Chapter 14 (The Chi-Square Test)
Short Description
Notes on Chi-Square...
Description
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
Page 1 of
CHAPTER 14: HYPOTHESIS TESTING: CATEGORICAL DATA
This chapter describes two types of tests: 1. Tests of hypothesis about contingency tables called independence tests 2. Tests of hypothesis for experiments with more than two categories, called goodness of fit tests All of these tests are performed by using the chisquare distribution. It is written as χ2 distribution, which is pronounced as ki Like the t distribution the chisquare has only one parameter called the degrees of freedom (df) . The shape of a specific chisquare distribution depends on the number of degrees of freedom. The random variable χ2 assumes nonnegative values only. Henc4e a chisquare distribution curve starts at the origin and lies entirely to the right of the vertical axis. If we know the degrees of freedom and the area in the right tail of a chisquare distribution curve, we can find the value of χ2 from the table. 14.1 R× C CONTINGENCY TABLES. Information can be summarized and presented using a two way classification table called a contingency table., which is also called a contingency table or cross tabulation In a test of independence for a contingency table, we test the null characteristics of the elements of a given population are not related that they are independent) against the alternative hypothesis that the two characteristics are related ( that they are dependant). may want to test if there is an association between being a male or female and having a preference for watching sports or soap operas on television. We perform such a test by using the chi-square distribution The Degrees of Freedom for a test of independence are df = ( R −1)(C −1) Where R and C are the number of rows and number of columns, respectively, in the given contingency table 14.1.1 A TEST OF INDEPENDENCE OF HOMOGENEITY The value of the test statistic χ2 for a test of independence is calculated as χ2 =Σ
(O − E ) 2 E
Where O and E are the observed and expected frequencies, respectively for a cell. The null hypothesis in a test of independence is always that the two attributes are not related. The alternative hypothesis is that the two attributes are related. The frequencies obtained from the performance of an experiment for a contingency table are called the observed frequencies. The expected frequency E for a cell is calculated as
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
E=
Page 2 of
( rowtotal )( columntotal ) sum
14.2 CHI-SQUARE GOODNESS-OF-FIT TEST This Section explains how to make tests of Hypothesis about experiments with more than two possible outcomes (categories). Such experiments called multinomial experiments possess four characteristics. A Multinomial Experiment An experiment with the following characteristics is called a multinomial experiment 1. 2. 3. 4.
It consists of n identical trials Each trial results in one of k possible outcomes( categories ) where k>2 The trials are independent The probabilities of the various outcomes remain constant for each trial
14.2.1 Observed and expected frequencies The frequencies obtained from the actual performance of a test are called observed frequencies. In a goodness –of-fit test, we test the null hypothesis that the observed frequency for an experiment follows a certain pattern or theoretical distribution. It is called a goodness of fit test because the hypothesis tested is how good the observed frequencies fit a given pattern. O denotes them. The expected frequencies, denoted by E are the frequencies that we will expect to obtain if the null hypothesis is true. The expected frequency for a category is obtained as E =np, where n is the sample size and p is the probability that an element belongs to that category if the null hypothesis is true
14.2.2 Degrees of freedom for a goodness of fit test In a good ness of fit test, the degrees of freedom are df = k −1 where k denotes the number of possible outcomes for the experiment 14.2.3 Test statistic for a goodness of fit test The test statistic for a goodness of fit test is χ2 and its value is calculated as χ2 = Σ
where
(O − E )2 E
O= Observed frequency for a category
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
Page 3 of
E= expected frequency for a category Remember that a chi-square goodness of fit test is always a right tailed test
Chi-squared goodness of Fit Test
PROBLEM SET Section 1 1. 300 employees of a company were selected at random and asked whether they were in favor of a scheme to introduce flexible working hours. The following table shows the opinions and the departments of the employees Department
Opinion Infavour Uncertain Production 89 42 Sales 53 36 Administration 38 12 Test whether there is evidence of a significant association department ? ( 8.98)
Against 9 11 10 between opinion and
2. A group of executives was classified according to total income and age. Test the hypothesis , that age is not related to the level of income Age Less than $100,000 $100,000 to $ $400,000 or more 399,999 Under 40 6 9 5 40 to 54 18 19 8 55 or older 11 12 17 Test whether or not type of industry is independent of state? (6.85) 3. Suppose a personnel department in investigated absentees, by categorizing absentees according to the shift on which they worked , as shown in the following table,. Day of the Week Shift
Monday
Tuesday
Day Evening
49 50
36 38
Wednesd ay 43 40
Thursday
Friday
40 40
45 41
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
Page 4 of
Is there is sufficient evidence at 5% significance level of an association between the days on which the employees are absent and the shift on which the employees work ? ( 0.3217) 4. A company owns Hyper mart in various parts of the country. The hyper marts are situated near large cities. Each Hyper mart has a a large car park that is free to use to users. The directors think that there are regional differences in the distances that customers travel to reach these stores. A hyper mart was selected in each of the three regions and a random sample of customers at each store was asked how far they have traveled to reach the store . The result were as follows Distance Traveled
Region South Middle North Less Than 5 Miles 50 80 70 Between 5 and 10 80 60 20 miles More than 10 miles 70 60 10 Examine at 5% significance level whether there is any relation ship between distance traveled and region ? ( 57.85) 5. The marketing director for a metropolitan daily news paper is studying the relationship between the type of community the reader lives in and the portion of the paper he or she reads first. For a sample of readers the following information is obtained National news Sports Comics Urban 170 140 90 Rural 100 110 100 Farm 130 100 60 At the 0.05 significance level , can we conclude that there is a relation ship between the type of community where the person resides and the portion of the paper he reads first? ( 80.678) 6. The following data concerning industrial accidents and absentees are Classified according top the type of employee Type of employee Absence following accident Up to One Month One month or Longer
the Men 26 14
Women
Juvenile
16 9
8 7
Is there any evidence to suggest that the severity of accident is associated with the types of employee? Use a 5% significance level? (0.6618)
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
Page 5 of
7 A tile company was interested in comparing the fraction of new house builders favoring three types of tiles as floor coverings for their houses in three different areas of Klang valley i.e. Subang jaya, Puchong and Petaling Jaya. A survey was conducted and the data were as follows Are a Floor Subang Jaya Puchong Petalling Jaya Covering Type1 224 165 36 Type 11 196 152 44 Type111 80 83 20 Test at 5%significance level whether there is any association between types of tiles used and the areas concerned. 5.4) 8.
A large consultancy firm regularly recruits MBA graduates. The personnel director has categorized each business school producing MBA graduates as top rate, adequate or bad to assist their recruitment strategy. A survey of the performance of 100 recent recruits has rated them as excellent, average or poor. A cross-classification of the results of the survey is shown in the table below.
Rating Top Rate of Adequate Business Bad Schools
Rating Of Graduates Excellent Average 10 10 7 30 3 20
Poor 5 8 7
Is there a relation ship between the rating of these recruits and the business school at which they were trained. ( Test at the 5% significance level) (9.44) Section 2 1. A group of 385 mental patients has been classified according to parental social class, with the following results Social Upper Upper Middle Lower Lower Class Middle middle Frequency 18 31 46 126 164 Test a 5% significance level that the data are consistent with the assumption that all social classes are equally likely to be represented (9.48) 2. Motor Vehicle production is the same each days. The following information is given below Days No:of vehicles
Monday 160
Tuesday 140
Wednesday 139
Thursday 141
Friday 170
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
Page 6 of
Test at 10% significance level to determine whether the number of vehicles is the same throughout the week? (5.36) 3. It has been estimated that employee absenteeism costs Malaysian companies more than RM 500 million per year. The personnel department of a large corporation recorded the weekdays during which individuals in a sample of 422 absentees were away over the past several months. .Do these data suggests that absenteeism is higher on some days of the week (use α = 0.05 ) ( 4.091) Day Number absent
Monday 99
Tuesday 74
Wednesday 83
Thursday 80
Friday 86
4. A company keeps detailed records of staff accidents. During a recent safety review , A random sample of 60 accidents was selected and classified by the day of the week on which they occurred Day Monday Tuesday Wednesday Thursday Friday Number of accidents 8 12 9 14 17 Test at 5% level of significance whether there is any evidence that accidents are more likely to happen on some days than others? (4.5) 5. A study reports an analysis of 35key product categories. At the time of the study, 72.9% of the products sold were of a national brand, 23 % were private –label and 4.1 % were generic. Suppose that you want to test whether these percentages are still valid for the market today. You collect a random sample of 1000 products in the 35 product categories studied, and you find the following: 610 products are of a national brand, 290 are private label, and 100 are generic. Conduct the test at the 0.025 level of significance. (119.98) 6. A farmers apples are graded on a scale from A to D before sale. Past experience shows that the percentages of apples in the four grades are as follows. Grade A B C D % 29 38 27 6 The farmer introduces a new treatment and applies it to a small number of trees to see if it affects the distribution of grades. The apples produced by these trees are graded as following Grade A B C D Number of apples 79 94 58 19 Test at the 5% level of significance to see if the new treatment has affected the distribution of grades? ( 3.08) 7. In a certain town in the Selangor state, the retailing market for petrol is shared among several companies. Their market share can be established in the ratio of 45: 25: 20: 10 respectively. A survey was conducted recently among 1000 car owners in that town and their preference were tabulated as follows
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
Page 7 of
Oil Company Shell Esso Petronas Others Number of Car Owners 420 300 210 70 2 Use a χ test at 1% significance level to test the hypothesis that there has been no change in the market share for petrol? ( 21.5) 8. An organization recently published the number of acts of violence seen in types of television programs.. Type of Drama Old movies Cartoon Police Comedy News program Acts Of 42 57 83 92 38 81 Violence The organization claimed that such acts occur with equal frequency across all types of program. Test this claim at 10% level? ( 40.14) 9. Seattle Air craft Company Inc Manufactures and Sells Twin Otters in the U.S. Records of the company showed that sales, by regions , in the previous years were distributed according to the following proportions Region
West North North East Coast Central Percentage (%) 30 25 20 This year, the numbers of planes sold in these regions are Region No. Of Planes
West Coast 330
North Central 220
South
South East
10
15
North East
South
South East
170
120
160
Can you conclude at 1% significance level that the sales distribution for this year differ significantly from those of the previous years?( 15.1) 10. The LDP express way, which has five lines after the tollgate, was studied to see whether drivers preferred to drive on the inside lanes. A total of 1000 automobiles was observed during the early morning traffic, and the number of cars on respective lanes were recorded.. The result were as follows: Lane 1 2 3 4 5 Observed count 96 154 275 225 171 Do the data provide sufficient evidence at 5%level of significance to indicate that some lanes are preferred over others ( 101.81)
MAT131/MAT2231: MATHEMATICS & STATISTICS 8
Page 8 of
11. A survey of the employees of a large company was conducted to see whether competence in computing skills was related to age. The results of the survey are given below Age Group ( years ) 18and under 30 30 and under 45 45 and over (i)
(ii)
Good 70 40 30
Computing Skill Average Poor 20 10 30 30 30 60
In a previous assessment of computing skills taken for all e3mployess 5 years ago , it was found that 30% were good , 20% average and 50% poor. Combine the three age groups of the data in part (i) and test whether there is any evidence of a change in computing skill? ( 46.67) Assuming that the survey was conducted by means of a random sample test , test the hypothesis that computing skill is associated with age? (55.713)
View more...
Comments