Analysis of Variance

November 9, 2017 | Author: Sanchit Mishra | Category: Analysis Of Variance, Experiment, Errors And Residuals, Design Of Experiments, Variance
Share Embed Donate


Short Description

this ducument briefly describes the concept of Analysis of variance or ANOVA...

Description

ANALYSIS OF VARIANCE (ANOVA) ANOVA is a technique that will enable us to test for the significance of the difference among more than 2 sample means. Assumptions in ANOVA:   

Each of the samples is drawn from a normal population. The variances for the population from which samples have been drawn are equal. The variation of each value around its own grand mean should be independent for each value.

Basic steps in ANOVA:   

Determine one estimate of the population variance from the variance among the sample means. Determine a 2nd estimate of the population variance from the variance within the sample. Compare these two estimates if they are approximately equal in value, accept the null hypothesis.

Analysis of Variance Table ( One Way Classification) Null hypothesis: Samples from same population Source of Variation Between Samples

Within Samples

Sum of Squares Sum of squares between samples (SSC) Sum of squares within samples (SSE)

Degree of freedom  1  K 1

2  N  K

Here, K – number of samples N – Total number of items in the given data. Conclusion: If FC  FT then difference is not significant. If FC  FT then difference is significant. Calculation Procedu re:

Mean square Mean squares between samples SSC MSC  K 1 Mean squares within samples SSE MSE  N K

F-ratio

Fc 

MSC MSE

1. Sum of all items (T) =

X X

2. Correction Factor(C.F) =

1

2

 ....

T2 N

T 2    X 22  .... -   N  ( X 1 ) 2 ( X 2 ) 2 T 2   .....)   4. SSC ( Sum of Squares between samples) = ( n n N

3. Total Sum of Squares (TSS) =

5. MSC =

 X



2 1

  

SSC df

6. Sum of Squares within Samples (SSE) = TSS – SSC

SSE df 1. A common test was given to a number of students taken at random from a particular class of the four departments concerned to assess the significance of possible variation in performance. Make an analysis of variance given in the following data:(Take the level of significance as 5%) Departments C M E I 9 12 17 13 10 13 17 12 13 11 15 12 9 14 9 18 9 5 7 15 7. MSE =

Solution: Here N = 20, n = 5 ( number of items in each sample) Sample I X1 X12 9 81 10 100 13 169 9 81 9 81 50 512

Sample II X2 X22 12 144 13 169 11 121 14 196 5 25 55 655

Step 1: Sum of all items (T) =

Sample III X3 X32 17 289 17 289 15 225 9 81 7 49 65 933

Sample IV X4 X42 13 169 12 144 12 144 18 324 15 225 70 1086

 X1   X 2   X 3   X 4 ....= 50 + 55 + 65 + 70 = 240

2

(240) T2 Step 2:Correction Factor(C.F) = = = 2880 N 20 Step 3: Total Sum of Squares (TSS) = Sum of squares of all items - CF T 2    X 22  .... -   N  = 512 + 655 + 933 + 1006 – 2880 = 226 ( X 1 ) 2 ( X 2 ) 2 T 2   .....)   Step 4: SSC (Sum of Squares between samples) = ( n n N

=

 X



2 1

  

(50) 2 (55) 2 (65) 2 (70) 2 =(  .  ....)  2880 5 5 5 5 = 50 Step 5: MSC = Mean square between samples =

50 SSC =  16.67 df 4 1

Step 6: Sum of Squares within Samples (SSE) = TSS – SSC = 226 – 50 = 176 Step 7: MSE = Source of Variation Between Samples

Within Samples

SSE = 11 df Sum of Squares Sum of squares between samples (SSC) = 50

Sum of squares within samples (SSE) = 176

Degree of freedom  1  K 1 =3

2  N  K = 20 – 4 = 16

Mean square Mean squares between samples SSC MSC  K 1 = 16.67 Mean squares within samples SSE MSE  N K = 11

Tabulated value for (3, 16) df at 5% level of significance = 3.24 Calculated value = 1.515 Conclusion: Calculated value < Tabulated value. So, we accept the null hypothesis. Therefore, the samples could come from the same population.

F-ratio

MSC MSE = 1.515

Fc 

2. Three different machines are used for a production. On the basis of the outputs, set up one-way ANOVA table and test whether the machines are equally effective.

Outputs Machine I Machine II Machine III 10 9 20 15 7 16 11 5 10 10 6 14 Given that the value of F at 5% level of significance for (2,9) df is 4.26. Solution: Source of Variation Between Samples

Within Samples

Sum of Squares

Degree of freedom

Mean square

Sum of squares between samples (SSC) = 162.17

 1  K 1

Mean squares between samples SSC MSC  K 1 = 81.085 Mean squares within samples SSE MSE  N K = 13.63

Sum of squares within samples (SSE) = 122.75

= 3-1 =2

2  N  K = 12 – 3 =9

F-ratio

MSC MSE = 5.95

Fc 

Conclusion: we reject null hypothesis. Three machines are not equally effective. Analysis of Variance of 2-way Classification Model: When 2 independent factors might affect the variable of interest it is possible to design a test so that an analysis of variance can be used to test the effects of these two factors simultaneously. Then a test is called a 2-way classification of ANOVA (or) a 2 factor ANOVA. Analysis of Variance Table ( Two Way Classification) Source of Variation Between columns ( k = Number of columns)

Sum of Squares SSC

Degree of freedom k–1

Mean Square

MSC 

SSC k 1

F – ratio

FC 

MSC MSE

Between rows (r = Number of rows) Residual (or) Error

SSR

r -1

SSE

(k-1)(r-1)

SSR MSR FR  r 1 MSE SSE MSE  (r  1)(k  1) MSR 

Conclusion: If FC  FT , null hypothesis is accepted. If FC  FT , null hypothesis is rejected. 1. Three breeds of cattle A, B and C were fed by 4 different rations P,Q, R and S. The following table gives the gains in weight. Test whether there is any significant difference between breeds and rations at 5% level of significance.

P 6 1 7

1 Breed 2 3

Q 3 3 3

Rations R 2 8 5

S 9 7 2

Null hypothesis: i) There is no significant between breeds. ii) There is no significant difference between rations Workers P 6 1 7 14

1 2 3 Total

Rations Q R 3 2 3 8 3 5 9 15

Total S 9 7 2 18

20 19 17 56(T)

Step 1: Total T = 56

(56) 2 T2 Step 2: Correction Factor CF = =  261.33 12 N Step 3: SSC = Sum of Squares between columns(Rations) ( X 1 ) 2 ( X 2 ) 2 T 2    .....)    =( n n N  (14) 2 (9) 2 (15) 2 (18) 2  .  ....)  261.33 3 3 3 3 = 14 Step 4: SSR = Sum of Squares between rows (workers) =(

( X 1 ) 2

( X 2 ) 2

T 2   .....)    n n N  (20) 2 (19) 2 (17) 2 =(  . )  261.33 4 4 4 = 1.17 =(



Step 5: Total Sum of Squares (TSS) = Sum of squares of each values – CF T 2  =  X 12   X 22  .... -   N  2 2 2 2 = (6) + (3) +(2) +(9) +…….+(2)2 – 261.33 = 78.67 Step 6: SSE = Residual = TSS – ( SSC + SSR) = 78.67 - (14+1.17) = 63.5 Source of Variation Between columns ( k = Number of columns) Between rows (r = Number of rows) Residual (or) Error

Sum of Squares

Degree of freedom k – 1 = 4 -1 =3

Mean Square

SSR = 1.17

r -1 = 3 – 1 =2

SSE = 63.5

(k-1)(r-1) = 6

SSR r 1 = 0.585 SSE MSE  (r  1)(k  1)

SSC = 14

MSC 

SSC k 1 = 4.67

MSR 

= 10.58

Tabulated value: i) (6,3) df at 5% level is 8.94 ii) (6,2) df at 5% level is 19.3 Conclusion : i) CV TV Mean productivity is the same for four different types of machines ii) CV > TV workers differe with mean productivity. 3. The following table gives monthly sales ( in thousand rupees) of a certain firm in three states by its four salesmen. States A

I 6

Salesmen II III 5 3

IV 8

B 8 9 6 5 C 10 7 8 7 Set up the analysis of variance table and test whether there is any significant difference i) between sales by the firm salesmen and ii) between sales in the three states. Solution: Null hypothesis: i) there is no significant difference between the sales by the firm’s salesmen and ii) there is no significant difference between sales in the three states. Source of Variation Between columns ( k = Number of columns) Between rows (r = Number of rows) Residual (or) Error

Sum of Squares SSC = 8.334

Degree of freedom k – 1 = 4 -1 =3

SSR = 161. 5

r -1 = 3 – 1 =2

SSE = 73.7

(k-1)(r-1) = 6

Mean Square

SSC k 1 = 2.778 SSR MSR  r 1 = 6.334 SSE MSE  (r  1)(k  1) = 3.444 MSC 

F – ratio

MSC MSE = 0.81 MSR FR  MSE = 1.84

FC 

Tabulated value: i) (6,3) df at 5% level is 8.94 ii) ( 2,6) df at 5% level is 5.14 Conclusion : i) there is no significant difference in sales at 5% level of significance ii) There is no significant difference in the states. Designs of Experiment Aim of the Design of Experiments: A statistical experiment in any field is performed to verify a particular hypothesis. For example, an agricultural experiment may be performed to verify the claim that particular manure has got the effect of increasing the yield of paddy. Here the quantity of the manure used and the amount of yield are the two variables involved directly. They are called Experimental Variables. Apart from these two, there are other variables such as fertility of the soil , the quantity of seed used and the amount of rainfall, which also affect the yield of paddy. Such variables are called extraneous variables. The main aim of the design of experiments is to control the extraneous variables and hence to minimize the experimental error so that the results of the experiments could be attributed only to the experimental variables. Basic Principle of Experimental Design:

Randomization, Replication, Local control 1. Randomization: It is not possible to eliminate completely the contribution of extraneous variable to the value of the response variable, we try to control it by randomization. The group of experimental units( plots of same size) in which the measure is used is called the Experimental group and the other group of plots in which the manure is not used and which will provide a basis of comparison is called Control group. We select the plots for the experimental and control group in a random manner, which provides the most sufficient way of eliminating any unknown basis in the experiment. 2. Replication: It means Repetition. It is essential to carry out more than one test on each manure in order to estimate the amount of the experimental error and hence to get some idea of the precision of the estimates of the manure effects. 3. Local control: To provide adequate control of extraneous variables, another essential principle used in the experimental design is the local control. This includes techniques such as grouping, blocking and balancing of the experimental units used in the experimental design. By grouping, we mean combining sets of homogeneous plots into groups, so that different manures may be used in different groups. The number of plots in different groups need not be the same. By blocking, we mean assigning the same number of plots in different blocks. The plots in the same block may be assumed to be relatively homogeneous. We can use as many fertilizers as the number of plots in a block in a random way. By balancing, we do the adjusting of grouping procedures and blocking procedures and assign fertilizers ao that a balanced configuration is obtained. Basic Designs of Experiments: 1. Completely Randomised Design :(C.R.D)- (One factor classification) Let us suppose that, to compare ‘h’ treatments and there are ‘n’ plots are available for the experiment. Let the ith treatment be repeated ni times, so that n1 + n2 + … + nh = N The plots to which the different treatments are to be given are found by the following randomisation principle. The plots are numbered from 1 to N serially. N-identical cards are taken, numbered from 1 to N and shuffled thoroughly. Randomly draw n1 cards and the numbers in these n1 cards give the numbers of the plots to which the first treatment is to be given and so on. This design is called CRD, it is used when the plots are homogeneous 2. Randomised Block Design :(R.B.D) – (Two factor classification) Let us consider an agricultural experiment using which we wish to test the effect of ‘k’ fertilizing treatments on the yield of a crop. We assume that, we know some information about the soil fertility of the plots. Then we divide the plots into ‘h’ blocks according to the soil

fertility, each block containing ‘k’ plots. Thus the plots in each block will be of homogeneous as far as possible. “ Within each block, the ‘k’ treatments are given to the ‘k’ plots in a perfectly random manner, such that each treatment occurs only once in any block. But the same ‘k’ treatments are repeated from block to block. Null hypothesis: Rows and columns are homogeneous 3. Latin Square Design: (LSD) – (Three factor classification) We consider an agricultural experiment, in which n2 plots are taken and arranged in the form of an n x n square, such that the plots in each row will be homogeneous as far as possible with respect to one factor of classification say (soil fertility) and plots in each column will be homogeneous as as far as possible with respect to another factor of classification say (seed quality) The ‘n’ treatments are given to these plots such that each treatment occurs only once in each row and only once in each column. The various possible arrangements obtained in this manner are known as Latin squares of order n. Here rows, columns and letters stand for the three factors say fertility, seed quality and treatment respectively. Null hypothesis: Rows, columns and letters are homogeneous. Comparison of RBD & LSD: 1. The number of replications of each treatment is equal to the number of treatments in LSD, whereas there is no such restrictions on treatments and replication in RBD. 2. LSD can be performed on a square field, while RBD can be performed either on a square field or a rectangular field. 3. LSD is known to be suitable for the case when the number of treatments is between 5 and 12, whereas RBD can be used for any number of treatments. 4. The main advantage of LSD is that it controls the effect of two extraneous variables, whereas RBD controls the effect of only one extraneous variable. Hence the experimental error is reduced to a larger extent in LSD than in RBD. 1. Three varieties A, B, C of a crop are tested in a RBD with four replications. The plot yields in pounds are as follows. A6 C5 A8 B9 C8 A4 B6 C9 B7 B 6 C 10 A 6 Analysis the experimental yield and state your conclusion.

Solution: Null hypothesis: There is no significant difference between varieties ( rows) and between yiels (blocks) Source of Sum of Squares Degree of Mean Square F – ratio

Variation Between columns ( k = Number of columns) Between rows (r = Number of rows) Residual (or) Error

SSC = 18

freedom k – 1 = 4 -1 =3

SSR = 8

r -1 = 3 – 1 =2

SSE = 10

(k-1)(r-1) = 6

SSC k 1 =6

MSC 

SSR r 1 =4 SSE MSE  (r  1)(k  1) MSR 

MSC MSE = 3.6

FC 

MSR MSE = 2.4

FR 

= 1.667 Tabulated value : i) (3,6) df at 5% level of significance is 4.76. ii) ( 2,6) df at 5% level of significance is 5.14 Conclusion: i) there is no significant difference between yields. ii) There is no significant difference between varieties. 2. The following data resulted from an experiment to compare three burners B1 , B2 and B3 . A latin square design was used as the tests were made on 3 engines and were spread over 3 days. Day – 1 Day – 2 Day - 3

Engine – 1 B1 - 16

Engine – 2 B2 -17

Engine - 3 B3 - 20

B2 - 16 B3 - 15

B3 - 21

B1 - 15 B2 - 13

B1 - 12

Test the hypothesis that there is no difference between the burners.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF