Spermans Rank Correlation
Short Description
Download Spermans Rank Correlation...
Description
Spearman’s Rank Correlation Coefficient: In obtaining Karl Pearson’s coefficient of correlation the calculations are done from the actual observations based on the assumption that the population data is normally distributed. For population which is not normal or when the shape of the distribution is not known, the coefficient of correlation is not calculated from the actual observations but from the ranks of both the variables either in ascending or in descending order. This method developed by Edward Spearman is termed the rank correlation coefficient & given by R=16∑D2÷N(N21) Where R denotes rank correlation coefficient. D denotes the difference in ranks between paired items of the two series. N denotes the number of pairs of observation. Just as Karl Pearson’s coefficient of correlation lies between +1 & 1, Spearman’s rank correlation coefficient R also lies between +1 & 1. Rank correlation coefficient when ranks are given:Steps : 1) Find the difference in ranks of the N paired items R1R2=D. 2) Calculate the squares of the rank differences & add them to get ∑D2. 3) Use formula R=16∑D2÷N(N21) to get the value of rank correlation coefficient.
Example 22. The ranking of 10 students in two subjects A & B are as follows: A 6 5 3 10 2 4 B 3 8 4 9 1 6 Calculate rank correlation coefficient
9 10
7 7
8 5
1 2
Solution: Rank R1 (A) 6 5 3 10 2 4 9 7 8 1
Rank R2 (B) 3 8 4 9 1 6 10 7 5 2
D=R1R2 3 3 1 1 1 2 1 0 3 1
D2(R1R2)2 9 9 1 1 1 4 1 0 9 1
∑D2=36 R=16∑D2÷N (N21) =16×36÷10 (1021) =16×36÷10×99 = 0.7818
Example 23: two judges in a beauty competition rank the 12 entries as follows: X 1 2 3 4 5 6 7 8 9 10 11 Y 12 9 6 10 3 5 4 7 8 2 11 What degree of agreement is there between the two judges? Solution:
12 1
Rank R1 (X) 1 2 3 4 5 6 7 8 9 10 11 12
Rank R2 (Y) 12 9 6 10 3 5 4 7 8 2 11 1
R1R2 (D) 11 7 3 6 2 1 3 1 1 8 0 11
(D2) (R1R2)2 121 49 9 36 4 1 9 1 1 64 0 121
∑D2=416 R=16∑D2÷N (N21) =16×416÷12 (1221) =16×416÷12×143 =11.4545 = 0.4545 The degree of agreement between the two judges is the rank correlation coefficient which is negative in this case indicating disagreement.
Example 24: the rank of the same 15 students in two subjects A & B are given below. The two numbers within brackets denote the ranks of the same students in A & B respectively. (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), (10,1), (9,1), (10,15), (11,9), (12,5), (13,14), (14,12), (15,13). Find the Spearman’s Rank Correlation Coefficient.
Solution: R1 (A) 1 2 3 4 5 6 7 10 9 10 11 12 13 14 15
R2 (B) 10 7 2 6 4 8 3 1 11 15 9 5 14 12 13
(R1R2)2 (D2) 81 25 1 4 1 4 16 81 4 25 4 49 1 4 4
R1R2 (D) 9 5 1 2 1 2 4 9 2 5 2 7 1 2 2
∑D2=304 Spearman’s correlation coefficient is given by R=16∑D2÷N (N21) =16×304÷15×224 =10.5428 =0.4571
Example 25: ten competitors in a beauty contest are ranked by three judges in the following order: 1st judge 2nd judge 3rd
1
6
5
10
3
2
4
9
7
8
3
5
8
4
7
10
2
1
6
9
6
4
9
8
1
2
3
10
5
7
judge Use the rank correlation coefficient to determine which pair of judges has the nearest approach to common taste in beauty. Solution: if R1, R2 & R3 are the respective ranking of the three judges, the pair of judgments will be three namely R1R2, R1R3 & R2R3. R1
R2
R3
1 6 5 10 3 2 4 9 7 8
3 5 8 4 7 10 2 1 6 9
6 4 9 8 1 2 3 10 5 7
R1R2 (D1) 2 1 3 6 4 8 2 8 1 1
R1R3 (D2) 5 2 4 2 2 0 1 1 2 1
R2R3 (D3) 3 1 1 4 6 8 1 9 1 2
(D1) 2
(D2)2
(D3)2
4 1 9 36 16 64 4 64 1 1
25 4 16 4 4 0 1 1 4 1
9 1 1 16 36 64 1 81 1 4
∑D2=200 ∑D2=60 ∑D2=214 Rank correlation coefficient between judgments of 1st & 2nd judges is given by R12 = 16∑D2÷N (N21) =16×200÷10 (1021) =16×200÷10×99 =11.2121 = 0.2121 Rank correlation coefficient between judgments of 1st & 3rd judges is given by R13 =16∑D2÷N (N21)
=16×60÷10 (1021) =16×60÷10×99 = 0.6363 Rank correlation coefficient between judgments of 2nd & 3rd judges is given by R23 =16∑D2÷N (N21) =160×214÷10 (1021) =160×214÷10×99 =11.2969 = 0.2969 Out of R12, R13 & R23 only R13 is positive. Hence the first & second judges are in agreement as their beauty tastes are common. Ex – 26 If the sum of squares of the rank differences of 9 pairs of values is 80, find the correlation coefficient between them Solution: ∑D2=80, N =9 Rank correlation coefficient R = 16∑D2÷ N (N21) =1 6*80÷9 (921) = 1 6*80 ÷ 9*80 = 1 6÷9 = 1/3 = 0.333 Ex – 27 In a bivariate data of n pairs of observations, the sum of square of differences between the ranks of observed values of two variables is 231 & the rank correlation coefficient is – 0.4. Find the value of N. Solution ∑D2 = 231, R=0.4 R= 1 6∑D2÷ N (N21) Or
0.4=1 – 6*231÷ N(N21) or 6*231÷ N(N21) = 1+0.4 =1.4 or N(N21) = 6*231÷1.4 =990 =10*99 =10(1001) = 10(1021) N=10 Rank correlation coefficient when ranks are not given:Steps:1. Assign ranks to all the items in one series (X) & separately to all items in the other sins (Y). Ranks can start from either the highest or the lowest values but the same criterion is to be followed both the variables. 2. find the difference in ranks of the N paired items R1 – R2 =D 3. Calculate the squares of the rank differences and add them to get ∑D2 4. Use formula R = 1  6∑D2÷ N(N21) to get the value of rank correlation coefficient.
EX – 28 The coefficient of rank correlation of the marks obtained by 10 students in statistics and accountancy was found to be 0.2. It was later discovered that the difference in ranks in the two subjects obtained by one of the students was wrongly taken as i instead of 7. Find the correct coefficient of rank correlation.
Solution Let Rc and Rw be the correct & wrong coefficient of rank correlation respectively & Dc ad Dw be the correct and wrong differences respectively. R = 1 6∑D2 ÷ N (N2 – 1) so Rw = 1 6∑Dw 2 ÷ N (N2 – 1) Or
0.2 = 1 6∑Dw 2 ÷ 10 (102 – 1) = 1 6∑Dw2 ÷ 10*99 or 6∑D w2 ÷ 10*99 = 10.2 =0.8 ∑Dw2 = 0.8 *10*99÷6 = 132
Now ∑Dc 2 = ∑Dw
2
 (wrong rank difference)2 + (correct rank difference)2
= 132 92 + 72 =132 81+49 = 100 So Rc= 1 6∑Dc 2 ÷ N (N2 – 1) = 16*100 ÷ 10(102 1) = 1 6*100÷10*99 = 1 – 0.606 = 0.394
Ex 29 A test in statistics was taken by 7 students. The teacher ranked his students according to their academic achievements .The order of achievement from high to low together with family income for each pupil, is given follows: Rai (Rs 8700), bhatnagar (Rs 4200), Tuli (Rs 5700), Desai (Rs8200), Gupta (Rs 20000), Choudhary (Rs 18000) & Singh (Rs 17500) Complete the spearman’s coefficient of rank correlation between academic achievement & family income.
Solution: The students have been ranked from high to low in academic achievements as there are 7 students whose academic achievements & family income are to be correlated as Rai , Bhatnagar , _ _ _ _ _ _ _ , singh However their ranking from high to low as per family income will be will be Gupta, Choudhary, Singh, Rai, Desai, Tuli, and Bhatnagar
Name of students
Rank as per academics (R1)
Rank as per family income (R2)
R1  R2(D)
D2
RAI
1
4
3
9
BHATNAGAR
2
7
5
25
TULI
3
6
3
9
DESAI
4
5
1
1
GUPTA
5
1
4
16
CHOUDHARY
6
2
4
16
SINGH
7
3
4
16 ∑D2=92
Spearman’s coefficient of rank correlation is R=1 6∑D2÷ N (N21) = 1 6*92÷7 (721) = 1 – 6*92÷7*48 =1 1.6248 = 0.6428
Ex 30 Quotation of index numbers of security prices of a certain joint stock company are given below:
Year
Debenture price
Share prices
1
97.8
73.2
2
99.2
85.8
3
98.8
78.9
4
98.3
75.8
5
98.4
77.2
6
96.7
87.2
7
97.1
83.8
Using rank correlation method, determine the relationship between debenture prices & share prices.
Solution: 7 yrs debenture and share prices data is given so N=7 Ranking from highest to lowest for both debenture & share prices, stabulating Debenture price
Debenture price rank(R1)
Share price Share price R1  R2(D) rank (R2)
D2
97.8
5
73.2
7
2
4
99.2
1
85.8
2
1
1
98.8
2
78.9
4
2
4
98.3
4
75.8
6
2
4
98.4
3
77.2
5
2
4
96.7
7
87.2
1
6
36
97.1
6
83.8
3
3
9 ∑D2=62
Coefficient of rank correlation is R =16∑D2÷ N (N21) =16*62÷7 (721)
=16*62÷7*48 =11.1071 =0.1071
Ex – 31 Calculate spearman’s coefficient of correlation between marks assigned to 10 students by judge X&Y in a certain competitive test as shown below: No.
1
2
3
4
5
6
7
8
9
10
Marks by judge X
52
53
42
60
45
41
37
38
25
27
Marks by judge Y
65
68
43
38
77
48
35
30
25
50
Solution: There are 10 students marked by judges X & Y so N = 10 Ranking students by marks given by both the judges from lowest to highest & tabulating. Marks by judge X
Rank by judge X(R1)
Marks by judge Y
Rank by judge Y(R2)
R1R2 (D)
D2
52
8
65
8
0
0
53
9
68
9
0
0
42
6
43
5
1
1
60
10
38
4
6
36
45
7
77
10
3
9
41
5
48
6
1
1
37
3
35
3
0
0
38
4
30
2
2
4
25
1
25
1
0
0
27
2
50
7
5
25
∑D2=76 Spearman’s coefficient of correlation R=1 6∑D2÷ N (N21) =16*76÷10(1021) =16*76÷990 =10.4604 =0.5396
Ranks correlation coefficient when ranks are equal:This is a special case of finding rank correlation coefficient when ranks are not given at the same time two or more items in a series have equal ranks, in other words, they are repeated. The steps calculating rank correlation coefficient will be the same as the previous case (ranks not given) however, the ranks will be assigned in the following manner. For a set of two repeated items in a series , if one is getting assigned rank R then the other is supposed to get rank ( R+1) on the assumption that it has a marginally higher value than the other one (ranking consideration is on the basis of increasing values of items ) . In reality this is not the case hence both the repeated items are assigned the average of two ranks as R + (R+1) ÷ 2. The next rank for non repeated items in the series will be R+2. For repeated n times in the series, the average rank will be r+(r+1) +………..[r+(n1)] & the subsequent rank for a non repeated item in the series will be (r+n) N The rank correlation coefficient in this case is given by formula. R=16[∑D2+1÷12(m13m1)+1÷12(m23m2)+1÷12(m33m3)+……..] N (N21) Where D is the rank difference of N paired items in the two series X&Y, m1 & m2, are the repeated items In the two series X&Y respectively.
Ex – 32
Relationship between height and weight of a batch of 10 students is given in the following table:
Height (inches): 48
49
50
51
52
53
54
55
56
57
Weight (lbs): 100 105 105 104 111 115 125 130 132 137 There are 10 pairs of observations so N = 10 & m1= 2 as 105 lbs figures twice in the weight series . There is no repetition in the data for height, however, the weight 105 lbs is repeated. Arranging the weight in ascending order 100, 104, 105, 105, 111, 115, 125, 130, 132, 137. Ranks of 100 & 104 are 1 & 2. Ranks of the repeated weight 105 lbs will be 3+4 = 3.5. Tabulating the data. 2 X(height)
Ranks (R1)
Y(weight)
Ranks(R2)
R1R2 (D)
D2
48
1
100
1
0
0
49
2
105
3.5
1.5
2.25
50
3
105
3.5
0.5
0.25
51
4
104
2
2
4
52
5
111
5
0
0
53
6
115
6
0
0
54
7
125
7
0
0
55
8
130
8
0
0
56
9
132
9
0
0
57
10
137
10
0
0 ∑D2=6.5
Rank correlation coefficient R=16[∑D2+1÷12 (m13m1)+1÷12(m23m2)+1÷12(m33m3)+……..] N (N21) =16*[6.5+1÷12 (232) ÷10(1021)
=16*(6.5+0.5) ÷ 10*99 =16*7÷990 =0.957
Ex 33 Calculate rank correlation coefficient of the following data: Marks in 1st subject: 40 46 54 60 70 80 82 85 85 90 95 Marks in 2nd subject: 45 45 50 43 40 75 55 72 65 42 70
Solution: There are 11 pairs of observations so N =11, marks in both subjects have been repeated. In 1st subject 85 is repeated and in 2nd subject 45 is repeated so m1 = 2 & m2 = 2. Arranging marks in ascending order 1st subject: 40 46 54 60 70 80 82 85 85 90 95 85 lies in 8th & 9th places so average rank = 8+9 = 8.5 2 2nd subject: 40 42 43 45 45 50 55 65 70 72 75 45 lies in 4th & 5th places so average rank = 4+5 = 4.5 2 1st subject (X)
Ranks (R1)
2nd subject (Y)
Ranks (R2)
R1R2 (D)
(D2)
40
1
45
4.5
3.5
12.25
46
2
45
4.5
2.5
6.25
54
3
50
6
3
9
60
4
43
3
1
1
70
5
40
1
4
16
80
6
75
11
5
25
82
7
55
7
0
0
85
8.5
72
10
1.5
2.25
85
8.5
65
8
0.5
0.25
90
10
42
2
8
64
95
11
70
9
2
4 ∑D2=140
Rank correlation coefficient R = 16[∑D2+1÷12 (m13m1)+1÷12(m23m2)+1÷12(m33m3)+……..] N (N21) =16[140+1÷12(232)+1÷12(232) ÷11(1121) =16(140+1÷2+1÷2) ÷11*120 =16*141÷11*120 =0.359
Ex 34 Obtain the rank correlation coefficient between the variables X & Y from the following pairs of observed values.
X = 50 55 65 50 55 60 50 65 70 75 Y = 110 110 115 125 140 115 130 120 115 160
Solution: There are 10 pairs of observations so N = 10. In X series 50 figures 3 times , 55 figures twice and 65 figures twice so m1 = 3 ,m2 = 2 & m3 = 2 In Y series 115 figures thrice & 110 figures twice so m4 = 3 & m5 = 2. Arranging in ascending order
X series: 50 50 50 55 55 60 65 65 70 75 50 lies in first, second and third places so their average rank = 1+2+3 = 2 3 55 lies in fourth & fifth places so their average rank = 4+5 = 4.5 2 65 lies in seventh & eighth places so their average rank = 7+8 = 7.5 2 Y series: 110 110 115 115 115 120 125 130 140 160 110 lies in first & second places so their average rank = 1+2 =1.5 2 115 lies in third , fourth & fifth places so their average rank = 3+4+5 = 4 3 X
Rank (R1)
Y
Rank (R2)
R1R2 (D)
D2
50
2
110
1.5
0.5
0.25
55
4.5
110
1.5
3
9
65
7.5
115
4
3.5
12.25
50
2
125
7
5
25
55
4.5
140
9
4.5
20.25
60
6
115
4
2
4
50
2
130
8
6
36
65
7.5
120
6
1.5
2.25
70
9
115
4
5
25
75
10
160
10
0
0 ∑D2=134
Rank correlation coefficient R= 16[∑D2+1÷12 (m13m1)+1÷12(m23m2)+1÷12(m33m3)+……..] N (N21)
=16[134+1÷12(333)+1÷12(232)+1÷12(232)+1÷12(333)+1÷12(232)] ÷10(1021) =16[134+2+1÷2+1÷2+2+1÷2] ÷10*99 =16*139.5÷990 =0.1545
Ex 35 Calculate the coefficient of correlation from the following data by the method of rank differences. Rank of X: 10 4 2 5 8 5 6 9 Rank of Y: 10 6 2 5 8 4 5 9
Solution: N= 8 as there are only 8 pairs of observations & in the data ranks are mentioned as 9 & 10 also. This is not possible. If rank correlation coefficient is to be calculated then ranks in the data are to be treated as observations & not ranks. Then ranks are to be assigned to these values. The values in ascending order will be: X series: 2 4 5 5 6 8 9 10 & Y series: 2 4 5 5 6 8 9 10 Both X and Y series have repeated items 5 so m1= 2 & m2 = 2. in both the series they are placed at third and fourth positions. Hence for both the series their average rank will be 3+4 = 3.5 2 X
Rank (R1)
Y
Rank (R2)
R1R2 (D)
D2
10
8
10
8
0
0
4
2
6
5
3
9
2
1
2
1
0
0
5
3.5
5
3.5
0
0
8
6
8
6
0
0
5
3.5
4
2
1.5
2.25
6
5
5
3.5
1.5
2.25
9
7
9
7
0
0 ∑D2=13.5
Rank correlation coefficient R = 16[∑D2+1÷12 (m13m1)+1÷12(m23m2)+1÷12(m33m3)+……..] N (N21) =16[13.5+1÷12(232)+1÷12(232)] ÷8(821) =16[13.5+0.5+0.5] ÷8*63 =16*14.5÷8*63 =0.8273
Ex 36 If the coefficient of rank correlation between debenture prices & share prices of a company found to be 0.143. if the sum of the squares of the differences in ranks is 48. Find the value of N?
Solution: ∑D2=48 & R=0.143 The rank correlation coefficient R=1 6∑D2÷ N (N21) Substituting the value from data 0.143=16*48÷ N (N21) Or 6*48÷ N (N21)=10.143 =0.857
N (N21)=6*48÷0.857 =336=7*48=7*(491)=7(721) Therefore, N=7
Concurrent deviation method: This is the easiest of all the methods of studying correlation. The basis of this method is to study the direction of change, in other words, to find the increase or decrease in value of the variables X and Y. Then the concurrent deviation which is the product of the changes in variables X and Y is observed ; only the positive sign or negative sign is considered and not the actual change in magnitude. The coefficient of correlation by the concurrent deviation method is given by rc= ±√± (2cn), where c donates the numbers of concurrent.
n Deviation is the number of +ve signs only obtained as a product of the deviation dx and dy (signs only and not the actual deviation values) in variables X&Y respectively n is one less than N, the number of pairs of observations. This is due to the fact that in both X and Y series, no value preceedes the first place value so change (deviation) can not be found. Calculated by this method, the value of the correlation coefficient also termed the coefficient of concurrent deviation lies between +1 & 1.
Steps: 1. In the X variable find the deviation or the direction of change Dx. The first place change cannot be determined due to non existence of predecessor to the first place value so it is left blank. Compare the first & second place values of the X series. If the second place value is more than the first place value, mark +ve sign in the second place of the deviation Dx column. In case the second place value is less than the first place value, mark –ve sign & if both values are equal mark zero in the second place of the deviation Dx column. In the same manner the second and third place & subsequently all
the remaining adjacent values of the X variables are to be compared and accordingly marked in the deviation Dx column. 2. The same treatment is to be given to values of the Y variable & the +ve, ve sign or zero as the case may be marked in the deviation Dy coloumn. 3. Find the product of Dx & corresponding Dy marking & record them in the DxDy coloumn. 4. Add all the +ve signs in DxDy coloumn to get ∑+ve signs = C 5. To obtain the value of the coefficient of correlation substitute the values of C & n in the formula Rc = ±√± (2cn),
n The +ve & ve signs inside and outside the square root sign have a significance. The square root of +ve numbers are real numbers which could be either +ve or –ve but of same magnitude. Square root of –ve numbers are not real numbers. If 2cn is –ve sign then 2cn will also be –ve n as n as a +ve number. So to make 2c – n positive the negative 2cn has to be multiplied by 1 otherwise real values of the correlation coefficient cannot be obtained. Once the negative sign inside the square root sign has been considered, the negative sign outside the square root sign will have to be considered thereby establishing –ve correlation. In other words, if 2cn is –ve, the correlation is –ve otherwise +ve correlation exists. There is absolute absolutely no ambiguity on account of the +ve sign & ve sign.
Ex 37 Calculate the coefficient of concurrent deviation from the following: X: 60 55 50 56 30 70 40 35 80 80 75 Y: 65 40 35 75 63 80 35 20 80 60 60
Solution:
X
Dx
60
Y
Dy
Dx.Dy
65
55

40

+
50

35

+
56
+
75
+
+
30

63

+
70
+
80
+
+
40

35

+
35

20

+
80
+
80
+
+
80
0
60

0
75

60
0
0
C=∑Dx.Dy=8 N=11, n=N1=10, c=8 Cofficient of concurrent deviation Rc=±√± (2cn), n =±√±(2*810) 10 =±√6÷10 =0.7747
Ex 38 Calculate the coefficient of concurrent deviation from the following data: Price: 368 384 385 361 347 384 395 403 400 385 Imports: 22 21 24 20 22 26 24 29 28 27
Solution: X
Dx
368
Y
Dy
Dx.Dy
22
384
+
21


385
+
24
+
+
361

20

+
347

22
+

384
+
26
+
+
395
+
24


403
+
29
+
+
400

28

+
385

27

+
C=∑Dx.Dy=6 N=10, n= N1=101=9, c=6 Cofficient of concurrent deviation Rc=±√± (2cn), n =±√±(2*69) 9 =±√±1÷3 =0.5773
Ex 39 Calculate the coefficient of correlation using the method of concurrent deviation from te following data: Year:
1998 1999 2000 2001 2002 2003 2004
Supply: 150 154
160 172 :160 165 180
Demand: 200 180
170 160 190 180 172
Solution: Year
Supply X
Dx
1998
150
1999
154
+
2000
160
2001
Demand Y
Dy
Dx.Dy
180


+
170


172
+
160


2002
160

190
+

2003
165
+
180


2004
180
+
172


200
C=∑Dx.Dy=0 N=7, n=N1=71=6, c=0 as there is not a single positive +ve sign Cofficient of concurrent deviation Rc=±√± (2cn), n =±√±(2*06÷6) =±√±(6÷6) =±√±(1) Square root of 1 is not a real number so consider –ve sign inside & outside the square root sign Rc= √(1)= √1= 1 This indicate perfect –ve correlation between supply & demand.
View more...
Comments