# Spermans Rank Correlation

September 22, 2017 | Author: floretsonal | Category: Correlation And Dependence, Statistical Analysis, Scientific Method, Probability Theory, Data Analysis

#### Description

Spearman’s Rank Correlation Coefficient: In obtaining Karl Pearson’s coefficient of correlation the calculations are done from the actual observations based on the assumption that the population data is normally distributed. For population which is not normal or when the shape of the distribution is not known, the coefficient of correlation is not calculated from the actual observations but from the ranks of both the variables either in ascending or in descending order. This method developed by Edward Spearman is termed the rank correlation coefficient & given by R=1-6∑D2÷N(N2-1) Where R denotes rank correlation coefficient. D denotes the difference in ranks between paired items of the two series. N denotes the number of pairs of observation. Just as Karl Pearson’s coefficient of correlation lies between +1 & -1, Spearman’s rank correlation coefficient R also lies between +1 & -1. Rank correlation coefficient when ranks are given:Steps : 1) Find the difference in ranks of the N paired items R1-R2=D. 2) Calculate the squares of the rank differences & add them to get ∑D2. 3) Use formula R=1-6∑D2÷N(N2-1) to get the value of rank correlation coefficient.

Example 22. The ranking of 10 students in two subjects A & B are as follows: A 6 5 3 10 2 4 B 3 8 4 9 1 6 Calculate rank correlation coefficient

9 10

7 7

8 5

1 2

Solution: Rank R1 (A) 6 5 3 10 2 4 9 7 8 1

Rank R2 (B) 3 8 4 9 1 6 10 7 5 2

D=R1-R2 3 -3 -1 1 1 -2 -1 0 3 -1

D2(R1-R2)2 9 9 1 1 1 4 1 0 9 1

∑D2=36 R=1-6∑D2÷N (N2-1) =1-6×36÷10 (102-1) =1-6×36÷10×99 = 0.7818

Example 23: two judges in a beauty competition rank the 12 entries as follows: X 1 2 3 4 5 6 7 8 9 10 11 Y 12 9 6 10 3 5 4 7 8 2 11 What degree of agreement is there between the two judges? Solution:

12 1

Rank R1 (X) 1 2 3 4 5 6 7 8 9 10 11 12

Rank R2 (Y) 12 9 6 10 3 5 4 7 8 2 11 1

R1-R2 (D) -11 -7 -3 -6 2 1 3 1 1 8 0 11

(D2) (R1-R2)2 121 49 9 36 4 1 9 1 1 64 0 121

∑D2=416 R=1-6∑D2÷N (N2-1) =1-6×416÷12 (122-1) =1-6×416÷12×143 =1-1.4545 = -0.4545 The degree of agreement between the two judges is the rank correlation coefficient which is negative in this case indicating disagreement.

Example 24: the rank of the same 15 students in two subjects A & B are given below. The two numbers within brackets denote the ranks of the same students in A & B respectively. (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), (10,1), (9,1), (10,15), (11,9), (12,5), (13,14), (14,12), (15,13). Find the Spearman’s Rank Correlation Coefficient.

Solution: R1 (A) 1 2 3 4 5 6 7 10 9 10 11 12 13 14 15

R2 (B) 10 7 2 6 4 8 3 1 11 15 9 5 14 12 13

(R1-R2)2 (D2) 81 25 1 4 1 4 16 81 4 25 4 49 1 4 4

R1-R2 (D) 9 -5 1 -2 1 -2 4 9 -2 -5 2 7 -1 2 2

∑D2=304 Spearman’s correlation coefficient is given by R=1-6∑D2÷N (N2-1) =1-6×304÷15×224 =1-0.5428 =0.4571

Example 25: ten competitors in a beauty contest are ranked by three judges in the following order: 1st judge 2nd judge 3rd

1

6

5

10

3

2

4

9

7

8

3

5

8

4

7

10

2

1

6

9

6

4

9

8

1

2

3

10

5

7

judge Use the rank correlation coefficient to determine which pair of judges has the nearest approach to common taste in beauty. Solution: if R1, R2 & R3 are the respective ranking of the three judges, the pair of judgments will be three namely R1R2, R1R3 & R2R3. R1

R2

R3

1 6 5 10 3 2 4 9 7 8

3 5 8 4 7 10 2 1 6 9

6 4 9 8 1 2 3 10 5 7

R1-R2 (D1) -2 1 -3 -6 -4 -8 2 8 1 -1

R1-R3 (D2) -5 2 -4 2 2 0 1 -1 2 1

R2-R3 (D3) -3 1 -1 -4 6 8 -1 -9 1 2

(D1) 2

(D2)2

(D3)2

4 1 9 36 16 64 4 64 1 1

25 4 16 4 4 0 1 1 4 1

9 1 1 16 36 64 1 81 1 4

∑D2=200 ∑D2=60 ∑D2=214 Rank correlation coefficient between judgments of 1st & 2nd judges is given by R12 = 1-6∑D2÷N (N2-1) =1-6×200÷10 (102-1) =1-6×200÷10×99 =1-1.2121 = -0.2121 Rank correlation coefficient between judgments of 1st & 3rd judges is given by R13 =1-6∑D2÷N (N2-1)

=1-6×60÷10 (102-1) =1-6×60÷10×99 = 0.6363 Rank correlation coefficient between judgments of 2nd & 3rd judges is given by R23 =1-6∑D2÷N (N2-1) =1-60×214÷10 (102-1) =1-60×214÷10×99 =1-1.2969 = -0.2969 Out of R12, R13 & R23 only R13 is positive. Hence the first & second judges are in agreement as their beauty tastes are common. Ex – 26 If the sum of squares of the rank differences of 9 pairs of values is 80, find the correlation coefficient between them Solution: ∑D2=80, N =9 Rank correlation coefficient R = 1-6∑D2÷ N (N2-1) =1- 6*80÷9 (92-1) = 1- 6*80 ÷ 9*80 = 1- 6÷9 = 1/3 = 0.333 Ex – 27 In a bivariate data of n pairs of observations, the sum of square of differences between the ranks of observed values of two variables is 231 & the rank correlation coefficient is – 0.4. Find the value of N. Solution ∑D2 = 231, R=-0.4 R= 1- 6∑D2÷ N (N2-1) Or

-0.4=1 – 6*231÷ N(N2-1) or 6*231÷ N(N2-1) = 1+0.4 =1.4 or N(N2-1) = 6*231÷1.4 =990 =10*99 =10(100-1) = 10(102-1) N=10 Rank correlation coefficient when ranks are not given:-Steps:1. Assign ranks to all the items in one series (X) & separately to all items in the other sins (Y). Ranks can start from either the highest or the lowest values but the same criterion is to be followed both the variables. 2. find the difference in ranks of the N paired items R1 – R2 =D 3. Calculate the squares of the rank differences and add them to get ∑D2 4. Use formula R = 1 - 6∑D2÷ N(N2-1) to get the value of rank correlation coefficient.

EX – 28 The co-efficient of rank correlation of the marks obtained by 10 students in statistics and accountancy was found to be 0.2. It was later discovered that the difference in ranks in the two subjects obtained by one of the students was wrongly taken as i instead of 7. Find the correct co-efficient of rank correlation.

Solution Let Rc and Rw be the correct & wrong co-efficient of rank correlation respectively & Dc ad Dw be the correct and wrong differences respectively. R = 1- 6∑D2 ÷ N (N2 – 1) so Rw = 1- 6∑Dw 2 ÷ N (N2 – 1) Or

0.2 = 1- 6∑Dw 2 ÷ 10 (102 – 1) = 1- 6∑Dw2 ÷ 10*99 or 6∑D w2 ÷ 10*99 = 1-0.2 =0.8 ∑Dw2 = 0.8 *10*99÷6 = 132

Now ∑Dc 2 = ∑Dw

2

- (wrong rank difference)2 + (correct rank difference)2

= 132- 92 + 72 =132- 81+49 = 100 So Rc= 1- 6∑Dc 2 ÷ N (N2 – 1) = 1-6*100 ÷ 10(102- 1) = 1- 6*100÷10*99 = 1 – 0.606 = 0.394

Ex -29 A test in statistics was taken by 7 students. The teacher ranked his students according to their academic achievements .The order of achievement from high to low together with family income for each pupil, is given follows: Rai (Rs 8700), bhatnagar (Rs 4200), Tuli (Rs 5700), Desai (Rs8200), Gupta (Rs 20000), Choudhary (Rs 18000) & Singh (Rs 17500) Complete the spearman’s coefficient of rank correlation between academic achievement & family income.

Solution: The students have been ranked from high to low in academic achievements as there are 7 students whose academic achievements & family income are to be correlated as Rai , Bhatnagar , _ _ _ _ _ _ _ , singh However their ranking from high to low as per family income will be will be Gupta, Choudhary, Singh, Rai, Desai, Tuli, and Bhatnagar

Name of students

Rank as per family income (R2)

R1 - R2(D)

D2

RAI

1

4

-3

9

BHATNAGAR

2

7

-5

25

TULI

3

6

-3

9

DESAI

4

5

-1

1

GUPTA

5

1

4

16

CHOUDHARY

6

2

4

16

SINGH

7

3

4

16 ∑D2=92

Spearman’s coefficient of rank correlation is R=1- 6∑D2÷ N (N2-1) = 1- 6*92÷7 (72-1) = 1 – 6*92÷7*48 =1- 1.6248 = -0.6428

Ex- 30 Quotation of index numbers of security prices of a certain joint stock company are given below:

Year

Debenture price

Share prices

1

97.8

73.2

2

99.2

85.8

3

98.8

78.9

4

98.3

75.8

5

98.4

77.2

6

96.7

87.2

7

97.1

83.8

Using rank correlation method, determine the relationship between debenture prices & share prices.

Solution: 7 yrs debenture and share prices data is given so N=7 Ranking from highest to lowest for both debenture & share prices, stabulating Debenture price

Debenture price rank(R1)

Share price Share price R1 - R2(D) rank (R2)

D2

97.8

5

73.2

7

-2

4

99.2

1

85.8

2

-1

1

98.8

2

78.9

4

-2

4

98.3

4

75.8

6

-2

4

98.4

3

77.2

5

-2

4

96.7

7

87.2

1

6

36

97.1

6

83.8

3

3

9 ∑D2=62

Coefficient of rank correlation is R =1-6∑D2÷ N (N2-1) =1-6*62÷7 (72-1)

=1-6*62÷7*48 =1-1.1071 =0.1071

Ex – 31 Calculate spearman’s coefficient of correlation between marks assigned to 10 students by judge X&Y in a certain competitive test as shown below: No.

1

2

3

4

5

6

7

8

9

10

Marks by judge X

52

53

42

60

45

41

37

38

25

27

Marks by judge Y

65

68

43

38

77

48

35

30

25

50

Solution: There are 10 students marked by judges X & Y so N = 10 Ranking students by marks given by both the judges from lowest to highest & tabulating. Marks by judge X

Rank by judge X(R1)

Marks by judge Y

Rank by judge Y(R2)

R1-R2 (D)

D2

52

8

65

8

0

0

53

9

68

9

0

0

42

6

43

5

1

1

60

10

38

4

6

36

45

7

77

10

-3

9

41

5

48

6

-1

1

37

3

35

3

0

0

38

4

30

2

2

4

25

1

25

1

0

0

27

2

50

7

-5

25

∑D2=76 Spearman’s coefficient of correlation R=1- 6∑D2÷ N (N2-1) =1-6*76÷10(102-1) =1-6*76÷990 =1-0.4604 =0.5396

Ranks correlation coefficient when ranks are equal:-This is a special case of finding rank correlation coefficient when ranks are not given at the same time two or more items in a series have equal ranks, in other words, they are repeated. The steps calculating rank correlation coefficient will be the same as the previous case (ranks not given) however, the ranks will be assigned in the following manner. For a set of two repeated items in a series , if one is getting assigned rank R then the other is supposed to get rank ( R+1) on the assumption that it has a marginally higher value than the other one (ranking consideration is on the basis of increasing values of items ) . In reality this is not the case hence both the repeated items are assigned the average of two ranks as R + (R+1) ÷ 2. The next rank for non repeated items in the series will be R+2. For repeated n times in the series, the average rank will be r+(r+1) +………..[r+(n-1)] & the subsequent rank for a non repeated item in the series will be (r+n) N The rank correlation coefficient in this case is given by formula. R=1-6[∑D2+1÷12(m13-m1)+1÷12(m23-m2)+1÷12(m33-m3)+……..] N (N2-1) Where D is the rank difference of N paired items in the two series X&Y, m1 & m2, are the repeated items In the two series X&Y respectively.

Ex – 32

Relationship between height and weight of a batch of 10 students is given in the following table:

Height (inches): 48

49

50

51

52

53

54

55

56

57

Weight (lbs): 100 105 105 104 111 115 125 130 132 137 There are 10 pairs of observations so N = 10 & m1= 2 as 105 lbs figures twice in the weight series . There is no repetition in the data for height, however, the weight 105 lbs is repeated. Arranging the weight in ascending order 100, 104, 105, 105, 111, 115, 125, 130, 132, 137. Ranks of 100 & 104 are 1 & 2. Ranks of the repeated weight 105 lbs will be 3+4 = 3.5. Tabulating the data. 2 X(height)

Ranks (R1)

Y(weight)

Ranks(R2)

R1-R2 (D)

D2

48

1

100

1

0

0

49

2

105

3.5

-1.5

2.25

50

3

105

3.5

-0.5

0.25

51

4

104

2

2

4

52

5

111

5

0

0

53

6

115

6

0

0

54

7

125

7

0

0

55

8

130

8

0

0

56

9

132

9

0

0

57

10

137

10

0

0 ∑D2=6.5

Rank correlation coefficient R=1-6[∑D2+1÷12 (m13-m1)+1÷12(m23-m2)+1÷12(m33-m3)+……..] N (N2-1) =1-6*[6.5+1÷12 (23-2) ÷10(102-1)

=1-6*(6.5+0.5) ÷ 10*99 =1-6*7÷990 =0.957

Ex -33 Calculate rank correlation coefficient of the following data: Marks in 1st subject: 40 46 54 60 70 80 82 85 85 90 95 Marks in 2nd subject: 45 45 50 43 40 75 55 72 65 42 70

Solution: There are 11 pairs of observations so N =11, marks in both subjects have been repeated. In 1st subject 85 is repeated and in 2nd subject 45 is repeated so m1 = 2 & m2 = 2. Arranging marks in ascending order 1st subject: 40 46 54 60 70 80 82 85 85 90 95 85 lies in 8th & 9th places so average rank = 8+9 = 8.5 2 2nd subject: 40 42 43 45 45 50 55 65 70 72 75 45 lies in 4th & 5th places so average rank = 4+5 = 4.5 2 1st subject (X)

Ranks (R1)

2nd subject (Y)

Ranks (R2)

R1-R2 (D)

(D2)

40

1

45

4.5

-3.5

12.25

46

2

45

4.5

-2.5

6.25

54

3

50

6

-3

9

60

4

43

3

1

1

70

5

40

1

4

16

80

6

75

11

-5

25

82

7

55

7

0

0

85

8.5

72

10

-1.5

2.25

85

8.5

65

8

0.5

0.25

90

10

42

2

8

64

95

11

70

9

2

4 ∑D2=140

Rank correlation coefficient R = 1-6[∑D2+1÷12 (m13-m1)+1÷12(m23-m2)+1÷12(m33-m3)+……..] N (N2-1) =1-6[140+1÷12(23-2)+1÷12(23-2) ÷11(112-1) =1-6(140+1÷2+1÷2) ÷11*120 =1-6*141÷11*120 =0.359

Ex 34 Obtain the rank correlation coefficient between the variables X & Y from the following pairs of observed values.

X = 50 55 65 50 55 60 50 65 70 75 Y = 110 110 115 125 140 115 130 120 115 160

Solution: There are 10 pairs of observations so N = 10. In X series 50 figures 3 times , 55 figures twice and 65 figures twice so m1 = 3 ,m2 = 2 & m3 = 2 In Y series 115 figures thrice & 110 figures twice so m4 = 3 & m5 = 2. Arranging in ascending order

X series: 50 50 50 55 55 60 65 65 70 75 50 lies in first, second and third places so their average rank = 1+2+3 = 2 3 55 lies in fourth & fifth places so their average rank = 4+5 = 4.5 2 65 lies in seventh & eighth places so their average rank = 7+8 = 7.5 2 Y series: 110 110 115 115 115 120 125 130 140 160 110 lies in first & second places so their average rank = 1+2 =1.5 2 115 lies in third , fourth & fifth places so their average rank = 3+4+5 = 4 3 X

Rank (R1)

Y

Rank (R2)

R1-R2 (D)

D2

50

2

110

1.5

0.5

0.25

55

4.5

110

1.5

3

9

65

7.5

115

4

3.5

12.25

50

2

125

7

-5

25

55

4.5

140

9

-4.5

20.25

60

6

115

4

2

4

50

2

130

8

-6

36

65

7.5

120

6

1.5

2.25

70

9

115

4

5

25

75

10

160

10

0

0 ∑D2=134

Rank correlation coefficient R= 1-6[∑D2+1÷12 (m13-m1)+1÷12(m23-m2)+1÷12(m33-m3)+……..] N (N2-1)

=1-6[134+1÷12(33-3)+1÷12(23-2)+1÷12(23-2)+1÷12(33-3)+1÷12(23-2)] ÷10(102-1) =1-6[134+2+1÷2+1÷2+2+1÷2] ÷10*99 =1-6*139.5÷990 =0.1545

Ex 35 Calculate the coefficient of correlation from the following data by the method of rank differences. Rank of X: 10 4 2 5 8 5 6 9 Rank of Y: 10 6 2 5 8 4 5 9

Solution: N= 8 as there are only 8 pairs of observations & in the data ranks are mentioned as 9 & 10 also. This is not possible. If rank correlation coefficient is to be calculated then ranks in the data are to be treated as observations & not ranks. Then ranks are to be assigned to these values. The values in ascending order will be: X series: 2 4 5 5 6 8 9 10 & Y series: 2 4 5 5 6 8 9 10 Both X and Y series have repeated items 5 so m1= 2 & m2 = 2. in both the series they are placed at third and fourth positions. Hence for both the series their average rank will be 3+4 = 3.5 2 X

Rank (R1)

Y

Rank (R2)

R1-R2 (D)

D2

10

8

10

8

0

0

4

2

6

5

-3

9

2

1

2

1

0

0

5

3.5

5

3.5

0

0

8

6

8

6

0

0

5

3.5

4

2

1.5

2.25

6

5

5

3.5

1.5

2.25

9

7

9

7

0

0 ∑D2=13.5

Rank correlation coefficient R = 1-6[∑D2+1÷12 (m13-m1)+1÷12(m23-m2)+1÷12(m33-m3)+……..] N (N2-1) =1-6[13.5+1÷12(23-2)+1÷12(23-2)] ÷8(82-1) =1-6[13.5+0.5+0.5] ÷8*63 =1-6*14.5÷8*63 =0.8273

Ex 36 If the coefficient of rank correlation between debenture prices & share prices of a company found to be 0.143. if the sum of the squares of the differences in ranks is 48. Find the value of N?

Solution: ∑D2=48 & R=0.143 The rank correlation coefficient R=1- 6∑D2÷ N (N2-1) Substituting the value from data 0.143=1-6*48÷ N (N2-1) Or 6*48÷ N (N2-1)=1-0.143 =0.857

N (N2-1)=6*48÷0.857 =336=7*48=7*(49-1)=7(72-1) Therefore, N=7

Concurrent deviation method: This is the easiest of all the methods of studying correlation. The basis of this method is to study the direction of change, in other words, to find the increase or decrease in value of the variables X and Y. Then the concurrent deviation which is the product of the changes in variables X and Y is observed ; only the positive sign or negative sign is considered and not the actual change in magnitude. The coefficient of correlation by the concurrent deviation method is given by rc= ±√± (2c-n), where c donates the numbers of concurrent.

n Deviation is the number of +ve signs only obtained as a product of the deviation dx and dy (signs only and not the actual deviation values) in variables X&Y respectively n is one less than N, the number of pairs of observations. This is due to the fact that in both X and Y series, no value preceedes the first place value so change (deviation) can not be found. Calculated by this method, the value of the correlation coefficient also termed the coefficient of concurrent deviation lies between +1 & -1.

Steps: 1. In the X variable find the deviation or the direction of change Dx. The first place change cannot be determined due to non- existence of predecessor to the first place value so it is left blank. Compare the first & second place values of the X series. If the second place value is more than the first place value, mark +ve sign in the second place of the deviation Dx column. In case the second place value is less than the first place value, mark –ve sign & if both values are equal mark zero in the second place of the deviation Dx column. In the same manner the second and third place & subsequently all

the remaining adjacent values of the X variables are to be compared and accordingly marked in the deviation Dx column. 2. The same treatment is to be given to values of the Y variable & the +ve, -ve sign or zero as the case may be marked in the deviation Dy coloumn. 3. Find the product of Dx & corresponding Dy marking & record them in the DxDy coloumn. 4. Add all the +ve signs in DxDy coloumn to get ∑+ve signs = C 5. To obtain the value of the coefficient of correlation substitute the values of C & n in the formula Rc = ±√± (2c-n),

n The +ve & -ve signs inside and outside the square root sign have a significance. The square root of +ve numbers are real numbers which could be either +ve or –ve but of same magnitude. Square root of –ve numbers are not real numbers. If 2c-n is –ve sign then 2c-n will also be –ve n as n as a +ve number. So to make 2c – n positive the negative 2c-n has to be multiplied by -1 otherwise real values of the correlation coefficient cannot be obtained. Once the negative sign inside the square root sign has been considered, the negative sign outside the square root sign will have to be considered thereby establishing –ve correlation. In other words, if 2c-n is –ve, the correlation is –ve otherwise +ve correlation exists. There is absolute absolutely no ambiguity on account of the +ve sign & -ve sign.

Ex 37 Calculate the coefficient of concurrent deviation from the following: X: 60 55 50 56 30 70 40 35 80 80 75 Y: 65 40 35 75 63 80 35 20 80 60 60

Solution:

X

Dx

60

Y

Dy

Dx.Dy

65

55

-

40

-

+

50

-

35

-

+

56

+

75

+

+

30

-

63

-

+

70

+

80

+

+

40

-

35

-

+

35

-

20

-

+

80

+

80

+

+

80

0

60

-

0

75

-

60

0

0

C=∑Dx.Dy=8 N=11, n=N-1=10, c=8 Cofficient of concurrent deviation Rc=±√± (2c-n), n =±√±(2*8-10) 10 =±√6÷10 =0.7747

Ex 38 Calculate the coefficient of concurrent deviation from the following data: Price: 368 384 385 361 347 384 395 403 400 385 Imports: 22 21 24 20 22 26 24 29 28 27

Solution: X

Dx

368

Y

Dy

Dx.Dy

22

384

+

21

-

-

385

+

24

+

+

361

-

20

-

+

347

-

22

+

-

384

+

26

+

+

395

+

24

-

-

403

+

29

+

+

400

-

28

-

+

385

-

27

-

+

C=∑Dx.Dy=6 N=10, n= N-1=10-1=9, c=6 Cofficient of concurrent deviation Rc=±√± (2c-n), n =±√±(2*6-9) 9 =±√±1÷3 =0.5773

Ex 39 Calculate the coefficient of correlation using the method of concurrent deviation from te following data: Year:

1998 1999 2000 2001 2002 2003 2004

Supply: 150 154

160 172 :160 165 180

Demand: 200 180

170 160 190 180 172

Solution: Year

Supply X

Dx

1998

150

1999

154

+

2000

160

2001

Demand Y

Dy

Dx.Dy

180

-

-

+

170

-

-

172

+

160

-

-

2002

160

-

190

+

-

2003

165

+

180

-

-

2004

180

+

172

-

-

200

C=∑Dx.Dy=0 N=7, n=N-1=7-1=6, c=0 as there is not a single positive +ve sign Cofficient of concurrent deviation Rc=±√± (2c-n), n =±√±(2*0-6÷6) =±√±(-6÷6) =±√±(-1) Square root of -1 is not a real number so consider –ve sign inside & outside the square root sign Rc= -√-(-1)= -√1= -1 This indicate perfect –ve correlation between supply & demand.