The Spearman

July 16, 2022 | Author: Anonymous | Category: N/A
Share Embed Donate


Short Description

Download The Spearman...

Description

 

The Spearman's Rank Correlation Coefficient is used to discover the strength of a link between two sets of data. This example looks at the strength of the link between the price of a convenience item (a 50cl bottle of water) and distance from the Contemporary Art Museum in El Raval, Barcelona. Example: The Example:  The hypothesis tested is that prices should decrease with distance from the key area of gentrification surrounding surrounding the Contemporary Art Museum. The line followed is Transect 2 in the map below, with continuous sampling of the price of a 50cl bottle water at every convenience convenience store. Hypothesis   Hypothesis We might expect to find that the price of a bottle of water decreases as distance from the Contemporary Art Museum increases. Higher property rents close to the museum should be reflected in higher prices in the shops. The hypothesis might be written like this: The price of a convenien convenience ce item decreases as distance from the Contemporary Art Museum increases. The more objective scientific research method is always to assume that no such price-distance relationship exists and to express the null hypothesis as: as:there there is no significant relationship between the price of a convenience item and distance from the Contemporary Art Museum.  Museum.   Data collected (see data table below) suggests a fairly strong negative relationship as shown in this scatter graph: The scatter graph shows the possibility of a negative correlation between the two variables and the Spearman's rank correlation technique should should be used to see if there is indeed a correlation, and to test the strength of the relationship. Spearman’s Rank correlation coefficient   A correlation can easily be drawn as a  a  scatter graph, graph, but the most precise way to compare several pairs of data is data is to use a statistical test - this t his establishes whether the correlation is really significant or if it could have been the result of chance alone. Spearman’s Rank correlation coefficient is a technique which c an be used to summarise the strength and direction (negative or positive) of a relationship between two variables.

The result will always be between 1 and minus 1.   Create a table from your data.



 



Rank the two data sets. Ranking is achieved by giving the rranking anking '1' to the biggest number in a column, '2' to the second biggest value and so on. The smallest value in the column will get the lowest ranking. This should be done for both sets of measurements.

 



Tied scores are given the mean (average) rank. For example, the three tied scores of 1 euro in the example below are ranked fifth in order of price, but occupy three positions (fifth, sixth and seventh) in a ranking hierarchy of ten. The mean rank in this case is calculated as (5+6+7) ÷ 3 = 6.

 



Find the difference in the ranks (d): This is the difference between the ranks of the two values on each row of the table. The rank of the second value (price) is subtracted from the rank of the first (distance from the museum).

 



Square Squa re the the differen differences ces (d²) (d²) To remov remove e negative negative value values s and then sum them them ( Convenience Store 

Distance from CAM (m)

Rank distance

d²).

Price of 50cl bottle (€)   (€)

Rank price

Difference between ranks (d)

d² 

1

50

10

1.80

2

8

64

2

175

9

1.20

3.5

5.5

30.25

3

270

8

2.00

1

7

49

4

375

7

1.00

6

1

1

5

425

6

1.00

6

0

0

6

580

5

1.20

3.5

1.5

2.25

7

710

4

0.80

9

-5

25 25

8

790

3

0.60

10

-7

49

9

890

2

1.00

6

-4

16 16

10

980

1

0.85

8

-7

49 d² = 285.5  285.5 

Correlation   Data Table: Spearman's Rank Correlation 

 

 



Calculate the coefficient (R s) using the formula below. The answer will always be between 1.0 (a perfect positive correlation) and -1.0 (a perfect negative correlation). When written in mathematical notation the Spearman Rank formula looks like this :

Now to put all these values into the formula.

 



Find the value of all the d² values by adding up all the values in the Difference² column. In our example this is 285.5 285.5.. Multiplying this by 6 gives 1713 1713..

 



Now for the bottom line of the equation. The value n is the number of sites at which you took measurements. This, in our example is 10 10.. Substituting these values into

 



We now have the formula:

R s =

n³ - n we

get 1000 - 10  10 

1 - (1713/990) which gives a value for

R s:

1 - 1.73 = -0.73 -0.73   What does this R s value of -0.73 mean?  mean?  The closer R s is to +1 or -1, the stronger the likely correlation. A perfect positive correlation is +1 and a perfect negative correlation is -1. The R s value of -0.73 suggests a fairly strong negative relationship.

A further technique is now required to test the significance significance of  of the relationship. The R s value of -0.73 -0.73 must  must be looked up on the Spearman Rank significance table below as follows:

 



Work out the 'degrees of freedom' you need to use. This is the number of pairs in your sample minus 2 (n-2). In the example it is 8 (10 - 2).

 

Now plot your result on the table.

 

If it is below the line marked 5%, then it is possible your result was the product of





chance and you must reject the hypothesis.

 



If it is above the 0.1% significance level, then we can be 99.9% confident the correlation has not occurred by chance.

 



If it is above 1%, but below 0.1%, you can say you are 99% confident.

 



If it is above 5%, but below 1%, you can say you are 95% confident (i.e. statistically there is a 5% likelihood the result occurred by chance).

In the example, value 0.73 a significance levelthe of slightly lessgives than 5%. That means that the probability of the relationship you have found being a chance event is about 5 in a 100. 100. You are 95% certain that your hypothesis is correct. The reliability of your sample can be stated in terms of how many researchers completing the same study as yours would obtain the same results: 95 out of 100. Graph of significance levels for Spearman's Rank correlation coefficients using Student's t   distribution distribution

 



The fact two variables correlate cannot prove anything - only further research can actually prove that one thing affects the other.

 



Data reliability is related to the size of the sample. The more data you collect, the more reliable your result. 

 

 

 

 

Minimum Sample Size Calculation The larger the size of the sample, the greater is the probability that it accurately reflects the distribution of the parent population. The example below shows shows how many pebble long axes are required to be measured at a beach site to obtain an average (or mean) at the 99% confidence level. The formula uses the standard deviation (the measure of the spread of data around the mean). What is the expected pebble long axes size distribution?  distribution?   There are, as is well known, lies, damned lies and statistics. And within statistics there is the bell curve. This is the shape of the frequency distribution one gets when conducting measurements of just about anything in the natural world. It first came to prominence in the early nineteenth century when Adolph Quetelet, the Belgian Astronomer Royal, collected data on the chest measurements of Scottish soldiers and the heights of French soldiers, and found that when both sets of measurements were plotted they tended to cluster in a symmetrical shape around a mean. Or, less technically, most soldiers were in a height range fairly close to the average. The bell curve became so ubiquitous in measurements of natural phenomena that it was eventually christened the 'normal distribution', and it has conditioned our thinking about statistical data ever since. The example below simulates how the random distribution of dropping balls creates a bell-curve. At first there does not seem to be any pattern but after a few minutes the stacks conform to the superimposed curve What is the standard deviation?  deviation?  The standard deviation is deviation is a statistic that tells you how tightly the pebbles sizes are clustered around the mean. When the sizes are tightly clustered and the distribution curve is steep (see graph below), the standard deviation is small. When the examples are spread apart and the distribution curve is relatively flat, that tells you that there is a relatively large standard deviation.

Normal distribution curve (bell curve) Key to graph colours above  above   Colour   Colour

Standard Deviation  Deviation 

% of Population   Population

One standard deviation away from the mean in either direction on the horizontal axis

68%

Two standard deviations away from the mean

95%

Three standard deviations away from the mean

99%

Why is this useful? useful?   Smaller standard deviations reflect more clustered data. More clustered data means less extreme values. A data set with less extreme values has a more reliable mean. The standard deviation is therefore a good measure of the reliability of the mean value. The formula is as follows:

 

 

it?   Is there an easy way to calculate it?  The Microsoft Excel programme  programme  will automatically calculate the standard deviation and mean for a set of data listed in a spreadsheet column. Method:

         



List data set in a single column



Click on the empty cell below the last data item



Open INSERT menu > FUNCTION > STDEV > click OK



The standard deviation is then shown and will appear in the empty cell.



The excel screen example below is for a data set of 3 items

Example The standard deviation for a pebble data set is shown below: 30 pebble long axes, Beach 18, Sitges  Sitges  Site 1  1 

Pebble number

Long Axis (cms)

1

10

2

9

3

8

4

8

5

16

6

12

7

8.5

8

10

9

12

10

9

11

13

12

14

13

10

14

14

15

17

16

12

17

6

 

 

18

17

19

9

20

5

21

10

22

7.5

23

13

24

13

25

7.5

26

15

27

12

28

8

29

22

30

16

Mean Mean  

11.20 11.20  

Standard Deviation Deviation  

3.81 3.81  

A 99% sample size confidence level with a mean pebble long axis within +/+/ - 0.1cm is calculated using the following formula:

i.e. 11.43 0.1   0.1 = 114.3  114.3  n= 114.3²  114.3²  Minimum sample required = 13,064  13,064  The time taken to measure over 13,000 pebbles suggests it is better to accept a lower level of confidence, and at the 95% level, with a mean pebble long axis within +/- 0.5 cm, a sample size of 30 is still inadequate. This is calculated as follows:

i.e. 7.62 0.5   0.5 = 15.24  15.24  n= 15.24²  15.24²  Minimum sample required = 232.26  232.26 

You may, given time constraints, have to accept a 68% level of confidence, with a mean pebble long axis within +/0.5 cm. The minimum sample size is calculated as follows:

 

  i.e. 3.81 0.5   0.5 = 7.62 7.62   n= 7.62²  7.62²  Minimum sample required = 58.06  58.06  It is therefore necessary to measure a minimum of 58 pebble long axes, given the pilot data at this site.

 

structure data for selected urban areas can be plotted on a three-sided Example: Service structure  three -sided triangular graph. The important features of a triangular graph are:

     



Each axis is divided into 100, representing percentages percentages..



From each 100-0% axis, lines are drawn at angles of 60 degrees to carry the values.



The data used must be in the form of three components, each component representing representing a percentage value, and

the three component percentage values must add up to 100 per cent. The position of the plots indicates the relative dominance of each of the three components and the value of the graph arises in giving a quick visual comparison of contrasting component component dominance for different areas. It is particularly useful in identifying changes over time, since a position on the graph will change as the relative dominance of the components change. The graph can be used to show contrasting service structures for 4 locations in El Raval, an inner-city area of Barcelona which has been the subject of radical urban reform. The choice of the three graph components is important and must be in the context of the investigation. An example of data from one location (El Raval Site 2) is shown in map 1 below, and this has been used along with data from three other sites (1,3 and 4) to t o compile the triangular graph.

Key   Key Gentrification Immigrant Services Local Services Professional Services Services of Poverty Training Centres Workshops

2   Map 1: Service Structure in El Raval, Site 2 Service Structure Data Summary Chart for Sites1-4  Sites1-4  El Raval Service Structure  Structure  Service   Service

Site 1  1 

Site 2  2 

Site 3  3 

Site 4  4 

% Gentrification

60

11.4

3

0

% Immigrant Services

5

15.2

20

50

% Other Local Services

35

73.4

76

50

Total   Total

100

100

100

100

Data example is for training purposes only. Its accuracy cannot be guaranteed.

 

 

Triangular graph to show the contrasting service structure structure for four areas of El Raval  Raval  

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF