Lesson 4 Measure of Central Tendency

February 18, 2019 | Author: rahmanzaini | Category: Mode (Statistics), Arithmetic Mean, Median, Skewness, Symmetry

Share Embed Donate

Report this link

Short Description

central tendency...

Description

LESSON 4 Measures of Central Tendency

Introduction

Histograms and polygons provide a general idea as to how a data is distributed. When comparisons are to be made between data or further statistical analysis is to be done, exact measures are required to describe the characteristics of a data. These numerical measures are also referred to as summary statistics.

In this lesson and the following, we will discuss two measures that can be used to describe the characteristics of a distribution. They are measures of central tendency and measures of dispersion.

LEARNING OUTCOMES

Upon the completion of this lesson, you should be able to: to: a) discuss mean, mode and median as measures of central tendency; b) discuss the advantages and disadvantages of each of the central tendency values; c) find the mean, mode and median for ungroup and grouped data from a given data set.

1

Measure of Central Tendency

When statisticians study a group of measurements, they try to determine which measure is most representative of the group. The score about which most of the other scores tend to cluster is a measure of central of central tendency tendency.. Three measures of central tendency are the mode, the median and the mean. A measure for central tendency is an average that represents the data. It pinpoints the center of the data. These measures are commonly known as averages. We will discuss three averages. They are the i)

arithmetic mean (or simply the mean),

ii)

median, and

iii)

mode.

Let us see how these measure are calculated from raw data, ungrouped and grouped frequency distribution. Note that all measures presented here correspond to measure made from sample data.

Raw Data and Ungrouped Frequency Distributions

Arithmetic mean

If we have a sample of n observations, x1 , x2 , x3 , ……..,xn , the sample mean, denoted by

X,

is defined as the sum of all observations divided by the sample size. n

x X=

i

i 1

n

Referring to example 3, the mean number of train stations a passenger passes b efore alighting the train is

X

3  4  4  2  2  4 1 2  2  0  3  2  3  2 1 3  2  2 1 2 20

2



45 20

 2.25

This means that on average, a passenger would pass approximately 2 train stations before getting off the train.

From the ungrouped frequency distribution in Table 1, we calculate the mean the following formula: k

 f i x i X

k

i 1

where n   f i and k = the number of class intervals.

,

n

i 1

Table 1 Number of train stations passed, X 0

Number of passengers, f

fX

1

0

1

3

3

2

9

18

3

4

12

4

3

12

Total

Hence, we have the mean, X 

f =20

45 20

fx =45

 2.25

In obtaining mean, all the observations from the sample or population are considered. Therefore, if there exist extreme values (either too big or too small), then, mean is not a suitable measure to represent the distribution of the data. The m edian would be a better measure for central tendency.

Example 1

The marks of five candidates in a mathematics test with a maximum possible mark of 20 are given below. 15

13

19

18

14

Find the mean value.

3

Solution:

So, the mean value is 15.8.

Example 2

A survey was taken in Mathematics class regarding the number of story books read b y each student in January. The table shows the class data with the frequency of responses. The mean of this data is 2.5. Find the value of k  in the table. Books

1

2

3

4

5

Frequency

5

k

8

4

1

Solution

1(5)  2(k )  3(8)  4(4)  5(1) 5  k  8  4  1 50  2k 18  k

 2.5

 2.5

50  2k  45  2.5k 0.5k  5 k  10

4

Median

The median, denoted by

~ X,

is the middle value of the observations that has been

arranged in an ascending or descending order. If the number of observations is odd, the median is the middle value, but if the number of observation is even, then the median is the mean of the two middle values.

Let’s take the data from example 3 and arrange them in the ascending order as follows.

0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4 In this case, the number of observation is even ; therefore the middle value is the midway between the tenth and the eleventh value. Hence,

~ X=

22 2

 2 . In other words, 50% of

the observation will be below the median and the other 50% will be above the median. Since median divides the observation into two, it is not affected by extreme values. In an ungrouped frequency distribution, the median is obtained by looking at the point where 50% of the frequency lies. The same value is obtained using the ungrouped frequency distribution in Table 1.

Example 3

The marks of five candidates in a geography test for which the maximum possible mark was 20 are given below: 19

18

16

15

20

Find the median mark. Solution:

Arrange the marks in ascending order of magnitude: 15

16

18

19

20

The third score, 18, is the middle one in this arrangement. Median = 18

5

Note:

In general:

If the number of values in the data set is even, then the median is the average of the two middle values.

Example 4

Find the median of the following scores: 11

17

15

20

9

12

Solution:

Arrange the score values in ascending order of magnitude: 9

11

12

15

17

20

There are 6 scores in the data set.

The third and fourth scores, 12 and 15, are in the middle. That is, there is no one middle value.

6

Note:

Half of the values in the data set lie below the median and half lie above the median.

Mode

The mode, denoted by

X, ˆ

is the most frequently occurring value in the observation. For

the data in example 3, we find that the most frequently occurring value is 2. This means that most passengers pass 2 train stations before alighting.

In an ungrouped frequency distribution, we determine the mode by looking at the highest frequency, in this case 9. Hence the mode is 2.

The concept of mode is easy to understand and simple to obtain. Mode is not affected by extreme values. The disadvantage of this measure for central tendency is that it might not exist. A set of observations can have no mode, one mode, two or more modes. Example 5

The marks awarded to seven pupils for an assignment were as follows: 19

15

19

16

13

20

19

a. Find the median mark. b. State the mode.

7

Solution:

a. Arrange the marks in ascending order of magnitude: 13

15

16

19

19

19

20

Note:

The fourth score, 19, is the middle data value in this arrangement. Median = 19 [19 is the middle data value] b. 19 is the score that occurs most often. Mode = 19

Grouped Frequency Distributions

For data that have been summarized in grouped frequency distributions, the measures of central tendency computed are only estimates of the true value. This is because accuracy has been lost when summarizing the data.

Arithmetic Mean

In finding the mean from a grouped frequency distribution, we choose one value from each class interval as a representative. This value is the class mark. We denote the class mark as m. The formula for the mean is given below.

8

k

 f  i mi =

i 1

k

,

n

where n =

 f and k = number of class intervals i

i 1

Using the grouped frequency distribution in Table 1, the mean speed of the 55 cars can be estimated. We add more columns to the table so that it is easier to have the values to be substituted in the formula. Speed (km/h), X 45 - 49 50 - 54 55 - 59 60 - 64 65 - 69 70 - 74 75 - 79 Total

X

Number of cars, f 4 14 19 7 5 4 2 f =55

=

3210 55

Class mark, x

fx

47 52 57 62 67 72 77

188 728 1083 434 335 288 154 fx=3210

 58.36 km/h

Notice that when we compare this value with the actual sample mean, there is a difference. Obviously, this is due the lost of accuracy. Example 6

Work out an estimate for the mean height.

Height (cm), X 101 - 120 121 - 130 131 - 140 141 - 150 151 - 160 161 - 170 171 - 190 Total

mean =

X

=

Number of people (f), 1 3 5 7 4 2 1 f =23

 fx  3316.5  144cm (3sf) 23  f   9

Mid Point (x) 110.5 125.5 135.5 144.5 155.5 165.5 180.5

fx 110.5 376.5 677.5 1018.5 622 331 180.5 fx=3316.5

Median

To estimate the median from a grouped frequency distribution, we must first locate the class interval containing the

n 2

th observation. We call this class interval the median

class. The estimated value of median can be obtained using the following formula. n ~ X  2

where

 =

  f m1

f m

.C

lower class boundary of the median class

 f m1 = cumulative frequency of classes before the median class f m

= frequency of the median class

c = width of the median class

Referring to the grouped frequency distribution in Table 1, the median class is 55 –  59. Speed (km/h), X 45 - 49 50 - 54 55 - 59 60 - 64 65 - 69 70 - 74 75 - 79 Total

Number of cars, f 4 14 19 7 5 4 2 f =55

Class mark, x

fx

47 52 57 62 67 72 77

188 728 1083 434 335 288 154 fx=3210

The following information are obtained and substitute them in to the formula for the value of the median. n

=

2

55 2

= 27.5

= 54.5  f m1 = 4 + 14 = 16 

= 19 c=5

f m

10

27.5  18 ~ Median speed, X  54.5  .5  57 km / h . 19 This means that at most 50% of the cars are being driven at 57 km/h or less and at most 50% of the cars are being driven above 57 km/h.

Mode

The class interval with the highest frequency is identified as the modal class. The formula for mode for the grouped frequency distribution is X 

1

ˆ

where

 =

1   2

.c

lower class boundary of the modal class

1

= frequency of the modal class –  frequency of the class before the modal class

2

= frequency of the modal class –  frequency of the class after the modal class

c = width of the modal class

The modal class for Table 1 is 55 –  59. It is a coincidence that the median and modal class is the same for this particular example. Do not make a generalization. There are times where the median and modal class is not the same. We obtain the following information to be substituted into the formula for mode.  =

54.5

1

= 19 –  14 = 5

2

= 19 –  7 = 12

c=5 X  54.5 

5 5  12

.5  55.97 km / h

This means that most cars are being driven at approximately 56 km/h, which is below the city speed limit. When computing the mean, all the observations are taken into consideration, but computing the median only involves the scores in the middle of the distribution. Hence,

11

whenever we have extreme scores in the distribution, or when the distribution is skewed, the median is a better measure for the average.

Advantages of the MEAN: 

The calculation of arithmetic mean is based on all values given in the data set.



The calculation of arithmetic mean is simple and it is unique, that is, every data set has one and only one mean.



The arithmetic mean is reliable single value that reflects all values in the data set.



can be used for further statistical calculations and mathematical manipulations.

Disadvantages of the MEAN 

Easily affected by extreme values



In grouped data with open ended class intervals, the mean cannot be computed



Not Appropriate with Highly Skewed Data

Advantages of the MEDIAN 

It is very simple to understand and easy to calculate. In some cases it is obtained simply by inspection.



Median lies at the middle part of the series and hence it is not affected b y the extreme values.



Can be computed even for grouped data with open ended class intervals In grouped frequency distribution it can be graphically located by drawing ogives.

 

It is especially useful in open-ended distributions since the position rather than the value of item that matters in median.

12

Disadvantages of Median 

In simple data set, the item values have to be arranged. If the series contains large number of items, then the process becomes tedious.



It is a less representative average because it does not depend on all the items in the series.



Observations from different data sets have to be merged to obtain a new median, whether group or ungrouped data are involved



In simple data set, having even number of items, median cannot be exactly found.



Moreover, the interpolation formula applied in the continuous series is based on the unrealistic assumption that the frequency of the median class is evenly spread over the magnitude of the class interval of the median group.

Advantages of the MODE



Mode value is easy to understand and to calculate. Mode class can also be located by inspection.



The mode is not affected by the extreme values in the distribution. The mode value can also be calculated for open-ended frequency distributions.



The mode can be used to describe quantitative as well as qualitative data. For example, its value is used for comparing consumer preferences for va rious types of products, say cigarettes, soaps, toothpastes, or other products

Disadvantages of the MODE



Mode is not a rigidly defined measure as the re are several methods for calculating its value.



It is difficult to locate modal class in the case of m ulti-modal frequency distributions.



Mode is not suitable for algebraic manipulations.

13

Skewness Definition

Skewness in statistics has been developed with respect to symmetry; in fact, it is the opposite of symmetry. Symmetry is a concept that is used in defining distribution in terms of graphical representation. A distribution is said to be symmetric if it looks the same from both left and right side of the center point (refer figure 1). The center point is called the axis of symmetry. This graph shows an example of a symmetric distribution.

Figure 1 Here the measures of central tendency like mean, median and mode will always be equal to each other and the axis of symmetry which is the ordinate at the mean will divide the distribution into two equal parts such that one side will b e a mirror image of the other.

So we can define skewness as a measure of asymmetry of the distribution that means, it helps to measure how much the distribution is not symmetric. It describes which side of the distribution has longer or shorter tail.

On the basis of the shapes interpreting statistic skewness can be d one in three ways

Positi ve skewness: If the right tail is longer than the left tail in the graph of the

distribution, the function is said to have positive skewness (refer figure 2). The presence of the extreme observations on the right hand side of a distribution makes it positively skewed.

14

So, if the mean > median> mode in any distribution, then it can be said to follow positive skewness.

Figure 2 When the distribution is skewed to the right (positively skewed), mean value is the largest among the three averages, followed by the median and then the mode. Why is this so? This is because the mean value is affected by the extreme values as compared to median and mode. Therefore, in when we have a positively skewed distribution, the mean value is not a suitable average to describe the distribution. The median and the mode would be more appropriate.

If the left tail is longer than the right tail in the graph of the Negative skewness: distribution, then the function will have negative skewness (refer figure 3).

. The

presence of the extreme observations on the left hand side of a distribution makes it negatively skewed.

15

So, if the mean < median< mode in any distribution, then it can be said to follow negative skewness.

Figure 3 If the two tails are of the same length and shape, then we say that the Zero skewness: function has zero skewness (figure 4). Then the distribution will be normal and symmetric.

Figure 4

16

It is useful to report all three measures because their relative positions can provide some idea about the shape of the distribution. We give some of the common cases encountered. In summary the different types of graphical representation of skewness are presented below

17

Exercise 1

1. The table displays the frequency of scores on a Science quiz (max score 10). Find the median of the scores. Score

5

6

7

8

9

10

Frequency

1

5

8

14

12

7

2. The table displays the number of cars owned in a family among students in Form 5 Science 1. Find the mean, median and mode of the cars owned per family for this data set. Express answers to the nearest hundredth. Cars owned

0

1

2

3

4

5

Frequency

2

5

4

6

10

8

3. Four students take an IQ test. Their scores are 96, 100, 106, 114. Find the mean and median scores. 4. Find the mean, median and mode for this grouped data of test scores.

Scores

Frequency

65

2

70

3

75

2

80

5

85

8

90

7

95

5

100

3

18

5. The table shows the number of hours (x) children spent on watching television in a week Class Interval 10  x < 15 15  x < 20 20  x < 25 25  x < 3 0

Mid-point (x)

Frequency (f) 42 38 45 38  f   =

fx

 fx =

First complete the table. Using the information answer the questions: (a) Find the modal class. (b) Estimate the median hours of watching TV. (whole number) (c) Estimate the mean hours of watching TV (2 d.p.) 6. The table shows the number of visits(v) to the doctor patients at a surgery make in a year. Class Interval 0  x  5 5  x < 10 10  x < 15

Mid-point (x)

Frequency (f) 5 47 11  f   =

fx

 fx =

First complete the table. Using this information answer the following questions. (a) What is the modal class? (b) Estimate the median number of visits made to the surgery in a year. (whole number) (c) Estimate the mean number of visits made to the surgery in a year. (2 d.p.) 7. The following frequency distribution shows the quiz scores of a sample of students: Score

Frequency

14 - 18

2

19 - 23

5

24 - 28

12

29 - 33

1

For the above data, compute the following. a. The mean

b. The standard deviation

19

20

Lesson 4 Measure of Central Tendency

Short Description

Description

Comments

We need your help!