Please copy and paste this embed script to where you want to embed

This article was downloaded by: [University of Otago] On: 31 December 2014, At: 11:16 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

The Journal of Experimental Education Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/vjxe20

The Advanced Raven’s Progressive Matrices Steven M. Paul

a

a

University of California, Berkeley Published online: 16 Apr 2014.

To cite this article: Steven M. Paul (1986) The Advanced Raven’s Progressive Matrices, The Journal of Experimental Education, 54:2, 95-100, DOI: 10.1080/00220973.1986.10806404 To link to this article: http://dx.doi.org/10.1080/00220973.1986.10806404

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Downloaded by [University of Otago] at 11:17 31 December 2014

The Advanced Raven's .Progressive Matrices: Normative Data for an American University Population and an Examination of the Relationship with Spearman's g STEVEN M. PAUL University of California, Berkeley

ABSTRACT Normative data for the Advanced Raven's Progressive Matrices are presented based on 300 University of California, Berkeley, students. Correlations with the Wechsler Adult Intelligence Scale and the Terman Concept Mastery Test are reported. The relationship between the Advanced Raven's Progressive Matrices and Spearman's g is explored.

THE RAVEN'S PROGRESSIVE MATRICES (RPM) are the best known and most widely used culture reduced tests of mental ability. British geneticist Lionel Penrose and British psychologist 5. C. Raven were the first to present perceptual analogy 'and inductive reasoning problems in the form of a matrix. In their matrices the perceptual analogies simultaneously involve both horizontal and vertical transformations. The variety of figures, relationships, and transformations are virtually limitless. Figures may increase or decrease in size, elements may be added or subtracted, shaded or unshaded, flipped, rotated, mirror imaged, or show many other progressive changes in pattern. In each case, the lower right corner of the total matrix is missing, and the subject must select the best one of the six or eight multiplechoice alternatives to fill the empty corner.

Raven described the Progressive Matrices as "a test of a person's present capacity to form comparisons, reason by analogy, and develop a logical method of thinking, regardless of previously acquired information'? (Raven, 1938, p. 12). He was responsible for publishing the first Progressive Matrices Test and its subsequent improvements and extensions (Raven, 1938, 1947, 1960). There are three forms of the RPM now in use: Standard Progressive Matrices (SPM), Colored Progressive Matrices (CPM), and Advanced Progressive Matrices (APM). Considerable research has been conducted involving the SPM and CPM, but little information is available concerning the APM. Adequate standardization norms are lacking in the United States. Research is notably absent in relation to university students, the type of population the APM is best suited to measure. The Standard Progressive Matrices consists of 60 items grouped in five sets (A, B, C, D, E) of 12 items each. Each set involves different principles of matrix transformation and within each set the items become progressively more difficult. It was designed to cover the widest possible range of mental ability and to be equally useful with persons of all ages, whatever their education, nationality, or physical condition. The scale is intended to cover the entire range of intellectual development starting with the time a child is able to grasp the idea of finding a missing piece to complete a pattern. It is sufficiently long to assess a person's maximum capac-

Downloaded by [University of Otago] at 11:17 31 December 2014

JOURNAL OF EXPERIMENTAL EDUCATION ity to form comparisons and reason by analogy without being overly taxing or unwieldy. A person's total score provides an index of his intellectual ability. The scores obtained by adults tend to cluster in the upper half of the scale. The Colored Progressive Matrices, Sets A, Ab, and B, were devised as a test for young children and old people, for anthropological studies, and for clinical work. It can be used with people who, for whatever reason, cannot understand or speak the English language, suffer from physical disabilities, are intellectually subnormal, or have deteriorated. To make the test independent of verbal instructions, the problems are printed on colored backgrounds and the scale is arranged so that it can be presented in the form of illustrations printed in a book or as boards with movable pieces. Success in Set Ab depends on the comprehension of discrete figures as spatially related "wholes" and in combination with Sets A and B adequately covers the cognitive processes of which children under 11 years of age are usually capable. The Advanced Progressive Matrices, Sets I and 11, were constructed as a test of intellectual efficiency that can be used with people of more than average intellectual ability and that will differentiate clearly between individuals of even superior ability. The difficulty level of the APM is such as to make it unsuitable for persons scoring below a raw score of about 50 on the SPM. For the general adult population the APM has too small a range of scores to be useful. The APM is intended for intellectually superior youths and adults, university students, and others for whom the SPM is too easy. The APM was originally created in 1943 for use'at the War Office Selection Boards. In 1947 a revision was prepared for general use as a nonverbal test of intellectual efficiency with which a person is able to form comparisons between figures and to develop a logical method of reasoning. Based on the experimental work with the 1947 edition of Foulds (Foulds & Raven, 1950) and an item analysis carried out by Forbes (1964), the 1962 edition of the APM dropped 12 problems that made no contribution to the score distributions for adults of more than average intellectual ability from Set I1 and arranged the remaining problems in order according to the frequency with which they were solved as the total score on the revised set increased from 0 to 36. Raven arranged the 1962 edition so that it could be used without a time limit, in order to assess a person9s total capacity for observation and clear thinking, or with a time limit, in order to assess the examinee's intellectual efficiency. It consists of two sets of tests. In Set I there are 12 problems designed to introduce a person to the method of working and cover all the intellectual processes needed for success in Set 11. The 36 problems in Set I1 are identical in presentation and argument with those in Set I. They only increase in difficulty more steadily and become considerably more complex.

To assess a person's total capacity for observation and clear thinking, Raven suggests that the examinee be shown the problems of Set I as examples to explain the principle of the test. The subject can then be allowed to work through Set I1 at his own speed from beginning to end without interruption. To assess a person's intellectual efficiency, Set.1 can be given as a short practice test followed by Set 11 as a speed test. The most common time limit is 40 minutes. Examination of the literature reveals a preference for the administration of the APM without a time limit. Yates, in particular, states that even the shorter 1962 edition has not overcome the problem of power and speed contamination when given a 40-minute time limit. In a study involving 960 freshman university students, he found that the number of persons not attempting to solve problems increases with the later items of the test. Consequently, the difficulty levels of the items cannot be determined (Yates, 1966). Unlimited working time practically eliminates the number of items not attempted and enables a determination of the true difficulty of each item, unconfounded by differences in speed of working. This present study was undertaken to provide normative information about the APM administered to an American university population. Comparisons to other mental ability tests are presented and the relationship between the APM and Spearman's g is explored. Method

Subjects Three hundred students (190 female, 110 male) from the University of California, Berkeley, served as subjects. Their average age was 252 months (21 years) with a standard deviation of 32 months.

Procedure Each subject was tested individually. The basic procedure of the matrices test was explained by the experimenter using examples (problems A1 and C5) from the SPM. Subjects were instructed to put some answer down for every question and were given a loose time limit of 1 hour. If the subject was not finished in an hour an additional 10 to 15 minutes was given to complete the test. A subject's score was the total number of items answered correctly. One hundred fifty of the subjects were also individually given the Terman Concept Mastery Test (CMT), a high level test of verbal ability. A different set of 62 subjects out of the 300 were also individually administered the Wechsler Adult Intelligence Scale (WAIS).

The mean total score for the sample of 300 students was 27.0 with a standard deviation of 5.14. The median

Downloaded by [University of Otago] at 11:17 31 December 2014

PAUL total score was also 27.0. The mean total score of the normative group of 170university students presented by Raven (1965) was only 21 (SD = 4). Gibson (1975) also found data on the APM which were significantly higher than the published university norms. The mean total score of 281 applicants to a psychology honors course at Hatfield Polytechnic in Great Britain was 24.28 (SD = '4.67). Table 1 presents the absolute frequency, cumulative frequency percentile, t score, and normalized t score for the total APM score values based on the sample of 300 students. The 95th percentile corresponds to a total score between 34 and 35 for this sample. The 95th percentile value based on Raven's normative group with similar ages is between 23 and 24. The Berkeley sample scored much higher overall than the normative sample of Raven's 1962 edition of the APM. The internal consistency reliability based on the Kuder-Richardson formula (KR-20) is .83. That is, approximately 83% of the variance in total test scores is attributable to true score variance, i.e., to what the APM is actually testing. There is strong agreement between the rank order of the items, according to the frequency with which they are solved, presented by Raven and those determined for this sample (r = .94). However, there is one noteworthy exception. The item Raven ranked 13th turned out to be much more difficult for the Berkeley students than would have been expected. It ranked as only the 22nd most frequently solved item. The item involves changes in three variables: object shape (diamond, square, circle), number of internal lines (one, two, three), and slant of internal lines (45", 90°, 135"). The majority of subjects who did not choose the correct response (#2) were attracted to a distractor (#5) that ignored the necessary change in the slant of the internal lines. Information beyond what is provided by just total score values can sometimes be found in an examination of the incorrect responses to the APM (Thissen, 1976). Selection of distractor items, incorrect multiple-choice alternatives, for each of the problems of the APM was examined to determine if patterns developed that would aid in the discrimination between subjects. Two subgroups of the total sample of 300 were formed. The low group came from the bottom 24th percentile receiving total scores less than or equal to 23 (n = 72). The high group comprised those in the top 26th percentile who scored greater than or equal to 31 (n = 78). A comparison was made between the two groups to see if distractors chosen by the high group were different from or perhaps better (i.e., closer to the correct response) than the incorrect responses chosen by the low group. No differences between the two groups were found. Unlike most studies of the Raven's Progressive Matrices, a significant difference (a = .05) was found between the average total score of males and females. In

97

TABLE 1-Absolute Frequency, Cumulative Frequency Percentile, t Score, and Normalized t Score for Total APM Score Values (N = 300) Total score

Absolute frequency

Cumulative frequency percentile

t score

Normalized t score

this sample the males (M = 28.40, SD = 4.85, n = 110) outscored the females (M = 26.23, SD 5.11, n = 190). Four percent of the variance in APM total scores can be explained by the differences in sexes. The sex differences occasionally reported in the literature are thought to be attributable to sampling errors. No true sex differences have been reliably demonstrated (Court & Kennedy, 1976). One hundred fifty of the Raven's testees were also individually given the Terman Concept Mastery Test. There was a moderate positive relationship (r = .44) between the total scores on the two tests (APM: M = 27.24, SD = 5.14; CMT: M = 81.69, SD = 32.80). Sixty-two of the subjects were also administered the WAIS. Full Scale IQ scores of the WAIS correlated .69 with the APM total scores. Correcting this correlation for restriction of range, based on the population WAIS IQ SD of 15, by the method given by McNemar (1949, p. 127), the correlation becomes. 84 (APM: M = 28.23, SD = 5.08; WAIS: M = 122.84, SD = 9.30). In a similar study, McLauren et al. (1973) reported a correlation of .55 (.74 corrected for restriction of range) between the APM and the WAIS based on 131 students at the University of Alabama in Birmingham. These results indicate that the APM, CMT, and the WAIS are tapping some of the same general ability. The possible nature of that ability is examined in the following section.

98

JOURNAL OF EXPERIMENTAL EDUCATION

Downloaded by [University of Otago] at 11:17 31 December 2014

Spearman's g One of the most solidly established phenomena in psychology is that scores on all mental ability tests, no matter how diverse the mental skills or areas they cover, are positively intercorrelated when they are obtained in a representative sample of the general population. It was Spearman who first hypothesized that there is some "general factor" of mental ability that is measured in common by all of the intercorrelated mental tests. He gave the label "g" to this general factor. Spearman developed the mathematical method known as factor analysis which enabled him to extract the g from all the intercorrelations among a collection of diverse tests and show the correlation between each test and the hypothetical general ability factor. The correlation of a particular test with the g factor common to all tests in the analysis is called the test's g loading. The square of a test's g loading indicates the proportion of the total variance in the scores on the test that is due to individual differences in this general ability. It is important to note that the g factor may not show up on some tests given to highly selected groups, such as the often tapped pool of university students, although these tests show moderate g loadings when given to the general population. The explanation is that these groups have already been highly selected on g-loaded tests, such as college entrance exams, and therefore their scores indicate less individual variation on the g factor. This limits the intercorrelations among the various tests and thereby prevents the g factor from showing up strongly in a factor analysis of the matrix of intercorrelations. Spearman originally hypothesized that each test measures only g plus some specific ability, s, which is tapped only by the particular test. This theory that any given test score is composed of only g + s, as well as measurement error, was soon refuted by the finding that there are other common factors besides g in many mental ability tests. However, they cannot be considered general factors because they do not enter into all tests, as does g, but do enter only into certain groups of tests. In a factor analysis of a large number of various mental tests, the first unrotated factor (or principal component) is g or general mental ability. It usually accounts for almost half of the total variance in a large battery of diverse tests. The several other smaller factors, the group factors, show highly differential loadings on tests that are often characterized as verbal, numerical, spatial, or involving memory. Factor analysis by itself does not and cannot explain the basis for the existence of g. Spearman himself stated that factor analysis cannot reveal the essential nature of g but only reveals where to look for it. Examination of the characteristics of a wide variety of tests in connection with their g loadings can provide some descriptive generalizations about the common features that charac-

terize tests that have relatively high g loadings as compared with tests that have relatively low g loadings. Spearman originally tried to get at the psychological nature of g by factor analyzing more than 100 tests, each fairly homogeneous in content, and then comparing their g loadings (Spearman & Jones, 1950). He characterized the most g-loaded tests essentially as those requiring "the eduction of relations and correlates," that is, perceiving relationships, inducing the general from the particular, and deducing the particular from the general. Such tests require inductive or inventive as contrasted to reproductive or rule-applying behavior. The most g-loaded test in the whole battery was the Raven's Progressive Matrices (RPM), which, as previously mentioned, depends almost entirely on perceiving key features and relationships and discovering the abstract rules that govern the differences among the elements in the matrix. There is much more test material available now than was available to Spearman more than 50 years ago. This had led to broader generalizations about g . The g factor is manifested in tests to the degree that they involve mental manipulation of the input elements, choice, decision, invention in contrast to selection, meaningful memory in contrast to rote memory, long-term memory in contrast to short-term memory, and distinguishing relevant information from irrelevant information in solving complex problems (Jensen, 1979). Task comple~rityand the amount of conscious mental manipulation required seem to be the most basic determinants of the g loading of a task. There are many examples in which a slight increase in task complexity is accompanied by an increase in the g loading of the task. Virtually any task involving mental activity that is complex enough to be recognized as involving some kind of conscious mental effort is substantially g loaded. It is the task's complexity rather than its content that is most related to g. An almost infinite variety of test items, regardless of sensory modality, substantive or cultural content, or the form of effector activity involved in the required response, is capable of measuring g. This observation led to Spearman's principle of "the indifference of the indicator," meaning that the manifestation of g is not limited to any particular types of information or item types. Previous research suggests that the Raven's Progressive Matrices administered to the general population measures g and little else (Burlte, 1958). The occasional loadings found on other factors, independently of g, are mostly trivial and inconsistent from one analysis to another. Although many other tests measure g to a similar extent, unlike the Raven, they also have loadings on the major group factors such as verbal, numerical, spatial, and memory. The RPM does not measure perceptual ability or spatial-visualization ability as is com-

Downloaded by [University of Otago] at 11:17 31 December 2014

PAUL monly believed. In fact, the Raven has very small loadings on these factors, when g is excluded. Factor analysis of the RPM at the item level should result in only a single factor. Some investigations that have found more than one factor have employed improper orthogonal rotations of the principal components. This method can artificially create the appearance of several factors even in correlation matrices that are artificially constructed so as to contain only one factor plus random error (Jensen, 1980). Some of the small spurious factors that emerge from factor analysis of the inter-item correlations are not really ability factors at all but are "difficulty" factors, due to varying degrees of restriction of variance on items of widely differing difficulty levels and to nonlinear regression of item difficulties on age and ability (McDonald, 1965). When these psychometric artifacts are taken into account, the RPM seems to measure only a single factor of mental ability, which can be termed g. The APM test results of this study were factor analyzed at the item level. The first principal factor of the intercorrelation matrix of the 36 items, scored correct or incorrect, accounts for 15% of the total inter-item variance. A factor loading correlation matrix was created from this first principal factor and subtracted from the original inter-item correlation matrix. The resulting residual matrix was tested and found not significantly different from zero at a! = .05. Therefore, the APM can be considered to measure only one factor. That this factor only accounts for 15% of the total inter-item variance indicates that the variance of each item is due mostly to uniqueness, that is, item specificity and error. The items are not highly intercorrelated. However, what they do have in common may indeed be Spearman's g. The first principal factor should not be considered just a difficulty factor. The correlation between the item loadings on the first principal factor and item difficulty levels (percent passing) is - .36. (Only 13% of the variance in the first principal factor loadings can be explained by differences in item difficulty.) The correlation between the item loadings on the first principal factor and item variance is .41. (Sixteen percent of the variance in the first principal factor loadings can be explained by differences in item variance.) The correlation between item difficulty and item variance is - 37. When item variance is held constant, i.e., partialled out, the correlation between item loadings on the first principal factor and difficulty levels is .01. The loadings of each item with the first principal factor and the correlations of each item with total score are shown in Table 2. The total score on the APM can be considered a reasonable measure of general mental ability. This notion is at the very least intuitively appealing. There is near perfect agreement between the correlations of each item with total score and the correlations of each item with the hypothesized g factor. The correlation be-

99

tween the 36 item by point biserial correlations and the 36 g loadings with the effect of item variance partialled out is .99. This evidence supports the claim that the first principal factor of this analysis is not just a difficulty factor and that it measures general mental ability. Further evidence that the APM measures g comes from a closer look at the relationship between the APM and the WAIS. Two subgroups consisting of 13 items each and matched on item variance were created from the 36 APM test items. The high g group had an average g loading of .46 and an average item variance of .15. The low g group had an average g loading of .27 and an average item variance of .IS. Correlations were obtained TABLE 2-APM Item Correlations with Total Score and First Principal Factor (N = 300)

Item

Total score

First principal factor

Item

Total score

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

.08 .20 .25 .38 .34 .17 .24 .35 .34 .34 .30 .42 .24 .30 .24 .45 .39 .35

.04 .24 .29 .43 .37 .17 .24 .37 .41 .37 .34 .46 .18 .29 .20 .47 .36 .31

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

.30 .27 .50 .41 .56 .53 .37 .36 .50 .45 .50 .45 .47 .39 .43 .53 .49 .43

First principal factor .26 .23 .49 .36 .57 .50 .30 .30

.44 .38 .44 .38 .41 .33 .37 .51 .45 .37

based on the 62 subjects who took both tests. The correlation between the high g items and WAIS Full Scale IQ scores was .67. Low g items and WAIS Full Scale IQ scores correlated 56. Although the two correlations are not significantly different from each other (t = 1.39, a = .05), a trend was apparent. Despite the fact that the items of the APM and the WAIS are drastically different in content, those items correlating highest with the hypothesized g factor derived from the APM show a stronger relationship to WAIS Full Scale IQ scores than those items with a low correlation with the hypothesized APM g. Since WAIS Full Scale IQ scores have been shown to be highly g loaded in previous research (Matarazzo, 1972; Jensen, 1980), the pattern found here can be interpreted to indicate that the hypothesized g of the APM is the same g that is measured by the WAIS. In summary then, the distribution of scores for a large cross section of University of California students on the Advanced Raven's Progressive Matrices is markedly higher than the estimated score distribution of

JOURNAL OF EXIPERXMENTAL EDUCATION university students that accompanies the 1962 version of the test. A moderate 'positive association exists between the APM and the Terman Concept Mastery Test. There is an even stronger positive relationship between the APM and the Wechsler Adult Intelligence Scale. Examination of the internal factor structure of the items of the test indicate that the APM measures only one factor. This factor is not just a difficulty factor. The results support the notion that the APM provides a measure of Spearman's g.

Downloaded by [University of Otago] at 11:17 31 December 2014

REFERENCES Burke, H. R. (1958). Raven's Progressive Matrices: A review and critical evaluation. Journal of Genetic Psychology, 93, 199-228. Court, J. H., & Kennedy, R. J. (1976). Sex as a variable in Raven's Standard Progressive Matrices. Proceedings of the 21st International Congress of Psychology, Paris, France. Forbes, A. R. (1964). An item analysis of the Advanced Matrices. British Journal of Educational Psychology, 34, 1-14. Foulds, G. A., & Raven, J. C. (1950). An experimental survey with Progressive Matrices (1947). British Journal of Educational Psychology, 20, 4-10. Gibson, H. B. (1975). Relations between performance on the Advanced Matrices and the EPI in high-intelligence subjects. British Journal qf Social and Clinical Psychology, 14, 363-369.

Jensen, A. R. (1979). g: Outmoded theory or unconquered frontier? Creative Science and Technology,2, 16-29. Jensen, A. R. (1980). Bias in mental testing. New York: The Free Press. Matarazzo, J. D. (1972). Wechsler's measurement and appraisal of adult intelligence (5th ed.). Baltimore: Williams & Wilkins. McDonald, R. P. (1965). Difficulty factors and nonlinear factor analysis. British Journal of Mathematical and Statistical Psychology, 18, 11-23. McLaurin, W. A., Jenkins, J. F., Farrar, W. E., & Rumore, M. C. (1973). Correlation of IQ's on verbal and nonverbal tests of intelligence. Psychological Reports, 33, 821-822. McNemar, Q. (1949). Psychological statistics. New York: Wiley. Raven, J. C. (1938). Progressive Matrices: A perceptual test of intelligence, 1938, Individual form. London: H . K. Lewis. Raven, J. C. (1947). Coloured Progressive Matrices. London: H . K. Lewis. Raven, J. C. (1960). Guide to the Standard Progressive Matrices. London: H. K. Lewis. Raven, J. C. (1965). Advanced Progressive Matrices Sets I and II. London: H. K. Lewis. Spearman, C., & Jones, L. L. W. (1950). Human ability. London: Macmillan. Thissen, D. M. (1976). Information in wrong responses to the Raven Progressive Matrices. Journal of Educational Measurement, 13, 201-214. Yates, A. J. (1966). A note on Progressive Matrices (1962). Australian Journal of Psychology, 18, 281-283.

View more...
The Journal of Experimental Education Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/vjxe20

The Advanced Raven’s Progressive Matrices Steven M. Paul

a

a

University of California, Berkeley Published online: 16 Apr 2014.

To cite this article: Steven M. Paul (1986) The Advanced Raven’s Progressive Matrices, The Journal of Experimental Education, 54:2, 95-100, DOI: 10.1080/00220973.1986.10806404 To link to this article: http://dx.doi.org/10.1080/00220973.1986.10806404

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Downloaded by [University of Otago] at 11:17 31 December 2014

The Advanced Raven's .Progressive Matrices: Normative Data for an American University Population and an Examination of the Relationship with Spearman's g STEVEN M. PAUL University of California, Berkeley

ABSTRACT Normative data for the Advanced Raven's Progressive Matrices are presented based on 300 University of California, Berkeley, students. Correlations with the Wechsler Adult Intelligence Scale and the Terman Concept Mastery Test are reported. The relationship between the Advanced Raven's Progressive Matrices and Spearman's g is explored.

THE RAVEN'S PROGRESSIVE MATRICES (RPM) are the best known and most widely used culture reduced tests of mental ability. British geneticist Lionel Penrose and British psychologist 5. C. Raven were the first to present perceptual analogy 'and inductive reasoning problems in the form of a matrix. In their matrices the perceptual analogies simultaneously involve both horizontal and vertical transformations. The variety of figures, relationships, and transformations are virtually limitless. Figures may increase or decrease in size, elements may be added or subtracted, shaded or unshaded, flipped, rotated, mirror imaged, or show many other progressive changes in pattern. In each case, the lower right corner of the total matrix is missing, and the subject must select the best one of the six or eight multiplechoice alternatives to fill the empty corner.

Raven described the Progressive Matrices as "a test of a person's present capacity to form comparisons, reason by analogy, and develop a logical method of thinking, regardless of previously acquired information'? (Raven, 1938, p. 12). He was responsible for publishing the first Progressive Matrices Test and its subsequent improvements and extensions (Raven, 1938, 1947, 1960). There are three forms of the RPM now in use: Standard Progressive Matrices (SPM), Colored Progressive Matrices (CPM), and Advanced Progressive Matrices (APM). Considerable research has been conducted involving the SPM and CPM, but little information is available concerning the APM. Adequate standardization norms are lacking in the United States. Research is notably absent in relation to university students, the type of population the APM is best suited to measure. The Standard Progressive Matrices consists of 60 items grouped in five sets (A, B, C, D, E) of 12 items each. Each set involves different principles of matrix transformation and within each set the items become progressively more difficult. It was designed to cover the widest possible range of mental ability and to be equally useful with persons of all ages, whatever their education, nationality, or physical condition. The scale is intended to cover the entire range of intellectual development starting with the time a child is able to grasp the idea of finding a missing piece to complete a pattern. It is sufficiently long to assess a person's maximum capac-

Downloaded by [University of Otago] at 11:17 31 December 2014

JOURNAL OF EXPERIMENTAL EDUCATION ity to form comparisons and reason by analogy without being overly taxing or unwieldy. A person's total score provides an index of his intellectual ability. The scores obtained by adults tend to cluster in the upper half of the scale. The Colored Progressive Matrices, Sets A, Ab, and B, were devised as a test for young children and old people, for anthropological studies, and for clinical work. It can be used with people who, for whatever reason, cannot understand or speak the English language, suffer from physical disabilities, are intellectually subnormal, or have deteriorated. To make the test independent of verbal instructions, the problems are printed on colored backgrounds and the scale is arranged so that it can be presented in the form of illustrations printed in a book or as boards with movable pieces. Success in Set Ab depends on the comprehension of discrete figures as spatially related "wholes" and in combination with Sets A and B adequately covers the cognitive processes of which children under 11 years of age are usually capable. The Advanced Progressive Matrices, Sets I and 11, were constructed as a test of intellectual efficiency that can be used with people of more than average intellectual ability and that will differentiate clearly between individuals of even superior ability. The difficulty level of the APM is such as to make it unsuitable for persons scoring below a raw score of about 50 on the SPM. For the general adult population the APM has too small a range of scores to be useful. The APM is intended for intellectually superior youths and adults, university students, and others for whom the SPM is too easy. The APM was originally created in 1943 for use'at the War Office Selection Boards. In 1947 a revision was prepared for general use as a nonverbal test of intellectual efficiency with which a person is able to form comparisons between figures and to develop a logical method of reasoning. Based on the experimental work with the 1947 edition of Foulds (Foulds & Raven, 1950) and an item analysis carried out by Forbes (1964), the 1962 edition of the APM dropped 12 problems that made no contribution to the score distributions for adults of more than average intellectual ability from Set I1 and arranged the remaining problems in order according to the frequency with which they were solved as the total score on the revised set increased from 0 to 36. Raven arranged the 1962 edition so that it could be used without a time limit, in order to assess a person9s total capacity for observation and clear thinking, or with a time limit, in order to assess the examinee's intellectual efficiency. It consists of two sets of tests. In Set I there are 12 problems designed to introduce a person to the method of working and cover all the intellectual processes needed for success in Set 11. The 36 problems in Set I1 are identical in presentation and argument with those in Set I. They only increase in difficulty more steadily and become considerably more complex.

To assess a person's total capacity for observation and clear thinking, Raven suggests that the examinee be shown the problems of Set I as examples to explain the principle of the test. The subject can then be allowed to work through Set I1 at his own speed from beginning to end without interruption. To assess a person's intellectual efficiency, Set.1 can be given as a short practice test followed by Set 11 as a speed test. The most common time limit is 40 minutes. Examination of the literature reveals a preference for the administration of the APM without a time limit. Yates, in particular, states that even the shorter 1962 edition has not overcome the problem of power and speed contamination when given a 40-minute time limit. In a study involving 960 freshman university students, he found that the number of persons not attempting to solve problems increases with the later items of the test. Consequently, the difficulty levels of the items cannot be determined (Yates, 1966). Unlimited working time practically eliminates the number of items not attempted and enables a determination of the true difficulty of each item, unconfounded by differences in speed of working. This present study was undertaken to provide normative information about the APM administered to an American university population. Comparisons to other mental ability tests are presented and the relationship between the APM and Spearman's g is explored. Method

Subjects Three hundred students (190 female, 110 male) from the University of California, Berkeley, served as subjects. Their average age was 252 months (21 years) with a standard deviation of 32 months.

Procedure Each subject was tested individually. The basic procedure of the matrices test was explained by the experimenter using examples (problems A1 and C5) from the SPM. Subjects were instructed to put some answer down for every question and were given a loose time limit of 1 hour. If the subject was not finished in an hour an additional 10 to 15 minutes was given to complete the test. A subject's score was the total number of items answered correctly. One hundred fifty of the subjects were also individually given the Terman Concept Mastery Test (CMT), a high level test of verbal ability. A different set of 62 subjects out of the 300 were also individually administered the Wechsler Adult Intelligence Scale (WAIS).

The mean total score for the sample of 300 students was 27.0 with a standard deviation of 5.14. The median

Downloaded by [University of Otago] at 11:17 31 December 2014

PAUL total score was also 27.0. The mean total score of the normative group of 170university students presented by Raven (1965) was only 21 (SD = 4). Gibson (1975) also found data on the APM which were significantly higher than the published university norms. The mean total score of 281 applicants to a psychology honors course at Hatfield Polytechnic in Great Britain was 24.28 (SD = '4.67). Table 1 presents the absolute frequency, cumulative frequency percentile, t score, and normalized t score for the total APM score values based on the sample of 300 students. The 95th percentile corresponds to a total score between 34 and 35 for this sample. The 95th percentile value based on Raven's normative group with similar ages is between 23 and 24. The Berkeley sample scored much higher overall than the normative sample of Raven's 1962 edition of the APM. The internal consistency reliability based on the Kuder-Richardson formula (KR-20) is .83. That is, approximately 83% of the variance in total test scores is attributable to true score variance, i.e., to what the APM is actually testing. There is strong agreement between the rank order of the items, according to the frequency with which they are solved, presented by Raven and those determined for this sample (r = .94). However, there is one noteworthy exception. The item Raven ranked 13th turned out to be much more difficult for the Berkeley students than would have been expected. It ranked as only the 22nd most frequently solved item. The item involves changes in three variables: object shape (diamond, square, circle), number of internal lines (one, two, three), and slant of internal lines (45", 90°, 135"). The majority of subjects who did not choose the correct response (#2) were attracted to a distractor (#5) that ignored the necessary change in the slant of the internal lines. Information beyond what is provided by just total score values can sometimes be found in an examination of the incorrect responses to the APM (Thissen, 1976). Selection of distractor items, incorrect multiple-choice alternatives, for each of the problems of the APM was examined to determine if patterns developed that would aid in the discrimination between subjects. Two subgroups of the total sample of 300 were formed. The low group came from the bottom 24th percentile receiving total scores less than or equal to 23 (n = 72). The high group comprised those in the top 26th percentile who scored greater than or equal to 31 (n = 78). A comparison was made between the two groups to see if distractors chosen by the high group were different from or perhaps better (i.e., closer to the correct response) than the incorrect responses chosen by the low group. No differences between the two groups were found. Unlike most studies of the Raven's Progressive Matrices, a significant difference (a = .05) was found between the average total score of males and females. In

97

TABLE 1-Absolute Frequency, Cumulative Frequency Percentile, t Score, and Normalized t Score for Total APM Score Values (N = 300) Total score

Absolute frequency

Cumulative frequency percentile

t score

Normalized t score

this sample the males (M = 28.40, SD = 4.85, n = 110) outscored the females (M = 26.23, SD 5.11, n = 190). Four percent of the variance in APM total scores can be explained by the differences in sexes. The sex differences occasionally reported in the literature are thought to be attributable to sampling errors. No true sex differences have been reliably demonstrated (Court & Kennedy, 1976). One hundred fifty of the Raven's testees were also individually given the Terman Concept Mastery Test. There was a moderate positive relationship (r = .44) between the total scores on the two tests (APM: M = 27.24, SD = 5.14; CMT: M = 81.69, SD = 32.80). Sixty-two of the subjects were also administered the WAIS. Full Scale IQ scores of the WAIS correlated .69 with the APM total scores. Correcting this correlation for restriction of range, based on the population WAIS IQ SD of 15, by the method given by McNemar (1949, p. 127), the correlation becomes. 84 (APM: M = 28.23, SD = 5.08; WAIS: M = 122.84, SD = 9.30). In a similar study, McLauren et al. (1973) reported a correlation of .55 (.74 corrected for restriction of range) between the APM and the WAIS based on 131 students at the University of Alabama in Birmingham. These results indicate that the APM, CMT, and the WAIS are tapping some of the same general ability. The possible nature of that ability is examined in the following section.

98

JOURNAL OF EXPERIMENTAL EDUCATION

Downloaded by [University of Otago] at 11:17 31 December 2014

Spearman's g One of the most solidly established phenomena in psychology is that scores on all mental ability tests, no matter how diverse the mental skills or areas they cover, are positively intercorrelated when they are obtained in a representative sample of the general population. It was Spearman who first hypothesized that there is some "general factor" of mental ability that is measured in common by all of the intercorrelated mental tests. He gave the label "g" to this general factor. Spearman developed the mathematical method known as factor analysis which enabled him to extract the g from all the intercorrelations among a collection of diverse tests and show the correlation between each test and the hypothetical general ability factor. The correlation of a particular test with the g factor common to all tests in the analysis is called the test's g loading. The square of a test's g loading indicates the proportion of the total variance in the scores on the test that is due to individual differences in this general ability. It is important to note that the g factor may not show up on some tests given to highly selected groups, such as the often tapped pool of university students, although these tests show moderate g loadings when given to the general population. The explanation is that these groups have already been highly selected on g-loaded tests, such as college entrance exams, and therefore their scores indicate less individual variation on the g factor. This limits the intercorrelations among the various tests and thereby prevents the g factor from showing up strongly in a factor analysis of the matrix of intercorrelations. Spearman originally hypothesized that each test measures only g plus some specific ability, s, which is tapped only by the particular test. This theory that any given test score is composed of only g + s, as well as measurement error, was soon refuted by the finding that there are other common factors besides g in many mental ability tests. However, they cannot be considered general factors because they do not enter into all tests, as does g, but do enter only into certain groups of tests. In a factor analysis of a large number of various mental tests, the first unrotated factor (or principal component) is g or general mental ability. It usually accounts for almost half of the total variance in a large battery of diverse tests. The several other smaller factors, the group factors, show highly differential loadings on tests that are often characterized as verbal, numerical, spatial, or involving memory. Factor analysis by itself does not and cannot explain the basis for the existence of g. Spearman himself stated that factor analysis cannot reveal the essential nature of g but only reveals where to look for it. Examination of the characteristics of a wide variety of tests in connection with their g loadings can provide some descriptive generalizations about the common features that charac-

terize tests that have relatively high g loadings as compared with tests that have relatively low g loadings. Spearman originally tried to get at the psychological nature of g by factor analyzing more than 100 tests, each fairly homogeneous in content, and then comparing their g loadings (Spearman & Jones, 1950). He characterized the most g-loaded tests essentially as those requiring "the eduction of relations and correlates," that is, perceiving relationships, inducing the general from the particular, and deducing the particular from the general. Such tests require inductive or inventive as contrasted to reproductive or rule-applying behavior. The most g-loaded test in the whole battery was the Raven's Progressive Matrices (RPM), which, as previously mentioned, depends almost entirely on perceiving key features and relationships and discovering the abstract rules that govern the differences among the elements in the matrix. There is much more test material available now than was available to Spearman more than 50 years ago. This had led to broader generalizations about g . The g factor is manifested in tests to the degree that they involve mental manipulation of the input elements, choice, decision, invention in contrast to selection, meaningful memory in contrast to rote memory, long-term memory in contrast to short-term memory, and distinguishing relevant information from irrelevant information in solving complex problems (Jensen, 1979). Task comple~rityand the amount of conscious mental manipulation required seem to be the most basic determinants of the g loading of a task. There are many examples in which a slight increase in task complexity is accompanied by an increase in the g loading of the task. Virtually any task involving mental activity that is complex enough to be recognized as involving some kind of conscious mental effort is substantially g loaded. It is the task's complexity rather than its content that is most related to g. An almost infinite variety of test items, regardless of sensory modality, substantive or cultural content, or the form of effector activity involved in the required response, is capable of measuring g. This observation led to Spearman's principle of "the indifference of the indicator," meaning that the manifestation of g is not limited to any particular types of information or item types. Previous research suggests that the Raven's Progressive Matrices administered to the general population measures g and little else (Burlte, 1958). The occasional loadings found on other factors, independently of g, are mostly trivial and inconsistent from one analysis to another. Although many other tests measure g to a similar extent, unlike the Raven, they also have loadings on the major group factors such as verbal, numerical, spatial, and memory. The RPM does not measure perceptual ability or spatial-visualization ability as is com-

Downloaded by [University of Otago] at 11:17 31 December 2014

PAUL monly believed. In fact, the Raven has very small loadings on these factors, when g is excluded. Factor analysis of the RPM at the item level should result in only a single factor. Some investigations that have found more than one factor have employed improper orthogonal rotations of the principal components. This method can artificially create the appearance of several factors even in correlation matrices that are artificially constructed so as to contain only one factor plus random error (Jensen, 1980). Some of the small spurious factors that emerge from factor analysis of the inter-item correlations are not really ability factors at all but are "difficulty" factors, due to varying degrees of restriction of variance on items of widely differing difficulty levels and to nonlinear regression of item difficulties on age and ability (McDonald, 1965). When these psychometric artifacts are taken into account, the RPM seems to measure only a single factor of mental ability, which can be termed g. The APM test results of this study were factor analyzed at the item level. The first principal factor of the intercorrelation matrix of the 36 items, scored correct or incorrect, accounts for 15% of the total inter-item variance. A factor loading correlation matrix was created from this first principal factor and subtracted from the original inter-item correlation matrix. The resulting residual matrix was tested and found not significantly different from zero at a! = .05. Therefore, the APM can be considered to measure only one factor. That this factor only accounts for 15% of the total inter-item variance indicates that the variance of each item is due mostly to uniqueness, that is, item specificity and error. The items are not highly intercorrelated. However, what they do have in common may indeed be Spearman's g. The first principal factor should not be considered just a difficulty factor. The correlation between the item loadings on the first principal factor and item difficulty levels (percent passing) is - .36. (Only 13% of the variance in the first principal factor loadings can be explained by differences in item difficulty.) The correlation between the item loadings on the first principal factor and item variance is .41. (Sixteen percent of the variance in the first principal factor loadings can be explained by differences in item variance.) The correlation between item difficulty and item variance is - 37. When item variance is held constant, i.e., partialled out, the correlation between item loadings on the first principal factor and difficulty levels is .01. The loadings of each item with the first principal factor and the correlations of each item with total score are shown in Table 2. The total score on the APM can be considered a reasonable measure of general mental ability. This notion is at the very least intuitively appealing. There is near perfect agreement between the correlations of each item with total score and the correlations of each item with the hypothesized g factor. The correlation be-

99

tween the 36 item by point biserial correlations and the 36 g loadings with the effect of item variance partialled out is .99. This evidence supports the claim that the first principal factor of this analysis is not just a difficulty factor and that it measures general mental ability. Further evidence that the APM measures g comes from a closer look at the relationship between the APM and the WAIS. Two subgroups consisting of 13 items each and matched on item variance were created from the 36 APM test items. The high g group had an average g loading of .46 and an average item variance of .15. The low g group had an average g loading of .27 and an average item variance of .IS. Correlations were obtained TABLE 2-APM Item Correlations with Total Score and First Principal Factor (N = 300)

Item

Total score

First principal factor

Item

Total score

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

.08 .20 .25 .38 .34 .17 .24 .35 .34 .34 .30 .42 .24 .30 .24 .45 .39 .35

.04 .24 .29 .43 .37 .17 .24 .37 .41 .37 .34 .46 .18 .29 .20 .47 .36 .31

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

.30 .27 .50 .41 .56 .53 .37 .36 .50 .45 .50 .45 .47 .39 .43 .53 .49 .43

First principal factor .26 .23 .49 .36 .57 .50 .30 .30

.44 .38 .44 .38 .41 .33 .37 .51 .45 .37

based on the 62 subjects who took both tests. The correlation between the high g items and WAIS Full Scale IQ scores was .67. Low g items and WAIS Full Scale IQ scores correlated 56. Although the two correlations are not significantly different from each other (t = 1.39, a = .05), a trend was apparent. Despite the fact that the items of the APM and the WAIS are drastically different in content, those items correlating highest with the hypothesized g factor derived from the APM show a stronger relationship to WAIS Full Scale IQ scores than those items with a low correlation with the hypothesized APM g. Since WAIS Full Scale IQ scores have been shown to be highly g loaded in previous research (Matarazzo, 1972; Jensen, 1980), the pattern found here can be interpreted to indicate that the hypothesized g of the APM is the same g that is measured by the WAIS. In summary then, the distribution of scores for a large cross section of University of California students on the Advanced Raven's Progressive Matrices is markedly higher than the estimated score distribution of

JOURNAL OF EXIPERXMENTAL EDUCATION university students that accompanies the 1962 version of the test. A moderate 'positive association exists between the APM and the Terman Concept Mastery Test. There is an even stronger positive relationship between the APM and the Wechsler Adult Intelligence Scale. Examination of the internal factor structure of the items of the test indicate that the APM measures only one factor. This factor is not just a difficulty factor. The results support the notion that the APM provides a measure of Spearman's g.

Downloaded by [University of Otago] at 11:17 31 December 2014

REFERENCES Burke, H. R. (1958). Raven's Progressive Matrices: A review and critical evaluation. Journal of Genetic Psychology, 93, 199-228. Court, J. H., & Kennedy, R. J. (1976). Sex as a variable in Raven's Standard Progressive Matrices. Proceedings of the 21st International Congress of Psychology, Paris, France. Forbes, A. R. (1964). An item analysis of the Advanced Matrices. British Journal of Educational Psychology, 34, 1-14. Foulds, G. A., & Raven, J. C. (1950). An experimental survey with Progressive Matrices (1947). British Journal of Educational Psychology, 20, 4-10. Gibson, H. B. (1975). Relations between performance on the Advanced Matrices and the EPI in high-intelligence subjects. British Journal qf Social and Clinical Psychology, 14, 363-369.

Jensen, A. R. (1979). g: Outmoded theory or unconquered frontier? Creative Science and Technology,2, 16-29. Jensen, A. R. (1980). Bias in mental testing. New York: The Free Press. Matarazzo, J. D. (1972). Wechsler's measurement and appraisal of adult intelligence (5th ed.). Baltimore: Williams & Wilkins. McDonald, R. P. (1965). Difficulty factors and nonlinear factor analysis. British Journal of Mathematical and Statistical Psychology, 18, 11-23. McLaurin, W. A., Jenkins, J. F., Farrar, W. E., & Rumore, M. C. (1973). Correlation of IQ's on verbal and nonverbal tests of intelligence. Psychological Reports, 33, 821-822. McNemar, Q. (1949). Psychological statistics. New York: Wiley. Raven, J. C. (1938). Progressive Matrices: A perceptual test of intelligence, 1938, Individual form. London: H . K. Lewis. Raven, J. C. (1947). Coloured Progressive Matrices. London: H . K. Lewis. Raven, J. C. (1960). Guide to the Standard Progressive Matrices. London: H. K. Lewis. Raven, J. C. (1965). Advanced Progressive Matrices Sets I and II. London: H. K. Lewis. Spearman, C., & Jones, L. L. W. (1950). Human ability. London: Macmillan. Thissen, D. M. (1976). Information in wrong responses to the Raven Progressive Matrices. Journal of Educational Measurement, 13, 201-214. Yates, A. J. (1966). A note on Progressive Matrices (1962). Australian Journal of Psychology, 18, 281-283.

Thank you for interesting in our services. We are a non-profit group that run this website to share documents. We need your help to maintenance this website.