Psychometrics and Misinterpretation:
A Look at Rushton's Work on Intelligence and Race

Mark Ferrari
Department of Psychology
Simon Fraser University

Introduction

Ever since its widespread introduction by Sir Francis Galton in the nineteenth century (Sternberg, 1990), psychometrics has come to prominence in the field of psychology. The use of psychometrics, loosely defined as any set of measures that can quantify a psychological trait, is now common place to an extent that even those with no psychological background are familiar with the names and concepts of several measures. Thus, for psychology, the relationship with psychometrics has been a good one, in that their use has led to several pragmatic applications which in turn has led to a rise in social status for the discipline (Danziger, 1990). Yet, unknown to most is that in psychologists' striving for knowledge and prominence, some researchers are stretching the boundary of what is commonly regarded as proper scientific research. This is especially true in the area of psychometrics where there seems to be a growing trend to use various measures for purposes for which they were not developed for. This in turn has raised questions about the validity and purpose of many psychological studies. The purpose of this paper is to examine this trend looking specifically at the work of psychologist Jean Philippe Rushton in the area of intelligence and race. Not only will one be exposed to his work and the views of his critics, but the paper will also examine how psychologists' desire to become more "scientific" actually contributes to unscientific, misinterpreted research . Finally, I wish to end the paper by discussing a real life example of psychometric misinterpretation in order to show how this trend has now permeated the way of thinking of many psychologists. It is hoped that by exposing instances of misinterpretation in psychometric research, the reader will become more critical of research findings and thus will avoid being negatively affected by faulty research.

Since this paper examines the research by Rushton which makes extensive use of psychometric tests that measure intelligence, it is prudent to first examine the purpose and meaning of the term. The concept of intelligence was developed by Binet at the turn of the century in France (Sternberg, 1990). Binet had been commissioned by the ministry of public instruction to devise a test that would reveal which students were "retarded", or slowed in their learning, so that they could be placed in special education classes. To do this, Binet devised a test that is now known to be the first intelligence test. While several researchers such as Stern and Terman have radically transformed the test throughout the years, many similarities remain (Sternberg, 1990). By far, the most crucial of these is that the purpose of intelligence tests is to give a measure of future performance in a "Euro-American" style school setting. Therefore, the intelligence score is nothing more than a measure that allows easy interpretation of the results of the test's prediction. While it may seem as though one is stating the obvious, what is important to note is that there are definite limits for the interpretation or generalizability of the results of intelligence tests. One of the main ways in which misinterpretation occurs is in the disregard of these limits.

Rushton's Studies

The work of Rushton (1988) provides us with an excellent example of how psychometric measures can be misinterpreted in research. In a study that created much controversy, Rushton (1988) tried to relate intelligence to brain size to try and explain variations in intelligence scores among races. He proposed that there is a natural hierarchy in intelligence among the races with Mongoloids (which mostly contains those of Asian decent) > Caucasoids > Negroids and that brain size can be used to explain this. To prove this, data from several previous studies were recalculated for both intelligence and brain size. In addressing the issue of brain size, calculations were done to assess the cranial capacity of over 200 skulls from over 17 populations. The figures resulted in an average brain size for Mongoloids of 1448 cm, Caucasoids 1408 cm, and Negroids 1334 cm . Rushton (1988) claims that the differences would be even larger if adjustments were made for the brain-body allometric regression since, at least in the United States, Mongoloids are smaller in body size, and Negroids larger than Caucasoids. The issue of differences in intelligence is addressed by examining data on intelligence scores for various races. The study points out that despite attempts to remedy the situation, black Americans have consistently shown lower intelligence scores than their white counterparts. Also, since intelligence test were standardized in Japan in the 1950's, that nations average intelligence has consistently been 1/3 - 2/3 of a standard deviation higher than that of white Americans and Europeans. Rushton (1988) also points out that the proportion of Asian-American students who receive a high math score(above 650) on the Scholastic Aptitude Test is twice the American average while the proportion of black students is less than 1/4 the American average. According to Rushton (1988) this seems to prove that intelligence varies among races with Mongoloid > Caucasoid > Negroid. It is suggested that since the ordering of the races is consistent for both intelligence and brain size, the variables must be related and there is a causal relationship between them. One of the most controversial aspects of Rushton's (1988) work is that he took the same logic used to claim a relationship between brain size and intelligence and applied it to other psychological and biological measures. In doing so, he wanted to show that intelligence was related to other measures. Rushton's (1988) findings did reveal that across several measures, there was a distinct difference between the races with the pattern consistently showing Mongoloids and Negroids at opposite ends of the continuum and Caucasoids somewhere in between. For instance, the study claimed that Mongoloids showed that highest levels of "social organization" and Negroids showed the lowest. In terms of personality, Negroids showed the highest levels of "aggressiveness" and "impulsivity" and Mongoloids showed the lowest. Even variables related to sexuality such as gestation time, intercourse frequencies, and age at first pregnancy were shown to follow the familiar pattern with Negroids highest on each of these measures. Rushton (1988) claims that these differences are mostly innate and thus have and should show stability over time.

Numerous researchers have found fault with Rushton's work. For instance, Vincent (1991) states that differences in intelligence between the races may be a result of culturally biased tests and unequal economic and educational opportunities. Vincent points out that most of the common intelligence tests were rethought in the past 15 years so that the questions asked would not give an advantage to any one group or culture. Vincent (1991) also notes that marked changes in education and economic opportunities for blacks have occurred during that time period as well. As such, Vincent predicted that black white differences in intelligence should show less difference during that time period, and while the results were modest, a decrease in the difference between black and white intelligence scores was found. Vincent (1991) hypothesizes that increased opportunities for blacks as well as increased sensitivity to cultural bias in tests can eventually remove the difference in intelligence scores between blacks and whites.

Other researchers have also criticized Rushton's work on the basis that it ignores the influence of environmental variables in promoting individual differences. Capron and Duyme (1989) conducted research that looked at the effect of socio-economic status (SES) of adoptive parents and studied its effect on the intelligence's of their adoptive children. It was shown that children adopted to a household with a high SES tended to have a higher intelligence than children who had been adopted to homes with a lower SES. As well, it was shown that the effect of SES on intelligence occurred whether or not the biological parents had a low or high intelligence. The researchers feel that this research shows the enormous impact the environment has on the formation of a child's intelligence. The preceding studies are only two of several that have criticized Rushton's (1988)work. While most of these have focused on flaws in his research design and conclusions, an important point that is not addressed is the profound affect his work may have on the interpretation of intelligence. Specifically, how the rationale employed in his research can have serious implications when adopted by others.

Misinterpretations: A World of Implications

The notion of misinterpretation of psychometrics in easily seen when one closely examines Rushton's work. Through his arguments relating intelligence (as measured by intelligence) to a number of other variables, he has made both causal and deterministic claims. Rushton's (1988) research would lend support to the argument that if someone knew your intelligence score, not only would they have an indication of your brain size, but they also would be able to infer several aspects of your personality, your sexual behavior, level of social organization, your likelihood of committing crime, the relative longevity of your lifespan, your degree of mental health, and numerous other factors relating to your psychological, social, and biological make-up. Rushton essentially took a simple measure of predicted future success in school and gave it much more meaning. It is this form of research which leads to the misinterpretation of psychometric measures. This occurs because once knowledge such as this reaches the public, [let alone Rushton him self] it is possible that persons will interpret the results of an intelligence test as a measure of all these variables even though the results remain controversial and the purpose and design of the test is solely to predict future success in school. Further, no psychometric measure has perfect predictive power or is wholly reliable or valid. This means that all results should be interpreted with caution and should not be seen as being causally related. However, most people do not possess the psychological or research background necessary to understand these limitations and thus are more prone to interpret the results as deterministic. Not only is this hurtful to the consumers of these tests, it is hurtful to the discipline of psychology as well. As noted previously, the ability to apply psychometrics to pragmatic purposes served to increase psychologists' social status so that their practices would be considered to be scientific, (which was, and still, is a major concern for many researchers and practitioners in the field), (Danziger, 1990). External observers may, however, question the legitimacy of the discipline when researchers in the field have trouble agreeing on what their tests can legitimately measure. Thus, an irony exists in that the tests that served to realize psychologists' goal of becoming accepted as a science may lead to its dismissal as nothing more than a pseudo-science. Also ironic is the fact that psychology seemed to embrace several research methods (i.e., statistics) that supposedly would make the discipline more scientific. It is the adoption of these methods and the rejection of alternate forms of inquiry that have lead to the current dilemma regarding misinterpretation. Rushton's (1988) work provides a clear example of how this is so.

According to Danziger (1990), in an attempt to become more scientific, psychology implicitly began to set criteria over what constituted "scientific" research. To a large extent this meant that aggregate data and statistical analysis were to become common place while qualitative data was to be rejected. It was felt that the use of statistics and aggregate data could lead researchers to discover actual laws of human behavior. As such, Danziger (1990) points out that the imposition of such arbitrary standards has led to a certain "methodolotry" whereby, it is expected today that almost all research should contain statistical analysis if it is to be published or acknowledged by ones peers as being legitimate. However, a problem lies in the fact that while statistics can be a useful tool for the psychologist, they can also be misleading in that they reveal relationships that are statistically significant but that are not necessarily meaningful. According to Cernovsky (1991) this is precisely what has happened with Rushton's (1988) work.

Cernovsky (1991) points out that Rushton's first statistical flaw is that he attached more meaning to his measures of correlation than were warranted by the data. For instance, Cernovsky states that of all the previous data cited by Rushton, the strongest correlation between brain size and intelligence is r= 0.35. Interpreted another way, this means that the amounts of variation in intelligence that can be accounted for by brain size is 12.3%. Considering that this is the strongest value, and that on average the relationship only explains 3.2% of the variance, one can see that the relationship is not as meaningful as Rushton's paper would have us believe. Yet, since the relationship is statistically significant the results were still deemed publishable.

A second statistical flaw with Rushton's (1988) work is in his attempt to generalize his results to races even though there is more variability within races than there is between races (Peters, 1991). A close look at the data reveals that there is enormous overlap in the distributions for each race. This occurs to such an extent that it would be erroneous for anyone to base judgment about intelligence solely on the race of an individual. In direct contrast to Rushton's theory, it is quite possible to have an individual from each of the measured races and find that their ordering of intelligence is Negroid > Caucasoid > Mongoloid. As Danziger (1990) points out, when we aggregate data on subjective categories, we tend to lose sight of individual differences. As such, individuals in each category might be misinterpreted to possess qualities that are not representative of themselves but solely of the group average. It appears that due to a drive to become more scientific, psychology has been willing to accept any statistical differences without first considering whether or not this difference is meaningful.

Another way in which psychologists' drive to make their discipline become more scientific has aided faulty research, such as Rushton`s, is due its marginal acceptance of non-empirical research. As has already been stated, the 'power brokers' of discipline highly regard empirical work and consequently researchers in the field are strongly persuaded to pursue this form of inquiry if they are wishing to be acknowledged. We have seen that even in criticism of Rushton's work, researchers have tended to dispute his numbers by calculating new numbers of their own. The main problem with this is that sometimes all that is needed to refute an empirical study is a logical examination of the data. This may not be undertaken, however, for fear that it will not be accepted for publication, or that is will not be accepted by one's peers, since this approach is "unscientific". An example of the possible benefits to be accrued by non-empirical (or theoretical) research is shown in a study by Cain and Vanderwolf (1990) that refuted Rushton's (1988) theory and did make it to publication. In their paper, Cain and Vanderwolf (1990) note that when previous research (cited by Rushton) has made comparisons of brain sizes, the resarchers have tended to adjust their calculations for differences in body mass between the races but not for differences between the sexes of each individual race. Cain and Vanderwolf also note that the average female is 12 - 15 % smaller in body size than the average male. It would be assumed then that the differences in brain size between the sexes would be just as great or greater than the differences between the races since the differences in body mass between the races is as large as between the sexes. Cain and Vanderwolf (1990) state that this would tremendously alter the ranking proposed by Rushton (1988). The ranking would most likely place men as superior to in intellect to women. While research has found that there are differences in the patterns of intelligence between the sexes, there is no research that proves there is an overall difference in intelligence between the sexes. This would present a serious problem for Rushton's theory since it would have to be changes to argue that the biological processes that cause differences in brain size and intelligence between the races are different in principle to the biological processes that cause brain size differences between the sexes, yet fail to result in differences of intelligence between the sexes. This paper appears to effectively discredit Rushton's theory without having to collect data or run analysis. In essence, this "unscientific" paper has done more to advance its viewpoint than most of its "scientific" equivalents. It appears that this example stresses the point made earlier, namely that despite opinions to the contrary, empirical research can easily foster destructive and poor quality research. Only by accepting non-empirical research as well as looking to see if statistical differences are truly meaningful can psychology avoid lending its name to faulty research such as Rushton's.

It might appear that ideas pertaining to misinterpretation might be of little value to the consumer of psychometric tests. After all, there are probably few people familiar enough with Rushton's work to remember all the variables he correlated with intelligence. However, as stated previously, this is only one example of misinterpretation. The main problem arises due to the fact that the research flaws mentioned earlier have become so common place that many psychological practitioners are constantly guilty of misinterpretation of psychometric results without even being aware of it. One example of this is a real life situation that I personally witnessed while spending some time working for the provincial Corrections Branch on projects involving the examination of youth in custody. During this time, I was able to examine several files, many of which contained a Young Offenders Act (YOA) assessment. This type of assessment basically contained the results of one to two weeks of psychological testing that was performed on youth in custody or remand, usually before sentencing. The purpose of the assessment was to establish the youth's level of mental and social functioning and then to use these measures in compiling recommendations for the court about the direction in which sentencing should proceed. It was felt that the psychologists would be able to make recommendations that would both aid in the rehabilitation of the youth and concurrently lower the chances of recidivism.

While the goal of the YOA assessments were (at the very least) noble, what soon became apparent was that the methods used to make decisions truly were flawed. Youth were given a battery of psychometric tests that included measures of intelligence and personality. It was on the basis of these decisions that the clinicians make their decisions regarding sentencing and the possibility of further crime. Like Rushton, the clinicians took a test that predicted future school performance and were using it for another purpose, that is, as a measure of recidivism. Who would have thought that one's intelligence could make the difference regarding whether or not one goes to jail. Fortunately, studies done in previous years had found that the YOA assessments offered little in alleviating the high rate of recidivism. What was found to be more effective was an examination of the youth's social history. This example clearly illustrates that the misinterpretation of psychometrics is prevalent and that it does impact on people's lives.

Summary

The purpose of this paper had been to document the use of misinterpretation of psychometric measures in psychology. Using the work of Rushton (1988) as an example, the paper has explained how various measures can be misinterpreted and how this can lead to erroneous conclusions in research. Danziger's (1990) idea of psychologists' struggle to make their profession be more scientific was presented as a catalyst of this problem. It was concluded that the problem of misattribution could be halted by carefully examining whether statistical differences are meaningful or not, and by allowing a forum for logical and theoretical examinations of research. As stated previously, the pragmatic application of psychometrics has proven beneficial to the discipline of psychology in that, for many, it has served to legitimize psychology as a science. It would be wasteful and inappropriate to cause harm to the legitimacy of the discipline by misinterpreting the products and uses of these tools.

References

Cain, D. P. and Vanderwolf, C. H. (1990). A critique of Rushton on race, brain size, and intelligence. Personality and Individual Differences, 11, 777-784.

Capron, C. and Duyme, M. (1989). Assessment of effects of socio-economic status on IQ in a full cross-fostering study. Nature, 340, 552-553.

Cernovsky, Z. Z. (1991). Intelligence and race: Further comments on J. P. Rushton's work. Psychological Reports, 68, 481-482.

Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge: Cambridge University Press.

Peters, M. (1991). Sex differences in human brain size and the general meaning of differences in brain size. Canadian Journal of Psychology, 45(4), 507-522.

Rushton, J. P. (1988). Race differences in behaviour. Personality and Individual Differences, 9(6), 1009-1024.

Sternberg, R. J. (1990). Metaphors of mind: Conceptions of the nature of intelligence. Cambridge: Cambridge University Press.

Vincent, K. (1991). Black/white IQ differences: Does age make the difference? Journal of Clinical Psychology, 47, 266-270.