As scientists, we are supposed to be objective and unbiased. We are trained to use sound scientific methods and experimental design to let the data speak for itself. By doing this, we remove our preconceptions and biases from the equation, because as human beings, we are both subjective and biased. Early this summer, there was quite a buzz about an article published in PLoS Biology (1) that debunked a paper (and prize winning book) by Stephen Jay Gould (1941–2002) that many of us probably read at some point in our academic training as scientists.
Stephen Gould was a well-known evolutionary biologist, paleontologist and science historian who (along with Niles Eldredge) is known for the theory of punctuated equilibrium. He was a prolific writer and his popular science essays and best selling books have been credited for increasing both public interest and understanding of science. In his paper, “Morteon’s rank of races by cranial capacity: unconscious manipulation of data may be a scientific norm” (2) and the following book, The Mismeasure of Man (3), Stephen Gould argued a case against Samuel Morton (1799–1851) to support his argument that “unconscious manipulation of data may be a scientific norm” because “scientist are human beings rooted in cultural contexts, not automatons directed toward external truth.”
Morton was a 19th century physician and physical anthropologist who was famous for his detailed measurements of nearly 1,000 human skulls from all over the world. At the time he took his measurements, Morton focused on cranium capacity, the skeletal equivalent of brain size, in hopes of determining if the different human populations were one species resulting from one (monogenesis) event or separate species arising from several (polygenesis) events. Although this question seems archaic and fraught with bigotry now, it was a major debate during the pre-Darwinian era of science in which Morton lived. In fact, Morton’s approach of objectively gathering data by measuring large numbers of specimens was groundbreaking in his day. Morton’s results ranked the populations (cranium size) in the order of Caucasians/”Malays”/blacks/”Mongolians”/Native Americans.
Gould took issue with what he inferred to be Morton’s equation of cranial capacity and intelligence, and he used his case study of Morton’s work to support his hypothesis that unconscious “finagling” or doctoring of data is common and unavoidable in science, a “profession that awards status and power for clean and unambiguous discovery”. In both his Science paper (2) and his book (3), Gould contended that Morton held priori bias toward elevating Caucasians above the other populations, and towards this end Gould charged that Morton had selectively reported data, manipulated sample composition, mismeasured skulls and made and ignored analytical errors all so that the results would support his (Morton’s) views on intelligence (i.e., cranial capacity) and differences in human populations. In fact, according to Gould’s analysis, there were only trivial differences between the populations Morton had measured. Virtually overnight, Samuel Morton became the poster child of scientific misconduct and an often cited example of how scientists are vulnerable to their own biases.
And so the story went for 30 years
The story would have continued thus were it not for a group of anthropologists who set about reassessing Morton’s results and Gould’s analysis of them. The team located and remeasured almost half of the skulls Morton originally measured. This is something Gould never did. His arguments were based solely on reanalyzing Morton’s measurements, which this team did as well. Then they turned their attention to Gould’s analysis and subsequent arguments, and that is where things get interesting.
Means and Bias
Gould claimed in his Science paper that Morton had selectively reported his data. ‘‘It is intriguing that Morton often reported Caucasian means by subsamples, which permitted him to assert the superiority of Teutons and Anglo-Saxons. But he never broke down the Indian mean.…Thus, the fact that some Indian subsamples (Iroquois at 91.5 in3, N = 4) exceeded the mean for Americans of Anglo-Saxon stock remained hidden in his raw data…” (2). Unfortunately this often quoted claim is false. Morton did report “Indian” subsample means; he did so at least 12 times in Crania Americana (4), the publication Gould was referring to. These subsample means did include the Iroquois numbers. Gould also claimed that Morton’s Native American average capacity was artificially decreased by using a straight mean (the average of each specimen in the entire sample) rather than a grouped mean (calculating the average of each subpopulation and then taking the mean of those means). This, Gould contended, would allow the numbers to be skewed by the differences in sample sizes of “large headed” versus “small headed” populations. However, if Morton had done his calculations the way Gould contended he should have, it would have resulted in a slight decrease in the Native American average (79.9in3 vs. 80.2in3).
Leaving aside which calculation method would be the best method to use, Morton clearly did not select his method to skew results toward his supposed bias. Yet when Gould reanalyzed Morton’s numbers, he calculated a higher average for the Native American skulls (83.8in3 vs. 79.9in3). How did he get this number? Well, Gould only used population samples with an n greater than 4, and then erroneously excluded 6 crania, all with small cranial capacities. Further, Gould only included skulls that Morton had measured both with mustard seed (his early measuring method) and with lead shot (his later method, which he adopted to eliminate the variation using seed might introduce). Interestingly, the authors point out, Gould did not use this same criteria when reanalyzing other populations.
Seeds and Shot
In his book, Gould speculated how Morton may have biased his seed measurements by loosely or tightly packing the seed into the skulls (3). Gould based his claims that Morton had mismeasured on his comparisons of Morton’s seed-based and lead shot-based measurements. In his reconstruction, Gould claimed that the average capacity for different groups had different increases when going from seed- to shot-based measurements, with the Caucasian skulls seeing the smallest increase. This led him to suspect a problem with the original seed-based measurements, and was his evidence for his famous “plausible scenario”.
The problem with Gould’s approach was that Morton reported individual seed-based measurements only in his volume Crania Americana (4), and these were only for Native American crania. Gould reported an average increase for these crania of 2.2in3. When the authors looked at the numbers and not just the average they found that there were increases and decreases, and these changes did not appear to be patterned by group; one skull in a subpopulation increased by 12in3 and another in the same subpopulation decreased by 5.5in3. This casts doubt on the idea that the mismeasurements were a result of bias. Since the only individual seed-based measurements Morton reported were for the Native American subpopulations, how did Gould arrive at his claims about the changes in other populations? Well, these authors contend, he must have done so by “guessing” which skulls had been included.
Morton himself acknowledged the likelihood of errors in the seed-based measurements. Some of the measurements in Crania Americana were done by an assistant, and Morton later found that this person had made errors. He stated as much in his publication Catalogue of skulls of man and the inferior animals, Third Edition (5).
Typo or Bias?
In the final table of Morton’s Crania Americana, the Native American mean cranial capacity was erroneously reported as 82.4in3 rather than 80.2in3. In this error, Gould saw Morton’s deliberate attempt to maintain his scale of Caucasion/Native American/Blacks. However, the correct value is given in the text, so the possibility of a typographical error in the table seems likely. In addition, the authors found reports of copies of Crania Americana inscribed by Morton with the number corrected and later reproductions of the table also contain the corrected value. This suggests that the error was recognized and corrected. Finally, the overall order of mean crania capacity didn’t change using either number; effectively removing Morton’s supposed motivation for allowing the error to go uncorrected.
The Irony
Of all the accusations Gould leveled against Morton, the authors of this study found only two to be substantiated. First, there were several errors in the summary table of Morton’s final catalog published in 1849. However, counter to Gould’s arguments, the authors found that had Morton not made these errors, the numbers would have actually supported his presumed bias better than the published numbers did.
Secondly, Morton undoubtedly believed in the idea of different races. This belief is clear in the opening pages of his Crania Americana, and Morton made no effort to hide them. Yet despite his bias, the authors found that Morton’s measurements are reliable and fully reported.
Ironically, it seems that it was Gould’s analysis that was flawed and influenced by his biases. Where the results reported in this study falsify Gould’s hypothesis that Morton manipulated his data, they also lend support his greater hypothesis that “Unconscious or dimly perceived finagling is probably endemic in science”, as his analysis of Morton is a strong example of bias influencing results.
The Warning and the Hope
When I read this paper, my first reaction was to shake my head and chuckle at the irony. My second reaction was sadness. For thirty years science has held up Gould’s analysis of Morton as a warning to new scientists. As this study’s authors point out, we now know that most variation in human populations is largely within rather than between subpopulations (6,7) and that cranial capacity variation is mostly a factor of climate (8). We found Morton’s views on race repugnant. We wanted Gould to be right and so we didn’t bother to evaluate the arguments critically. As scientists we failed; we liked the results so we didn’t bother to question them.
Some scientist have railed against Gould for “letting us down”, but were we not all capable of looking at the numbers and checking the facts? As an undergraduate assigned to read Gould’s Science paper as preparation for a discussion on ethics and self awareness in science, I couldn’t have remeasured Morton’s skulls, but I could have checked Gould’s calculations. I could have found Morton’s original results and checked Gould’s claims. I didn’t have to accept the results of the paper just because my professor assigned it. I was being trained to think critically, and I didn’t.
My third reaction was relief. Despite the occurrences of scientific misconduct and fraud that we hear about, the final message of this paper is that the scientific method, when properly applied, is sound. Morton, despite his biases, used methods that kept his biases from influencing his data. As scientists we can not be “automatons” as Gould pointed out, but we can teach and use sound scientific methods that will shield the outcome of our work from our inevitable biases. Finally, we owe it to ourselves and our colleagues to turn an equally critical eye to results we like as to those we don’t.
References
- Lewis JE, Degusta D, Meyer MR, Monge JM, Mann AE, & Holloway RL (2011). The mismeasure of science: Stephen Jay Gould versus Samuel George Morton on skulls and bias. PLoS biology, 9 (6) PMID: 21666803
- Gould, S.J. (1978) Morton’s ranking of races by cranial capacity: unconscious manipulation of data may be a scientific norm. Science 200, 503–509.
- Gould, S.J. (1981) The mismeasure of man. New York: W. W. Norton and Company.
- Morton, S.G. (1839) Crania Americana; or, a comparative view of the skulls of various aboriginal nations of North and South America: to which is prefixed an essay on the varieties of the human species. Philadelphia: J. Dobson.
- Morton, S.G. (1849) Catalogue of skulls of man and the inferior animals, Third Edition. Philadelphia: Merrihew and Thomson Printers.
- Brace, C.L. (2005) ‘‘Race’’ is a four-letter word: the genesis of the concept. New York: Oxford University Press.
- Cartmill, M. (1998) The status of the race concept in physical anthropology. Am Anthropol 100, 651–660.
- Beals, K.L., Smith. C.L., Dodd, S.M. (1984) Brain size, cranial morphology, climate, and time machines. Curr Anthropol 25, 301–330.
Kelly Grooms
Latest posts by Kelly Grooms (see all)
- Live-Cell Imaging: It’s Time to See What Else Your Luminescence Assays Can Tell You - December 5, 2024
- Don’t Flush Your Kitty Litter! Toxoplasmosis Is a Growing Threat to Sea Otters and Other Marine Mammals - November 12, 2024
- Tardigrade Proteins Might Solve the Cold Chain Problem for Biologics - October 17, 2024
Great post! This really shows the great responsibility we have as scientists to make sure we carefully and clearly report our data because people assume/trust that if it is published, it is correct.
It’s also important to remember that negative results are not bad results. Human nature makes us want to only share positive results, which may leave many important discoveries hidden.
Thanks Karen. What bothered me the most was that no one bothered to check the claims before or after they were published (including myself when I read the paper in 1991). You are right that we assume that something that has been published is correct. As scientists (and a society) I am afriad that is leading us down a dangerous path were it is not okay to question previously published results.
You also make a good point about publishing only positive results. Consider if only the positive results are reported, we may only be getting half the story. I don’t know how many people had a professor like I did who spent several lectures discussing the ethical responsibilities of scientists. They were fascinating discussions. I spend a semester working in his lab, where I learned that sometimes failure (of the experiment) no only is an option, sometimes it is the answer. He was fond of telling me “If everything worked the first time, they wouldn’t call it REsearch, they would just call it SEARCH.”
Great summary. Thank you for emphasizing the fact that Gould alleged an unconscious bias, rather than deliberate fraud. It strikes me is that Gould had a case against Morton. Where were the skulls obtained? Were they representative of the subpopulations sampled? Did the unevenness in the numbers of skulls in each group affect the outcome? Why did Morton not challenge the assumption that cranial capacity is liked to intelligence?
Friedrich Tiedemann tested the hypothesis that no differences exist in cranial capacity among groups by race or gender. Measurements of his skull collection yielded different results from Morton’s. Tiedemann’s study is subject to the same potential weaknesses, yet his results have not been challenged.
One lesson students draw from this case is to be scrupulously careful when critiquing another scientist. Morton did not make the link between cranial capacity and intelligence directly in Crania Americana or his other works, that I could find. However, he was delighted when phrenologist George Combe contributed an appendix that linked those two variables and made the implications explicit. If not a perpetrator, Morton was at least a bystander. Morton’s work may have focused on polygenism, but he missed an opportunity to challenge the bigotry of his peers and followers.
As Kelly implied above, research is an ongoing process. No single scientist – or group – has the final word, including Lewis and colleagues.
Hi Anne. Thank you for your comment. You are correct that Morton could have (and did not) challenge the assumption of cranial capacity being linked to intelligence. As a scientist and a human being, I wish that Morton would have stood up and challenged the bigotry of his time. Likewise, I wish that Gould’s peers and reviewers would have checked carefully and perhaps stood up to him if they needed to.
There probably should be some ethical concerns with how Morton’s skulls were obtained. I think that there is a good opportunity for using these points as a spring board for a discussion about showing respect for all test subjects dead or alive, human or animal.
As scientist, we should all be open to critique. After all, if our work can not stand up to challenges, then how strong are our results really? At the same time, we need to treat our fellow scientists with respect. Our critiques should be of the science and not of the person. It seems to me that science has always worked best as a collaborative effort, and, unfortunately, this is something that is getting lost in the high-pressure, highly competitive world of today’s science.
I noticed that you don’t mention at all what I still think was an important point of Gould’s – that Morton embarked on his measurements without knowing whether he had an even distribution of male to female skulls in his collection for each race, even though female skulls are on average smaller than males – a fact as well known in the nineteenth century as in ours. Therefore Morton chose to do his studies and draw the conclusions he wanted to on the races while it was perfectly within his capacity to understand the possibly invalid nature of his sample – which IMO does show a priori bias on his part.
I’ll take your word and The Mismeasure of Man’s word on the problems of the rest of Gould’s points.