In most animal species, males and females differ. This is true for people and other mammals, as well as many species of birds, fish and reptiles. But what about dinosaurs? In 2015, I proposed that variation found in the iconic back plates of stegosaur dinosaurs was due to sex differences.
I was surprised by how strongly some of my colleagues disagreed, arguing that differences between sexes, called sexual dimorphism, did not exist in dinosaurs.
I am a paleontologist, and the debate sparked by my 2015 paper has made me reconsider how researchers studying ancient animals use statistics.
The limited fossil record makes it hard to declare if a dinosaur was sexually dimorphic. But I and some others in my field are beginning to shift away from traditional black-or-white statistical thinking that relies on p-values and statistical significance to define a true finding.
Instead of only looking for yes or no answers, we are beginning to consider the estimated magnitude of sexual variation in a species, the degree of uncertainty in that estimate and how these measures compare to other species. This approach offers a more nuanced analysis to challenging questions in paleontology as well as many other fields of science.
Differences between males and females
Sexual dimorphism is when males and females of a certain species differ on average in a particular trait – not including their reproductive anatomy. Classic examples are how male deer have antlers and male peacocks have flashy tail feathers, while the females lack these traits.
Dimorphism can also be subtle and unflashy. Often the difference is one of degree, like differences in the average body size between males and females – as in gorillas. In these modest cases, researchers use statistics to determine whether a trait differs on average between males and females.
The dinosaur dilemma
Studying sexual dimorphism in extinct animals is fraught with uncertainty. If you and I independently dig up similar fossils of the same species, they are inevitably going to be slightly different.
These differences could be due to sex, but they could also be driven by age – young birds are fuzzy, adult birds are sleek. They could also be due to genetics unrelated to sex, like eye color in humans.
If paleontologists had thousands of fossils to study of every species, the many sources of biological variation wouldn’t matter as much. Unfortunately, the ravages of time have left the fossil record painfully incomplete, often with less than a dozen good specimens for large, extinct vertebrate species.
Additionally, there is currently no way to identify the sex of an individual fossil except in rare cases where obvious clues exist, like eggs preserved within the body cavity.
So where does all this leave the debate on whether male and female dinosaurs had differences within traits? On the one hand, birds – which are direct descendants of dinosaurs – commonly show sexual dimorphism. So do crocodilians, dinosaurs’ next closest living relatives.
Evolutionary theory also predicts that, since dinosaurs reproduced with sperm and egg, there would be a benefit to sexual dimorphism.
These things all suggest that dinosaurs likely were sexually dimorphic. But in science you need to be quantitative. The challenge is that there is little in the way of statistically significant analyses of the fossil record to support dimorphism.
There are a couple of ways paleontologists could test for sexual dimorphism. They could look to see if there are statistically significant differences between fossils from presumed males and females, but there are very few specimens where researchers know the sex.
Another method is to see whether there are two distinct groupings of a trait, called a bimodal distribution, which could suggest a difference between males and females.
To tell whether a perceived difference between two groups is true, scientists have traditionally used a tool called the p-value. P-values quantify the probability of a result being due to random chance. If a p-value is low enough, the result is deemed “statistically significant” and considered unlikely to have happened by chance.
But p-values can be heavily influenced by sample size and the design of the study, in addition to the actual degree of sexual dimorphism. Because of the very small sample size of fossils, relying on this statistical technique makes it exceedingly difficult to categorically proclaim what dinosaur species were dimorphic.
The weakness of the black-or-white approach that focuses solely on whether a result is statistically significant has led to hundreds of scientists calling to abandon significance testing with p-values in favor of something called effect size statistics.
Using this approach, researchers would simply report the measured difference between two groups and the uncertainty in that measurement.
Effect size statistics
I have begun to apply effect size statistics in my research on dinosaurs. My colleagues and I compared sexual dimorphism in body size between three different dinosaurs: the duck-billed Maiasaura, Tyrannosaurus rex and Psittacosaurus, a small relative of Triceratops.
None of these species would be expected to show statistically significant size differences between males and females according to p-values. But that approach does not capture the nature of the variation within these species.
When we instead used effect size statistics, we were able to estimate that male and female Maiasaura demonstrate a greater difference in body mass compared to the other two species and that we had a higher confidence in this estimate as well. A few of the characteristics within the data helped reduce the uncertainty.
First, we had a large number of Maiasaura fossils, from individuals of various ages. These bones very nicely fit with trajectories of how size changes as an individual grows from juvenile to adult, so we could control for differences due to age and instead focus on differences due to sex.
Additionally, the Maiasaura fossils all come from a single bone bed of individuals that died in the same place at the same time. This means that variation between individuals is likely not due to them being different species from different regions or time periods.
If my colleagues and I had approached the problem expecting a yes or no answer on whether males and females differed in size, we would have completely missed all of these intricacies.
Effect size statistics allow researchers to produce much more nuanced and, I think, informative results. It is almost as much a difference in the philosophical approach to science as it is a mathematical one.
Studying dinosaur dimorphism is not the only place p-values create issues. Many fields of science, including medicine and psychology, are having similar debates about issues in statistics and a worrying problem of unrepeatable studies.
Embracing uncertainty in data – rather than looking for black-or-white answers to questions like whether male and female dinosaurs were sexually dimorphic – can help elucidate dinosaur biology. But this shift in thinking may be felt far and wide across the sciences. A careful consideration of problems within statistics could have deep impacts across many fields.