In January, Science Advances published a massive project analyzing the peer-review outcomes of 350,000 manuscripts from 145 journals that found no evidence of gender bias following manuscript submission. Just a month earlier, my colleagues and I published in mBio a similar, though smaller-scale, study that analyzed the peer-review outcomes from 108,000 manuscript submissions to 13 American Society for Microbiology (ASM) journals. Our study found a consistent trend for manuscripts submitted by women corresponding authors to receive more negative outcomes than those submitted by men. Both projects analyzed six years’ worth of submission data that are only available to journal publishers but came to different conclusions.
In November 2020, Nature Communications published a paper concluding that women trainees should seek out male advisors because they are better mentors than women. This conclusion opposes data showing that female role models improve performance and retention of women in STEM. A closer reading of the Nature Communications paper reveals that after finding that men are cited more often than women, the authors reached their conclusion by equating citations with the quality of mentorship. Furthermore, the authors did not include a robust literature review, which would have contextualized their results and refined their conclusion. After a push from scientists on social media, the article’s publication was investigated, and the paper has since been retracted.
While such studies may be conducted with good intentions, they are dangerous if not conducted properly. Conflicting results can reduce trust in the results of equity-based issues—preexisting beliefs are difficult to change—and obstruct policy changes to reduce said equity issues. Similarly, the recent study of gender bias in peer review may seem robust with a large dataset and an analysis that asks the right questions, but a closer look reveals missed opportunities that instead cloud discussions of equity in peer review. Below are three issues with the larger, more recent study that may have affected the results.
The journal selection is not robust
This analysis included manuscript submission and peer-review outcomes from three major for-profit publishers, Elsevier, Wiley, and Springer Nature. These publishers are responsible for more than 6,000 journals, including Cell, The Lancet, The BMJ, and Nature. Instead of using a random selection process, the publishers chose 157 journals that were grouped into four fields. From that database, the authors eliminated journals that lacked journal impact factors, were published by “learned societies, or [had] specific legal status.” The rationale and criteria for the selection process was unclear, and it resulted in poor representation of social science journals (20/157) and an otherwise insufficiently robust or rigorous sampling to yield broadly generalizable results.
Each manuscript submission is treated as a single unit
Upon submission, manuscripts are assigned a unique number, and while journal protocols vary, these manuscript numbers can be used to track a manuscript through multiple outcomes at a single journal or journals within a franchise, particularly when title and author data are available. The selected publishers each maintain journal franchises with tiered publishing structures. For instance, manuscripts rejected from Nature may instead be published in Nature Chemistry or Scientific Reports.
Unfortunately, in the Science Advances paper, journal titles were not available to see if this was likely. The authors noted that the highest journal impact factor in their dataset, which covered papers published between 2010 and 2016, was 10. Many journals in these publishing franchises have high impact factors, often between 16 and 35, which seems to exclude them from the study. However, in 2016, journals including Lancet HIV, BMC Medicine, Nature Protocols, and Cell Reports had impact factors less than 10, so it is possible that this scenario applies to manuscripts in this study.
Furthermore, by treating each manuscript submission as a single unit, rather than linking a manuscript through multiple submissions and rejections (e.g., by titles, authors, or related manuscript numbers), the analysis fails to capture the whole story. Not only is it unclear if a manuscript was rejected by other journals before being accepted, but the analysis obscures other gender-based penalties. For instance, the amount of time that women authors spend completing revisions— our mBio study showed an additional 1 to 9 days, despite similar decision times and an equivalent number of revisions—may point to differences in reviewer suggestions, available resources, and/or publication output.
Desk rejections are not evaluated
Peer review is most frequently associated with the lengthy, occasionally abusive, feedback provided by two or more fellow academics. Conversely, the role of editors in the process may be overlooked or excluded despite their academic and field-specific expertise. In fact, editors are the first peers whose expectations must be met or exceeded, and frequently their decisions are unilateral. Accordingly, editorial rejections (so-called “desk rejections”) were the greatest source of gender-based outcomes in our ASM study. Failing to evaluate this crucial step in the process ignores a large potential source of bias. The authors explained their focus on papers that went to review by stating that these “data on desk rejections were not consistently available.” However, with a data set of more than 300,000 reviewed manuscripts from 145 journals, it is reasonable to conclude that they had sufficient data for a robust analysis of this stage in peer review.
Like the study that equated citations with mentorship, the Science Advances paper missed the mark. While the authors were careful to frame their discussion in the context of previous literature, they did not evaluate all potential outcomes of the peer-review process, thus glossing over many potential sources of inequity. This clouds the discussion surrounding the role of journals in scientific inequity and prevents accountability and change at multiple levels, from the individual to the journal and publisher.
The bottom line is that a robust sampling system, investigating journal franchises, and evaluating editorial rejections should have been reviewer requirements. However, journals do not have the infrastructure to appropriately evaluate equity-based studies, as is evidenced by the growing retraction of sexist and racist papers. This is largely because editors and reviewers have little to no training and/or expertise in studying equity (race/gender/disability, etc.) issues. Our experience as humans and scientists blinds us to the fact that conducting and evaluating studies of STEM equity issues require field-specific expertise that not all scientists have. There is an urgent need for editors to be aware of this when vetting equity-based research, and to act accordingly, preferably by ensuring that such manuscripts adhere to an equity rubric and that the appropriate reviewers are recruited and compensated. Whether by maintaining a pool of equity researchers to review equity-based papers submitted to science journals, or by some other means, publishers and journals must either step up and enforce robust reviews or stop accepting them. Anything else is unethical.
Ada Hagan is a microbiologist with a passion for making science accessible. In 2019, Hagan founded Alliance SciComm & Consulting, LLC to enable her to use her strong background in communications and higher education to help make scientific concepts more easily understood and make the academy more inclusive to future scientists from all backgrounds.
Response: Despite Limitations, Study Offers Clues to Gender Bias
Estimating possible sources of gender inequalities in peer review and editorial processes at scholarly journals is a difficult endeavor for various reasons. There are serious obstacles to solid, cross-journal experimental studies testing causal hypotheses by manipulating information and contexts of manuscripts, authors, and referees. So performing retrospective studies is the only option, and this is also far from simple due to the lack of a data-sharing infrastructure between publishers and journals. While this makes any generalization of findings problematic and limited, I believe it is essential to study peer review with large-scale, cross-journal data, to avoid overemphasizing individual cases. This is what we have tried to do in our recent Science Advances article.
While we knew our findings could be controversial, I am surprised by the way Ada Hagan has misinterpreted our research and would like to comment on the three points on which she based her opinion.