Abstract
The robustness of scholarly peer review has been challenged by evidence of disparities in publication outcomes based on author’s gender and nationality. To address this, we examine the peer review outcomes of 23,873 initial submissions and 7,192 full submissions that were submitted to the biosciences journal eLife between 2012 and 2017. Women and authors from nations outside of North America and Europe were underrepresented both as gatekeepers (editors and peer reviewers) and last authors. We found a homophilic interaction between the demographics of the gatekeepers and authors in determining the outcome of peer review; that is, gatekeepers favor manuscripts from authors of the same gender and from the same country, The acceptance rate for manuscripts with male last authors was significantly higher than for female last authors, and this gender inequity was greatest when the team of reviewers was all male; mixed-gender gatekeeper teams lead to more equitable peer review outcomes. Similarly, manuscripts were more likely to be accepted when reviewed by at least one gatekeeper with the same national affiliation as the corresponding author. Our results indicated that homogeneity between author and gatekeeper gender and nationality is associated with the outcomes of scientific peer review. We conclude with a discussion of mechanisms that could contribute to this effect, directions for future research, and policy implications. Code and anonymized data, have been made available at https://github.com/murrayds/elife-analysis.
Author summary Peer review, the primary method by which scientific work is evaluated and developed, is ideally a fair and equitable process, in which scientific work is judged solely on its own merit. However, the integrity of peer review has been called into question based on evidence that outcomes often differ between between male and female authors, and between authors in different countries. We investigated such a disparity at the biosciences journal eLife, by analyzing the author and gatekeepers (editors and peer reviewers) demographics and review outcomes of all submissions between 2012 and 2017. We found evidence of disparity in outcomes that disfavored women and those outside of North America and Europe, and that these groups were underrepresented among, authors and gatekeepers. The gender disparity was greatest when reviewers were all male; mixed-gender reviewer teams lead to more equitable outcomes. Similarly, manuscripts were more likely to be accepted when reviewed by at least one gatekeeper from the same country as the corresponding author. Our results indicated that gatekeeper characteristics are associated with the outcomes of scientific peer review. We discuss mechanisms that could contribute to this effect, directions for future research, and policy implications.
Introduction
Peer review is foundational to the development, gatekeeping, and dissemination of research, while also underpinning the professional hierarchies of academia. Normatively, peer review is expected to follow the ideal of “universalism” [1], where scholarship is judged solely on its intellectual merit. However, confidence in the extent to which peer review accomplishes the goal of promoting, the best scholarship has been eroded by questions about whether social biases [2], based on or correlated with the characteristics of the scholar, could also influence outcomes of peer review [3,4]. This challenge to the integrity of peer review has prompted an increasing number of funding agencies and journals to assess the disparities and potential influence of bias in their peer review processes.
Several terms are often conflated in the discussion on bias in peer review. We use the term disparities to refer to unequal composition between groups, inequities to characterize unequal outcomes, and bias to refer to the degree of impartiality in judgment, Disparities and inequities have been widely studied in scientific publishing, most notably in regards to gender and country of affiliation. Globally women account for about 30 percent of scientific authorship [5] and are underrepresented in the scientific workforce [6,7], Articles authored by women are disproportionately underrepresented in the most prestigious and high-profile scientific journals [8–13]. Moreover, developed countries dominate the production of highly-cited publications [14,15].
The underrepresentation of authors from certain groups may reflect differences in submission rates, or it may reflect differences in success rates during peer review (percent of submissions accepted). Analyses of success rates have yielded mixed results in terms of the presence and magnitude of such inequities. Some analyses have found lower success rates for female-authored papers [16,17] and grant applications [18,19], while other studies have found no gender differences in review outcomes (for examples, see [20–23]). Inequities in journal success rates based on authors’ nationalities have also been documented, with reports that authors from English-speaking and scientifically-advanced countries have higher success rates [24,25]; however, other studies found no evidence that the language or country of affiliation of an author influences peer review outcomes [25–28]. These inconsistencies could be explained by several factors, such as the contextual factors of the studies and the variations in research design and sample size.
The nature of bias and its contribution to inequities in scientific publishing is highly controversial. Implicit bias;—the macro-level social and cultural stereotypes that can subtly influence everyday interpersonal judgements and thereby produce and perpetuate status inequalities and hierarchies [29,30]—has been suggested as a possible mechanism to explain differences in peer review outcomes based on socio-demographic and professional characteristics [3,31]. When faced with uncertainty—which is quite common in academia—people often weight the social status and other ascriptive characteristics of others to help make decisions [32], Hence, scholars are more likely to consider particularistic characteristics (e.g., gender, institutional status) of an author under conditions of uncertainty [33,34], such as at the frontier of new scientific knowledge [35]. However, given the stratification of scholars within institutions and across countries, it can be difficult to pinpoint the nature of a potential bias. For example, women are underrepresented in prestigious educational institutions [36–38], which conflates gender and prestige biases. These institutional differences can be compounded by gendered differences in age, professional seniority, research topic, and access to top mentors [39]. Another potential source of bias is what [40] dubbed cognitive particularism, where scholars harbor preferences for work and ideas similar to their own [41]. Evidence of this process has been reported in peer review at one journal, in the reciprocity (i.e., correspondences between patterns of recommendations received by authors and patterns of recommendations given by reviewers in the same social group) between authors and reviewers of the same race and gender [42] (see also [43,44]). Reciprocity can exacerbate or mitigate existing inequalities in science. If the work and ideas favored by gatekeepers are unevenly distributed across author demographics, this could be conducive to Matthew Effects [1], whereby scholars accrue accumulative advantages via a priori status privileges. Consistent with this, inclusion of more female reviewers was reported to attenuate biases that disfavor women in the awarding of Health ROl grants at the National Institute of Health [17]. However, an inverse reaction was found by [45] in the evaluation of candidates for professorships; when female evaluators were present, male evaluators became less favorable toward female candidates. Thus the nature and potential impact of cognitive biases during peer review are multiple and complex.
Another challenge is to disentangle the contribution of bias during peer review from factors external to the review process that could influence success rates. For example, there are gendered differences in access to funding, domestic responsibilities, and cultural expectations of career preferences and ability [46,47] that may adversely impact manuscript preparation and submission. Furthermore, women have been found to be less likely to compete [48] and hold themselves to higher standards [49], hence they may self-select a higher quality of work for submission to prestigious journals, At the country level, disparities in peer review outcomes could reflect structural factors related to a nation’s scientific investment [14,50], publication incentives [51,52], local challenges [53], and research culture [54], all of which could influence the actual and perceived quality of submissions from different nations. Because multiple factors external to the peer review process can influence peer review outcomes, unequal success rates for authors with particular characteristics do not necessarily reflect bias., conversely, equal success rates do not necessarily reflect a lack of bias.
Several approaches have been applied in an attempt to identify and analyze bias. One quasi-experimental approach was to compare outcomes in anonymized (double-blind) and non-anonymized (single-blind) peer review. The results of these studies demonstrates that double-blind review yields more equitable outcomes [31] and mitigates inequities that favor famous authors, elite institutions [55–57], and those from high-income and English-speaking nations [26]. However, although double-blind review is generally viewed positively by the scientific community [58,59], this process has not been widely used in peer review in the biosciences, and authors rarely opted-in when offered [60]. Therefore, the experimental approach has not resolved debate over the role of bias in the outcomes of peer review in scholarly publishing.
Here, we use an alternative approach to assess the extent to which gender and national disparities manifest in peer review outcomes at eLife—an open-access journal in the life and biomedical sciences. In particular, we study the extent to which the magnitude of these disparities vary across different gender and national compositions of gatekeeper teams. Peer review at eLife differs from other traditional forms of peer review used in the life sciences in that review at eLife is done through deliberation between reviewers (usually three in total) on an online platform. Previous studies have shown that deliberative scientific evaluation are influenced by social dynamics between evaluators [61,62]. Therefore, we assessed the extent to which the composition of the reviewer teams relates to peer review outcomes. Using all research papers (Research Articles, Short Reports, and Tools and Resources) submitted between 2012 and 2017 (n=23,879), we investigated the extent to which an interaction manifests between the gender and nationality of authors (first, last, and corresponding) and gatekeepers (editors and invited peer reviewers), similar to the approach used by [2]. We acknowledge that inequity in success rates could result from a variety of factors unrelated to the peer review process. However, if the outcomes vary significantly based on the demographic characteristics of the reviewers in relation to authors, we contend that this provides evidence of potential bias in the peer review process.
Data and methods
Consultative peer review and eLife
Founded in 2012 by the Howard Hughes Medical Institute (United States), the Max Planck Society (Germany), and the Wellcome Trust (United Kingdom), eLife is an open-access journal that publishes research in the life and biomedical sciences. Manuscripts submitted to eLife progress through several stages. In the first stage, the manuscript is assessed by a Senior Editor, who may confer with one or more Reviewing Editors and decide whether to reject the manuscript or encourage the authors to provide a full submission. In May 2018, eLife had 45 Senior Editors, including the Editor-in-Chief and three Deputy Editors, and 339 Reviewing Editors, all of whom were active scientists. When a full manuscript is submitted, the Reviewing Editor recruits a small number of peer reviewers (typically two or three) to write reports on the manuscript. The Reviewing Editor is encouraged to serve as one of the peer reviewers; in our sample, the Reviewing Editor was listed as a peer reviewer for 58.9 percent of full submissions. When all individual reports have been submitted, both the Reviewing Editor and peer reviewers discuss the manuscript and their reports using a private online chat system hosted by eLife. At this stage the identities of the Reviewing Editor and peer reviewers are known to one another. If the consensus of this group is to reject the manuscript, all the reports are usually sent to the authors. If the consensus is that the manuscript requires revision, the Reviewing Editor and additional peer reviewers agree on the essential points that need to be addressed before the paper can be accepted. In this case, a decision letter outlining these points is sent to the authors (and the original reports by the peer reviewers are not usually sent in their entirety to the authors). When a manuscript is accepted, the decision letter and the authors’ response are published along with the manuscript. The name of the Reviewing Editor is also published. Peer reviewers can also choose to have their name published. This process has been referred to as consultative peer review (see [63,64] for a more in-depth description of the eLife peer-review process), Consultative peer review provides a unique context for analyzing deliberation and social valuation in professional groups.
Data
We retrieved metadata for research papers submitted to eLife between its inception in 2012 and mid-September, 2017 (n=23,873). Submissions fell into three main categories: 20,945 Research Articles (87.7 percent), 2,186 Short Reports (9.2 percent), and 742 Tools and Resources (3.1 percent). Not included in this total were six Scientific Correspondence articles, which were excluded because they follow a distinct review process. Each record potentially included four submissions—an initial submission, full submission, and up to two revision submissions (though in some cases manuscripts remain in revision even after two revised submissions). Fig 1 depicts the flow of all 23,873 manuscripts through each review stage. The majority, 70.0 percent, of initial submissions for which a decision was made were rejected. Only 7,111 manuscripts were encouraged for a full submission. A total of 7,192 manuscripts were submitted as full submission; the number is slightly larger than encouraged initial submissions due to appeals of initial decisions and other special circumstances. Most full submissions, 52.4 percent (n = 3,767), received a decision of revise, while another 43.9 percent (n = 3,154) were rejected. A small number of full submissions (n = 54) were accepted without any revisions. On average, full submissions that were ultimately accepted underwent 1.23 revisions and, within our dataset, 3,426 full submissions were eventually accepted to be published. A breakdown of the number of revisions requested before a final decision is made, by gender and nationality of the last author, is provided in S1 Fig. On the date on which data was collected (mid-September, 2017), a portion of initial submission (n = 147) and full submissions (n = 602) remained in various stages of processing and deliberation (without final decisions). Another portion of initial and full submissions (n = 619) appealed their decision, causing some movement from decisions of “Reject” to decisions of “Accept” or “Revise”.
The review process at eLife was highly selective, and became more selective over time. Fig 2 shows that while the total count of manuscripts submitted to eLife has rapidly increased since the journal’s inception, the count of encouraged initial submissions and accepted full submissions has grown more slowly. The encourage rate (percentage of initial submissions encouraged to submit full manuscripts) was 44.6 percent in 2012, and dropped to 26.6 percent in 2016. The overall accept rate (percentage of initial submissions eventually accepted) began at 27.0 percent in 2012 and decreased to 14.0 percent in 2016. The accept rate (the percentage of accepted full submissions) was 62.4 percent in 2012 and decreased to 53.0 percent in 2016. While only garnering 307 submissions in 2012, eLife accrued 8,061 submissions in 2016. In the present analysis we considered the outcomes of all manuscripts without respect to submission year, though we note that data was skewed to the large portion of manuscripts published most recently.
In addition to authorship data, we obtained information about the gatekeepers involved in the processing of each submission. In our study, we define gatekeepers to include any Senior Editor or Reviewing Editor at eLife or invited peer reviewer involved in the review of at least one initial or full submission between 2012 and mid-September 2017. Gatekeepers at eLife often serve in multiple roles, for example, acting as both a Reviewing Editor and peer reviewer on a given manuscript. For initial submissions, we had data on the corresponding author of the manuscript and the Senior Editor tasked with making the decision. For full submissions we had data on the corresponding author, first author, last author, Senior Editor, Reviewing Editor, and members of the team of peer reviewers. Data for each individual included their stated name, institutional affiliation, and country of affiliation. A small number of submissions were removed, such as cases where a paper had a first but no last author or papers which did not have a valid submission type. Country names were manually disambiguated (for example, normalized names such as “USA” to “United States” and “Viet Nam” to “Vietnam”).
Full submissions included 6,669 distinct gatekeepers, 6,694 distinct corresponded authors, 6,691 distinct first authors, and 5,580 distinct last authors. Authors were also likely to appear on multiple manuscripts and may hold a different authorship role in each: in 26.5 percent of full submissions the corresponding author was also the first author, while in 71.2 percent of submissions the corresponding author was also the last author. We did not have access to the full authorship list that included middle authors. Note that in the biosciences, the last author is typically the most senior researcher involved [65] and responsible for more conceptual work, whereas the first author is typically less senior and performs more of the scientific labor (such as lab work, analysis, etc.) to produce the study [66–68].
Gender assignment
Gender variables for authors and gatekeepers were coded using an updated version of the algorithm developed in [5]. This algorithm used a combination of the first name and country of affiliation to assign each author’s gender on the basis of several universal and country-specific name-gender lists (e.g., United States Census). This list of names was complemented with an algorithm that searches Wikipedia for pronouns associated with names. This new list was validated by applying it to a dataset of names with known gender. We used data collected from RateMyProfessor.com, a website containing anonymous student-submitted ratings and comments for professors, lecturers, and teachers for professors at United States, United Kingdom and Canadian universities. We limited the dataset to only individuals with at least five comments, and counted the total number of gendered pronouns that appear in comments; if the total of one gendered-pronoun type was at least the square of the other, then we assigned the gender of the majority pronoun to the individual. To compare with pronoun-based assignment, we assigned gender using the previously detailed first-name based algorithm. In total, there were 384,127 profiles on RateMyProfessor.com that had at least five comments and for whom pronouns indicated a gender. Our first name-based algorithm assigned a gender of male or female to 91.26 percent of these profiles. The raw match-rate between these two assignments was 88.6 percent. Of those that were assigned a gender, our first name-based assignment matched the pronoun assignment in 97.1 percent of cases, and 90.3 percent of distinct first names. While RateMyProfessor.com and the authors submitting to eLife represent different populations (RateMyProfessor.com being biased towards teachers in the United States, United Kingdom, and Canada), the results of this validation provide some credibility to the first-name based gender assignment used here. We also manually identified gender for three Senior Editors and 24 Reviewing Editors for whom our algorithm did not assign gender by searching for them on the web and inspected resulting photos in order to determine if the individual was presenting as male or female.
Through the combination of manual efforts and our first-name based gender-assignment algorithm, we assigned a gender of male or female to 92.3 percent (n = 34,333) of the 37,195 name/role combinations that appear in our dataset. 26.0 percent (n = 9,675) were assigned a gender of female, 66.3 percent (n = 24,658) were assigned a gender of male, while the remaining 7.7 percent (n = 2,862) were assigned no gender. This gender distribution roughly matches the gender distribution observed globally across scientific publications [5].
Analysis
When comparing peer review outcomes between groups, we used χ2 tests of independence. We maintain the convention of 0.05 as the threshold of statistical significance, though we also report significance levels less than or equal to 0.1 as marginally significant. When visualizing our results, we superimposed the 95th percentile sample proportion confidence intervals. When comparing groups based on gender, we excluded submissions for which no gender could be identified. Data processing, statistical testing, and visualization was performed using R version 3.4.2 and RStudio version 1.1.383.
Results
Gatekeeper representation
We first analyzed whether the gender and national affiliations of the population of gatekeepers at eLife was similar to that of the authors of initial and full submissions (Fig 2). The population of gatekeepers was primarily comprised of invited peer reviewers, as there were far fewer Senior and Reviewing Editors. A gender breakdown by type of gatekeeper has been provided in SI Table, and a national breakdown is provided in S2 Table.
Fig 3 illustrates the gender and national demographics of authors and gatekeepers at eLife. The population of gatekeepers at eLife was largely male. Only 20.6 percent (n = 1,372) of gatekeepers were identified as female, compared with 26.4 percent (n = 4,803) of corresponding authors (includes authors of initial submissions), 33.6 percent (n = 2,256) of first authors, and 22.2 percent (n = 1,243) of last authors. The difference between the gender composition of gatekeepers and authors was statistically significant for corresponding authorship, χ2 (1, n = 16, 774) = 465.9, p < 0.0001; first authorship, χ2 (1, n = 6,087) = 837.6, p < 0.0001; and last authorship, χ2 (1, n = 5,162) = 16.4, p < 0.0001. Thus, the gender proportions of gatekeepers at eLife was. male-skewed in comparison to the authorship profile.
The population of gatekeepers at eLife was heavily dominated by those from North America, who constitute 59.9 percent (n = 3,992). Gatekeepers from Europe were the next most represented, constituting 32.4 percent (n = 2,161), followed by Asia with 5.7 percent (n = 379). Individuals from South America, Africa, and Oceania each made up less than two percent of the population of gatekeepers. As with gender, we identified significant differences between the international composition of gatekeepers and that of the authors. Gatekeepers from North America were over-represented whereas gatekeepers from Asia and Europe were under-represented compared to the population of corresponding authors, χ2 (5, n = 18,191) = 6904.6, p < 0.0001, first authors, χ2 (5, n = 6,670) = 480.4, p < 0.0001, and last authors, χ2 (5, n = 5,564) = 428.2, p < 0.0001. The international representation of gatekeepers was most similar to first and last authorship, and least similar to corresponding authorship. This likely resulted from the fact that our population of corresponding authors included initial submissions, which tend to be more internationally diverse than full submissions, for which we had information about first and last authors as well as corresponding authors.
Authorship, Gender, and Outcomes
Male authorship dominated eLife submissions: men accounted for 76.8 percent (n = 5,113) of gender-identified last authorships and 74.0 percent (n = 4,913) of gender-identified corresponding authorships of full submissions (see S2 Fig). First authorship of full submissions was closest to gender parity, although still skewed towards male authorship at 63.2 percent (n = 4,125).
We found small but statistically significant gender inequity favoring men in the outcomes of each stage of the review processes. The percentage of initial submissions encouraged was higher for male corresponding authors—30.6 to 28.6 percent, χ2 (1, n = 21,841) = 7.79, p < 0.01 (see S2 Fig). Likewise, the percentage of full submissions accepted was higher for male corresponding authofs—53.4 to 50.4 percent χ2 (1, n = 6,013) = 4.0, p < 0.05. The gender disparity at each stage of the review process yielded significantly higher overall accept rates (the percentage of initial submissions eventually accepted) for male corresponding authors (15.4 percent) compared with female corresponding authors (13.6 percent), χ2(1; n = 21,217) = 10.5; p < 0:01 (see S2 Fig).
Fig 4 shows the gendered acceptance rates of full submissions for corresponding, first, and last author. The greatest gender disparity in success rates was observed for last authors. The accept rate of full submissions was 3.7 percentage, points higher for male last authors—53.4 to 49.7 percent χ2 (1, n = 6,035) = 5.70, p < 0.05. There was no significant relationship between the gender of the first author and percentage of full submissions accepted, χ2 (1, n = 5,913) = 0.42, p > 0.1. Differences may be present at the intersection of gender and national affiliation (see S3 Fig), though the data was not sufficient to support statistically valid conclusions.
Gender homogeneity and peer review outcome
To examine the relationship between author-gatekeeper gender homogeneity on review outcomes, we analyzed the gender composition of the gatekeepers and authors of full submissions. Each manuscript was assigned a reviewer composition category of all-male, all-female, mixed, or uncertain. Reviewer teams labeled all-male and all-female were teams for which we could identify a gender for every member, and for which all genders were identified as either male or female, respectively. Teams labeled as mixed were those teams where we could identify a gender for at least two members, and which had at least one male and at least one female peer reviewer. Teams labeled as uncertain were those teams for which we could not assign a gender to every member and which were not mixed. A full submission is typically reviewed by two to three peer reviewers, which may or may not include the Reviewing Editor. However, the Reviewing Editor is likely to some degree always involved in the review process of a manuscript, and so we always considered the Reviewing Editor as a member of the reviewing team. Of 7,912 full submissions, a final decision of accept or reject was given for 6,590 during the dates analyzed; of these, 40.9 percent (n = 2,696) were reviewed by all-male teams, 1.2 percent (n = 81) by all-female teams, and 49.0 percent (n = 3,226) by mixed-gender teams; the remaining 587 reviewer teams were classified as uncertain.
Fig 4 illustrates higher acceptance rates for full submissions from male corresponding and last authors (submissions with authors of unidentified gender excluded). Fig 5 shows that this disparity manifested largely from instances when the reviewer team was all male. When all reviewers were male, the acceptance rate of full submissions was significantly higher for male compared to female last authors (Fig 5; χ2 (1, n = 2,472) = 6.6, p < 0.05) and corresponding authors (S3 Fig; χ2 (1,n = 2,472) = 4.5, p < 0.05). For mixed-gender reviewer teams, the disparity in author success rates by gender was smaller and non-significant. All-female reviewer teams were rare (only 81 of 6,509 processed full submissions). In the few cases of all-female reviewer teams, there was a higher acceptance rate for female last, corresponding, and first authors; however, this difference was not statistically significant, and the number of observations was too small to draw conclusions. There was no significant relationship between first authorship gender and acceptance rates, regardless of the gender composition of the reviewer team (see S3 Fig). In summary, we found that full submissions with male corresponding and last authors were more often accepted when they were reviewed by a team of gatekeepers consisting only of men; greater parity in outcomes was observed when gatekeeper teams contained both men and women. We refer to this favoring by reviewers of authors sharing the same gender as homophily.
Country of affiliation and peer review outcome
Fig 6 shows the proportions and rates of manuscripts submitted, encouraged, and accepted to eLife for corresponding authors originating from the eight most prolific countries (in terms of initial submissions). Manuscripts with corresponding authors from these eight countries accounted for a total of 73.9 percent of all initial submissions, 81.2 percent of all full submissions, and 86.5 percent of all accepted publications. Many countries were underrepresented in full and accepted submissions compared to their submissions. For example, while papers with Chinese corresponding authors accounted for 6.9 percent of initial submissions, they comprised only 3.0 percent of full and 2.4 percent of accepted submissions. The only countries that were overrepresented—making up a greater portion of full and accepted submissions than expected given their initial submissions—were the United States, United Kingdom, and Germany. In particular, corresponding authors from the United States made up 35.8 percent of initial submissions, yet constituted 48.5 percent of full submissions and the majority (54.9 percent) of accepted submissions.
Each stage of review contributed to the disparity of national representation between initial, full, and accepted submissions, with manuscripts from the United States, United Kingdom, and Germany more often encouraged as initial submissions, and accepted as full submissions. Figure 6 shows that initial submissions with a corresponding author from the United States were the most likely to be encouraged (39.2 percent), followed by the United Kingdom (31.7 percent) and Germany (29.3 percent). By contrast, manuscripts with corresponding authors from Japan, Spain, and China were least likely of these eight to be encouraged (21.4, 16.7, and 12.6 percent, respectively). These differences narrowed somewhat for full submissions; the accept rate for full submissions with corresponding authors from the U.S. was the highest (57.6 percent), though more similar to the United Kingdom and France than encourage rates. Full submissions from China, Spain, and Japan had the lowest acceptance rates of these eight countries
Country homogeneity and peer review outcomes
We also investigated the relationship between peer review outcomes and the presence of nationality-based homogeneity between the authors and reviewers. We defined national homogeneity as a condition with at least one member of the reviewer team (Reviewing Editor and peer reviewers) listing the same national affiliation as the corresponding author. We only considered the nationality of the corresponding author, since the nationality of the corresponding author was identical to the nationality of the first and last author for 94.1 and 94.5 percent of full submissions, respectively. Outside of the United States, the presence of country homogeneity during review was rare. While 90.5 percent of full submissions with corresponding authors from the U.S. were reviewed by at least one gatekeeper from their country, homogeneity was present for only 29.8 percent of full submissions with corresponding authors from the United Kingdom and 25.3 percent of those with a corresponding author from Germany. The likelihood of reviewer homogeneity falls sharply for Japan and China, which had author-reviewer homogeneity for only 7.6 and 6.4 percent of full submissions, respectively. More extensive details on the rate of author/reviewer homogeneity for each country can be found in S3 Table.
We examined whether author-reviewer homogeneity tended to result in the favoring of submissions from authors of the same country as the reviewer. We first pooled together all countries, as shown in Fig 7, and found that the presence of homogeneity during review was significantly associated with a higher accept rate, χ2 (1, n = 6,508) = 75.9, p < 0.0001. However, most cases of homogeneity occurred for authors from the United States, so this result could potentially reflect the higher accept rate for these authors, rather than homophily. Therefore we repeated the test, excluding all full submissions with corresponding authors from the United States, and we again found a significant homophilic effect, χ2 (1, n = 3,236) = 14.1, p < 0.001. We repeated once more, excluding full submissions with corresponding authors from the the United States, United Kingdom, and Germany, and we identified no homophilic effect, χ2 (1, n = 1,920) = 0.095, p > 0.1.
We also examined the effects of homogeneity wit hin individual nations and tested for the presence of homophilic effects. Fig 7 shows accept rates for the eight most prolific nations submitting to eLife. For the United States, there was a weak relationship between the percentage of accepted full submissions and the presence of national homogeneity, χ2 (1, n = 3,270) = 2.9, p =< 0.1. We observed a similar weak relationship for the United Kingdom, χ2 (1, n = 739) = 3.3, p < 0.1. For China, we observed a statistically significant homophilic relationship between the acceptance rate of full submissions and national homogeneity χ2 (1, n = 204) = 5.2, p < 0.05, We observed the inverse trend for France and Canada, where the presence of gatekeepers from the same country was associated with lower accept rates, though this trend was not statistically significant. In summary, we found that the presence of national homogeneity was rare unless an author was from the United States, but that author/reviewer homogeneity was often (though not always) associated with homophilic bias.
Discussion
We identified inequities in peer review outcomes at eLife, based on gender and nationality of the senior (last and corresponding) authors. We observed a significant disparity in the acceptance rates of submissions with male and female last authors, which favored men. Inequities were also observed by country of affiliation. In particular, submissions from highly developed countries, with high scientific capacities, tended to have higher success rates than others. These inequities in peer review outcomes could be attributed, at least in part, to an interaction between gatekeeper and author demographics, which can be described as homophily, or a preference based on shared characteristics: Gatekeepers were more likely to recommend a manuscript for acceptance if they shared demographic characteristics with the authors. In particular, manuscripts with male senior (last or corresponding) authors were more likely to be accepted if reviewed by an all-male reviewer panel rather than a mixed-gender panel. Similarly, manuscripts were more likely to be accepted if at least one of the reviewers was from the same country as the last or corresponding author. The differential outcomes on the basis of homophily suggests that peer review at eLife is influenced by some form of bias—be it implicit bias [3,16], geographic or linguistic bias [24,69,70], or cognitive particularism [40]. Specifically, a homophilic interaction suggests that peer review outcomes may sometimes be based on more than the intrinsic quality of manuscript; the composition of the review team is also related to outcomes in peer review.
The opportunity for homophilous interactions is determined by the demographics of the gatekeeper pool. We found that the demographics of the gatekeepers differed significantly from those of the authors, even for last authors, who tend to be more senior [65–68]. Women were underrepresented among eLife gatekeepers, and gatekeepers tended to come from a small number of highly-developed countries. The underrepresentation of women at eLife mirrors global trends—women comprise a minority of total authorships, yet constitute an even smaller proportion of gatekeepers across many domains [13,71–78]. Similarly, gatekeepers at eLife were less internationally diverse than their authorship, reflecting the general underrepresentation of the “global south” in leadership positions of international journals [79].
The demographics of the reviewer pool made certain authors more likely to benefit from homophily in the review process than others. US authors were much more likely than not (see S3 Table) to be reviewed by a panel with at least one reviewer from the US. However, the opposite was true for authors from other countries. Fewer opportunities for such homophily may result in a disadvantage for scientists from smaller and less scientifically prolific countries. For gender, male lead authors had a nearly 50 percent chance of being reviewed by a homophilous (all-male), rather than a, mixed-gender team. In contrast, because all-female reviewer panels were so rare (accounting for only 81 of 6,509 full submission decisions), female authors were highly unlikely to benefit from homophily in the review process.
Increasing eLife’s representation of women and scientists from a more diverse set of nations among editors may lead to more diverse reviewer pool and a more equitable peer review process. Editors often invite peer reviewers from their own professional networks, networks that likely reflect the characteristics of the editor [80–82]; this can lead to editors, who tend to be men [13,71—78] and from scientifically advanced countries [79] to invite peer reviewers who are cognitively or demographically similar to themselves [44,83,84], inadvertently excluding certain groups from the gatekeeping process. Accordingly, we found that male Reviewing Editors at eLife were less likely to create mixed-gender teams of gatekeepers than female Reviewing Editors (see S5 Fig). We observed a similar effect based on the nationality of the Reviewing Editor and invited peer reviewers (see S6 Fig).
The size of disparities we observe in peer review outcomes may seem modest, however these small disparities can accumulate through each stage of the review process (initial submission, full submission, revisions), and potentially affect the outcomes of many submissions. For example, the overall acceptance rate (the rate at which initial submissions were eventually accepted) for male and female corresponding authors was 15.4 and 13.6 percent respectively; in other words, manuscripts submitted to eLife with female lead authors were published at 88.3 percent the rate of those with male lead authors. Similarly, manuscripts submitted by lead authors from China were accepted at only 22.0 percent the rate of manuscripts submitted by a lead author from the United States (with overall acceptance rates of 4.9 and 22.3 percent, respectively). Success in peer review is vital for a researcher’s career because successful publication strengthens their professional reputation and makes it easier to attract funding, students, postdocs, and hence further publications. Even small advantages can compound over time and result in pronounced inequalities in science [85–88].
Our finding that the gender of the last authors is associated with a significant difference in the rate at which full submissions were accepted at eLife stands in contrast with a number of previous studies of journal peer review; these studies have found no significant difference in outcomes of papers submitted by male and female authors [55,89,90], or differences in reviewer’s evaluations based on the author’s apparent gender [91]. This discrepancy may may be explained in part by eLife’s unique context, policies, or the relative selectivity of eLife compared to venues where previous studies found gender equity. In addition, our results point to a key feature of study design that may account for some of the differences across studies, which is the consideration of multiple authorship roles. This is especially important for the biosciences, where authorship order is strongly associated with contribution [67,68,92]. Whereas our study examines the gender of the first, last, and corresponding authors, most previous studies have focused on the gender of the first author (e.g., [2,89,93]) or of the corresponding author (e.g., [21,94]). Like previous studies, we observed no strong relationship between first author gender and review outcomes at eLife. Only when considering lead authorship roles—last authorship, and to a lesser extent, corresponding author, did we observe such an effect. Our results may be better compared with studies of grant peer review, where leadership roles are more explicitly defined, and many studies have identified significant disparities in outcomes favoring men [17,18,95–98], although many other studies have found no evidence of gender disparity [20,22,23,99–101]. Given that science has grown increasingly collaborative and that average authorship per paper has expanded [102,103], future studies of disparities would benefit from explicitly accounting for multiple authorship roles and signaling among various leadership positions on the byline [65,104].
The interaction we found between the gender and nationality of the gatekeepers and peer review outcomes also stands in contrast to the findings from a number of previous studies. One study, [105], identified a homophilous relationship between female reviewers and female authors. However, most previous analyses found only procedural differences based on the gender of the gatekeeper [21,90,91,106] and identified no difference in outcomes based on the interaction of author and gatekeeper gender in journal submissions [90,107,108] or grant review [22], Studies of gatekeeper nationality have found no difference in peer review outcomes based on the nationality of the reviewer [107,109], though there is little research on the correspondence between author and reviewer gender. One past study examined the interaction between U.S. and non-U.S. authors and gatekeepers, but found an effect opposite to what we observed, such that U.S. reviewers tended to rate submissions of U.S. authors more harshly than those of non-U.S. authors [43]. Our results also contrast with the study most similar to our own, which found no evidence of bias related to gender, and only modest evidence of bias related to geographic region [2]. These discrepancies may result from our analysis of multiple author roles. Alternatively, they may result from the unique nature of eLife’s consultative peer review; the direct communication between peer reviewers compared to traditional peer review may render the social characteristics of reviewers more influential.
Limitations
There are limitations of our methodology that must be considered. First, we have no objective measure of the intrinsic quality of manuscripts. Therefore, it is not clear which review condition (homophilic or non-homophilic) more closely approximates the ideal of merit-based peer review outcomes. Second, measuring the interaction between reviewer and author demographics on peer review outcomes cannot readily detect biases that are shared by all reviewers/gatekeepers (e.g., if all reviewers, regardless of gender, favored manuscripts from male authors); hence, our approach could underestimate the influence of bias. Third, our analysis is observational, so we cannot establish causal relationships between success rates and authors or gatekeeper demographics. Along these lines, the reliance on statistical tests with arbitrary significance thresholds may provide misleading results (see [110]), or obfuscate statistically weak but potentially important relationships. Fourth, our gender-assignment algorithm is only a proxy for author gender and varies in reliability by continent.
Further studies will be required to determine the extent to which the effects we observe generalize to other peer review contexts. Specific policies at eLife, such as their consultative peer review process, may contribute to the effects we observed. Other characteristics of eLife may also be relevant, including its level of prestige [12], and its disciplinary specialization in the biological sciences, whose culture may differ from other scientific and academic disciplines. Future work is necessary to confirm and expand upon our findings, assess the extent to which they can be generalized, establish causal relationships, and mitigate the effects of these methodological limitations. To aid in this effort, we have made as much as possible of the data and analysis publicly available at (https://github.com/murrayds/elife-analysis).
Conclusion and recommendations
Many factors can contribute to gender, national, and other inequities in scientific publishing. This includes a variety of factors entirely external to peer review [46,50,111–114], which can affect the quantity and perceived quality of submitted manuscripts. However, these structural factors do not readily account for the observed interaction between gatekeeper and author demographics associated with peer review outcomes at eLife; rather, biases related to the personal characteristics of the authors and gatekeepers are likely to play some role in peer review outcomes.
Our results suggest that it is not only the form of peer review that matters, but also the composition of reviewers. Homophilous preferences in evaluation are a potential mechanism underpinning the Matthew Effect [1] in academia. This effect entrenches privileged groups while potentially limiting diversity, which could hinder scientific production, since diversity may lead to better working groups [115] and promote high-quality science [116,117]. Increasing gender and international representation among scientific gatekeepers may improve fairness and equity in peer review outcomes, and accelerate scientific progress. However, this must be carefully balanced to avoid overburdening scholars from minority groups with disproportionate service obligations.
Although some journals, such as eLife and Frontiers Media, have begun providing peer review data to researchers (see [44,118]), data on equity in peer review outcomes is currently available only for a small fraction of journals and funders. While many journals collect these data internally, they are not usually standardized or shared publicly. One group, PEERE, authored a protocol for open sharing of peer review data [119,120], though this protocol is recent, and the extent to which it will be adopted remains uncertain. To both provide better benchmarks and to incentivize better practices, journals should make analyses on author and reviewer demographics publicly available. These data include, but would not be limited to, characteristics such as gender, race, sexual orientation, seniority, and institution and country of affiliation. It is likely that privacy concerns and issues relating to confidentiality will limit the full availability of the data; but analyses that are sensitive to the vulnerabilities of smaller populations should be conducted and made available as benchmarking data.
Some high-profile journals have experimented with implementing double-blind peer review as a potential solution to inequities in publishing, including Nature [121] and eNeuro [11], though in some cases with mixed results [60]. In addition, journals are analyzing the demographics of their published authorship and editorial staff in order to identify key problem areas, focus initiatives, and track progress in achieving diversity goals [13,83,89]. Alternatives to traditional peer review have also been proposed, including open peer review, study pre-registration, consultative peer review, and hybrid processes (eg: [64,122–126]), as well as alternative forms of dissemination, such as preprint servers (e.g., arXiv, bioRxiv). Currently, there is little empirical evidence to determine whether these formats constitute less biased or more equitable alternatives [3].
More work should be done to study and understand the issues facing peer review and scientific gatekeeping in all its forms, and to promote fair, efficient, and meritocratic scientific cultures and practices. Editorial bodies should craft policies and implement practices that diminish disparities in peer review; they should also continue to be innovative and reflective about their practices to ensure that papers are accepted on scientific merit, rather than particularistic characteristics of the authors.
Competing interests
Wei Mun Chan and Andrew M. Collings are employed by eLife. Jennifer Raymond and Cassidy R. Sugimoto are Reviewing Editors at eLife. Andrew M. Collings was employed by PLOS between 2005 and 2012.
Acknowledgments
We are grateful for the editing and feedback provided by Susanna Richmond (Senior Editorial Assistant at eLife), Mark Patterson (Executive Director at eLife), Eve Marder, Anna Akhmanova, and Detlef Weigel (Deputy Editors at eLife). We are also grateful for the work of James Gilbert (Production Editor at eLife) for extracting the data used in this analysis. This work was partially supported by a grant from the National Science Foundation (SciSIP #1561299).
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.
- 10.
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.
- 87.
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.
- 97.
- 98.↵
- 99.↵
- 100.
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.
- 113.
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.
- 124.
- 125.
- 126.↵