Abstract
The fairness of scholarly peer review has been challenged by evidence of disparities in publication outcomes based on author demographic characteristics. To assess this, we conducted an exploratory analysis of peer review outcomes of 23,876 initial submissions and 7,192 full submissions that were submitted to the biosciences journal eLife between 2012 and 2017. Women and authors from nations outside of North America and Europe were underrepresented both as gatekeepers (editors and peer reviewers) and authors. We found evidence of a homophilic relationship between the demographics of the gatekeepers and authors and the outcome of peer review; that is, there were higher rates of acceptance in the case of gender and country homophily. The acceptance rate for manuscripts with male last authors was seven percent, or 3.5 percentage points, greater than for female last authors (95% CI = [0.5, 6.4]); this gender inequity was greatest, at nine percent or about 4.8 percentage points (95% CI = [0.3, 9.1]), when the team of reviewers was all male; this difference was smaller and not significantly different for mixed-gender reviewer teams. Homogeny between countries of the gatekeeper and the corresponding author was also associated with higher acceptance rates for many countries. To test for the persistence of these effects after controlling for potentially confounding variables, we conducted a logistic regression including document and author metadata. Disparities in acceptance rates associated with gender and country of affiliation and the homophilic associations remained. We conclude with a discussion of mechanisms that could contribute to this effect, directions for future research, and policy implications. Code and anonymized data have been made available at https://github.com/murrayds/elife-analysis
Author summary Peer review, the primary method by which scientific work is evaluated, is ideally a fair and equitable process in which scientific work is judged solely on its own merit. However, the integrity of peer review has been called into question based on evidence that outcomes often differ between male and female authors, and for authors in different countries. We investigated such disparities at the biosciences journal eLife by analyzing the demographics of authors and gatekeepers (editors and peer reviewers), and peer review outcomes of all submissions between 2012 and 2017. Outcomes were more favorable for male authors and those affiliated with institutions in North America and Europe; these groups were also over-represented among gatekeepers. There was evidence that peer review outcomes were influenced by homophily —a preference of gatekeepers for manuscripts from authors with shared characteristics. We discuss mechanisms that could contribute to this effect, directions for future research, and policy implications.
Introduction
Peer review is foundational to the development, gatekeeping, and dissemination of research, while also underpinning the professional hierarchies of academia. Normatively, peer review is expected to follow the ideal of “universalism” [1], whereby scholarship is judged solely on its intellectual merit. However, confidence in the extent to which peer review promotes the best scholarship has been eroded by questions about whether social biases [2], based on or correlated with the demographic characteristics of the scholar, could also influence outcomes of peer review [3–5]. This challenge to the integrity of peer review has prompted increased interest in assessment of the disparities and potential influence of bias in their peer review processes.
Several terms are often conflated in the discussion of bias in peer review. We use the term disparities to refer to unequal composition between groups, inequities to characterize unequal outcomes, and bias to refer to the degree of impartiality in judgment or lack thereof. Disparities and inequities have been widely studied in scientific publishing, most notably with regard to gender and country of affiliation. Globally, women account for only about 30 percent of scientific authorships [6] and are underrepresented even when compared to their in the numbers in the scientific workforce [7, 8]. Underrepresentation of articles authored by women is most pronounced in the most prestigious and high-profile scientific journals [9–14]. Similar disparities are observed across countries, for which developed countries dominate the production of highly-cited publications [15, 16].
The under-representation of authors from certain groups may reflect differences in submission rates, or it may reflect differences in success rates during peer review (percent of submissions accepted), or both. Analyses of success rates have yielded mixed results in terms of the presence and magnitude of such inequities. Some analyses have found lower success rates for female-authored papers [17, 18] and grant applications [19, 20], while other studies have found no gender differences in review outcomes (for examples, see [21–25]). Inequities in journal success rates based on authors’ nationalities or country of affiliation have also been documented, with reports that authors from English-speaking and scientifically-advanced countries have higher success rates [26, 27]; however, other studies found no evidence that the language or country of affiliation of an author influences peer review outcomes [27–29]. These inconsistencies could be explained by several factors, such as the contextual characteristics of the studies (e.g., country, discipline) and variations in research design and sample size. Another possible explanation is that these gender and national disparities emerge from bias in peer review.
The possibility that bias contributes to inequities in scientific publishing and the nature of any such bias is highly controversial. Implicit bias—the macro-level social and cultural stereotypes that can subtly influence everyday interpersonal judgments and thereby produce and perpetuate status inequalities and hierarchies [30, 31]—has been suggested as a possible mechanism to explain differences in peer review outcomes based on socio-demographic and professional characteristics [3]. When faced with uncertainty—which is quite common in peer review—people often weight the social status and other ascriptive characteristics of others to help make decisions [32]. Hence, scholars are more likely to consider particularistic characteristics (e.g., gender, institutional prestige) of an author under conditions of uncertainty [33, 34], such as at the frontier of new scientific knowledge [35]. However, given the demographic stratification of scholars within institutions and across countries, it can be difficult to pinpoint the nature of a potential bias. For example, women are underrepresented in prestigious educational institutions [36–38], which conflates gender and prestige biases. These institutional differences can be compounded by gendered differences in age, professional seniority, research topic, and access to top mentors [39]. Another potential source of bias is what is dubbed cognitive particularism [40], whereby scholars harbor preferences for work and ideas similar to their own [41]. Evidence of this process has been reported in peer review in the reciprocity (i.e., correspondences between patterns of recommendations received by authors and patterns of recommendations given by reviewers in the same social group) between authors and reviewers of the same race and gender [42] (see also [43, 44]). Reciprocity can exacerbate or mitigate existing inequalities in science. If the work and ideas favored by gatekeepers are unevenly distributed across author demographics, this could be conducive to Matthew Effects [1], whereby scholars accrue accumulative advantages via a priori status privileges. Consistent with this, inclusion of more female reviewers was reported to attenuate biases that favored men in the awarding of RO1 grants at the National Institute of Health [18]. However, an inverse relationship was found in the evaluation of candidates for professorships [45] when female evaluators were present, male evaluators became less favorable toward female candidates. Thus the nature and potential impact of cognitive biases during peer review are multiple and complex.
Another challenge is to disentangle the contribution of bias during peer review from factors external to the review process that could influence success rates. For example, there are gendered differences in access to funding, domestic responsibilities, and cultural expectations of career preferences and ability [46, 47] that may adversely impact manuscript preparation and submission. On the other hand, women have been found to hold themselves to higher standards [48] and be less likely to compete [49], hence they may self-select a higher quality of work for submission to prestigious journals. At the country level, disparities in peer review outcomes could reflect structural factors related to a nation’s scientific investment [15, 50], publication incentives [51, 52], local challenges [53], and research culture [54], all of which could influence the actual and perceived quality of submissions from different nations. There are also several intersectional issues: there are, for example, differences in socio-demographic characteristics of the scientific workforce across countries—e.g., more women from some countries and disproportionately less professionally-senior women in others [6]. Because multiple factors external to the peer review process can influence peer review outcomes, unequal success rates for authors with particular characteristics do not necessarily reflect bias in the peer review process itself; conversely, equal success rates do not necessarily reflect a lack of bias.
Here, we assessed the extent to which bias contributes to gender and country disparities in peer review outcomes by analyzing the extent to which the magnitude of these disparities vary across different gender and country compositions of gatekeeper teams. In particular, we focused on the notion of homophily between the reviewers and authors. This analysis examined the outcomes of peer review at eLife—an open-access journal in the life and biomedical sciences. Peer review at eLife differs from other traditional forms of peer review used in the life sciences in that it is done through deliberation between reviewers (usually three in total) on an online platform. Previous studies have shown that deliberative scientific evaluation is influenced by social dynamics between evaluators [55, 56]. We examine how such social dynamics manifest in eLife’s deliberative peer review by assessing the extent to which the composition of reviewer teams correlates with peer review outcomes. Using all research papers (Research Articles, Short Reports, and Tools and Resources) submitted between 2012 and 2017 (n=23,876), we investigate the extent to which a relationship emerges between the gender and country of affiliation of authors (first, last, and corresponding) and gatekeepers (editors and invited peer reviewers), extending the approach used in previous work [2].
Consultative peer review and eLife
Founded in 2012 by the Howard Hughes Medical Institute (United States), the Max Planck Society (Germany), and the Wellcome Trust (United Kingdom), eLife is an open-access journal that publishes research in the life and biomedical sciences. Manuscripts submitted to eLife progress through several stages. In the first stage, the manuscript is assessed by a Senior Editor, who may confer with one or more Reviewing Editors and decide whether to reject the manuscript or encourage the authors to provide a full submission. When a full manuscript is submitted, the Reviewing Editor recruits a small number of peer reviewers (typically two or three) to write reports on the manuscript. The Reviewing Editor is encouraged to serve as one of the peer reviewers. When all individual reports have been submitted, both the Reviewing Editor and peer reviewers discuss the manuscript and their reports using a private online discussion system hosted by eLife. At this stage the identities of the Reviewing Editor and peer reviewers are known to one another. If the consensus of this group is to reject the manuscript, all the reports are usually sent to the authors. If the consensus is that the manuscript requires revision, the Reviewing Editor and additional peer reviewers agree on the essential points that need to be addressed before the paper can be accepted. In this case, a decision letter outlining these points is sent to the authors (the original reports are not usually released in their entirety to the authors). When a manuscript is accepted, the decision letter and the authors’ response are published along with the manuscript. The name of the Reviewing Editor is also published. Peer reviewers can also choose to have their name published. This process has been referred to as consultative peer review (see [57, 58] for a more in-depth description of the eLife peer-review process).
Data and methods
Data
Metadata for research papers submitted to eLife between its inception in 2012 and mid-September, 2017 (n=23,876) were provided to us by eLife for analysis. As such, these data were considered a convenience sample. Submissions fell into three main categories: 20,948 Research Articles (87.7 percent), 2,186 Short Reports (9.2 percent), and 742 Tools and Resources (3.1 percent). Not included in this total were six Scientific Correspondence articles, which were excluded because they followed a distinct and separate review process. Each record potentially listed four submissions—an initial submission, full submission, and up to two revision submissions (though in some cases manuscripts remained in revision even after two revised submissions). Fig 1 depicts the flow of all 23,876 manuscripts through each review stage. The majority, 70.0 percent, of initial submissions for which a decision was made were rejected. Only 7,111 manuscripts were encouraged to submit a full submission. A total of 7,192 manuscripts were submitted as a full submission; this number was slightly larger than encouraged initial submissions due to appeals of initial decisions and other special circumstances. Most full submissions, 52.4 percent (n = 3,767), received a decision of revise, while 43.9 percent (n = 3,154) were rejected. A small number of full submissions (n = 54) were accepted without any revisions. On the date that data were collected (mid-September, 2017), a portion of initial submission (n = 147) and full submissions (n = 602) remained in various stages of processing and deliberation (without final decisions). On average, full submissions that were ultimately accepted underwent 1.23 revisions and, within our dataset, 3,426 full submissions were eventually accepted to be published. A breakdown of the number of revisions requested before a final decision was made, by gender and country of affiliation of the last author, is provided in S1 Fig. A portion of initial and full submissions (n = 619) appealed their decision, causing some movement from decisions of “Reject” to decisions of “Accept” or “Revise”; counts of appeals by the gender of author and gatekeepers is shown in S2 Fig.
The review process at eLife is highly selective, and became more selective over time. While only garnering 307 submissions in 2012, eLife accrued 8,061 submissions in 2016. Fig 2 shows that while the total count of manuscripts submitted to eLife has rapidly increased since the journal’s inception, the count of encouraged initial submissions and accepted full submissions has grown more slowly. The encourage rate (percentage of initial submissions encouraged to submit full manuscripts) was 44.6 percent in 2012, and dropped to 26.6 percent in 2016. The acceptance rate (the percentage of accepted full submissions) was 62.4 percent in 2012 and decreased to 53.0 percent in 2016. The overall acceptance rate (percentage of initial submissions eventually accepted) began at 27.0 percent in 2012 and decreased to 14.0 percent by 2016.
In addition to authorship data, we obtained information about the gatekeepers involved in the processing of each submission. We defined gatekeepers as any Senior Editor or Reviewing Editor at eLife or invited peer reviewer involved in the review of at least one initial or full submission between 2012 and mid-September 2017. Gatekeepers at eLife often served in multiple roles; for example, acting as both a Reviewing Editor and peer reviewer on a given manuscript, or serving as a Senior Editor on one manuscript, but an invited peer review on another. In our sample, the Reviewing Editor was listed as a peer reviewer for 58.9 percent of full submissions. For initial submissions, we had data on only the corresponding author of the manuscript and the Senior Editor tasked with making the decision. For full submissions we had data on the corresponding author, first author, last author, Senior Editor, Reviewing Editor, and members of the team of invited peer reviewers. Data for each individual included their stated name, institutional affiliation, and country of affiliation. A small number of submissions were removed, such as those that had a first but no last author (reflecting compromised data record—even a single-authored manuscript should duplicate authors across all roles) and those that did not have a valid submission type. Country names were manually disambiguated (for example, by normalizing names such as “USA” to “United States” and “Viet Nam” to “Vietnam”). To simplify continent-level comparisons, we also excluded one submission for which the corresponding author listed their affiliation as Antarctica.
Full submissions included 6,669 distinct gatekeepers, 5,694 distinct corresponding authors, 6,691 distinct first authors, and 5,581 distinct last authors. Authors were also likely to appear on multiple manuscripts and may have held a different authorship role in each: whereas our data included 17,966 distinct combinations of author name and role, this number comprised only 12,059 distinct authors. For 26.5 percent of full submissions the corresponding author was also the first author, whereas for 71.2 percent of submissions the corresponding author was the last author. We did not have access to the full authorship list that included middle authors. Note that in the biosciences, the last author is typically the most senior researcher involved [59] and responsible for more conceptual work, whereas the first author is typically less senior and performs more of the scientific labor (such as lab work, analysis, etc.) to produce the study [60–62].
Gender assignment
Gender variables for authors and gatekeepers were coded using an updated version of the algorithm developed in [6]. This algorithm used a combination of the first name and country of affiliation to assign each author’s gender on the basis of several universal and country-specific name-gender lists (e.g., United States Census). This list of names was complemented with an algorithm that searched Wikipedia for pronouns associated with names.
We validated this new list by applying it to a dataset of names with known gender. We used data collected from RateMyProfessor.com, a website containing anonymous student-submitted ratings and comments for professors, lecturers, and teachers for professors at universities in the United States, United Kingdom, and Canada. We limited the dataset to only individuals with at least five comments, and counted the total number of gendered pronouns that appeared in their text; if the total of one gendered-pronoun type was at least the square of the other, then we assigned the gender of the majority pronoun to the individual. To compare with pronoun-based assignment, we assigned gender using the previously detailed first-name based algorithm. In total, there were 384,127 profiles on RateMyProfessor.com that had at least five comments and for whom pronouns indicated a gender. Our first name-based algorithm assigned a gender of male or female to 91.26 percent of these profiles. The raw match-rate between these two assignments was 88.6 percent. Of those that were assigned a gender, our first name-based assignment matched the pronoun assignment in 97.1 percent of cases, and 90.3 percent of distinct first names. While RateMyProfessor.com and the authors submitting to eLife represent different populations (RateMyProfessor.com being biased towards teachers in the United States, United Kingdom, and Canada), the results of this validation provide some credibility to the first-name based gender assignment used here.
We also attempted to manually identify gender for all Senior Editors, Reviewing Editors, invited peer reviewers, and last authors for whom our algorithm did not assign a gender. We used Google to search for their name and institutional affiliation, and inspected the resulting photos and text in order to make a subjective judgment as to whether they were presenting as male or female.
Through the combination of manual efforts and our first-name based gender-assignment algorithm, we assigned a gender of male or female to 95.5 percent (n = 35,511) of the 37,198 name/role combinations that appeared in our dataset. 26.7 percent (n = 9,910) were assigned a gender of female, 68.8 percent (n = 25,601) were assigned a gender of male, while a gender assignment could be not assigned for the remaining 4.5 percent (n = 1,687). This gender distribution roughly matches the gender distribution observed globally across scientific publications [6]. A breakdown of these gender demographics by role can be found in S1 Table and S2 Table.
Gender composition of reviewers
To assess the relationship between author-gatekeeper gender homogeny and review outcomes, we analyzed the gender composition of the gatekeepers and authors of full submissions. Each manuscript was assigned a reviewer composition category of all-male, all-female, mixed, or uncertain. Reviewer teams labeled all-male and all-female were teams for which we could identify a gender for every member, and for which all genders were identified as either male or female, respectively. Teams labeled as mixed were those teams for which we could identify a gender for at least two members, and which had at least one male and at least one female peer reviewer. Teams labeled as uncertain were those teams for which we could not assign a gender to every member and which were not mixed. A full submission was typically reviewed by two to three peer reviewers, which may or may not expicitely include the Reviewing Editor. However, the Reviewing Editor was always involved in the review process of a manuscript, and so we always considered the Reviewing Editor as a member of the reviewing team. Of 7,912 full submissions, a final decision of accept or reject was given for 6,590 during the dates analyzed; of these, 47.7 percent (n = 3,144) were reviewed by all-male teams, 1.4 percent (n = 93) by all-female teams, and 50.8 percent (n = 3,347) by mixed-gender teams; the remaining six manuscripts had reviewer teams classified as uncertain and were excluded from further analysis.
Institutional Prestige
Institutional names for each author were added manually by eLife authors and were thus highly idiosyncratic. Many institutions appeared with multiple name variants (e.g., “UCLA”, “University of California, Los Angeles”, and “UC at Los Angeles”). In total, there were nearly 8,000 unique strings in the affiliation field. We performed several pre-processing steps on these names, including converting characters to lower case, removing stop words, removing punctuation, and reducing common words to abbreviated alternatives (e.g., “university” to “univ”). We used fuzzy-string matching with the Jaro-Winkler distance measure [63] to match institutional affiliations from eLife to institutional rankings in the 2016 Times Higher Education World Rankings. A match was established for 15,641 corresponding authors of initial submission (around 66 percent). Matches for last authors were higher: 5,118 (79 percent) were matched.
Institutions were classed into two levels of prestige: “top” institutions were those within the top 50 universities as ranked by the global Times Higher Education rankings. Institutions which ranked below the top 50, or which were otherwise unranked or which were not matched to a Times Higher Education ranking were labeled as “non-top”. One limitation of the Times Higher Education ranking as a proxy for institutional prestige is that these rankings cover only universities, excluding many prestigious research institutes. To mitigate this limitation, we mapped a small number of well-known and prestigious biomedical research institutes to the “top” category, including: The Max Plank Institutes, the National Institutes of Health, the UT Southwestern Medical Center, the Memorial Sloan Cancer Medical Center, the Ragon institutes, and the Broad Institute.
Geographic distance
Latitude and longitude of country centroids were taken from Harvard WorldMap [64]; country names in the eLife and Harvard WorldMap dataset were manually disambiguated and then mapped to the country of affiliation listed for each author from eLife (for example, ”Czech Republic” from the eLife data was mapped to ”Czech Rep.” in the Harvard WorldMap data). For each initial submission, we calculated the geographic distance between the centroids of the countries of the corresponding author and Senior Editor; we call this the corresponding author-editor geographic distance. For each full submission, we calculated the sum of the geographic distances between the centroid of the last author’s country and the country of each of the reviewers. All distances were calculated in thousands of kilometers; we call this the last author-reviewers geographic distance.
Analysis
We conducted a series of χ2 tests of equal proportion as well as several logistic regression models in order to assess the likelihood that an initial submission is encouraged and that a full submission is accepted, as a function of author and gatekeeper characteristics. We supply p-values and confidence intervals as a tool for interpretation; we generally maintain the convention of 0.05 as the threshold for statistical significance, though we also report and interpret values just outside of this range. When visualizing proportions, 95% confidence intervals are calculated using the definition , where p is the proportion and n is the number of observations in the group. When conducting χ2 tests comparing groups based on gender, we excluded submissions for which no gender could be identified. When conducting tests for gender and country homogeny, we report 95% interval confidence intervals of their difference in proportion—we do not report confidence intervals for tests involving more than two groups. Odds ratios and associated 95% confidence intervals are reported for logistic regression models. Data processing, statistical testing, and visualization was performed using R version 3.4.2 and RStudio version 1.1.383.
Having conducted an exploratory analysis of gender and country inequities in peer review with this univariate approach, we built a series of logistic regression models to investigate whether these differences could be explained by other factors. In each model, we used the submission’s outcome as the response variable, whether that be encouragement (for initial submissions) or acceptance (for full submissions). For both initial and full submissions, we added control variables for the year of submission (measured from 0 to 5, representing 2012 to 2017, accordingly), the type of the submission (Research Article, Short Report, or Tools and Resources), and the institutional prestige of the author (top vs non-top). For full submissions, we also controlled for the gender of the first author. Mirroring the univariate analysis, we constructed two sets of models. The first set of models investigates the extent of peer review inequities based on author characteristics. We considered predictor variables for the gender and continent of affiliation of the corresponding author (for initial submissions), and the last author (for full submissions). For the second set of models, we investigated whether these inequities differed based on gender or country homogeny between the author and the reviewer or editor. In addition to variables from the first model, we considered several approaches to capture the effect of gender-homogeny between the author and reviewers on peer review inequity (see below). We also included variables for the corresponding author-editor geographic distance (for initial submissions), and last author-reviewers geographic distance (for full submission), and a dummy variable indicating whether this distance was zero; these variables serve as proxies for the degree of country homogeny between the author and the editor or reviewers. There were a small number of Senior Editors in our data—in order to protect their identity we did not include their gender or specific continent of affiliation in any models; we maintained a variable for corresponding author-editor geographic distance.
Several approaches were considered for modeling the relationship between equity in peer review and the composition of the reviewer team using logistic regression. Approaches such as modelling equity using simple interaction terms or with a two-model approach were also considered but were ultimately excluded due to methodological and interpretive constraints (see S1 Text and S2 Text for more discussion of these models and their results). A third approach modelled equity across groups as a categorical variable consisting of all six combinations of last author gender (male, female) and reviewer team composition (all-male, all-female, mixed); This approach provides a more interpretable means of testing the extent to which gender equity in success rates was related to the interaction between author and reviewer team demographics, and was the focus of our analysis.
Results
Gatekeeper representation
We first analyzed whether the gender and countries of affiliation of the population of gatekeepers at eLife was similar to that of the authors of initial and full submissions. The population of gatekeepers comprised primarily of invited peer reviewers, as there were far fewer Senior and Reviewing Editors. A gender and country breakdown by gatekeeper type has been provided in S2 Table, and S3 Table.
Fig 3 illustrates the gender and country demographics of authors and gatekeepers. The population of gatekeepers at eLife was largely male. Only 21.6 percent (n = 1,440) of gatekeepers were identified as female, compared with 26.6 percent (n = 4,857) of corresponding authors (includes authors of initial submissions), 33.9 percent (n = 2,272) of first authors, and 24.0 percent (n = 1,341) of last authors. For initial submissions, we observed a strong difference between the gender composition of gatekeepers and corresponding authors, χ2(df= 1, n = 17, 119) = 453.9, p ≤ 0.00001. The same held for full submissions, with a strong difference for first authorship, χ2(df= 1, n = 6, 153) = 844.4, p ≤ 0.0001; corresponding authorship, χ2(df= 1, n = 6, 647) = 330.04, p ≤ 0.0001; and last authorship, χ2(df= 1, n = 5, 292) = 17.7, p ≤ 0.00003. Thus, the gender proportions of gatekeepers at eLife was male-skewed in comparison to the authorship profile.
The population of gatekeepers at eLife was heavily dominated by those from North America, who constituted 59.9 percent (n = 3,992) of the total. Gatekeepers from Europe were the next most represented, constituting 32.4 percent (n = 2,162), followed by Asia with 5.7 percent (n = 378). Individuals from South America, Africa, and Oceania each made up less than two percent of the population of gatekeepers. As with gender, we observed differences between the country composition of gatekeepers and that of the authors. Gatekeepers from North America were over-represented whereas gatekeepers from Asia and Europe were under-represented for all authorship roles. For initial submissions, there was a significant difference in the distribution of corresponding authors compared to gatekeepers χ2(df= 5, n = 18, 195) = 6738.5, p ≤ 0.00001. The same held for full submissions, with a significant difference for first authors, χ2(df= 5, n = 6, 674) = 473.3, p ≤ 0.00001, corresponding authors, χ2(df= 5, n = 6, 669) = 330.04, p ≤ 0.00001, and last authors χ2(df= 5, n = 5, 595) = 417.2, p ≤ 0.0001. The international representation of gatekeepers was most similar to first and last authorship (full submissions), and least similar to corresponding authorship (initial submissions) due to country-level differences in acceptance rates (see Fig 4). We also note that the geographic composition of submissions to eLife has changed over time, attracting more submissions from authors in Asia in later years of analysis (see S4 Fig).
Peer review success rates by author gender, country of affiliation
Male authorship dominated eLife submissions: men accounted for 76.9 percent (n = 5,529) of gender-identified last authorships and 70.7 percent (n = 5,083) of gender-identified corresponding authorships of full submissions (see S3 Fig). First authorship of full submissions was closest to gender parity, although still skewed towards male authorship at 58.1 percent (n = 4,179).
We observed a gender inequity favoring men in the outcomes of each stage of the review processes. The percentage of initial submissions encouraged was 2.1 percentage points higher for male corresponding authors—30.83 to 28.75 percent, χ2(df= 1, n = 22, 319) = 8.95, 95% CI = [0.7, 3.4], p = 0.0028 (see S3 Fig). Likewise, the percentage of full submissions accepted was higher for male corresponding authors—53.7 to 50.8 percent χ2(df= 1, n = 6, 188) = 3.95, 95% CI = [0.03, 5.8], p = 0.047. The gender inequity at each stage of the review process yielded higher overall acceptance rates (the percentage of initial submissions eventually accepted) for male corresponding authors (15.6 percent) compared with female corresponding authors (13.8 percent), χ2(df= 1, n = 21, 670) = 10.96, 95% CI = [0.8, 2.9], p = 0.0009 for a male:female success ratio of 1.13 to 1.
Gender disparity was only apparent in the senior authorship roles. Fig 4.A shows the gendered acceptance rates of full submissions for corresponding, first and last authors. There was little to no relationship between the gender of the first author and the percentage of full submissions accepted, χ2(df= 1, n = 5, 971) = 0.34, 95% CI = [−1.8, 3.5], p = 0.56. There was, however, a significant gender inequity in full submission outcomes for last authors, as also observed for corresponding authors—the acceptance rate of full submissions was 3.5 percentage points higher for male as compared to female last authors—53.5 to 50.0 percent, χ2(df= 1, n = 6, 505) = 5.55, 95% CI = [0.5, 6.4], p = 0.018.
Fig 4.B shows the proportion of manuscripts submitted, encouraged, and accepted to eLife from corresponding authors originating from the eight most prolific countries (in terms of initial submissions). Manuscripts with corresponding authors from these eight countries accounted for a total of 73.9 percent of all initial submissions, 81.2 percent of all full submissions, and 86.5 percent of all accepted publications. Many countries were underrepresented in full and accepted manuscripts compared to their submissions. For example, whereas papers with Chinese corresponding authors accounted for 6.9 percent of initial submissions, they comprised only 3.0 percent of full and 2.4 percent of accepted submissions. The only countries that were over-represented—making up a greater portion of full and accepted submissions than expected given their initial submissions—were the United States, United Kingdom, and Germany. In particular, corresponding authors from the United States made up 35.8 percent of initial submissions, yet constituted 48.5 percent of full submissions and the majority (54.9 percent) of accepted submissions.
Each stage of review contributed to the disparity of country representation between initial, full, and accepted submissions, with manuscripts from the United States, United Kingdom, and Germany more often encouraged as initial submissions and accepted as full submissions. Fig 4.C shows that initial submissions with a corresponding author from the United States were the most likely to be encouraged (39.2 percent), followed by the United Kingdom (31.7 percent) and Germany (29.3 percent). By contrast, manuscripts with corresponding authors from Japan, Spain, and China were comparatively less likely to be encouraged (21.4, 16.7, and 12.6 percent, respectively). These differences narrowed somewhat for full submissions: the acceptance rate for full submissions with corresponding authors from the U.S. was the highest (57.6 percent), though more similar to other countries, such as the United Kingdom, Germany, and France than for encourage rates.
There were gendered differences in submissions by country of affiliation (S5 Fig), but there were insufficient data to test whether gender and country of affiliation interacted to affect the probability of acceptance.
The gender and country inequities evident in the univariate analyses were subsequently affirmed using a logistic regression model that controlled for a number of potential confounds (see Fig 5. We modeled peer review outcomes based on not only the gender and affiliated continent of the corresponding author (initial submissions) and last author (full submissions), but also the prestige of the author’s institution, the year in which the manuscript was submitted, and the submission type (Research Article, Short Report, or Tools and Resources). For full submissions, we also controlled for the gender of the first author. The results of this regression for initial and full submissions are shown in Fig 5.
For both initial and full submissions, the prestige of the author’s institution was the strongest predictor of a positive peer review outcome (initial: β = 1.726, 95% CI = [1.663, 1.789], p ≤ 0.0001; full: β = 1.379, 95% CI = [1.272, 1.486], p ≤ 0.0001). A more recent year of submission was associated with a lower odds of acceptance, (initial: β = 0.918, 95% CI = [0.894, 0.942], p ≤ 0.0001; full: β = 0.888, 95% CI = [0.847, 0.929], p ≤ 0.0001), reflecting the increasing selectivity of eLife (see Fig 2). Compared to Research Articles, both Short Reports, (β = 0.742, 95% CI = [0.638, 0.847], p ≤ 0.0001), and Tools and Resources (β = 0.740, 95% CI = [0.567, 0.913], p ≤ 0.0001) were less likely to have a positive review outcome at the initial submission stage.
Even when controlling for these variables, there were still inequities by the gender and country of affiliation of the author, affirming trends illustrated in Fig 4. Initial submissions with a male corresponding author were associated with a 1.12 times increased odds of being encouraged (95% CI = [1.048, 1.182], p = 0.0014), and full submissions with a male last author with a 1.14 times increased odds of being accepted (95% CI = [1.03, 1.26], p = 0.025), compared to submissions with female corresponding or last authors. In contrast, for the first author position, there was no significant difference in outcomes by gender. The logistic regression also provided evidence for the presence of geographic inequities, with lower odds of success for submissions with authors outside of North America. Compared to submissions with a corresponding author from North America, an initial submission with a corresponding author from Europe was 0.68 times as likely to be encouraged (95% CI = [0.3236, 0.783], p ≤ 0.0001), and a corresponding author from Oceania was 0.56 times as likely to be encouraged (95% CI = [0.34, 0.78], p ≤ 0.0001), followed by corresponding authors from Africa (β = 0.53, 95% CI = [−0.18, 1.088], p = 0.027), Asia (β = 0.40, 95% CI = [0.30, 0.49], p ≤ 0.0001), and South America (β = 0.21, 95% CI = [−0.269, 0.679], p ≤ 0.0001). Geographic disparities were also present, although less pronounced for full submissions, with significantly lower odds of acceptance for submissions with a last author from Europe (β = 0.86, 95% CI = [0.75, 0.97], p = 0.008) or Asia (β = 0.59, 95% CI = [0.41, 0.76], p ≤ 0.0001) compared with North America.
Peer Review Outcomes by Author-Gatekeeper Homogeny
The higher acceptance rates for male authors manifested largely from instances when the reviewer team was all male (Fig 6). When all reviewers were male, the acceptance rate of full submissions was about 4.7 percentage points higher for male compared to female last authors (χ2 = 4.48(df= 1, n = 3, 110), 95% CI = [0.3, 9.1], p = 0.034) and about 4.4 points higher for male compared to female corresponding authors (S6 Fig; χ2(df= 1, n = 2, 974) = 3.97, 95% CI = [0.1, 8.7]p = 0.046). For mixed-gender reviewer teams, the disparity in author success rates by gender was smaller and not statistically-significant. All-female reviewer teams were too rare to draw firm conclusions (only 81 of 6,509 processed full submissions), but in the few cases of all-female reviewer teams, there was a higher acceptance rate for female last, corresponding, and first authors that did not reach statistical significance. There was no significant relationship between first authorship gender and acceptance rates, regardless of the gender composition of the reviewer team. In sum, greater parity in outcomes was observed when gatekeeper teams contained both men and women. Notably, the acceptance rate for female authors was not lower for all-male reviewer teams compared with mixed reviewer teams, rather the gender disparity arose from a higher acceptance rate for submissions from male authors when they were reviewed by a team of all-male reviewers. We refer to this favoring by reviewers of authors sharing their same gender as homophily.
Homophily was also evident in the relationship between peer review outcomes and the presence of country homogeny between the last author and reviewer. We defined last author-reviewer country homogeny as a condition for which at least one member of the reviewer team (Reviewing Editor and peer reviewers) listed the same country of affiliation as the last author. We only considered the country of affiliation of the last author, since it was the same as that of the first and corresponding author for 98.4 and 94.9 percent of full submissions, respectively. Outside of the United States, the presence of country homogeny during review was rare. Whereas 88.4 percent of full submissions with last authors from the U.S. were reviewed by at least one gatekeeper from their country, country homogeny was present for only 29.3 percent of full submissions with last authors from the United Kingdom and 26.2 percent of those with a last author from Germany. The incidence of reviewer homogeny fell sharply for Japan and China which had geographic homogeny for only 10.3 and 9.9 percent of full submissions, respectively. More extensive details on the rate of author/reviewer homogeny for each country can be found in S5 Table.
Last author-reviewer country homogeny tended to result in the favoring of submissions from authors of the same country as the reviewer. We first pooled together all authors from all countries (n = 6,508 for which there was a full submission and a final decision), and found that the presence of homogeny during review was associated with a 10.0 percentage point higher acceptance rate, (Fig 6.B; χ2(1, n = 6, 508) = 65.07, 95% CI = [7.58, 12.47], p ≤ 0.00001). However, most cases of homogeny occurred for authors from the United States, so this result could potentially reflect the higher acceptance rate for these authors (see Fig 4), rather than homophily overall. Therefore we repeated the test, excluding all full submissions with last authors from the United States, and we again found a significant, though statistically less confident homophilic effect, χ2(df= 1, n = 3, 236) = 4.74, 95% CI = [0.52, 10.1], p = 0.029. We repeated this procedure again, excluding authors from both the United States and United Kingdom, (the two nations with the highest acceptance rates, see 4), and we identified no homophilic effect, χ2(df= 1, n = 1, 920) = 0.016, 95% CI = [−4.6, 7.7]p = 0.65. Thus, the effects of last-author reviewer country-homophily were largely driven by the United States and United Kingdom.
For authors from outside the United States, not only was the presence of author-reviewer country homogeny rare, but the tendency for a homophilic effect on peer review outcome appeared to vary, depending on the country. Fig 6.C shows acceptance rates for last authors affiliated within the eight most prolific nations submitting to eLife. For the United States, presence of homogeny was associated with a 6.9 percentage point higher likelihood of acceptance compared to no homogeny χ2(df= 1, n = 3, 270) = 6.25, 95% CI = [1.4, 12.4], p = 0.0124. Similarly, papers from the United kingdom were 8.0 percentage points more likely to be accepted if there was last author-reviewer homogeny χ2(df= 1, n = 739) = 3.65, 95% CI = [−0.1, 16.2], p = 0.056. In contrast, submissions with last authors from France were 23 percentage points less likely to be accepted if there was country homogeny χ2(df= 1, n = 204) = 4.34, 95% CI = [−42.8, −3.4], p = 0.037. There was a similar, though non-significant effect for Canada and Switzerland (also French-speaking countries). Due to the rarity of country homogeny outside of the U.S., more data are needed to draw firm conclusions on a per-country basis.
To further assess the contribution of author-reviewer homogeny to inequity in peer review outcomes, we extended the logistic regression approach shown in Fig 5. For full submissions, we compared two logistic regression models, one that considered author-reviewer geographic homogeny but only main effects of reviewer team gender composition (Fig 7.A) and one that included terms to model the effects of author-reviewer geographic and gender homogeny (Fig 7.B). To model the extent to which gender equity differed based on the gender composition of the reviewer team, we modelled interactions using a variable combining factor levels for last author gender and reviewer team composition (Fig 7.B). To model the degree of country homogeny between the author and the author and the reviewers, we included in the model the last author-reviewers geographic distance, defined as the sum of the geographic distance between the centroids of the last author’s country, and the country of all of the peer reviewers. All distances were calculated in thousands of kilometers; for example, the geographic distance between the United States and Denmark is 7.53 thousands of kilometers. We included a dummy variable indicating whether the distance was zero. A similar analysis was performed to assess the effect of author-editor homogeny on the outcomes of initial submissions (S8 Table); this excludes any analysis of homophily between the author and Senior Editor in order to protect the identity of the small number of Senior Editors.
Fig 7.A shows that there were similar main effects of author gender and country, in terms of direction and magnitude, as in Fig 5.B. Even after controlling for reviewer team composition, a full submission with a male last author was 1.14 times more likely to be accepted than a submission with a female last author (95% CI = [1.020, 1.256], p = 0.032). In addition, there were inequities based on author continent of affiliation, although smaller than in Fig 5.B. Affiliation within Asia was associated with a 0.779 times reduced odds of acceptance compared to North America (95% CI = [0.565, 0.992], p = 0.022)—a smaller effect size than the 0.585 times reduced odds observed in Fig 5.B. Submissions with a last author from Oceania were associated with a 1.494 times increased odds of acceptance compared to North America, though with wide confidence intervals (95% CI = [1.020, 1.968], p = 0.097]); this diverges from the non-significant negative effect observed in Fig 5. The effect of control variables—submission year, submission type, author institutional prestige, and first author gender—were also similar to those in Fig 5.
The extended model in Fig 7.A revealed a main effect of reviewer team gender composition. Compared to mixed-gender reviewer teams, submissions reviewed by all-male reviewers were 1.15 times more likely to be accepted (95% CI = [1.051, 1.252], p = 0.0059); there was no significant difference between all-female and mixed-gender teams. In addition, this model revealed an influence of author-reviewer geographic homogeny. Every 1000km of last author-reviewer distance was associated with a 0.988 times lower odds of acceptance (95% CI = [0.982, 0.994], p ≤ 0.0001). This negative effect of last author-reviewers geographic distance provides additional evidence for the observations from Fig 6—that homogeny between the author and reviewers was associated with a greater odds of acceptance, even when controlling for the continent of affiliation of the author and other characteristics of the author and submission. A last author-reviewers geographic distance of zero (indicating that all reviewers were from the same country as the corresponding author) was not associated with a strong effect beyond that predicted by distance.
Finally, we modelled interactions between last author gender and reviewer-team composition by combining them into a single categorical variable containing all six combinations of factor levels (Fig 7.B). Full submissions with a male last author and which were reviewed by a team of all-male reviewers was associated with a 1.22 times higher odds of being accepted than a full submission with a female last author that was reviewed by an all male team (95% CI = [1.044, 1.40], p = 0.027). No significant differences were observed for other combinations of author gender and reviewer gender composition. The absolute difference in parameter estimates between male and female authors among mixed-gender teams (0.084) was less than half that of all-male reviewer teams (0.198), suggesting greater equity among submissions reviewed by mixed-gender teams than by all-male teams. Taken together, these findings suggest that gender inequity in peer review outcomes tended to be smaller for mixed-gender reviewer teams, even controlling for many potentially confounding factors. These results provide evidence affirming observations from the univariate analysis in Fig 6.
Discussion
We identified inequities in peer review outcomes at eLife, based on the gender and country of affiliation of the senior (last and corresponding) authors. Acceptance rates were higher for male than female last authors. In addition, submissions from developed countries with high scientific capacities tended to have higher success rates than others. These inequities in peer review outcomes could be attributed, at least in part, to a favorable interaction between gatekeeper and author demographics under the conditions of gender or country homogeny; we describe this favoring as homophily, a preference based on shared characteristics. Gatekeepers were more likely to recommend a manuscript for acceptance if they shared demographic characteristics with the authors, demonstrating homophily. In particular, manuscripts with male (last or corresponding) authors had a significantly higher chance of acceptance than female (last or corresponding) authors when reviewed by an all male review team. Similarly, manuscripts tended to be accepted more often when at least one of the reviewers was from the same country as the corresponding author (for initial submissions) or the last author (for full submissions), though there may be exceptions on a per-country basis (such as France and Canada). We followed our univariate analysis with a regression analysis, and observed evidence that these inequities persisted even when controlling for potentially confounding variables. The differential outcomes on the basis of author-reviewer homogeny is consistent with the notion that peer review at eLife is influenced by some form of bias—be it implicit bias [3, 17], geographic or linguistic bias [26, 65, 66], or cognitive particularism [40]. Specifically, homophilic interaction suggests that peer review outcomes may sometimes be associated with factors other than the intrinsic quality of a manuscript, such as the composition of the review team.
The opportunity for homophilous interactions is determined by the demographics of the gatekeeper pool, and the demographics of the gatekeepers differed significantly from those of the authors, even for last authors, who tend to be more senior [59–62]. The underrepresentation of women at eLife mirrors global trends—women comprise a minority of total authorships, yet constitute an even smaller proportion of gatekeepers across many domains [14, 67–74]. Similarly, gatekeepers at eLife were less geographically diverse than their authorship, reflecting the general underrepresentation of the “global south” in leadership positions of international journals [75].
The demographics of the reviewer pool made certain authors more likely to benefit from homophily in the review process than others. Male lead authors had a nearly 50 percent chance of being reviewed by a homophilous (all-male), rather than a mixed-gender team. In contrast, because all-female reviewer panels were so rare (accounting for only 81 of 6,509 full submission decisions), female authors were highly unlikely to benefit from homophily in the review process. Similarly, U.S. authors were much more likely than not (see S5 Table) to be reviewed by a panel with at least one reviewer from their country. However, the opposite was true for authors from other countries. Fewer opportunities for such homophily may result in a disadvantage for scientists from smaller and less scientifically prolific countries.
Increasing representation of women and scientists from a more diverse set of nations among eLife’s editor may lead to more diverse pool of peer reviewers and reviewing editors and a more equitable peer review process. Editors often invite peer reviewers from their own professional networks, networks that likely reflect the characteristics of the editor [76–78]; this can lead to editors, who tend to be men [14, 67–74] and from scientifically advanced countries [75] to invite peer reviewers who are demographically similar to themselves [44, 79, 80], inadvertently excluding certain groups from the gatekeeping process. Accordingly, we found that male Reviewing Editors at eLife were less likely to create mixed-gender teams of gatekeepers than female Reviewing Editors (see S8 Fig). We observed a similar effect based on the country of affiliation of the Reviewing Editor and invited peer reviewers (see S9 Fig). Moreover, in S11 Table we conducted a regression analysis considering only the gender of the Reviewing Editor, rather than the composition of the reviewer team; we found similar homophilous relationships as in Fig 7, suggesting the importance of the reviewing editor to the peer review process at eLife.
The size of disparities we observed in peer review outcomes may seem modest; however these small disparities accumulate through each stage of the review process (initial submission, full submission, revisions). These cumulative effects yield an overall acceptance rate (the rate at which initial submissions were eventually accepted) for male and female corresponding authors of 15.6 and 13.8 percent respectively; in other words, manuscripts submitted to eLife with male corresponding were published at a rate 1.13 times the rate of those with female corresponding authors. Similarly, initial submissions by corresponding authors from China were accepted at only 22.0 percent the rate of manuscripts submitted by corresponding authors from the United States (with overall acceptance rates of 4.9 and 22.3 percent, respectively). Success in peer review is vital for a researcher’s career because successful publication strengthens their professional reputation and makes it easier to attract funding, students, postdocs, and hence further publications. Even small advantages can compound over time and result in pronounced inequalities in science [81–84].
Our finding that the gender of the last authors was associated with a significant difference in the rate at which full submissions were accepted at eLife stands in contrast with a number of previous studies of journal peer review that reported no significant difference in outcomes of papers submitted by male and female authors [85–87], or differences in reviewer’s evaluations based on the author’s apparent gender [88]. This discrepancy may be explained in part by eLife’s unique context, policies, or the relative selectivity of eLife compared to journals where previous studies found gender equity. In addition, our results point to a key feature of study design that may account for some of the differences across studies: the consideration of multiple authorship roles. This is especially important for the life sciences, for which authorship order is strongly associated with contribution [61, 62, 89]. Whereas our study examined the gender of the first, last, and corresponding authors, most previous studies have focused on the gender of the first author (e.g., [2, 90]) or of the corresponding author (e.g., [22, 91]). Consistent with previous studies, we observed no strong relationship between first author gender and review outcomes at eLife. Only when considering lead authorship roles—last authorship, and to a lesser extent, corresponding author, did we observe such an effect. Our results may be better compared with studies of grant peer review, where leadership roles are more explicitly defined, and many studies have identified significant disparities in outcomes favoring men [18, 92–95], although many other studies have found no evidence of gender disparity [21, 23, 24, 96–98]. Given that science has grown increasingly collaborative and that average authorship per paper has expanded [99, 100], future studies of disparities would benefit from explicitly accounting for multiple authorship roles and signaling among various leadership positions on the byline [59, 101].
The relationship we found between the gender and country of affiliation of gatekeepers and peer review outcomes also stands in contrast to the findings from a number of previous studies. Studies of gatekeeper country of affiliation have found no difference in peer review outcomes based on the country of affiliation or country of affiliation of the reviewer [104, 106], though there is little research on the correspondence between author and reviewer gender. One study identified a homophilous relationship between female reviewers and female authors, [102]. However, most previous analyses found only procedural differences based on the gender of the gatekeeper [22, 87, 88, 103] and identified no difference in outcomes based on the interaction of author and gatekeeper gender in journal submissions [87, 104, 105] or grant review [23]. One past study examined the interaction between U.S. and non-U.S. authors and gatekeepers, but found an effect opposite to what we observed, such that U.S. reviewers tended to rate submissions of U.S. authors more harshly than those of non-U.S. authors [43]. Our results also contrast with the study most similar to our own, which found no evidence of bias related to gender, and only modest evidence of bias related to geographic region [2]. These discrepancies may result from our analysis of multiple author roles rather than considering only the characteristics of the first author. Alternatively, they may result from the unique nature of eLife’s consultative peer review; the direct communication between peer reviewers compared to traditional peer review may render the social characteristics of reviewers more influential.
Limitations
There are limitations of our methodology that must be considered. First, we have no objective measure of the intrinsic quality of manuscripts. Therefore, it is not clear which review condition (homophilic or non-homophilic) more closely approximates the ideal of merit-based peer review outcomes. Second, measuring the relationship between reviewer and author demographics on peer review outcomes cannot readily detect biases that are shared by all reviewers/gatekeepers (e.g., if all reviewers, regardless of gender, favored manuscripts from male authors); hence, our approach could underestimate the influence of bias. Third, our analysis is observational, so we cannot establish causal relationships between success rates and authors or gatekeeper demographics—there remain potential confounding factors that we were unable to control for in the present analysis, such as the gender distribution of submission by country (see S5 Fig). Along these lines, the reliance on statistical tests with arbitrary significance thresholds may provide misleading results (see [107]), or obfuscate statistically weak but potentially important relationships. Fourth, our gender-assignment algorithm is only a proxy for author gender and varies in reliability by continent.
Further studies will be required to determine the extent to which the effects we observed generalize to other peer review contexts. Specific policies at eLife, such as their consultative peer review process, may contribute to the effects we observed. Other characteristics of eLife may also be relevant, including its level of prestige [13], and its disciplinary specialization in the biological sciences, whose culture may differ from other scientific and academic disciplines. It is necessary to determine the extent to which the findings here are particularistic or generalizable; it may also be useful in identifying explanatory models. Future work is necessary to confirm and expand upon our findings, assess the extent to which they can be generalized, establish causal relationships, and mitigate the effects of these methodological limitations. To aid in this effort, we have made as much as possible of the data and analysis publicly available at: https://github.com/murrayds/elife-analysis.
Conclusion and recommendations
Many factors can contribute to gender, country, and other inequities in scientific publishing [47, 50, 108–111], which can affect the quantity and perceived quality of submitted manuscripts. However, these structural factors do not readily account for the observed effect of gatekeeper-author demographic homogeny associated with peer review outcomes at eLife; rather, relationships between the personal characteristics of the authors and gatekeepers are likely to play some role in peer review outcomes.
Our results suggest that it is not only the form of peer review that matters, but also the composition of reviewers. Homophilous preferences in evaluation are a potential mechanism underpinning the Matthew Effect [1] in academia. This effect entrenches privileged groups while potentially limiting diversity, which could hinder scientific advances, since diversity may lead to better working groups [112] and promote high-quality science [113, 114]. Increasing gender and international representation among scientific gatekeepers may improve fairness and equity in peer review outcomes and accelerate scientific progress. However, this must be carefully balanced to avoid overburdening scholars from minority groups with disproportionate service obligations.
Although some journals and publishers, such as eLife and Frontiers Media, have begun providing peer review data to researchers (see [44, 115]), data on equity in peer review outcomes is currently available only for a small fraction of journals and funders. While many journals collect these data internally, they are not usually standardized or shared publicly. One group, PEERE, authored a protocol for open sharing of peer review data [116, 117], though this protocol is recent, and the extent to which it will be adopted remains uncertain. Watchdog groups, such as BiaswatchNeuro, are now tracking and posting the representation of women authors in some journals. To both provide better benchmarks and to incentivize better practices, journals should make analyses on author and reviewer demographics publicly available. These data include, but would not be limited to, characteristics such as gender, race, sexual orientation, seniority, and institution and country of affiliation. It is likely that privacy concerns and issues relating to confidentiality will limit the full availability of the data; but analyses that are sensitive to the vulnerabilities of smaller populations should be conducted and made available as benchmarking data. As these data become increasingly available, systematic reviews can be useful in identifying general patterns across disciplines and countries.
Some high-profile journals have experimented with implementing double-blind peer review as a potential solution to inequities in publishing, including Nature [118] and eNeuro [12], though in some cases with low uptake [119]. Our findings of homophilic effects may suggest that single-blind review is not the optimal form of peer review; however, our study did not directly test whether homophily persists in the case of double blind review. If homophily is removed in double-blind review, it would reinforce the interpretation of bias; if it is maintained, it would suggest other underlying attributes of the manuscript that may be contributing to homophilic effects. Double-blind peer review is viewed positively by the scientific community [120, 121], and some studies have found evidence that double-blind review mitigates inequities that favor famous authors, elite institutions [85, 122, 123], and those from high-income and English-speaking nations [28].
There may be a tension, however, in attempting to further double blind peer review while other aspects of the scientific system become more open. More than 20 percent of eLife papers that go out for review, for example, are already available as preprints, which complicates the possibility of truly blind review. To a lesser extent, several statements required for the responsible conduct of research—e.g., conflicts of interest, funding statements, and other ethical declarations—would require altered administrative treatment to implement double blind review. Other options involve making peer review more open—one recent study showed evidence that more open peer review did not compromise the integrity or logistics of the process, so long as reviewers could maintain anonymity [124].
Other alternatives to traditional peer review have also been proposed, including study pre-registration, consultative peer review, and hybrid processes (eg: [58, 125–129]), as well as alternative forms of dissemination, such as preprint servers (e.g., arXiv, bioRxiv) which have in recent years grown increasingly popular [130]. Currently, there is little empirical evidence to determine whether these formats constitute more equitable alternatives [3]. In addition, some journals are analyzing the demographics of their published authorship and editorial staff in order to identify key problem areas, focus initiatives, and track progress in achieving diversity goals [14, 79, 86]. More work should be done to study and understand the issues facing peer review and scientific gatekeeping in all its forms and to promote fair, efficient, and meritocratic scientific cultures and practices. Editorial bodies should craft policies and implement practices to mitigate disparities in peer review; they should also continue to be innovative and reflective about their practices to ensure that papers are accepted on scientific merit, rather than particularistic characteristics of the authors.
Supporting information
S1 Text Modelling homogeny using main effects with interaction term. We used logistic regression to model the degree to which gender equity in peer review outcomes differed based on the composition of the reviewer team in order to verify the inequity observed in Fig 6. Fig 7.A demonstrates that last author gender inequity persisted even when controlling for the gender composition of the reviewer team, but did not address the degree to which this equity manifests in submissions reviewed by all-male vs. mixed-gender reviewer teams. Given that there is no established method of addressing this question, we considered several approaches. The first approach modelled the interaction between last author gender and the gender-composition of the reviewer team (see S9 Table, column 2), however this approach proved difficult to interpret: adding the interaction term appeared to suppress the main effects of last author gender and reviewer team composition observed in Fig 7.A, though the corresponding ANOVA table demonstrated these effects to still account for a significant amount of deviance (see S11 Table). There were no significant interaction term, conflicting with Fig 6; main effects are often made less interpretable by the addition of interaction terms. A low sample size across interaction groups further complicates interpretation. Moreover, this approach modelled individual-level interactions between the author and reviewer composition on a per-submission basis, not differences in group-level estimates of inequity.
S2 Text Modelling homogeny using separately trained models. S9 Table, columns 3 and 4 shows the results of two logistic regression models of percentage of full submissions accepted, constructed as in fig 7.A, but each calculated using only full submissions reviewed by either all-male or mixed-gender reviewer teams. In the all-male model, a male last author was associated with a 1.23 times increased odds of acceptance (95% CI = [1.05, 1.41], p = 0.027) compared to a female last author; in contrast, a smaller non-significant effect was observed between male and female last authors in the model containing only mixed-gender reviewer teams. This approach shows a larger positive effect favoring male last authors under the condition of all-male teams than for mixed-reviewer teams, affirming results of models in Fig 7.B, but this approach has several limitations that favor the approach from Fig 7. The confidence intervals for the effect of the regression for submissions reviewed by mixed reviewer teams are wide, making precise comparisons difficult. Interpretation of S9 Table is further complicated by possible population differences between groups as well as the different amount of data used to fit each model, n=3,090 for the all-male reviewer model and n = 3,280 for the mixed-gender reviewer model.
S11 Table. Model coefficients of full submissions—author characteristics and reviewing-editor only homogeny: Odds ratio, associated confidence intervals, and model diagnostics for logistic regression model using the encouragement of full submission as a response variable. Predictor variables include control variables of the submission year and type, and variables capturing author characteristics and homogeny between the author and reviewing editor only. For continent of affiliation, “North America” was used as the reference level. For submission type, “RA” (research article) was used as the reference level; the submission type “SR” means “Short Reports”, and “TR” means “Tools and Resources”. This regression models gender equity between reviewer composition groups using a new variable containing all combinations of last author gender and reviewer team composition; for this new categorical variable, we used “last author female - female rev. editor” as the reference level.
Competing interests
Wei Mun Chan and Andrew M. Collings are employed by eLife. Jennifer Raymond and Cassidy R. Sugimoto are Reviewing Editors at eLife. Andrew M. Collings was employed by PLOS between 2005 and 2012.
Ethics statement
This research underwent expedited review by the Institutional Review Board at Indiana University Bloomington and was determined to be exempt (Protocol #: 1707327848).
Acknowledgments
We are grateful for the editing and feedback provided by Susanna Richmond (Senior Manager at eLife), Mark Patterson (Executive Director at eLife), Eve Marder, Anna Akhmanova, and Detlef Weigel (Deputy Editors at eLife). We are also grateful for the work of James Gilbert (Production Editor at eLife) for extracting the data used in this analysis. This work was partially supported by a grant from the National Science Foundation (SciSIP #1561299).
Footnotes
This revision largely consists of revisions to the language and organization of the text, with a particular focus on the interpretation of the findings. We have made attempts to remove causal language while also clarifying our results.
References
- 1.↵
- 2.↵
- 3.↵
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.
- 11.
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.
- 83.
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.
- 94.
- 95.↵
- 96.
- 97.
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.
- 109.
- 110.
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.
- 127.
- 128.
- 129.↵
- 130.↵