Gender and international diversity improves equity in peer review

The robustness of scholarly peer review has been challenged by evidence of disparities in publication outcomes based on author’s gender and nationality. To address this, we examine the peer review outcomes of 23,873 initial submissions and 7,192 full submissions that were submitted to the biosciences journal eLife between 2012 and 2017. Women and authors from nations outside of North America and Europe were underrepresented both as gatekeepers (editors and peer reviewers) and last authors. We found a homophilic interaction between the demographics of the gatekeepers and authors in determining the outcome of peer review; that is, gatekeepers favor manuscripts from authors of the same gender and from the same country, The acceptance rate for manuscripts with male last authors was significantly higher than for female last authors, and this gender inequity was greatest when the team of reviewers was all male; mixed-gender gatekeeper teams lead to more equitable peer review outcomes. Similarly, manuscripts were more likely to be accepted when reviewed by at least one gatekeeper with the same national affiliation as the corresponding author. Our results indicated that homogeneity between author and gatekeeper gender and nationality is associated with the outcomes of scientific peer review. We conclude with a discussion of mechanisms that could contribute to this effect, directions for future research, and policy implications. Code and anonymized data, have been made available at https://github.com/murrayds/elife-analysis. Author summary Peer review, the primary method by which scientific work is evaluated and developed, is ideally a fair and equitable process, in which scientific work is judged solely on its own merit. However, the integrity of peer review has been called into question based on evidence that outcomes often differ between between male and female authors, and between authors in different countries. We investigated such a disparity at the biosciences journal eLife, by analyzing the author and gatekeepers (editors and peer reviewers) demographics and review outcomes of all submissions between 2012 and 2017. We found evidence of disparity in outcomes that disfavored women and those outside of North America and Europe, and that these groups were underrepresented among, authors and gatekeepers. The gender disparity was greatest when reviewers were all male; mixed-gender reviewer teams lead to more equitable outcomes. Similarly, manuscripts were more likely to be accepted when reviewed by at least one gatekeeper from the same country as the corresponding author. Our results indicated that gatekeeper characteristics are associated with the outcomes of scientific peer review. We discuss mechanisms that could contribute to this effect, directions for future research, and policy implications.

Peer review is foundational to the development, gatekeeping, and dissemination of research, 2 while also underpinning the professional hierarchies of academia. Normatively, peer review is 3 expected to follow the ideal of "universalism" [1], whereby scholarship is judged solely on its 4 intellectual merit. However, confidence in the extent to which peer review accomplishes the goal 5 of promoting the best scholarship has been eroded by questions about whether social biases [2], 6 based on or correlated with the characteristics of the scholar, could also influence outcomes of 7 peer review [3][4][5]. This challenge to the integrity of peer review has prompted an increasing 8 number of funding agencies and journals to assess the disparities and potential influence of bias 9 in their peer review processes. 10 Several terms are often conflated in the discussion of bias in peer review. We use the term 11 disparities to refer to unequal composition between groups, inequities to characterize unequal 12 outcomes, and bias to refer to the degree of impartiality in judgment. Disparities and inequities 13 have been widely studied in scientific publishing, most notably in regards to gender and country 14 of affiliation. Globally, women account for about only 30 percent of scientific authorship [6] and 15 are underrepresented in the scientific workforce, even when compared to the pool of earned 16 degrees [7,8]. Articles authored by women are most underrepresented in the most prestigious 17 and high-profile scientific journals [9][10][11][12][13][14]. Moreover, developed countries dominate the 18 production of highly-cited publications [15,16]. 19 The under-representation of authors from certain groups may reflect differences in 20 submission rates, or it may reflect differences in success rates during peer review (percent of 21 submissions accepted). Analyses of success rates have yielded mixed results in terms of the 22 presence and magnitude of such inequities. Some analyses have found lower success rates for 23 female-authored papers [17,18] and grant applications [19,20], while other studies have found 24 no gender differences in review outcomes (for examples, see [21][22][23][24][25]). Inequities in journal 25 success rates based on authors' nationalities have also been documented, with reports that 26 authors from English-speaking and scientifically-advanced countries have higher success 27 rates [26,27]; however, other studies found no evidence that the language or country of 28 affiliation of an author influences peer review outcomes [27][28][29]. These inconsistencies could be 29 explained by several factors, such as the contextual characteristics of the studies (e.g., country, 30 discipline) and variations in research design and sample size. 31 The nature of bias and its contribution to inequities in scientific publishing is highly 32 controversial. Implicit bias-the macro-level social and cultural stereotypes that can subtly 33 influence everyday interpersonal judgments and thereby produce and perpetuate status 34 inequalities and hierarchies [30,31]-has been suggested as a possible mechanism to explain 35 differences in peer review outcomes based on socio-demographic and professional 36 characteristics [3]. When faced with uncertainty-which is quite common in peer 37 review-people often weight the social status and other ascriptive characteristics of others to 38 help make decisions [32]. Hence, scholars are more likely to consider particularistic 39 characteristics (e.g., gender, institutional prestige) of an author under conditions of 40 uncertainty [33,34], such as at the frontier of new scientific knowledge [35]. However, given the 41 demographic stratification of scholars within institutions and across countries, it can be difficult 42 to pinpoint the nature of a potential bias. For example, women are underrepresented in 43 prestigious educational institutions [36][37][38], which conflates gender and prestige biases. These 44 institutional differences can be compounded by gendered differences in age, professional encouraged initial submissions due to appeals of initial decisions and other special Starting from the left, an initial submission is first given an initial decision of encourage or reject, and if encouraged, continues through the first full review and subsequent rounds of revision. "Encouraged", "Accepted", "Rejected" and "Revision needed" represent the decisions made by eLife editors and reviewers at each submission stage. A portion of manuscripts remained in various stages of processing at the time of data collection-these manuscripts were labeled as "Decision pending". The status of manuscripts after the second revision is the final status that we consider in the present data. The dashed line delineates full submissions from rejected initial submissions.
The review process at eLife is highly selective, and became more selective over time. Fig 2 145 shows that while the total count of manuscripts submitted to eLife has rapidly increased since 146 the journal's inception, the count of encouraged initial submissions and accepted full 147 submissions has grown more slowly. The encourage rate (percentage of initial submissions In addition to authorship data, we obtained information about the gatekeepers involved in 154 the processing of each submission. In our study, we defined gatekeepers as any Senior Editor or 155 Reviewing Editor at eLife or invited peer reviewer involved in the review of at least one initial 156 or full submission between 2012 and mid-September 2017. Gatekeepers at eLife often served in 157 multiple roles; for example, acting as both a Reviewing Editor and peer reviewer on a given 158 manuscript, or serving as a Senior Editor on one manuscript, but an invited peer review on 159 another. In our sample, the Reviewing Editor was listed as a peer reviewer for 58.9 percent of 160 full submissions. For initial submissions, we had data on only the corresponding author of the 161 manuscript and the Senior Editor tasked with making the decision. For full submissions we had 162 data on the corresponding author, first author, last author, Senior Editor, Reviewing Editor, 163 and members of the team of invited peer reviewers. Data for each individual included their 164 stated name, institutional affiliation, and country of affiliation. A small number of submissions 165 were removed, such as those that had a first but no last author and those that did not have a 166 valid submission type. Country names were manually disambiguated (for example, normalized 167 names such as "USA" to "United States" and "Viet Nam" to "Vietnam"). To simplify 168 continent-level comparisons, we also excluded one submission for which the corresponding 169 author listed their affiliation as Antarctica. 170 Full submissions included 6,669 distinct gatekeepers, 5,694 distinct corresponding authors, 171 7/59 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint 6,691 distinct first authors, and 5,581 distinct last authors. Authors were also likely to appear 172 on multiple manuscripts and may have held a different authorship role in each: whereas our 173 data included 17,966 distinct combinations of author name and role, this number comprised 174 only 12,059 distinct authors. For 26.5 percent of full submissions the corresponding author was 175 also the first author, whereas for 71.2 percent of submissions the corresponding author was the 176 last author. We did not have access to the full authorship list that included middle authors.

177
Note that in the biosciences, the last author is typically the most senior researcher involved [59] 178 and responsible for more conceptual work, whereas the first author is typically less senior and 179 performs more of the scientific labor (such as lab work, analysis, etc.) to produce the 180 study [60][61][62].

181
Gender assignment 182 Gender variables for authors and gatekeepers were coded using an updated version of the 183 algorithm developed in [6]. This algorithm used a combination of the first name and country of 184 affiliation to assign each author's gender on the basis of several universal and country-specific 185 name-gender lists (e.g., United States Census). This list of names was complemented with an 186 algorithm that searched Wikipedia for pronouns associated with names. 187 We validated this new list by applying it to a dataset of names with known gender. We used 188 data collected from RateMyProfessor.com, a website containing anonymous student-submitted 189 ratings and comments for professors, lecturers, and teachers for professors at universities in the 190 United States, United Kingdom, and Canada. We limited the dataset to only individuals with 191 at least five comments, and counted the total number of gendered pronouns that appeared in 192 their text; if the total of one gendered-pronoun type was at least the square of the other, then 193 we assigned the gender of the majority pronoun to the individual. To compare with 194 pronoun-based assignment, we assigned gender using the previously detailed first-name based 195 algorithm. In total, there were 384,127 profiles on RateMyProfessor.com that had at least five 196 comments and for whom pronouns indicated a gender. Our first name-based algorithm assigned 197 a gender of male or female to 91.26 percent of these profiles. The raw match-rate between these 198 two assignments was 88.6 percent. Of those that were assigned a gender, our first name-based 199 assignment matched the pronoun assignment in 97.1 percent of cases, and 90.3 percent of 200 distinct first names. While RateMyProfessor.com and the authors submitting to eLife represent 201 different populations (RateMyProfessor.com being biased towards teachers in the United States, 202 United Kingdom, and Canada), the results of this validation provide some credibility to the 203 first-name based gender assignment used here. 204 We also attempted to manually identify gender for all Senior Editors, Reviewing Editors, 205 invited peer reviewers, and last authors for whom our algorithm did not assign a gender. We 206 used Google to search for their name and institutional affiliation, and inspected the resulting 207 photos and text in order to make a subjective judgment as to whether they were presenting as 208 male or female.

209
Through the combination of manual efforts and our first-name based gender-assignment 210 algorithm, we assigned a gender of male or female to 95.5 percent (n = 35,511) of the 37,198 211 name/role combinations that appeared in our dataset. 26.7 percent (n = 9,910) were assigned a 212 gender of female, 68.8 percent (n = 25,601) were assigned a gender of male, while a gender 213 assignment could be not assigned for the remaining 4.5 percent (n = 1,687). This gender 214 distribution roughly matches the gender distribution observed globally across scientific 215 publications [6]. A breakdown of these gender demographics by role can be found in S1 Table 216 8/59 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint and S2 Table   217 Gender composition of reviewers 218 To examine the relationship between author-gatekeeper gender homogeny on review outcomes, 219 we analyzed the gender composition of the gatekeepers and authors of full submissions. Each 220 manuscript was assigned a reviewer composition category of all-male, all-female, mixed, or 221 uncertain. Reviewer teams labeled all-male and all-female were teams for which we could 222 identify a gender for every member, and for which all genders were identified as either male or 223 female, respectively. Teams labeled as mixed were those teams where we could identify a gender 224 for at least two members, and which had at least one male and at least one female peer 225 reviewer. Teams labeled as uncertain were those teams for which we could not assign a gender 226 to every member and which were not mixed. A full submission was typically reviewed by two to 227 three peer reviewers, which may or may not include the Reviewing Editor. However, the 228 Reviewing Editor was always involved in the review process of a manuscript, and so we always 229 considered the Reviewing Editor as a member of the reviewing team. Of 7,912 full submissions, 230 a final decision of accept or reject was given for 6,590 during the dates analyzed; of these, 47.7 231 percent (n = 3,144) were reviewed by all-male teams, 1.4 percent (n = 93) by all-female teams, 232 and 50.8 percent (n = 3,347) by mixed-gender teams; the remaining six manuscripts had 233 reviewer teams classified as uncertain and were excluded from further analysis.

235
Institutional names for each author were added manually by eLife authors and were thus highly 236 idiosyncratic. Many institutions appeared with multiple name variants (e.g., "UCLA", 237 "University of California, Los Angeles", and "UC at Los Angeles"). In total, there were nearly 238 8,000 unique strings in the affiliation field. We performed several pre-processing steps on these 239 names, including converting characters to lower case, removing stop words, removing 240 punctuation, and reducing common words to abbreviated alternatives (e.g., "university" to 241 "univ"). We used fuzzy-string matching with the Jaro-Winkler distance measure [63] to match 242 institutional affiliations from eLife to institutional rankings in the 2016 Times Higher Education 243 World Rankings. A match was established for 15,641 corresponding authors of initial submission 244 (around 66 percent). Matches for last authors were higher: 5,118 (79 percent) were matched.

245
Institutions were classed into two levels of prestige: "top" institutions were those within the 246 top 50 universities as ranked by the global Times Higher Education. Institutions which ranked 247 below the top 50, or which were otherwise unranked or which were not matched to a Times 248 Higher Education ranking were labeled as "non-top". One limitation of the Times Higher 249 Education ranking as a proxy for institutional prestige is that these rankings cover only 250 universities, excluding many prestigious research institutes. Latitude and longitude of country centroids were taken from Harvard WorldMap [64]; country 257 names in the eLife and Harvard WorldMap dataset were manually disambiguated and then 258 mapped to the country of affiliation listed for each author from eLife (for example, "Czech 259 Republic" from the eLife data was mapped to "Czech Rep." in the Harvard WorldMap data). 260 For each initial submission, we calculated the geographic distance between the centroids of the 261 countries of the corresponding author and Senior Editor; we call this the corresponding 262 author-editor geographic distance. For each full submission, we calculated the sum of the 263 geographic distances between the centroid of the last author's country and the country of each 264 of the reviewers. All distances were calculated in thousands of kilometers; we call this the last 265 author-reviewers geographic distance.

267
We conducted a series of χ 2 tests of equal proportion as well as multiple logistic regression 268 models in order to assess the extent to which the likelihood that an initial submission is 269 encouraged and that a full submission is accepted. We supply p-values and confidence intervals 270 as a tool for interpretation; we generally maintain the convention of 0.05 as the threshold for 271 statistical significance, though we also report and interpret values just outside of this range.

272
When visualizing proportions, 95% confidence intervals are calculated using the definition 273 p ± 1.96 p(1 − p)/n, where p is the proportion and n is the number of observations in the 274 group. When conducting χ 2 tests comparing groups based on gender, we excluded submissions 275 for which no gender could be identified. When conducting tests for gender and country 276 homogeny, we report 95% interval confidence intervals of their difference in proportion-we do 277 not report confidence intervals for tests involving more than two groups. Odds ratios and 278 associated 95% confidence intervals are reported for logistic regression models. Data processing, 279 statistical testing, and visualization was performed using R version 3.4.2 and RStudio version 280 1.1.383.

281
Having demonstrated gender and national inequities in peer review with this exploratory 282 univariate analysis, we built a series of logistic regression models to investigate whether these 283 differences could be explained by other factors. In each model, we used the submission's 284 outcome as the response variable, whether that be encouragement (for initial submissions) or 285 acceptance (for full submissions). For both initial and full submissions, we added control inequities based on author characteristics. We considered predictor variables for the gender and 292 continent of affiliation of the corresponding author (for initial submissions), and the last author 293 (for full submissions). For the second set of models, we investigated whether these inequities 294 differed based on gender or national homogeny between the author and the reviewer or editor. 295 In addition to variables from the first model, we considered several approaches to capture the 296 effect of gender-homogeny between the author and reviewers on peer review inequity (see 297 below). We also included variables for the corresponding author-editor geographic distance (for 298 initial submissions), and last author-reviewers geographic distance (for full submission), and a 299 10/59 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint dummy variable indicating whether this distance was zero; these variables serve as proxies for 300 the degree of national homogeny between the author and the editor or reviewers. There were a 301 small number of Senior Editors in our data-in order to protect their identity we did not 302 include their gender or specific continent of affiliation in any models; we maintained a variable 303 for corresponding author-editor geographic distance.

304
Several approaches were considered for modeling the relationship between equity in peer 305 review and relationship to the reviewer team. The simplest approach-to examine the 306 interaction between author and reviewer characteristic-does not adequately address the 307 research question as it focuses on individual interactions rather than on compositional effects of 308 the reviewer team. Collapsing these into individual interactions (e.g., all-male, mixed, 309 all-female) also fails to address whether there is a difference between these various interactions: 310 this would require a manual comparison and statistical test of parameter estimates from each 311 interaction. This does not provide parsimonious interpretation of the model outcomes.

312
Therefore, we took two complimentary approaches. The first involves the construction of two 313 separate models-one including only submissions reviewed by all men and another including 314 only those reviewed by mixed-gender teams. We then compared the effect of last author gender 315 between each model. A model for all-female reviewers was excluded due to the small sample 316 size (representing less than 2 percent of all submissions). This approach simplifies 317 interpretation compared to a simple interaction model, but still fails to provide a universal test 318 of the interaction between author demographics and reviewer team demographics. The full 319 model contained a categorical variable which included all six combinations of last author gender 320 (male, female) and reviewer team composition (all-male, all-female, mixed).

322
Gatekeeper representation 323 We first analyzed whether the gender and national affiliations of the population of gatekeepers 324 at eLife was similar to that of the authors of initial and full submissions. The population of 325 gatekeepers was primarily comprised of invited peer reviewers, as there were far fewer Senior 326 and Reviewing Editors. A gender and national breakdown by gatekeeper type has been 327 provided in S2 Table, and S3 Table. 328  The population of gatekeepers at eLife was heavily dominated by those from North America, 340 who constituted 59.9 percent (n = 3,992) of the total. Gatekeepers from Europe were the next 341 most represented, constituting 32.4 percent (n = 2,162), followed by Asia with 5.7 percent (n = 342

11/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  Corresponding authorship is divided by those who were among initial submissions, and those who were authors on full submissions. Black dashed lines overlaid on authorship graphs indicate the proportion of gatekeepers within that gendered or continental category. Precise values used in this graph can be found in S1 Table and S4 Table. 378). Individuals from South America, Africa, and Oceania each made up less than two percent 343 of the population of gatekeepers. As with gender, we observed differences between the . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint χ 2 (df= 5, n = 18, 195) = 6738.5, p ≤ 0.00001. The same held for full submissions, with a 349 significant difference for first authors, χ 2 (df= 5, n = 6, 674) = 473.3, p ≤ 0.00001, corresponding 350 authors, χ 2 (df= 5, n = 6, 669) = 330.04, p ≤ 0.00001, and last authors 351 χ 2 (df= 5, n = 5, 595) = 417.2, p ≤ 0.0001. The international representation of gatekeepers was 352 most similar to first and last authorship (full submissions), and least similar to corresponding 353 authorship (initial submissions) due to country-level differences in acceptance rates (see Fig 4). 354 We also note that the geographic composition of submissions to eLife has changed over time, 355 attracting more submissions from authors in Asia in later years of analysis (see S4 Fig).   Each stage of review contributed to the disparity of national representation between initial, 392 full, and accepted submissions, with manuscripts from the United States, United Kingdom, and 393

13/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under   non-statistically-significant. All-female reviewer teams were rare (only 81 of 6,509 processed full 414 submissions). In the few cases of all-female reviewer teams, there was a higher acceptance rate 415 for female last, corresponding, and first authors; however, these differences were not statistically 416 significant, though the number of observations was too small to draw firm conclusions. There 417 was no significant relationship between first authorship gender and acceptance rates, regardless 418 of the gender composition of the reviewer team. In summary, we found that full submissions 419 with male corresponding and last authors were more often accepted under the condition of 420 gender homogeny when they were reviewed by a team of gatekeepers consisting only of men; 421 greater parity in outcomes was observed when gatekeeper teams contained both men and 422 women. We refer to this favoring by reviewers of authors sharing their same gender as 423 homophily. 424 We also investigated the relationship between peer review outcomes and the presence of sharply for Japan and China which had geographic homogeny for only 10.3 and 9.9 percent of 436 full submissions, respectively. More extensive details on the rate of author/reviewer homogeny 437 for each country can be found in S5 Table. 438 We examined whether last author-reviewer country homogeny tended to result in the 439 favoring of submissions from authors of the same country as the reviewer. We first pooled 440 together all authors from all countries (n = 6,508 for which there was a full submission and a 441 final decision), and found that the presence of homogeny during review was associated with a 442 10.0 percentage point higher acceptance rate, ( United States, so this result could potentially reflect the higher acceptance rate for these 445 authors (see Fig 4), rather than homophily overall. Therefore we repeated the test, excluding 446 all full submissions with last authors from the United States, and we again found a significant, 447 though statistically less confident homophilic effect, χ 2 (df= 1, n = 3, 236) = 4.74, 95% CI = 448 15/59 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under   non-significant effect for Canada and Switzerland (French-speaking countries). In summary, the 464 presence of national homogeny was rare unless an author was from the United States, but that 465 the effects of last author-reviewer national homogeny was associated with heterogeneous 466 outcomes, depending on the country. However, due to the rarity of national homogeny outside 467 of the U.S., more data is needed to draw firm conclusions on a per-country basis.

496
The same effects also held for full submissions (Fig 6.B), though with smaller effect sizes.

497
Institutional prestige again had a strong positive effect on the odds of a full submission being 498

17/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Odds Ratio

B. Full Submissions
Direction of Effect A: Odds ratio estimates of logistic regression model of initial submissions using whether the submission was encouraged as the response variable, and available information on the corresponding author as predictors. B: Odds ratio estimates of logistic regression model of full submissions using whether the submission was accepted as the response variable, and available information about the first and last authors as predictors. For both initial and full submissions, control variables included author's institutional prestige, the year of submission, and the submission type. For full submissions, there is also a control variable for the gender of the first author. For continent of affiliation, we held "North America" as the reference level. For submission type, "RA" (research article) was used as the reference level; the submission type "SR" means "Short Reports", and "TR" means "Tools and Resources". Grey points indicate that the effect is non-significant; blue and red points indicate significant positive and negative effects, respectively. The numbers above each point label the size of the effect, as an odds ratio. Bars extending from either side of each point indicate 95% confidence intervals. Asterisks next to each label indicate significance level: "***" = p ≤ 0.001 ; "**" = p ≤ 0.01; "*" = p ≤ 0.05; otherwise, p > 0.05. Tables detailing these effects are included in S6 Table and S7 Table. accepted reflecting that eLife's increasing selectivity also extended to full submissions. Unlike initial 501 submissions, there was no significant differences between types of submissions. We also 502 controlled for the gender of the first author, though we found no significant difference between 503 submissions with male and female first authors, or between female first authors and those with 504 unknown gender. Controlling for these variables, we used this model (Fig 6.B) to confirm the 505 gender and national inequities in full submission outcomes observed in Fig 4. Full submissions 506 with a male last author were associated with a 1.14 times increased odds of being accepted, 507 18/59 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint compared to submissions with female last authors (95% CI = [1.03, 1.26], p = 0.025)-an effect 508 similar in magnitude to that of the corresponding author gender in initial submissions.

509
Geographic inequities were present, though they were less pronounced compared to initial 510 submissions. A full submission with a last author from Africa was associated with a higher odds 511 of being accepted than a submission with a North America last author (β = 1.48, 95% CI =  incorporating additional variables for reviewer characteristics and author-reviewer homogeny. 523 We included the corresponding author-editor geographic distance (for initial submissions), and 524 the last author-reviewers geographic distance (for full submissions); the former is the geographic 525 distance between the centroids of the countries of affiliation of the corresponding author and 526 the Senior Editor, whereas the latter is the sum of the geographic distance between the 527 centroids of the last author's country, and the country of all of the peer reviewers. This variable 528 is intended to model the degree of homogeny between the author and the editor or reviewers. 529 All distances were calculated in thousands of kilometers; for example, the geographic distance 530 between the United States and Denmark is 7.53 thousands of kilometers. For both initial and 531 full submissions, we included a dummy variable indicating whether the distance was zero. For 532 full submissions, we considered three approaches to model the extent to which gender equity 533 differed based on the gender composition of the reviewer team. One approach used interaction 534 terms between the last author gender and the composition of the reviewer team (S1 Text); 535 another compared parameter estimates for last author gender between separate models (S2  variable, in terms of direction and magnitude, as in Fig 6. We did not consider the relationship 544 between the gender of the corresponding author and the gender of the Senior Editor in order to 545 protect the identity of the small number of Senior Editors. Controlling for other variables, zero 546 distance between the corresponding author and Senior Editor (indicating that they were from 547 the same country) was associated with a 1.56 times increased odds of being encouraged (95% CI 548 = [1.01, 1.034], p ≤ 0.0001). Controlling for presence of corresponding author-editor distance, 549 every additional 1,000km of corresponding author-editor geographic distance was associated 550 with a 1.02 times increase in the odds of being encouraged (95% CI [1.45, 1.67], p = 0.0003). We 551 note that these geographic effects may be confounded by the low number of Senior Editors, and 552

19/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint the fact that the majority of Senior Editors were affiliated within North America and Europe. 553 For full submissions, we first modelled peer review outcomes as in Fig 6 but with additional 554 variables for the gender composition of the reviewer team and last author-reviewers geographic 555 distance (see Fig 7.A). The effect of control variables-submission year, submission type, author 556 institutional prestige, and first author gender-were similar to those in Fig 6. A full submission 557 with a male last author was 1.14 times more likely to be accepted than a submission with a 558 female last author (95% CI = [1.020, 1.256], p = 0.032), even after controlling for reviewer-team 559 gender composition. Compared to mixed-gender reviewer teams, submissions reviewed by  To make use of all data in a single regression, we modelled global interactions between last 579 author gender and reviewer-team composition by combining them into a single categorical 580 variable containing all six combinations of factor levels (Fig 7.B). Full submission with a male 581 last author and which were reviewed by a team of all-male reviewers was associated with a 1.22 582 times higher odds of being accepted than a full submission with a female last author that was 583 reviewed by an all male team (95% CI = [1.044, 1.40], p = 0.027). No significant differences 584 were observed for other combinations of author gender and reviewer gender composition. The 585 absolute difference in parameter estimates between male and female authors among 586 mixed-gender teams (0.084) was less than half that of all-male reviewer teams (0.198), 587 suggesting greater equity among submissions reviewed by mixed-gender teams than by all-male 588 teams. Taken together, these findings and those discussed in S1 Text and S2 Text suggest that 589 gender inequity in peer review outcomes were in part mitigated by mixed-gender reviewer    A: Estimates of the logistic regression model of initial submissions using whether the submission was encouraged as the response variable, and available information on the editor and corresponding author as predictors. B: Estimates of the logistic regression model of full submissions using whether the submission was accepted as the response variable, and available information about the first and last authors, and gatekeeper composition as predictors. For both initial and full submissions, control variables included author's institutional prestige, the year of submission, and the submission type. For full submissions, there is also a control variable for the gender of the first author. For continent of affiliation, "North America" was used as the reference level. For submission type, "RA" (research article) was used as the reference level; the submission type "SR" means "Short Reports", and "TR" means "Tools and Resources". For the combination variable of last author gender and reviewer team composition, we held "last author female-all rev. male" as the reference level. Blue points indicate positive effects, whereas red indicates negative effects. The numbers above each point label the size of the effect, as an odds ratio. Bars extending from either side of each point indicate 95% confidence intervals. Asterisks above each label indicate significance level: "***" = p < 0.001; "**" = p < 0.01; "*" = p < 0.05; otherwise, p > 0.05. A table detailing these effects are included in S9 Table. were also observed by country of affiliation. In particular, submissions from developed countries 597

21/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint with high scientific capacities tended to have higher success rates than others. These inequities 598 in peer review outcomes could be attributed, at least in part, to a favorable interaction between 599 gatekeeper and author demographics under the conditions of gender or national homogeny; we 600 describe this favoring as homophily, a preference based on shared characteristics. Gatekeepers 601 were more likely to recommend a manuscript for acceptance if they shared demographic 602 characteristics with the authors, demonstrating homophily. In particular, manuscripts with 603 male senior (last or corresponding) authors were more likely to be accepted if reviewed by an 604 all-male reviewer panel rather than a mixed-gender panel. Similarly, manuscripts were more 605 likely to be accepted if at least one of the reviewers was from the same country as the last or 606 corresponding author, though there were exceptions on a per-country basis (such as France and 607 Canada). We followed our univariate analysis with a regression analysis, and observed evidence 608 that these inequities persisted even when controlling for potentially confounding variables. The 609 differential outcomes on the basis of author-reviewer homogeny suggests that peer review at 610 eLife is influenced by some form of bias-be it implicit bias [3,17], geographic or linguistic 611 bias [26,65,66], or cognitive particularism [40]. Specifically, a homophilic interaction suggests 612 that peer review outcomes may sometimes be based on more than the intrinsic quality of 613 manuscript; the composition of the review team is also related to outcomes in peer review.

614
The opportunity for homophilous interactions is determined by the demographics of the 615 gatekeeper pool. We found that the demographics of the gatekeepers differed significantly from 616 those of the authors, even for last authors, who tend to be more senior [59][60][61][62]. Women were 617 underrepresented among eLife gatekeepers, and gatekeepers tended to come from a small 618 number of highly-developed countries. The underrepresentation of women at eLife mirrors 619 global trends-women comprise a minority of total authorships, yet constitute an even smaller 620 proportion of gatekeepers across many domains [14,[67][68][69][70][71][72][73][74]. Similarly, gatekeepers at eLife were 621 less internationally diverse than their authorship, reflecting the general underrepresentation of 622 the "global south" in leadership positions of international journals [75].

623
The demographics of the reviewer pool made certain authors more likely to benefit from 624 homophily in the review process than others. U.S. authors were much more likely than not 625 (see S5 Table) to be reviewed by a panel with at least one reviewer from the their country. 626 However, the opposite was true for authors from other countries. Fewer opportunities for such 627 homophily may result in a disadvantage for scientists from smaller and less scientifically prolific 628 countries. For gender, male lead authors had a nearly 50 percent chance of being reviewed by a 629 homophilous (all-male), rather than a mixed-gender team. In contrast, because all-female 630 reviewer panels were so rare (accounting for only 81 of 6,509 full submission decisions), female 631 authors were highly unlikely to benefit from homophily in the review process.

632
Increasing eLife's editorial representation of women and scientists from a more diverse set of 633 nations may lead to more diverse pool of peer reviewers and reviewing editors and a more 634 equitable peer review process. Editors often invite peer reviewers from their own professional 635 networks, networks that likely reflect the characteristics of the editor [76][77][78]; this can lead to 636 editors, who tend to be men [14,[67][68][69][70][71][72][73][74] and from scientifically advanced countries [75] to invite 637 peer reviewers who are cognitively or demographically similar to themselves [44,79,80], 638 inadvertently excluding certain groups from the gatekeeping process. Accordingly, we found 639 that male Reviewing Editors at eLife were less likely to create mixed-gender teams of The size of disparities we observed in peer review outcomes may seem modest, however these 646 small disparities can accumulate through each stage of the review process (initial submission, 647 full submission, revisions), and potentially affect the outcomes of many submissions. For 648 example, the overall acceptance rate (the rate at which initial submissions were eventually  [81][82][83][84].

658
Our finding that the gender of the last authors was associated with a significant difference in 659 the rate at which full submissions were accepted at eLife stands in contrast with a number of 660 previous studies of journal peer review; these studies found no significant difference in outcomes 661 of papers submitted by male and female authors [85][86][87], or differences in reviewer's evaluations 662 based on the author's apparent gender [88]. This discrepancy may be explained in part by 663 eLife's unique context, policies, or the relative selectivity of eLife compared to venues where 664 previous studies found gender equity. In addition, our results point to a key feature of study 665 design that may account for some of the differences across studies: the consideration of multiple 666 authorship roles. This is especially important for the biosciences, for which authorship order is 667 strongly associated with contribution [61,62,89]. Whereas our study examined the gender of the 668 first, last, and corresponding authors, most previous studies have focused on the gender of the 669 first author (e.g., [2,90]) or of the corresponding author (e.g., [22,91]). Like previous studies, we 670 observed no strong relationship between first author gender and review outcomes at eLife. Only 671 when considering lead authorship roles-last authorship, and to a lesser extent, corresponding 672 author, did we observe such an effect. Our results may be better compared with studies of 673 grant peer review, where leadership roles are more explicitly defined, and many studies have 674 identified significant disparities in outcomes favoring men [18,[92][93][94][95], although many other 675 studies have found no evidence of gender disparity [21,23,24,[96][97][98]. Given that science has 676 grown increasingly collaborative and that average authorship per paper has expanded [99,100], 677 future studies of disparities would benefit from explicitly accounting for multiple authorship 678 roles and signaling among various leadership positions on the byline [59,101].

679
The relationship we found between the gender and nationality of the gatekeepers and peer 680 review outcomes also stands in contrast to the findings from a number of previous studies. One 681 study, [102], identified a homophilous relationship between female reviewers and female authors. 682 However, most previous analyses found only procedural differences based on the gender of the 683 gatekeeper [22,87,88,103] and identified no difference in outcomes based on the interaction of 684 author and gatekeeper gender in journal submissions [87,104,105] or grant review [23]. Studies 685 of gatekeeper nationality have found no difference in peer review outcomes based on the 686 nationality of the reviewer [104,106], though there is little research on the correspondence 687 between author and reviewer gender. One past study examined the interaction between U.S.

688
and non-U.S. authors and gatekeepers, but found an effect opposite to what we observed, such 689 23/59 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint that U.S. reviewers tended to rate submissions of U.S. authors more harshly than those of 690 non-U.S. authors [43]. Our results also contrast with the study most similar to our own, which 691 found no evidence of bias related to gender, and only modest evidence of bias related to 692 geographic region [2]. These discrepancies may result from our analysis of multiple author roles. 693 Alternatively, they may result from the unique nature of eLife's consultative peer review; the 694 direct communication between peer reviewers compared to traditional peer review may render 695 the social characteristics of reviewers more influential.

697
There are limitations of our methodology that must be considered. First, we have no objective 698 measure of the intrinsic quality of manuscripts. Therefore, it is not clear which review condition 699 (homophilic or non-homophilic) more closely approximates the ideal of merit-based peer review 700 outcomes. Second, measuring the relationship between reviewer and author demographics on 701 peer review outcomes cannot readily detect biases that are shared by all reviewers/gatekeepers 702 (e.g., if all reviewers, regardless of gender, favored manuscripts from male authors); hence, our 703 approach could underestimate the influence of bias. Third, our analysis is observational, so we 704 cannot establish causal relationships between success rates and authors or gatekeeper 705 demographics-there remain potential confounding factors that we were unable to control for in 706 the present analysis, such as the gender distribution of submission by country (see S5 Fig). 707 Along these lines, the reliance on statistical tests with arbitrary significance thresholds may 708 provide misleading results (see [107]), or obfuscate statistically weak but potentially important 709 relationships. Fourth, our gender-assignment algorithm is only a proxy for author gender and 710 varies in reliability by continent.

711
Further studies will be required to determine the extent to which the effects we observed 712 generalize to other peer review contexts. Specific policies at eLife, such as their consultative 713 peer review process, may contribute to the effects we observed. Other characteristics of eLife 714 may also be relevant, including its level of prestige [13], and its disciplinary specialization in the 715 biological sciences, whose culture may differ from other scientific and academic disciplines. It is 716 necessary to see the extent to which the findings here are particularistic or generalizeable; it 717 may also be useful in identifying explanatory models. Future work is necessary to confirm and 718 expand upon our findings, assess the extent to which they can be generalized, establish causal 719 relationships, and mitigate the effects of these methodological limitations. To aid in this effort, 720 we have made as much as possible of the data and analysis publicly available at: publishing. [47,50,[108][109][110][111], which can affect the quantity and perceived quality of submitted 725 manuscripts. However, these structural factors do not readily account for the observed 726 relationship between gatekeeper and author demographics associated with peer review outcomes 727 at eLife; rather, biases related to the personal characteristics of the authors and gatekeepers are 728 likely to play some role in peer review outcomes. 729 Our results suggest that it is not only the form of peer review that matters, but also the 730 composition of reviewers. Homophilous preferences in evaluation are a potential mechanism 731 24/59 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint underpinning the Matthew Effect [1] in academia. This effect entrenches privileged groups 732 while potentially limiting diversity, which could hinder scientific production, since diversity may 733 lead to better working groups [112] and promote high-quality science [113,114]. Increasing 734 gender and international representation among scientific gatekeepers may improve fairness and 735 equity in peer review outcomes and accelerate scientific progress. However, this must be 736 carefully balanced to avoid overburdening scholars from minority groups with disproportionate 737 service obligations.

738
Although some journals and publishers, such as eLife and Frontiers Media, have begun 739 providing peer review data to researchers (see [44,115]), data on equity in peer review outcomes 740 is currently available only for a small fraction of journals and funders. While many journals 741 collect these data internally, they are not usually standardized or shared publicly. One group, 742 PEERE, authored a protocol for open sharing of peer review data [116,117], though this 743 protocol is recent, and the extent to which it will be adopted remains uncertain. To both 744 provide better benchmarks and to incentivize better practices, journals should make analyses on 745 author and reviewer demographics publicly available. These data include, but would not be 746 limited to, characteristics such as gender, race, sexual orientation, seniority, and institution and 747 country of affiliation. It is likely that privacy concerns and issues relating to confidentiality will 748 limit the full availability of the data; but analyses that are sensitive to the vulnerabilities of 749 smaller populations should be conducted and made available as benchmarking data. As these 750 data become increasingly available, systematic reviews can be useful in identifying general 751 patterns across disciplines and countries.

752
Some high-profile journals have experimented with implementing double-blind peer review as 753 a potential solution to inequities in publishing, including Nature [118] and eNeuro [12], though 754 in some cases with low uptake [119]. Our findings of homophilic effects may suggest that 755 single-blind review is not the optimal form of peer review; however, our study did not directly 756 test whether homophily persists in the case of double blind review. If homophily is removed in 757 double-blind review, it reinforces the interpretation of bias; if it is maintained, it would suggest 758 other underlying attributes of the manuscript that may be contributing to homophilic effects. 759 Double-blind peer review is viewed positively by the scientific community [120,121], and some 760 studies have found evidence that double-blind review mitigates inequities that favor famous 761 authors, elite institutions [85,122,123], and those from high-income and English-speaking 762 nations [28] 763 There may be a tension, however, in attempting to further blind peer review while other 764 aspects of the scientific system become more open. More than 20 percent of eLife papers that 765 go out for review, for example, are already available as preprints. Several statements required 766 for the responsible conduct of research-e.g., conflicts of interest, funding statements, and other 767 ethical declarations-complicate the possibility of truly blind review. Other options involve 768 making peer review more open-one recent study showed evidence that more open peer review 769 did not compromise the integrity or logistics of the process, so long as reviewers could maintain 770 anonymity [124].

771
Other alternatives to traditional peer review have also been proposed, including study 772 pre-registration, consultative peer review, and hybrid processes (eg: [58,[125][126][127][128][129]), as well as 773 alternative forms of dissemination, such as preprint servers (e.g., arXiv, bioRxiv). Currently, 774 there is little empirical evidence to determine whether these formats constitute less biased or 775 more equitable alternatives [3]. In addition, journals are analyzing the demographics of their 776 published authorship and editorial staff in order to identify key problem areas, focus initiatives, 777

25/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint and track progress in achieving diversity goals [14,79,86]. More work should be done to study 778 and understand the issues facing peer review and scientific gatekeeping in all its forms and to 779 promote fair, efficient, and meritocratic scientific cultures and practices. Editorial bodies should 780 craft policies and implement practices to mitigate disparities in peer review; they should also 781 continue to be innovative and reflective about their practices to ensure that papers are accepted 782 on scientific merit, rather than particularistic characteristics of the authors. 783

26/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint Supporting information 784 S1 Text Modelling homogeny using main effects with interaction term. Using 785 logistic regression, we attempted to model the degree to which gender equity in peer review 786 outcomes differed based on the composition of the reviewer team in order to verify the inequity 787 observed in Fig 5. Fig 7.A demonstrates that last author gender inequity persisted even when 788 controlling for the gender composition of the reviewer team, but did not address the degree to 789 which this equity manifests in submissions reviewed by all-male vs. mixed-gender reviewer 790 teams. Given that there is no established method of addressing this question, we considered 791 several approaches. The first approach modelled the interaction between last author gender and 792 the gender-composition of the reviewer team (see S9 Table), however this approach proved 793 difficult to interpret: adding the interaction term appeared to suppress the main effects of last 794 author gender and reviewer team composition observed in Fig 7.A, though the corresponding 795 ANOVA table demonstrated these effects to still account for a significant amount of deviance 796 (see S10 Table). There were no significant interaction term, conflicting with Fig 5; However, we 797 note the vastly different sample sizes between reviewer-team gender composition groups: half of 798 the manuscripts were reviewed by mixed-gender teams and slightly less than half by all-male 799 teams. All-female teams comprised less than two percent of all reviews. Therefore, a low sample 800 size across interaction groups further complicates interpretation. Moreover, this approach 801 modelled individual-level interactions between the author and reviewer composition on a 802 per-submission basis, not differences in group-level estimates of inequity.

803
S2 Text Modelling homogeny using separately trained models. S9 Table shows the 804 results of two logistic regression models constructed as in fig 7.A, but each calculated using only 805 full submissions reviewed by either all-male or mixed-gender reviewer teams. In the all-male 806 model, a male last author was associated with a 1.23 times increased odds of acceptance (95% 807 CI = [1.05, 1.41], p = 0.027) compared to a female last author; in contrast, no significant 808 difference was observed between male and female last authors in the model containing only 809 mixed-gender reviewer teams. This approach, which more appropriately addresses our research 810 question than the interaction model, affirms the findings of Fig 5. However, interpretation of S9 811 Table is complicated by possible population differences between groups as well as the different 812 amount of data used to fit each model, n=3,090 for the all-male reviewer model and n = 3,280 813 for the mixed-gender reviewer model.

Gender of Author 831
Submission and success rates by gender of corresponding, first, and last author.

832
Proportion of initial submissions, encourage rate, overall acceptance rate, and acceptance rate 833 of full submissions by the gender of the corresponding author, first author, and last author.

32/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  Table. Model coefficients of initial submissions-author characteristics: Odds 908 ratio, associated confidence intervals, and model diagnostics for logistic regression model using 909 the encouragement of initial submission as a response variable. Predictor variables include 910 control variables of the submission year and type, and variables capturing author characteristics. 911 For continent of affiliation, "North America" was used as the reference level. For submission 912 type, "RA" (research article) was used as the reference level; the submission type "SR" means 913 "Short Reports", and "TR" means "Tools and Resources". 914

41/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under variables of the submission year and type, and variables capturing author characteristics. For 918 continent of affiliation, "North America" was used as the reference level. For submission type, 919 "RA" (research article) was used as the reference level; the submission type "SR" means "Short 920 Reports", and "TR" means "Tools and Resources". Notes: * P < .05 * * P < .01 * * * P < .001

43/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint S9 Table. Model coefficients of regressions on full submissions: Odds ratio, 930 associated confidence intervals, and model diagnostics for logistic regression model using the 931 acceptance of full submission as the response variable. Control variables include the submission 932 year, submission type, last author institutional prestige, and the gender of the first author. 933 Other predictor variables include the gender of the last author, continent of affiliation of the 934 last author, gender-composition of the reviewers, the last author-reviewers geographic distance, 935 and variables attempting to capture the gender equity by reviewer-team composition group.

936
Five models are presented: the first (Main Effects) shows only the main effects for the model 937 including all full submissions without any additional manipulation or variables (1); the second 938 model (2, With interaction) models the main effects as well as an interaction term between last 939 author gender and the gender composition of the reviewer team (an ANOVA table for this 940 model has been provided in S10 Table; the next two models were separately trained on only 941 submissions reviewed by all-male reviewer teams (3) and only submission trained on 942 mixed-gender reviewer teams (4), respectively; the last model (5) models gender equity between 943 reviewer-composition groups using a new variable with all combinations of author and reviewer 944 gender. For continent of affiliation, "North America" was used as the reference level. For 945 submission type, "RA" (research article) was used as the reference level; the submission type 946 "SR" means "Short Reports", and "TR" means "Tools and Resources". For the combination 947 variable of last author gender and reviewer team composition, we held "last author female, all 948 rev. male" as the reference level. Missing cells indicates that the corresponding variable was not 949 part of that model. 950

45/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Notes: * P < .05 * * P < .01 * * * P < .001

46/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint S10 Table. ANOVA table for

47/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint S11 Table. Model coefficients of full submissions-author characteristics and 955 reviewing-editor only homogeny: Odds ratio, associated confidence intervals, and model 956 diagnostics for logistic regression model using the encouragement of full submission as a 957 response variable. Predictor variables include control variables of the submission year and type, 958 and variables capturing author characteristics and homogeny between the author and reviewing 959 editor only. For continent of affiliation, "North America" was used as the reference level. For 960 submission type, "RA" (research article) was used as the reference level; the submission type 961 "SR" means "Short Reports", and "TR" means "Tools and Resources". This regression models 962 gender equity between reviewer composition groups using a new variable containing all 963 combinations of last author gender and reviewer team composition; for this new categorical 964 variable, we used "last author female, all male reviewers" as the reference level. 965

48/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Notes: * P < .05 * * P < .01 * * * P < .001

49/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019.

51/59
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted April 11, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint