Author-Reviewer Homophily in Peer Review

The fairness of scholarly peer review has been challenged by evidence of disparities in publication outcomes based on author demographic characteristics. To assess this, we conducted an exploratory analysis of peer review outcomes of 23,876 initial submissions and 7,192 full submissions that were submitted to the biosciences journal eLife between 2012 and 2017. Women and authors from nations outside of North America and Europe were underrepresented both as gatekeepers (editors and peer reviewers) and authors. We found evidence of a homophilic relationship between the demographics of the gatekeepers and authors and the outcome of peer review; that is, there were higher rates of acceptance in the case of gender and country homophily. The acceptance rate for manuscripts with male last authors was seven percent, or 3.5 percentage points, greater than for female last authors (95% CI = [0 . 5 , 6 . 4]); this gender inequity was greatest, at nine percent or about 4.8 percentage points (95% CI = [0 . 3 , 9 . 1]), when the team of reviewers was all male; this difference was smaller and not significantly different for mixed-gender reviewer teams. Homogeny between countries of the gatekeeper and the corresponding author was also associated with higher acceptance rates for many countries. To test for the persistence controlling for many potentially confounding factors. These results provide evidence affirming observations from the univariate analysis in Fig 6.

Introduction portion of initial submission (n = 147) and full submissions (n = 602) remained in various 133 stages of processing and deliberation (without final decisions). On average, full submissions 134 that were ultimately accepted underwent 1.23 revisions and, within our dataset, 3,426 full 135 submissions were eventually accepted to be published. A breakdown of the number of revisions 136 requested before a final decision was made, by gender and country of affiliation of the last 137 author, is provided in S1 Fig. A portion of initial and full submissions (n = 619) appealed their 138 decision, causing some movement from decisions of "Reject" to decisions of "Accept" or 139 "Revise"; counts of appeals by the gender of author and gatekeepers is shown in S2 Fig.   140 The review process at eLife is highly selective, and became more selective over time.  In addition to authorship data, we obtained information about the gatekeepers involved in 150 the processing of each submission. We defined gatekeepers as any Senior Editor or Reviewing 151 Editor at eLife or invited peer reviewer involved in the review of at least one initial or full 152 submission between 2012 and mid-September 2017. Gatekeepers at eLife often served in 153 multiple roles; for example, acting as both a Reviewing Editor and peer reviewer on a given 154 manuscript, or serving as a Senior Editor on one manuscript, but an invited peer review on 155 another. In our sample, the Reviewing Editor was listed as a peer reviewer for 58.9 percent of 156 full submissions. For initial submissions, we had data on only the corresponding author of the 157 manuscript and the Senior Editor tasked with making the decision. For full submissions we had 158 data on the corresponding author, first author, last author, Senior Editor, Reviewing Editor, 159 and members of the team of invited peer reviewers. Data for each individual included their 160 stated name, institutional affiliation, and country of affiliation. A small number of submissions 161 were removed, such as those that had a first but no last author (reflecting compromised data 162 record-even a single-authored manuscript should duplicate authors across all roles) and those 163 that did not have a valid submission type. Country names were manually disambiguated (for 164 example, by normalizing names such as "USA" to "United States" and "Viet Nam" to 165 "Vietnam"). To simplify continent-level comparisons, we also excluded one submission for which 166 the corresponding author listed their affiliation as Antarctica. 167 Full submissions included 6,669 distinct gatekeepers, 5,694 distinct corresponding authors, 168 6,691 distinct first authors, and 5,581 distinct last authors. Authors were also likely to appear 169 on multiple manuscripts and may have held a different authorship role in each: whereas our 170 data included 17,966 distinct combinations of author name and role, this number comprised 171 only 12,059 distinct authors. For 26.5 percent of full submissions the corresponding author was 172 also the first author, whereas for 71.2 percent of submissions the corresponding author was the 173 last author. We did not have access to the full authorship list that included middle authors. 174 Note that in the biosciences, the last author is typically the most senior researcher involved [59] 175 and responsible for more conceptual work, whereas the first author is typically less senior and 176 performs more of the scientific labor (such as lab work, analysis, etc.) to produce the 177 study [60][61][62]. Starting from the left, an initial submission is first given an initial decision of encourage or reject, and if encouraged, continues through the first full review and subsequent rounds of revision. "Encouraged", "Accepted", "Rejected" and "Revision needed" represent the decisions made by eLife editors and reviewers at each submission stage. A portion of manuscripts remained in various stages of processing at the time of data collection-these manuscripts were labeled as "Decision pending". The status of manuscripts after the second revision is the final status that we consider in the present data. The dashed line delineates full submissions from rejected initial submissions.

7/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint Gender assignment 179 Gender variables for authors and gatekeepers were coded using an updated version of the 180 algorithm developed in [6]. This algorithm used a combination of the first name and country of 181 affiliation to assign each author's gender on the basis of several universal and country-specific 182 name-gender lists (e.g., United States Census). This list of names was complemented with an 183 algorithm that searched Wikipedia for pronouns associated with names. 184 We validated this new list by applying it to a dataset of names with known gender. We used 185 data collected from RateMyProfessor.com, a website containing anonymous student-submitted 186 ratings and comments for professors, lecturers, and teachers for professors at universities in the 187 United States, United Kingdom, and Canada. We limited the dataset to only individuals with 188 at least five comments, and counted the total number of gendered pronouns that appeared in 189 their text; if the total of one gendered-pronoun type was at least the square of the other, then 190 we assigned the gender of the majority pronoun to the individual. To compare with 191 pronoun-based assignment, we assigned gender using the previously detailed first-name based 192 algorithm. In total, there were 384,127 profiles on RateMyProfessor.com that had at least five 193

8/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint comments and for whom pronouns indicated a gender. Our first name-based algorithm assigned 194 a gender of male or female to 91.26 percent of these profiles. The raw match-rate between these 195 two assignments was 88.6 percent. Of those that were assigned a gender, our first name-based 196 assignment matched the pronoun assignment in 97.1 percent of cases, and 90.3 percent of 197 distinct first names. While RateMyProfessor.com and the authors submitting to eLife represent 198 different populations (RateMyProfessor.com being biased towards teachers in the United States,199 United Kingdom, and Canada), the results of this validation provide some credibility to the 200 first-name based gender assignment used here. 201 We also attempted to manually identify gender for all Senior Editors, Reviewing Editors, 202 invited peer reviewers, and last authors for whom our algorithm did not assign a gender. We 203 used Google to search for their name and institutional affiliation, and inspected the resulting 204 photos and text in order to make a subjective judgment as to whether they were presenting as 205 male or female. 206 Through the combination of manual efforts and our first-name based gender-assignment 207 algorithm, we assigned a gender of male or female to 95.5 percent (n = 35,511) of the 37,198 208 name/role combinations that appeared in our dataset. 26.7 percent (n = 9,910) were assigned a 209 gender of female, 68.8 percent (n = 25,601) were assigned a gender of male, while a gender 210 assignment could be not assigned for the remaining 4.5 percent (n = 1,687). This gender 211 distribution roughly matches the gender distribution observed globally across scientific 212 publications [6]. A breakdown of these gender demographics by role can be found in S1 Table 213 and S2 Table. 214 Gender composition of reviewers 215 To assess the relationship between author-gatekeeper gender homogeny and review outcomes, 216 we analyzed the gender composition of the gatekeepers and authors of full submissions. Each 217 manuscript was assigned a reviewer composition category of all-male, all-female, mixed, or 218 uncertain. Reviewer teams labeled all-male and all-female were teams for which we could 219 identify a gender for every member, and for which all genders were identified as either male or 220 female, respectively. Teams labeled as mixed were those teams for which we could identify a 221 gender for at least two members, and which had at least one male and at least one female peer 222 reviewer. Teams labeled as uncertain were those teams for which we could not assign a gender 223 to every member and which were not mixed. A full submission was typically reviewed by two to 224 three peer reviewers, which may or may not expicitely include the Reviewing Editor. However, 225 the Reviewing Editor was always involved in the review process of a manuscript, and so we 226 always considered the Reviewing Editor as a member of the reviewing team. Of 7,912 full 227 submissions, a final decision of accept or reject was given for 6,590 during the dates analyzed; of 228 these, 47.7 percent (n = 3,144) were reviewed by all-male teams, 1.4 percent (n = 93) by 229 all-female teams, and 50.8 percent (n = 3,347) by mixed-gender teams; the remaining six 230 manuscripts had reviewer teams classified as uncertain and were excluded from further analysis. 231

232
Institutional names for each author were added manually by eLife authors and were thus highly 233 idiosyncratic. Many institutions appeared with multiple name variants (e.g., "UCLA", 234 "University of California, Los Angeles", and "UC at Los Angeles"). In total, there were nearly 235 names, including converting characters to lower case, removing stop words, removing 237 punctuation, and reducing common words to abbreviated alternatives (e.g., "university" to 238 "univ"). We used fuzzy-string matching with the Jaro-Winkler distance measure [63] to match 239 institutional affiliations from eLife to institutional rankings in the 2016 Times Higher Education 240 World Rankings. A match was established for 15,641 corresponding authors of initial submission 241 (around 66 percent). Matches for last authors were higher: 5,118 (79 percent) were matched.

242
Institutions were classed into two levels of prestige: "top" institutions were those within the 243 top 50 universities as ranked by the global Times Higher Education rankings. Institutions 244 which ranked below the top 50, or which were otherwise unranked or which were not matched 245 to a Times Higher Education ranking were labeled as "non-top". One limitation of the Times 246 Higher Education ranking as a proxy for institutional prestige is that these rankings cover only 247 universities, excluding many prestigious research institutes. Latitude and longitude of country centroids were taken from Harvard WorldMap [64]; country 254 names in the eLife and Harvard WorldMap dataset were manually disambiguated and then 255 mapped to the country of affiliation listed for each author from eLife (for example, "Czech 256 Republic" from the eLife data was mapped to "Czech Rep." in the Harvard WorldMap data). 257 For each initial submission, we calculated the geographic distance between the centroids of the 258 countries of the corresponding author and Senior Editor; we call this the corresponding 259 author-editor geographic distance. For each full submission, we calculated the sum of the 260 geographic distances between the centroid of the last author's country and the country of each 261 of the reviewers. All distances were calculated in thousands of kilometers; we call this the last 262 author-reviewers geographic distance.

264
We conducted a series of χ 2 tests of equal proportion as well as several logistic regression 265 models in order to assess the likelihood that an initial submission is encouraged and that a full 266 submission is accepted, as a function of author and gatekeeper characteristics. We supply 267 p-values and confidence intervals as a tool for interpretation; we generally maintain the 268 convention of 0.05 as the threshold for statistical significance, though we also report and 269 interpret values just outside of this range. When visualizing proportions, 95% confidence 270 intervals are calculated using the definition p ± 1.96 p(1 − p)/n, where p is the proportion and 271 n is the number of observations in the group. When conducting χ 2 tests comparing groups 272 based on gender, we excluded submissions for which no gender could be identified. When 273 conducting tests for gender and country homogeny, we report 95% interval confidence intervals 274 of their difference in proportion-we do not report confidence intervals for tests involving more 275 than two groups. Odds ratios and associated 95% confidence intervals are reported for logistic 276 regression models. Data processing, statistical testing, and visualization was performed using R 277 version 3.4.2 and RStudio version 1.1.383.
Having conducted an exploratory analysis of gender and country inequities in peer review 279 with this univariate approach, we built a series of logistic regression models to investigate 280 whether these differences could be explained by other factors. In each model, we used the 281 submission's outcome as the response variable, whether that be encouragement (for initial 282 submissions) or acceptance (for full submissions). For both initial and full submissions, we 283 added control variables for the year of submission (measured from 0 to 5, representing 2012 to 284 2017, accordingly), the type of the submission (Research Article, Short Report, or Tools and 285 Resources), and the institutional prestige of the author (top vs non-top). For full submissions, 286 we also controlled for the gender of the first author. Mirroring the univariate analysis, we 287 constructed two sets of models. The first set of models investigates the extent of peer review 288 inequities based on author characteristics. We considered predictor variables for the gender and 289 continent of affiliation of the corresponding author (for initial submissions), and the last author 290 (for full submissions). For the second set of models, we investigated whether these inequities 291 differed based on gender or country homogeny between the author and the reviewer or editor. 292 In addition to variables from the first model, we considered several approaches to capture the 293 effect of gender-homogeny between the author and reviewers on peer review inequity (see 294 below). We also included variables for the corresponding author-editor geographic distance (for 295 initial submissions), and last author-reviewers geographic distance (for full submission), and a 296 dummy variable indicating whether this distance was zero; these variables serve as proxies for 297 the degree of country homogeny between the author and the editor or reviewers. There were a 298 small number of Senior Editors in our data-in order to protect their identity we did not 299 include their gender or specific continent of affiliation in any models; we maintained a variable 300 for corresponding author-editor geographic distance.

301
Several approaches were considered for modeling the relationship between equity in peer 302 review and the composition of the reviewer team using logistic regression. Approaches such as 303 modelling equity using simple interaction terms or with a two-model approach were also 304 considered but were ultimately excluded due to methodological and interpretive constraints 305 (see S1 Text and S2 Text for more discussion of these models and their results). A third 306 approach modelled equity across groups as a categorical variable consisting of all six 307 combinations of last author gender (male, female) and reviewer team composition (all-male, 308 all-female, mixed); This approach provides a more interpretable means of testing the extent to 309 which gender equity in success rates was related to the interaction between author and reviewer 310 team demographics, and was the focus of our analysis.

312
Gatekeeper representation 313 We first analyzed whether the gender and countries of affiliation of the population of 314 gatekeepers at eLife was similar to that of the authors of initial and full submissions. The 315 population of gatekeepers comprised primarily of invited peer reviewers, as there were far fewer 316 Senior and Reviewing Editors. A gender and country breakdown by gatekeeper type has been 317 provided in S2 Table, and S3 Table.  . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint authors (includes authors of initial submissions), 33.9 percent (n = 2,272) of first authors, and 322 24.0 percent (n = 1,341) of last authors. For initial submissions, we observed a strong difference 323 between the gender composition of gatekeepers and corresponding authors, 324 χ 2 (df= 1, n = 17, 119) = 453.9, p ≤ 0.00001. The same held for full submissions, with a strong 325 difference for first authorship, χ 2 (df= 1, n = 6, 153) = 844.4, p ≤ 0.0001; corresponding 326 authorship, χ 2 (df= 1, n = 6, 647) = 330.04, p ≤ 0.0001; and last authorship, 327 χ 2 (df= 1, n = 5, 292) = 17.7, p ≤ 0.00003. Thus, the gender proportions of gatekeepers at eLife 328 was male-skewed in comparison to the authorship profile.

329
The population of gatekeepers at eLife was heavily dominated by those from North America, 330 who constituted 59.9 percent (n = 3,992) of the total. Gatekeepers from Europe were the next 331 most represented, constituting 32.4 percent (n = 2,162), followed by Asia with 5.7 percent (n = 332 378). Individuals from South America, Africa, and Oceania each made up less than two percent 333 of the population of gatekeepers. As with gender, we observed differences between the country 334 composition of gatekeepers and that of the authors. Gatekeepers from North America were 335 over-represented whereas gatekeepers from Asia and Europe were under-represented for all 336 authorship roles. For initial submissions, there was a significant difference in the distribution of 337 corresponding authors compared to gatekeepers χ 2 (df= 5, n = 18, 195) = 6738.5, p ≤ 0.00001. 338 The same held for full submissions, with a significant difference for first authors, 339 χ 2 (df= 5, n = 6, 674) = 473.3, p ≤ 0.00001, corresponding authors, 340 χ 2 (df= 5, n = 6, 669) = 330.04, p ≤ 0.00001, and last authors 341 χ 2 (df= 5, n = 5, 595) = 417.2, p ≤ 0.0001. The international representation of gatekeepers was 342 most similar to first and last authorship (full submissions), and least similar to corresponding 343 authorship (initial submissions) due to country-level differences in acceptance rates (see Fig 4). 344 We also note that the geographic composition of submissions to eLife has changed over time,    Gender disparity was only apparent in the senior authorship roles. . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  percentages exclude those for whom no gender was identified. B: proportion of people with countries of affiliation within each of six continents in the population of distinct gatekeepers, and for the population of distinct corresponding, first, and last authors. Black dashed lines overlaid on authorship graphs indicate the proportion of gatekeepers within that gendered or continental category. Values used in this graph can be found in S1 Table and S4 Table. Code to reproduce this figure can be found on the linked Github repository at the path figures/gatekeeper representation/gatekeeper representation.rmd.

13/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint however, a significant gender inequity in full submission outcomes for last authors, as also 367 observed for corresponding authors-the acceptance rate of full submissions was 3.5 percentage 368 points higher for male as compared to female last authors-53.5 to 50.0 percent, 369 χ 2 (df= 1, n = 6, 505) = 5.55, 95% CI = [0.5, 6.4], p = 0.018.   Each stage of review contributed to the disparity of country representation between initial, 384 full, and accepted submissions, with manuscripts from the United States, United Kingdom, and 385 Germany more often encouraged as initial submissions and accepted as full submissions.   . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  A: Estimates of a logistic regression model of initial submissions using whether the submission was encouraged as the response variable, and available information on the corresponding author as predictors. B: Estimates of a logistic regression model of full submissions using whether the submission was accepted as the response variable, and available information about the first and last authors as predictors. For both initial and full submissions, control variables included author's institutional prestige, the year of submission, and the submission type. For full submissions, there is also a control variable for the gender of the first author. For continent of affiliation, we held "North America" as the reference level. For submission type, "RA" (research article) was used as the reference level; the submission type "SR" means "Short Reports", and "TR" means "Tools and Resources". Blue, red, and grey points indicate positive, negative, and non-significant effects, respectively. The numbers above each point label the size of the effect, as an odds ratio. Bars extending from either side of each point indicate 95% confidence intervals. Asterisks next to each label indicate significance level: "***" = p ≤ 0.001 ; "**" = p ≤ 0.01; "*" = p ≤ 0.05; otherwise, p > 0.05. Some confidence intervals are cropped; a table detailing full effects are included in S6 Table and S7 Table. Code to reproduce this figure can be found on the linked Github repository at the path figures/regression analysis/regression analysis simple.rmd.

16/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019.  For mixed-gender reviewer teams, the disparity in author success rates by 439 gender was smaller and not statistically-significant. All-female reviewer teams were too rare to 440 draw firm conclusions (only 81 of 6,509 processed full submissions), but in the few cases of 441 all-female reviewer teams, there was a higher acceptance rate for female last, corresponding, 442 and first authors that did not reach statistical significance. There was no significant 443 relationship between first authorship gender and acceptance rates, regardless of the gender 444 composition of the reviewer team. In sum, greater parity in outcomes was observed when 445 gatekeeper teams contained both men and women. Notably, the acceptance rate for female 446 authors was not lower for all-male reviewer teams compared with mixed reviewer teams, rather 447 the gender disparity arose from a higher acceptance rate for submissions from male authors 448 when they were reviewed by a team of all-male reviewers. We refer to this favoring by reviewers 449 of authors sharing their same gender as homophily.

450
Homophily was also evident in the relationship between peer review outcomes and the 451 presence of country homogeny between the last author and reviewer. We defined last 452 author-reviewer country homogeny as a condition for which at least one member of the reviewer 453 team (Reviewing Editor and peer reviewers) listed the same country of affiliation as the last 454 author. We only considered the country of affiliation of the last author, since it was the same as 455 that of the first and corresponding author for 98.4 and 94.9 percent of full submissions, 456 respectively. Outside of the United States, the presence of country homogeny during review was 457

17/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under    respectively. More extensive details on the rate of author/reviewer homogeny for each country 463 can be found in S5 Table. 464 Last author-reviewer country homogeny tended to result in the favoring of submissions from 465 authors of the same country as the reviewer. We first pooled together all authors from all 466 countries (n = 6,508 for which there was a full submission and a final decision), and found that 467 the presence of homogeny during review was associated with a 10.0 percentage point higher  However, most cases of homogeny occurred for authors from the United States, so this result 470 could potentially reflect the higher acceptance rate for these authors (see Fig 4), rather than 471 homophily overall. Therefore we repeated the test, excluding all full submissions with last 472 authors from the United States, and we again found a significant, though statistically less  To further assess the contribution of author-reviewer homogeny to inequity in peer review 494 outcomes, we extended the logistic regression approach shown in Fig 5. For full submissions, we 495 compared two logistic regression models, one that considered author-reviewer geographic 496 homogeny but only main effects of reviewer team gender composition (Fig 7.A) and one that 497 included terms to model the effects of author-reviewer geographic and gender homogeny 498 (Fig 7.B). To model the extent to which gender equity differed based on the gender composition 499 of the reviewer team, we modelled interactions using a variable combining factor levels for last 500 author gender and reviewer team composition (Fig 7.B). To model the degree of country 501 homogeny between the author and the author and the reviewers, we included in the model the 502 last author-reviewers geographic distance, defined as the sum of the geographic distance 503

19/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint between the centroids of the last author's country, and the country of all of the peer reviewers. 504 All distances were calculated in thousands of kilometers; for example, the geographic distance 505 between the United States and Denmark is 7.53 thousands of kilometers. We included a dummy 506 variable indicating whether the distance was zero. A similar analysis was performed to assess 507 the effect of author-editor homogeny on the outcomes of initial submissions (S8 Table); this 508 excludes any analysis of homophily between the author and Senior Editor in order to protect 509 the identity of the small number of Senior Editors.  submission. A last author-reviewers geographic distance of zero (indicating that all reviewers 534 were from the same country as the corresponding author) was not associated with a strong 535 effect beyond that predicted by distance. 536 Finally, we modelled interactions between last author gender and reviewer-team composition 537 by combining them into a single categorical variable containing all six combinations of factor 538 levels (Fig 7.B). Full submissions with a male last author and which were reviewed by a team of 539 all-male reviewers was associated with a 1.22 times higher odds of being accepted than a full 540 submission with a female last author that was reviewed by an all male team (95% CI = 541 [1.044, 1.40], p = 0.027). No significant differences were observed for other combinations of 542 author gender and reviewer gender composition. The absolute difference in parameter estimates 543 between male and female authors among mixed-gender teams (0.084) was less than half that of 544 all-male reviewer teams (0.198), suggesting greater equity among submissions reviewed by 545 mixed-gender teams than by all-male teams. Taken together, these findings suggest that gender 546 inequity in peer review outcomes tended to be smaller for mixed-gender reviewer teams, even 547 controlling for many potentially confounding factors. These results provide evidence affirming 548 observations from the univariate analysis in  Estimates of logistic regression models of full submissions using whether the submission was accepted as the response variable. A: Includes as predictors the demographic and geographic characteristics of last author and gatekeepers, along with an indicator or the level of last authorreviewer geographic homogeny. B: Includes all predictors as in A but with the last author gender and reviewer gender composition combined into a single, six-level categorical variable. Control variables for both panels include author's institutional prestige, year of submission, submission type, and gender of the first author. For continent of affiliation, "North America" was used as the reference level. For submission type, "RA" (research article) was used as the reference level; the submission type "SR" means "Short Reports", and "TR" means "Tools and Resources". For the combination variable of last author gender and reviewer team gender composition, we held "last author female-all rev. male" as the reference level. Blue and red points indicate positive and negative effects, respectively. The numbers above each point are the size of the effect as an odds ratio. Bars extending from either side of each point indicate 95% confidence intervals. Asterisks above each label indicate significance level: "***" = p < 0.001; "**" = p < 0.01; "*" = p < 0.05; otherwise, p > 0.05. Some confidence intervals are cropped; a table detailing full effects is included in S9 Table. Code to reproduce this figure can be found on the linked Github repository at the path figures/regression analysis/regression analysis interaction.rmd.

21/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under We identified inequities in peer review outcomes at eLife, based on the gender and country of 551 affiliation of the senior (last and corresponding) authors. Acceptance rates were higher for male 552 than female last authors. In addition, submissions from developed countries with high scientific 553 capacities tended to have higher success rates than others. These inequities in peer review 554 outcomes could be attributed, at least in part, to a favorable interaction between gatekeeper 555 and author demographics under the conditions of gender or country homogeny; we describe this 556 favoring as homophily, a preference based on shared characteristics. Gatekeepers were more 557 likely to recommend a manuscript for acceptance if they shared demographic characteristics 558 with the authors, demonstrating homophily. In particular, manuscripts with male (last or 559 corresponding) authors had a significantly higher chance of acceptance than female (last or 560 corresponding) authors when reviewed by an all male review team. Similarly, manuscripts 561 tended to be accepted more often when at least one of the reviewers was from the same country 562 as the corresponding author (for initial submissions) or the last author (for full submissions), 563 though there may be exceptions on a per-country basis (such as France and Canada). We 564 followed our univariate analysis with a regression analysis, and observed evidence that these 565 inequities persisted even when controlling for potentially confounding variables. The differential 566 outcomes on the basis of author-reviewer homogeny is consistent with the notion that peer 567 review at eLife is influenced by some form of bias-be it implicit bias [3,17], geographic or 568 linguistic bias [26,65,66], or cognitive particularism [40]. Specifically, homophilic interaction 569 suggests that peer review outcomes may sometimes be associated with factors other than the 570 intrinsic quality of a manuscript, such as the composition of the review team.

571
The opportunity for homophilous interactions is determined by the demographics of the 572 gatekeeper pool, and the demographics of the gatekeepers differed significantly from those of 573 the authors, even for last authors, who tend to be more senior [59][60][61][62]. The underrepresentation 574 of women at eLife mirrors global trends-women comprise a minority of total authorships, yet 575 constitute an even smaller proportion of gatekeepers across many domains [14,[67][68][69][70][71][72][73][74]. Similarly, 576 gatekeepers at eLife were less geographically diverse than their authorship, reflecting the general 577 underrepresentation of the "global south" in leadership positions of international journals [75]. 578 The demographics of the reviewer pool made certain authors more likely to benefit from 579 homophily in the review process than others. Male lead authors had a nearly 50 percent chance 580 of being reviewed by a homophilous (all-male), rather than a mixed-gender team. In contrast, 581 because all-female reviewer panels were so rare (accounting for only 81 of 6,509 full submission 582 decisions), female authors were highly unlikely to benefit from homophily in the review process. 583 Similarly, U.S. authors were much more likely than not (see S5 Table) to be reviewed by a panel 584 with at least one reviewer from their country. However, the opposite was true for authors from 585 other countries. Fewer opportunities for such homophily may result in a disadvantage for 586 scientists from smaller and less scientifically prolific countries.

587
Increasing representation of women and scientists from a more diverse set of nations among 588 eLife's editor may lead to more diverse pool of peer reviewers and reviewing editors and a more 589 equitable peer review process. Editors often invite peer reviewers from their own professional 590 networks, networks that likely reflect the characteristics of the editor [76][77][78]; this can lead to 591 editors, who tend to be men [14,[67][68][69][70][71][72][73][74] and from scientifically advanced countries [75] to invite 592 peer reviewers who are demographically similar to themselves [44,79,80], inadvertently 593 excluding certain groups from the gatekeeping process. Accordingly, we found that male 594 Reviewing Editors at eLife were less likely to create mixed-gender teams of gatekeepers than 595 female Reviewing Editors (see S8 Fig). We observed a similar effect based on the country of 596 affiliation of the Reviewing Editor and invited peer reviewers (see S9 Fig). Moreover, in S11 597 Table we conducted a regression analysis considering only the gender of the Reviewing Editor, 598 rather than the composition of the reviewer team; we found similar homophilous relationships as 599 in Fig 7, suggesting the importance of the reviewing editor to the peer review process at eLife. 600 The size of disparities we observed in peer review outcomes may seem modest; however these 601 small disparities accumulate through each stage of the review process (initial submission, full 602 submission, revisions). These cumulative effects yield an overall acceptance rate (the rate at

613
Our finding that the gender of the last authors was associated with a significant difference in 614 the rate at which full submissions were accepted at eLife stands in contrast with a number of 615 previous studies of journal peer review that reported no significant difference in outcomes of 616 papers submitted by male and female authors [85][86][87], or differences in reviewer's evaluations 617 based on the author's apparent gender [88]. This discrepancy may be explained in part by 618 eLife's unique context, policies, or the relative selectivity of eLife compared to journals where 619 previous studies found gender equity. In addition, our results point to a key feature of study 620 design that may account for some of the differences across studies: the consideration of multiple 621 authorship roles. This is especially important for the life sciences, for which authorship order is 622 strongly associated with contribution [61,62,89]. Whereas our study examined the gender of 623 the first, last, and corresponding authors, most previous studies have focused on the gender of 624 the first author (e.g., [2,90]) or of the corresponding author (e.g., [22,91]). Consistent with 625 previous studies, we observed no strong relationship between first author gender and review 626 outcomes at eLife. Only when considering lead authorship roles-last authorship, and to a 627 lesser extent, corresponding author, did we observe such an effect. Our results may be better 628 compared with studies of grant peer review, where leadership roles are more explicitly defined, 629 and many studies have identified significant disparities in outcomes favoring men [18,[92][93][94][95], 630 although many other studies have found no evidence of gender disparity [21,23,24,[96][97][98].

631
Given that science has grown increasingly collaborative and that average authorship per paper 632 has expanded [99,100], future studies of disparities would benefit from explicitly accounting for 633 multiple authorship roles and signaling among various leadership positions on the 634 byline [59,101]. 635 The relationship we found between the gender and country of affiliation of gatekeepers and 636 peer review outcomes also stands in contrast to the findings from a number of previous studies. 637 Studies of gatekeeper country of affiliation have found no difference in peer review outcomes 638 based on the country of affiliation or country of affiliation of the reviewer [104,106], though 639 there is little research on the correspondence between author and reviewer gender. One study 640

23/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint identified a homophilous relationship between female reviewers and female authors, [102]. 641 However, most previous analyses found only procedural differences based on the gender of the 642 gatekeeper [22,87,88,103] and identified no difference in outcomes based on the interaction of 643 author and gatekeeper gender in journal submissions [87,104,105] or grant review [23]. One 644 past study examined the interaction between U.S. and non-U.S. authors and gatekeepers, but 645 found an effect opposite to what we observed, such that U.S. reviewers tended to rate 646 submissions of U.S. authors more harshly than those of non-U.S. authors [43]. Our results also 647 contrast with the study most similar to our own, which found no evidence of bias related to 648 gender, and only modest evidence of bias related to geographic region [2]. These discrepancies 649 may result from our analysis of multiple author roles rather than considering only the 650 characteristics of the first author. Alternatively, they may result from the unique nature of 651 eLife's consultative peer review; the direct communication between peer reviewers compared to 652 traditional peer review may render the social characteristics of reviewers more influential.

654
There are limitations of our methodology that must be considered. First, we have no objective 655 measure of the intrinsic quality of manuscripts. Therefore, it is not clear which review condition 656 (homophilic or non-homophilic) more closely approximates the ideal of merit-based peer review 657 outcomes. Second, measuring the relationship between reviewer and author demographics on 658 peer review outcomes cannot readily detect biases that are shared by all reviewers/gatekeepers 659 (e.g., if all reviewers, regardless of gender, favored manuscripts from male authors); hence, our 660 approach could underestimate the influence of bias. Third, our analysis is observational, so we 661 cannot establish causal relationships between success rates and authors or gatekeeper  Along these lines, the reliance on statistical tests with arbitrary significance thresholds may 665 provide misleading results (see [107]), or obfuscate statistically weak but potentially important 666 relationships. Fourth, our gender-assignment algorithm is only a proxy for author gender and 667 varies in reliability by continent.

668
Further studies will be required to determine the extent to which the effects we observed 669 generalize to other peer review contexts. Specific policies at eLife, such as their consultative 670 peer review process, may contribute to the effects we observed. Other characteristics of eLife 671 may also be relevant, including its level of prestige [13], and its disciplinary specialization in the 672 biological sciences, whose culture may differ from other scientific and academic disciplines. It is 673 necessary to determine the extent to which the findings here are particularistic or generalizable; 674 it may also be useful in identifying explanatory models. Future work is necessary to confirm 675 and expand upon our findings, assess the extent to which they can be generalized, establish 676 causal relationships, and mitigate the effects of these methodological limitations. To aid in this 677 effort, we have made as much as possible of the data and analysis publicly available at: publishing [47,50,[108][109][110][111], which can affect the quantity and perceived quality of submitted 682

24/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint manuscripts. However, these structural factors do not readily account for the observed effect of 683 gatekeeper-author demographic homogeny associated with peer review outcomes at eLife; 684 rather, relationships between the personal characteristics of the authors and gatekeepers are 685 likely to play some role in peer review outcomes. 686 Our results suggest that it is not only the form of peer review that matters, but also the 687 composition of reviewers. Homophilous preferences in evaluation are a potential mechanism 688 underpinning the Matthew Effect [1] in academia. This effect entrenches privileged groups 689 while potentially limiting diversity, which could hinder scientific advances, since diversity may 690 lead to better working groups [112] and promote high-quality science [113,114]. Increasing 691 gender and international representation among scientific gatekeepers may improve fairness and 692 equity in peer review outcomes and accelerate scientific progress. However, this must be 693 carefully balanced to avoid overburdening scholars from minority groups with disproportionate 694 service obligations.

695
Although some journals and publishers, such as eLife and Frontiers Media, have begun 696 providing peer review data to researchers (see [44,115]), data on equity in peer review outcomes 697 is currently available only for a small fraction of journals and funders. While many journals 698 collect these data internally, they are not usually standardized or shared publicly. One group, 699 PEERE, authored a protocol for open sharing of peer review data [116,117], though this 700 protocol is recent, and the extent to which it will be adopted remains uncertain. Watchdog available. These data include, but would not be limited to, characteristics such as gender, race, 705 sexual orientation, seniority, and institution and country of affiliation. It is likely that privacy 706 concerns and issues relating to confidentiality will limit the full availability of the data; but 707 analyses that are sensitive to the vulnerabilities of smaller populations should be conducted and 708 made available as benchmarking data. As these data become increasingly available, systematic 709 reviews can be useful in identifying general patterns across disciplines and countries.

710
Some high-profile journals have experimented with implementing double-blind peer review as 711 a potential solution to inequities in publishing, including Nature [118] and eNeuro [12], though 712 in some cases with low uptake [119]. Our findings of homophilic effects may suggest that 713 single-blind review is not the optimal form of peer review; however, our study did not directly 714 test whether homophily persists in the case of double blind review. If homophily is removed in 715 double-blind review, it would reinforce the interpretation of bias; if it is maintained, it would 716 suggest other underlying attributes of the manuscript that may be contributing to homophilic 717 effects. Double-blind peer review is viewed positively by the scientific community [120,121], and 718 some studies have found evidence that double-blind review mitigates inequities that favor 719 famous authors, elite institutions [85,122,123], and those from high-income and 720 English-speaking nations [28].

721
There may be a tension, however, in attempting to further double blind peer review while 722 other aspects of the scientific system become more open. More than 20 percent of eLife papers 723 that go out for review, for example, are already available as preprints, which complicates the more open peer review did not compromise the integrity or logistics of the process, so long as 729 reviewers could maintain anonymity [124]. 730 Other alternatives to traditional peer review have also been proposed, including study 731 pre-registration, consultative peer review, and hybrid processes (eg: [58,[125][126][127][128][129]), as well as 732 alternative forms of dissemination, such as preprint servers (e.g., arXiv, bioRxiv) which have in 733 recent years grown increasingly popular [130]. Currently, there is little empirical evidence to 734 determine whether these formats constitute more equitable alternatives [3]. In addition, some 735 journals are analyzing the demographics of their published authorship and editorial staff in 736 order to identify key problem areas, focus initiatives, and track progress in achieving diversity 737 goals [14,79,86]. More work should be done to study and understand the issues facing peer 738 review and scientific gatekeeping in all its forms and to promote fair, efficient, and meritocratic 739 scientific cultures and practices. Editorial bodies should craft policies and implement practices 740 to mitigate disparities in peer review; they should also continue to be innovative and reflective 741 about their practices to ensure that papers are accepted on scientific merit, rather than 742 particularistic characteristics of the authors. 743

26/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint Supporting information 744 S1 Text Modelling homogeny using main effects with interaction term. We used 745 logistic regression to model the degree to which gender equity in peer review outcomes differed 746 based on the composition of the reviewer team in order to verify the inequity observed in Fig 6. 747  Fig 7.A demonstrates that last author gender inequity persisted even when controlling for the 748 gender composition of the reviewer team, but did not address the degree to which this equity 749 manifests in submissions reviewed by all-male vs. mixed-gender reviewer teams. Given that 750 there is no established method of addressing this question, we considered several approaches. 751 The first approach modelled the interaction between last author gender and the 752 gender-composition of the reviewer team (see S9 Table, column 2), however this approach 753 proved difficult to interpret: adding the interaction term appeared to suppress the main effects 754 of last author gender and reviewer team composition observed in Fig 7.A, though the 755 corresponding ANOVA table demonstrated these effects to still account for a significant amount 756 of deviance (see S11 Table). There were no significant interaction term, conflicting with F r a n c e U n i t e d S t a t e s G e r m a n y U n i t e d K i n g d o m S w i t z e r l a n d C a n a d a J a p a n C h i n a

Country of last author
Average number of revisions  Proportion of initial submissions, encourage rate, overall acceptance rate, and acceptance rate 798 of full submissions by the gender of the corresponding author, first author, and last author.

799
Gender data is unavailable for first and last authors of initial submissions that were never 800 submitted as full submissions, therefore these cells remain blank. Authors whose gender is . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under  Table. Model coefficients of initial submissions-author characteristics: Odds 889 ratio, associated confidence intervals, and model diagnostics for logistic regression model using 890 the encouragement of initial submission as a response variable. Predictor variables include 891 control variables of the submission year and type, and variables capturing author characteristics. 892 For continent of affiliation, "North America" was used as the reference level. For submission 893 type, "RA" (research article) was used as the reference level; the submission type "SR" means 894 "Short Reports", and "TR" means "Tools and Resources". This table contains the same values 895 as visualized in Fig 5.A. 896

42/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under variables of the submission year and type, and variables capturing author characteristics. For 900 continent of affiliation, "North America" was used as the reference level. For submission type, 901 "RA" (research article) was used as the reference level; the submission type "SR" means "Short 902 Reports", and "TR" means "Tools and Resources". This table contains the same values as 903 visualized in Fig 5. year, submission type, last author institutional prestige, and the gender of the first author. 916 Other predictor variables include the gender of the last author, continent of affiliation of the 917 last author, gender-composition of the reviewers, the last author-reviewers geographic distance, 918 and variables attempting to capture the gender equity by reviewer-team composition group.

919
Five models are presented: the first (Main Effects) shows only the main effects for the model 920 including all full submissions without any additional manipulation or variables (1); the second 921 model (2, Standard Interaction) models the main effects as well as an interaction term between 922 last author gender and the gender composition of the reviewer team (an ANOVA table for this 923 model has been provided in S11 Table; the next two models were separately trained on only 924 submissions reviewed by all-male reviewer teams (3) and only submission trained on 925 mixed-gender reviewer teams (4), respectively; the last model (5) models gender equity between 926 reviewer-composition groups using a new variable with all combinations of author and reviewer 927 gender (see Fig 7). Columns (1) and (5) contain the same values as Fig 7A and Fig 7.B, 928 respectively. For continent of affiliation, "North America" was used as the reference level. For 929 submission type, "RA" (research article) was used as the reference level; the submission type 930 "SR" means "Short Reports", and "TR" means "Tools and Resources". For the combination 931 variable of last author gender and reviewer team composition, we held "last author female, all 932 rev. male" as the reference level. Missing cells indicates that the corresponding variable was not 933 part of that model. 934

46/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Notes: * P < .05 * * P < .01 * * * P < .001

47/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint S10 . CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint S11 Table. ANOVA table for

49/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint S11 Table. Model coefficients of full submissions-author characteristics and 942 reviewing-editor only homogeny: Odds ratio, associated confidence intervals, and model 943 diagnostics for logistic regression model using the encouragement of full submission as a 944 response variable. Predictor variables include control variables of the submission year and type, 945 and variables capturing author characteristics and homogeny between the author and reviewing 946 editor only. For continent of affiliation, "North America" was used as the reference level. For 947 submission type, "RA" (research article) was used as the reference level; the submission type 948 "SR" means "Short Reports", and "TR" means "Tools and Resources". This regression models 949 gender equity between reviewer composition groups using a new variable containing all 950 combinations of last author gender and reviewer team composition; for this new categorical 951 variable, we used "last author female -female rev. editor" as the reference level. 952

50/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint Notes: * P < .05 * * P < .01 * * * P < .001

51/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019.

53/61
. CC-BY 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted August 4, 2019. ; https://doi.org/10.1101/400515 doi: bioRxiv preprint