PhD and postdoc training outcomes at EMBL: changing career paths for life scientists in Europe

The life sciences are training growing numbers of PhDs and postdocs, who increasingly engage in collaborative research. The impact of these changes on the careers of researchers is, however, unclear. Here, we report an analysis of the training outcomes for 2284 researchers who completed a PhD or postdoc at the European Molecular Biology Laboratory (EMBL) between 1997 and 2020. This is the first such study published from a European institute and first time-resolved analysis globally. The most common career outcomes were in academic research, service and teaching (1263 alumni, 55%), including 636 principal investigators (PIs). A broad spectrum of other career paths was also represented, including in industry research (332, 15%) and science-related professions (349, 15%). Our analysis indicates that, although there is increased competition for PI roles, life scientists continue to enter and excel in careers that drive research and innovation.

The career landscape in the life sciences has changed dramatically in the last decades for PhD students 32 and postdocs (collectively referred to in this study as early career researchers (ECRs)). For example, 33 in the US, the percentage of life science doctoral graduates holding a tenure track position 3 to 5 years 34 after graduation more than halved from 18.1% to 8.1% between 1997 and 2015 (1). The principal 35 cause appears to be that the numbers of PhD students and postdocs trained have increased relative to 36 the number of faculty positions (2)(3)(4)(5)(6). This trend may stem from economic constraints US institutions 37 faced due to the plateauing of the NIH budget in 2003 after a period of budget increases (7,8). In 38 Europe, the number of doctorates awarded has also increased faster than number of academic staff, 39 with doctorates increasing by 44% from 82 416 in 2004 to 118 375 in 2012 (9), compared to an 40 increase of only about 9% for full-time equivalent academic staff in the tertiary education sector (10). 41 Economic factors include the stagnation in growth in private, philanthropic and public funding of 42 research and university teaching following the 2008 financial crash (11,12). Additionally, in both the 43 US and Europe, an increased proportion of funding is now allocated by project-based competitions 44 rather than institutional funding (13)(14)(15). 45 46 Nevertheless, despite low percentages obtaining these positions, majorities of graduate students 47 starting PhDs have reported aiming for an academic career, with research-intensive principal 48 investigator (PI) roles particularly sought after (16)(17)(18)(19)(20). Several surveys have reported high levels of 49 concern from ECRs around career progression, with some linking this to high rates of mental health 50 problems within academia (21)(22)(23)(24)(25)(26)(27)(28). Clarity around outcomes is important to help address these 51 concerns and ensure that ECRs can make informed decisions about their career development. Data on 52 training outcomes are also essential for policymakers planning funding and training programs that 53 meet the needs of science, scientists and society as a whole. 54 55 In response to calls for increased transparency (29,30) some North American institutions have 1 published analysis of PhD or postdoc career outcomes (31)(32)(33). Via the Coalition for Next Generation 2 Life Science, many additional institutions have also committed to releasing career destinations data 3 and other training outcomes, for example, completion rate and time to degree for PhD students (34). 4 Similarly, in Europe, many universities collect limited information on the initial or current sector of 5 employment (35); a small number of reports have also been released by funders or national agencies, 6 some with analysis of trends in broad sectors of employment (36)(37)(38). These reports represent an 7 important step towards transparency, but no detailed time-resolved analysis has yet been published 8 from a university or research institute. 9 10 In addition to a changing job market, other changes in research culture may also impact the training 11 of ECRs. For example, there has been an increase in international, interdisciplinary and team-based 12 science (39), with more authors per paper and greater amounts of data included in individual 13 publications (40,41). 14 15 To investigate the effect of these trends on life science training outcomes in Europe, we performed a 16 time This study is, to our knowledge, the first analysis of training outcomes data published by a European 29 research institute. Furthermore, our time-resolved data analysis enables a better understanding of how 30 career outcomes have been changing with time, and how this may extrapolate to the future job market 31 for current ECRs. Finally, our study also links career outcomes to other data, including publication 32 record, to better understand training outcomes. 33 34 35 Results

36
Data collection on career and publication outcomes was initially carried out in 2017 and updated in 37 2021. 38 39 Using manual Google searches, we located publicly available information identifying the current role 40 of 89% (2035/2284) of the ECRs in the study ( Figure 1A). EMBL alumni contribute to research & innovation in academic and non-academic roles 51 The majority of alumni (55%) were found to be working in academic research, scientific services and 52 teaching in 2021 (Table 1). An appreciable number of alumni were found employed in industry 53 research (15%) and in science-related roles (15%) not directly carrying out or supervising research, 1 but still closely linked to research, innovation or science education -for example, science 2 communication and patent law. Only 4% were found in professions not closely linked to science. A 3 more detailed summary of the types of role, aligned to a published taxonomy (42), is provided in 4 Supplementary Tables 2-3 Figure 1C). Additionally, only 71.6% of those who entered a research 26 position in industry remained in this type of role long-term; 9.9% returned to academia and 18.6% 27 transitioned to non-research or non-science-related professions. 28 29 Gender differences in career outcomes 30 Many studies have reported that female ECRs are less likely to remain in an academic career, (43,44). 31 Consistent with these previous studies, we observed differences in PI career outcome by gender for 32 both PhD and postdoc alumni. Female alumni were found less frequently in PI roles and more 33 frequently in non-research science-related positions than their male colleagues ( Figure 1B). 34 Time-resolved analysis of career outcomes confirms a changing career landscape 3 For further analysis, alumni were split into three 8-year cohorts. Some demographic differences were 4 seen between cohorts including, at the PhD-level, a higher percentage of female ECRs for more recent 5 cohorts. More recent cohorts were also larger (  When comparing career outcomes of those with detailed career paths, differences in career outcomes 10 by cohort at equivalent time-points after EMBL are evident (Supplementary Figure 2). For example, 11 the proportion of alumni in PI roles 5-years after EMBL was markedly lower for PhD students and  Detailed career path indicates alumni for whom we were able to reconstruct a career path from EMBL to their current role with a maximum of two 1-year CV gaps, current position only indicates that we were not able to construct a complete career path but were able to identify their current position, and position unknown indicates that we were not able to identify their current role.   The percentage of EMBL alumni becoming PIs is similar to data released by North American 1 institutions for both older and more recent cohorts 2 To assess whether the changes in career outcomes are EMBL-specific, we compared our data with 3 cohort-based data released by other institutions. We note that there are several limitations in 4 comparing absolute career outcomes between individual studies. Most critically, individual studies 5 often use different data collection methods and role classifications. Furthermore, career outcomes are 6 influenced by the broader scientific ecosystem and the subject focuses of the institution's departments, 7 which may attract ECRs with dissimilar career motivations. Nevertheless, comparing the outcomes 8 with multiple institutions allows us to interrogate whether the changes we observe for the most 9 frequent, well-defined and linear career path, the PhD-postdoc-PI track, reflect a general trend. 10 For example, EMBL has strong scientific overlap with Stanford University bioscience departments. 11 Stanford has reported that 34% of institutes have a similar proportion of alumni entering PI roles for comparable cohorts. This is 17 consistent with our hypothesis that the differences between cohorts are not EMBL-specific, and reflect 18 a wide-spread change in the number of PhDs and postdocs relative to available PI positions. 19 We did not analyse the data for non-PI roles, as smaller numbers of individuals entering these roles 20 make it difficult to identify real trends. Additionally, for postdoc career outcomes, only a small number 21 of institutions have released detailed data on the destinations of recent alumni and we are not aware 22 of any long-term cohort-based data. 23

While most EMBL ECRs remain in science-related professions, the rate at which they become 24
PIs has decreased with time 25 To estimate the probability of alumni from different cohorts entering each type of role each year after 26 completing an EMBL PhD or postdoc, we used a statistical regression method, the Cox proportional 27 hazards model. This model is commonly used to model time-to-event distributions from observational 28 data with censoring (i.e., when not all study subjects are monitored until the event occurs, or the event 29 never occurs for some of the subjects). In brief, we fitted the data to a univariate Cox proportional 30 hazards model to calculate hazard ratios, which represent the relative chance of the event considered 31 (here: entering a specific type of role) occurring in each cohort with respect to the oldest cohort. We 32 also calculated Kaplan-Meier estimators, which estimate the probability of the event (entering a 33 specific type of role) at different time points. 34 35 For entry of both PhD and postdoc alumni into PI roles, we observe hazard ratios of less than one in 36 the Cox models when comparing the newer cohorts with the oldest cohort (Supplementary Table 5), 37 indicating that the chances of becoming a PI have become lower for the newer cohorts. The Kaplan-38 Meier curves ( Figure 2C remained small in comparison to other types of roles. 4

A small increase in time between PhD conferral year and first PI position is observed 5
The increased percentage of ECRs entering non-PI careers may fully explain the change in career 6 destinations we observe. However, an increase in total postdoc length could also contribute to the 7 reduced percentage of PIs we observe when comparing career destinations at specific time points after 8 an ECR's PhD defence. We therefore asked whether an increased duration between PhD and first PI 9 role was observed in our data. In order to fairly compare alumni from different cohorts, we included 10 only alumni for whom we had a detailed career path, who had defended their PhD at least nine-years 11 ago and who had become a PI within nine years of defending their PhD. The nine-year cut-off was  Figure 4C). This may contribute to the reduced rate at which EMBL ECRs 24 become PIs in the first years after EMBL, but is too small to explain the large differences in career 25 trajectories more than 10-years after EMBL, when the number of alumni becoming PIs begins to 26 plateau. 27 28 Publication factors are highly predictive of early entry into a PI position 29 Publication metrics have been linked to the likelihood of obtaining (46,47) and succeeding (48) in a 30 faculty position. In this study, alumni who became a PI had more favourable publication metrics from 31 their EMBL work -for example, they published more papers, and papers that had a higher 'Category 32 Normalized Citation Impact' (indicating higher numbers of citations compared to other publications 33 in the same field and year; a CNCI of one indicates performance expected by the average paper in that 34 field and year) ( Figure 3A-B, Supplementary Table 6). 35 36 To understand the potential contribution of an ECR's publication record as well as other factors such 37 as cohort, gender, nationality, and seniority of the supervising PI, we fitted multivariate Cox models 38 to the data including these factors as potential predictors for entry into each type of role. We included 39 a range of metrics in our analysis -including journal impact factor, which has been shown to 40 statistically correlate with becoming a PI in some studies (46) and continues to be used by some 41 institutions in research evaluation (49). We, however, wish to note that EMBL is a signatory of the 42 San Francisco Declaration on Research Assessment (DORA), and does not condone the use of journal 43 impact factors for the evaluation of scientists' work, nor use them in its hiring or evaluation decisions. 44 45 46  observed and predicted temporal order of the outcome (for example, becoming a PI). A C-index of 1 23 indicates complete concordance between observed and predicted order, and 0.5 is the baseline that 24 would be achieved by guessing alone. Prediction is clearly limited by the fact that we could not 25 explicitly encode some covariates that are certain to play an important role in career outcomes, such 26 as career preferences and relevant skills. Nevertheless, the C-index for models containing all data were 27 between 0.61 (entry to Industry Research) and 0.70 (entry into PI roles). 28 29 Consistent with previous studies, we found that statistics related to the ECR's own publications were 30 highly predictive for entry into a PI role: a model containing only the publication statistics performs 31 almost as well as the complete model, reaching a C-index of 0.69 ( Figure 3C). Consistent with this, 32 univariate Cox models also suggest that -for example -postdocs with one or more first-author 1 publications from their EMBL work are 3.2 or 6.6 times more likely to become PIs than their 2 colleagues with no first-author publication ( Figure 3D, Supplementary Table 7). Group publications 3 (the aggregated publication statistics for all ECRs who were trained within the same research group), 4 cohort/time, gender and type of alumnus (PhD/postdoc), were also predictive for entry into a PI role, 5 with C-indexes of 0.61, 0.59, 0.57 and 0.55, respectively. Models containing only nationality or group 6 leader seniority were not predictive.     Table 7). Alumni who 23 published more first author articles were, however, less likely to be found in non-research and non-1 science roles (Supplementary Figure 5, Supplementary Table 7). Average publication factors for those 2 who enter each type of non-PI role are shown in Supplementary Tables 8-11.  3  4 For academic research, service and teaching positions, the factors that were most predictive were those 5 related to the publications of all ECRs in the research group the fellow was trained in (Figure 4A). It 6 is unclear why this might be, but we speculate that this could reflect publication characteristics specific 7 to certain fields that have a high number of staff positions, or other factors such as the scientific 8 reputation, breadth or collaborative nature of the research group and its supervisor. The group's 9 publications were also predictive for other research and science-related career areas. 10 11 Time-related factors (i.e. cohort, PhD award year and EMBL contract start/ end years) were the 12 strongest prediction factors for non-academic career categories (industry research, non-research 13 science-related and non-science related professions, Figure 4B-D). 14 15 More collaborative publications 16 Previous reports suggest that more recent biology publications have increased amounts of data and 17 more authors (40,50); a corresponding decrease in the number of first author publications per PhD 18 student has also been reported. For research articles linked to ECRs in this study, the average number 19 of authors per research article has more than doubled between 1995 and 2020 ( Figure 5A). No 20 statistically significant difference was found for the number of research articles per ECR between 21 cohorts ( Figure 5B, average 3.6 per ECR); however, there was a difference in the number of first 22 author research articles between cohorts. ECRs from the more recent cohorts published fewer first-23 author publications ( Figure 5C, Supplementary Table 12). 24 25 Data from Clarivate Analytics InCites databases suggest that increasing collaboration may contribute 26 to the increasing number of authors: 79% of EMBL's publications in 2020 involved international 27 collaboration, compared to 47% in 1995 (51). More recent research articles with ECR authors from 28 EMBL ( Figure 5D) also had higher 'Category Normalised Citation Impact' values. research articles between 1995-2020 that were assigned to EMBL in the Web of Science InCites database and had at least 10 one author who was included in this study (n=5413). 11 12

1
Our time-resolved analysis of career outcomes indicates that, overall, ECRs in the life sciences have 2 strong job prospects. The career landscape for ECRs has been changing over the last 25 years. 3 Nevertheless, the vast majority of former ECRs continue to pursue careers that contribute to the 4 research and innovation landscape, many in managerial and leadership positions. 5 6 Limitations of our study include that its retrospective, observational design limits our ability to 7 disentangle causation from correlation. The changes in career outcomes may, for example, also be 8 influenced by a greater availability or awareness of non-PI career options. EMBL has held an annual 9 career day highlighting non-academic career options since 2006, and many ECRs actively decide for 10 a career in the private sector, attracted by perceptions of higher pay, more stable contracts, faster 11 impact, and/or better work-life balance. ECRs driven by an interest in specific technologies may also 12 consider roles in research infrastructures and scientific services to be as or more attractive than PI 13 roles. Additionally, we cannot exclude that other factors may also affect the differences we see 14 between cohorts. For example, variations in the number of alumni entering academic roles in countries 15 that offer later scientific independence may have a small effect on outcome data. 16 17 To overcome the limitations of this study and better investigate multifactorial and complex issues such 18 as the gender differences in career outcomes, future studies should include large-scale mixed-method 19 longitudinal studies that better record pre-existing career motivations, skills development, and other 20 factors. Multi-institutional studies using consistent data collection methods and a robust classification 21 of roles would also provide a fuller picture of workforce trends. 22 23 Addressing ECR career challenges 24 ECRs make important contributions to research and develop skills that continue to make them highly 25 employable in academia, industry and other sectors. Nevertheless, some challenges are associated with 26 the diversified career landscape and increase in large-scale projects. 27 28 Many ECRs are employed for long time-periods on short-term contracts funded by project-based 29 grants (49) and surveys suggest that ECRs are concerned about career progression (24)(25)(26)(27)(28). This may 30 influence high levels of poor mental health amongst ECRs (21,23). To help alleviate current ECRs' 31 career concerns and ensure that they can adapt to any further changes in the career landscape -for 32 example, as the result of economic impacts from the corona virus pandemic (52) -it is essential that 33 ECRs are provided with opportunities to reflect on their strengths, understand the wide variety of 34 valuable career options available to them, and develop skills that advance their research projects and 35 employability in their preferred career areas. Efficient and effective support requires input from 36 different stakeholders -including institutions, supervisors, funders, policymakers, and employers of 37 former ECRs -and the engagement of ECRs themselves. At EMBL, a career service was launched in 38 2019 for all EMBL PhD students and postdocs, building on a successful EC-funded pilot project. In 39 addition to ensuring that career development support is available for all ECRs, policymakers may need 40 to reassess the sustainably of academic career paths, and review the proportion of funding allocated to 41 project-based grants compared to mechanisms that can support PI and non-PI positions with longer-42 term stability. Providing career development support and more career stability will also support 43 equality, diversity and inclusion in science. 44 45 Publication factors are highly predictive of entry into PI careers. For ECRs aiming for a career as a PI, 46 one challenge is balancing the quantity and subjective quality of publications. The observed trend to 47 fewer first author papers likely reflects a global trend towards papers that have more authors, including 48 more international collaborators, as well as EMBL's focus on ambitious interdisciplinary approaches. 49 Ambitious interdisciplinary projects provide unrivalled opportunities to develop high-performance 50 behaviours including team-work, leadership and creativity. They also allow researchers to tackle 51 challenging biological questions from new angles and deliver publications that significantly advance 52 the field; such publications are seen very positively by academic hiring committees (53)(54)(55). However, 53 more complex projects require coordinated input from a large group of co-authors, and may have 1 longer project timelines. Supervisors and ECRs should discuss the potential impact and challenges 2 associated with an ECR's project, and what can be done to reduce risks. 3 4 Research assessment and availability of funding plays an important role in determining funding and 5 career prospects of an academic. Therefore, it is also vital that factors that may affect apparent 6 productivity of ECRs, such as involvement in large-scale projects, career breaks, or time spent on 7 teaching and service, are considered in research assessment. The impact of the coronavirus pandemic 8 on research productivity of researchers with caregiving responsibilities makes these actions imperative 9 (56)(57)(58). Initiatives such as the San Francisco Declaration on Research Assessment (DORA) have been 10 advocating for an increased focus on good practice in research assessment, and many funders have 11 reviewed their practices. Cancer Research UK, for example, now asks applicants to its grants to 12 describe three to five research achievements, which can include non-publication outputs (59). Other 13 initiatives include more transparent author contribution information in publications (60, 61) and 14 promotion of "FAIR" principles of data management (62). We also expect the increasing use of 15 preprints (40,63) to have a positive effect on the careers of ECRs involved in ambitious projects with 16 longer publication time-scales. 17 18 Conclusions 19 Our data highlight the many ways that early career researchers contribute to the research and 20 innovation landscape, and suggest that early career life scientists continue to enter leadership roles in 21 academia, industry and other sectors. ECR training therefore continues to be a valuable investment 22 that creates a highly qualified workforce with strong job prospects. 23 24 Nevertheless, a number of challenges exist for ECRs in their careers, particularly increasing 25 competition for mid-career research leadership roles in academia. These challenges may also increase 26 in the coming years due to the impacts of the coronavirus pandemic. Adequate support and policies to 27 address these challenges are therefore urgently needed. Continued monitoring of career outcomes will 28 be essential for policymakers deciding how best to adapt funding and training programmes to support 29 sustainable career paths in the life sciences, and to enable ECRs to make informed decisions about 30 their own career development. 31 32 Methods 33

Data collection & analysis
The study includes individuals who graduated from the EMBL 34 International PhD Programme (n=969) between 1997 and 2020 or who left the EMBL postdoc 35 programme between 1997 and 2020 after spending at least one year as an EMBL postdoctoral fellow 36 (n=1315). Each person is included only once in the study: where a PhD student remained at EMBL 37 for a bridging or longer postdoc, they were included as PhD alumni only, with the postdoc position 38 listed as a career outcome. 39 For each alumnus or alumna, we retrieved demographic information from our internal records and 40 identified publicly available information about each person's career path (see supplementary methods). 41 Where possible, this information was used to reconstruct a detailed career path for each ECR. An 42 individual was classified as having a "detailed career path" if an online CV or biosketch was found 43 that accounted for their time since EMBL excluding a maximum of two 1-calendar-year career breaks 44 (which may, for example, reflect undisclosed sabbaticals or parental leave). Each position was 45 classified using a detailed taxonomy, based on a published schema (42), and given a broad overall 46 classification (see Supplementary Methods). The country of the position was also recorded. For the 47 most recent position, we noted whether the job title was indicative of a senior or management level 48 role (included "VP", "chief","cso","cto";"ceo";"head;"principal";"president";"manager"; "leader"; 49 "senior"), or if they appeared to be running a scientific service or core facility in academia. 50 We use calendar years for all outcome data -for example, for an ECR who left EMBL in 2012, the 1 position one calendar year after EMBL would be the position held in 2013. If multiple positions were 2 held in that year, we take the most recent position. We use calendar years, as the available online 3 information often only provides the start and end year of a position (rather than exact date). 4 An EMBL publication record was also reconstituted for each person in the study. Each of their 5 publications linked to EMBL in Clarivate Analytics' Web of Science and InCites databases in June 6 2021 were recorded. The data included publication year and -for those indexed in InCites-crude 7 metrics, such as category normalised citation impact and percentile in subject area (measures of 8 citations) as well as journal impact factor. EMBL publications were assigned to individuals in the 9 study based on matching name and publication year (see Supplementary Methods for full description). 10 When an individual was the second author on a publication, we manually checked for declarations of 11 co-first authorship. Aggregate publication statistics for ECRs with the same primary supervisor were 12 also calculated. 13 The names and other demographic information that would allow easy identification of individuals in 14 the case of a data breach were pseudonymised. A file with key data for analysis and visualisation in R 15 was then generated. A description of this data table can be found in Supplementary Table 1, along  16 with summary statistics. 17 Statistical model A Cox proportional hazards regression model was fitted to the data in order to 18 predict time-to-event probabilities for each type of career outcome based on different covariates 19 including cohort, publication variables and gender. Multivariate Cox models were fitted using a ridge 20 penalty with penalty parameter chosen by 10-fold cross-validation. Harrell's C-index was calculated 21 for each fit in an outer cross-validation scheme for validation and analysis of different models, with 22 10-fold cross-validation. 23 Data availability, and data protection The data were collated for the provision of statistics, and are 24 stored in a manner compliant with EMBL's internal policy on data protection 25 (https://www.embl.org/documents/document/internal-policy-no-68-on-general-data-protection/). The 26 nature of the data precludes sufficient anonymisation to provide a public data-release. Summary 27 statistics for the data