Measuring effects of trainee professional development on research productivity: A cross-institutional meta-analysis

PhD-trained scientists are essential contributors to the workforce in diverse employment sectors that include academia, industry, government, and non-profit organizations. Hence, best practices for training the future biomedical workforce are of national concern. Complementing coursework and laboratory research training, many institutions now offer professional training that enables career exploration and develops a broad set of skills critical to various career paths. The National Institutes of Health funded academic institutions to design innovative programming to enable this professional development through a mechanism known as Broadening Experiences in Scientific Training (BEST). Programming at the BEST awardee institutions included career panels, skill-building workshops, job-searching workshops, site visits, and internships. An initial concern was since doctoral training is lengthy and requires focused attention on dissertation research, having students participate in additional complementary training activities might lengthen time to degree and hamper student research productivity. To address this concern, using time to degree and publication records as measures of efficiency and productivity, metrics were analyzed from ten BEST awardee institutions. Comparing doctoral students who participated to those who did not, results revealed that across these diverse academic institutions, there were no differences in time to degree or manuscript output. Furthermore, a few institutions even demonstrated a positive correlation between participation in career and professional development activities and productivity. Our findings suggest that doctoral students should be encouraged to participate in career and professional development opportunities to ensure their preparedness for a variety of diverse and important careers in the workforce. Significance Statement Our study is unique in that it compiled doctoral degree durations at ten different universities, recorded individual participation in career and professional development activities in terms of dosage, and tracked individual engagement in real-time rather than relying on surveys sent to trainees after graduation. Participation in career and professional development activities, including internships, did not decrease efficiency or productivity. Our findings suggest that doctoral students should be encouraged to participate in career and professional development opportunities to ensure their preparedness for a variety of diverse and important careers in the workforce.


Introduction
Scientific doctoral education provides technical and cognitive-skill training and enables students to establish a positive sense of personal identity while building professional networks (1). Importantly, doctoral training provides graduates with career value in the workforce as employers increasingly recognize that employees with PhDs have advanced knowledge and skills that can enhance the organization's productivity and reputation (1).
Three decades ago, one in three biomedical doctoral students could have expected to join the academic tenure track; however, employment trends have since shifted (2)(3)(4). Both the National Institutes of Health (NIH) and the National Science Foundation (NSF) estimate that the current percentage of PhD scientists in tenured or tenure-track positions fell to fewer than one in four (3,5,6). This relatively lower percentage of PhD scientists transitioning to tenure-track academic positions is ascribed to several factors. First, the number of doctoral students graduating in the biomedical sciences in the United States has steadily risen, almost quadrupling over the past fifty years (3,5,6). Second, the growth in employment of biomedical doctoral graduates during this same time period has occurred almost entirely in industrial sectors, with comparatively little growth in employment in academic and government jobs (5)(6)(7)(8). Third, graduates are preferentially choosing careers in research and research-related careers beyond academia, and this has only recently been widely recognized by the biomedical academic community (9).
Despite these concerns, many faculty recognize both the importance of career development to assist trainees and that their own knowledge in this area is lacking, such that supplemental programming is valuable (25). Moreover, initiatives that have promoted professional skills to complement scientific development have shown a benefit to graduate education and have not impacted time to degree or publication output, as highlighted by program evaluations (26)(27)(28)(29)(30). Initial data compiled from the baseline cohort of NIH BEST graduate trainees did not show a difference in average time in PhD programs over the first 3 years of data collection compared to average time before BEST implementation (8). To further test this hypothesis, a robust empirical comparison is needed to fully examine the effects of participation in professional development on time to degree and publications.
Hence, ten NIH BEST awardee institutions tested whether participation in career development activities affected time to degree as well as productivity (measured by published manuscripts) of doctoral students. BEST was an NIH grant program that funded seventeen institutions across the country to develop programming that could bridge the gap between research training and the job market, a transformative effort to catalyze career development change nationally (31). Our study is unique in that it compiled doctoral degree durations at ten different universities, recorded individual participation in career and professional development activities in terms of dosage, and tracked individual engagement in real-time rather than relying on surveys sent to trainees after graduation. Each of these ten BEST institutions developed distinctive program formats and structures. Data collected from these unique programs show that there was no difference in publication output or time to degree for doctoral students who participated, even quite actively, in career and professional development activities during their academic training.
Program Activities. Each BEST institution developed its own program to achieve proposed program-specific goals. Program activities ranged from single events to multi-part workshop series or coursework, as well as experiential learning activities, such as site visits, internships, and individual training sessions. One-off workshops were the most common activity each year for all of the programs (8). Institutions also deployed a wide range of activities differently, allowing trainees to participate through specific phases, by sector, by career interests, ad hoc, or some combination thereof. Most institutions included experiential learning opportunities with partners outside the university. Many programs offered opportunities at their university by partnering with various professional schools, core facilities, or support offices within their institution. Another focus was on incorporating mentorship and connecting trainees to alumni and professionals in broad areas of biomedical research. From these internal and external institutional connections, a majority of the BEST institutions allowed the possibility of internships, but it was not a requirement. The BEST institutions shared strategies, activities, and contacts among the BEST network of institutions during annual NIH BEST conferences, allowing programmatic offerings to evolve over time. A more complete description of the BEST institutions' programming can be found in Supplemental File 1 (S1).
Procedures. During the duration of BEST funding, institutions collected data about biomedical PhD trainee time to defense and level of participation in internships and BEST activities (e.g., career panels, skill-building workshops, job-searching workshops, site visits, internships). Data were submitted annually to NIH over a five-year period using common forms, standardized datacollection procedures, and compatible reporting methods to allow for cross-institutional comparison. Meetings to discuss evaluation of program design were held with all BEST consortium members, including a data summit to finalize common definitions and standardize BEST data collection methods (8; detailed collection methods including baseline data survey design and results included). Cross-institutional definitions for methods of instruction/delivery and agreements on common criteria for data were instrumental in developing data collection methods.
The most straightforward comparison between participants and non-participants in BEST career and professional development programming was measurement of binary outcome differences. Hence, this was the most reliable effect-size measure to use, and was employed for meta-analytic comparisons. For binary comparisons using a t-test, the no participation group (control) was compared with any participation (e.g., medium plus high participation groups), giving a sense of effect size.
We were also interested in identifying potential dose-response effects based on level of participation. As each institution offered different events with variable length and scope, each was asked to define low participation and high participation levels independently (S1). Most institutions split their low and high dosage populations based on the observed median dosage level. These definitions were established so that the three groups could be compared with ANOVA analysis, giving a sense of any dose-response effect. This additional level of analysis yielded a more nuanced ability to evaluate participation effects and query for potential negative effects on productivity when there were high levels of participation. Nonetheless to retain the clarity of the control vs participant populations, all cross-institutional analyses were based upon bivariate comparisons.
For all binary analysis, with one exception, control groups were defined as non-participants; the exception was one program that did not have a true control group and hence divided participation in BEST events into an approximation of a control group (0-1 points) and a medium/high dose, rather than the null, low, high dose used by the remaining institutions. For consistency, the comparison groups for ANOVA are referred to as control, low, and high (control* is used to denote the approximated control group). Post-hoc analysis shows no difference when this institution's data was excluded, hence we chose to include the data to be comprehensive.
Institutions also collected and reported publication outcomes. These data were independently gathered by each institution. Publication data were collected either by self-reported survey, manual PubMed queries or using the PubMed API using a Python script developed for this purpose and freely provided by Daniel Arneman and Joshua Hall (see Supplemental File 4 S4; 47). For those institutions that used the Python script, results were manually spot-checked for potential errors (S4), including overcounts for common names, legal name changes, nickname use, or advisor switching. In addition, extreme publication counts identified by the automated script (e.g., 0 publications or >5 publications) were manually rechecked by hand.
Analyses. Binary participant/non-participant comparisons were evaluated using independent sample t-tests, whereas dose-response tests were run using a one-way analysis of variance (ANOVA) with a three-level professional development dose variable (control, low, high). All comparisons were analyzed using Prism GraphPad (v8.4.0) software, which was also used to generate plots throughout the manuscript. All p values are reported to two significant figures.
The use of meta-analyses allows for extrapolation of an effect size and significance across different populations, multiple studies, or in our case different institutions and interventions. Hence, meta-analyses are preferred when comparing effects, especially when the variables of interest are measured differently across sites (e.g., hours, events, points). Although it is not uncommon to have fewer than optimal sample numbers (32), this situation is not ideal for use with meta-analyses. Therefore, meta-analyses were conducted only when a large enough set of institutions provided data (9-10 studies per meta-analysis). In some cases, not enough institutions were able to provide data to allow for meta-analysis (i.e., only a subset of institutions supporting internships). Meta-analyses were performed by entering effect sizes, p-values, and sample sizes for each institution's data on that variable into Jamovi (v1.2.16) to produce an overall analysis of whether there was a significant effect across the population.
Primary predictors included the amount of professional development participation (binary or control/low/high dosage). Primary outcome variables of interest included productivity as measured by time to degree and publications (total and first-author). Finally, all outcome measures were tested against internship participation, the highest dose of professional development implemented across sites for the subset of institutions able to provide this data.
Power calculations verified whether our sample sizes were sufficient to detect a small effect size across each type of meta-analysis (33,34). Post-hoc power analyses determined that >80% power was achieved for each meta-analysis, indicating that a sufficient number of subjects and studies were included. Meta-analytic power was calculated in accordance with recommendations by Harrer, Cuijpers, Furukawa, & Ebert (35). Meta analysis results and power calculations are grouped with other relevant analyses.

Participation & Efficiency: Time to Degree
Efficiency: Time to Degree versus Professional Development Participation As BEST programs were implemented at each institution, some in the biomedical training community questioned whether participation in professional development programming would increase time to degree. Here, we tested this hypothesis using binary measurements (participants versus non-participants), as well as using a dose-response effect to determine whether higher levels of participation affect time to degree. The t-tests were conducted for bivariate analyses, ANOVAs for multiple groups, and multiple comparisons were only conducted if warranted by a significant omnibus finding.
One institution showed a statistically significant shorter time to degree for participants using either the binary or dose/level of analysis; the remaining institutions showed no significant difference in time to degree for participants in the binary condition or when accounting for level of participation (Figure 1a, 1b) Using the measure of months to defense resulted in one additional institution (i.e., two total) showing that greater participation was associated with a statistically significant decrease in time to defense (SI Figure 2a). Overall, the data failed to support the hypothesis that participation in career and professional development at any level tested leads to a statistically significant increase in time in graduate training.

Meta-Analysis of Effects on Trainee Efficiency
A meta-analysis (Figure 2) was conducted to determine a weighted effect size and significance across all the institutions. This cross site meta-analysis (including 1700 trainees' participation data) showed no difference in time to degree between participants and non-participants, a pointestimate of -0.04, p=0.19, SE=0.03, z=-1.31, [CI95%= -0.09, 0.02], with effect sizes ranging from r 2 = 0.01 -0.04 (rs from -0.21 to +0.12). Power calculations suggest that with the average sample sizes and number of participating institutions' data available for this study (N1=78, N2=95; alpha=0.05, k=10, d=0. 20), and an observed low heterogeneity (I 2 = 24.72% < 25% cutoff for low heterogeneity; τ 2 =0.002, SEτ=0.004; Q=12.91, p=0.17) in a random effects model (35,36), we had nearly 90% power to detect a small effect size (89%). Given that our study was cross-institutional and well exceeded the acceptable rate of 80% power (with an alpha of 0.05; 33, 34), we can confidently say that we had the ability to detect an effect size of this magnitude or greater.
Furthermore, there were no cases in which the dose-response effects were significantly longer for those with the highest participation (omnibus F-tests were not significant); in fact, in the single case of significant difference, the directionality indicated a favorable association such that participants took less time to graduate than non-participants. ANOVAs show comparisons between no-dose, low-dose, and high-dose event participation (Figure 1b).
In sum, the analysis reveals that participating in career and professional development was not associated with an increased time to degree. This initial finding supports the notion that participation, even in high doses, is not associated with any delay.

Participation and Productivity: Total Publications
Next we evaluated the impact of career and professional development participation on productivity, measured by number of publications. We first evaluated total publications during the graduate training period. For participants versus non-participants, one institution showed significantly more publications for participants, and one showed significantly fewer publications for participants. The remaining seven institutions showed no significant difference between participants and non-participants with regard to total number of publications, and when accounting for different levels of participation, no institution showed any significant difference in the number of total publications between groups (Figure 3).

Participation & Productivity: First-author Publications
Professional scientists, faculty researchers, and doctoral training programs often place special significance on first-author publications because the bulk of trainees' efforts in the lab are usually directed at projects resulting in first-author publications. These efforts also typically form the underpinning for the students' theses. Due to the unique importance of first-author publications, we further examined whether there is a specific impact of participation in career and professional development on first-author publications.
Similar to the overall number of publications, there was no conclusive effect of BEST participation on increases or decreases in, specifically, first-author publications ( Figure 5). In the binary condition for first-author publications, one institution's BEST participants produced significantly fewer first-author publications. In contrast, when level of participation was considered, one institution's "high dose" BEST participants produced significantly more firstauthor publications. In both the binary and dose-response analyses, the remaining eight institutions showed no significant difference between participants and non-participants in firstauthor publications. Accordingly, there was no overall trend of BEST participation reducing first-author publications, and the hypothesis that participation in professional development activities reduces publication rate was not supported by our data.
Meta-analyses were conducted to determine the weighted effect size and significance across all the institutions for total and first author publications (Figures 4 and 6). The cross-site metaanalyses (including nearly 1500 trainees' publication data) showed no significant difference in total publications between participants and non-participants, with a point estimate of -0.04 (p=0.23, SE=0.03, z=-1.21, [CI95%= -0.10, 0.02]), with effect sizes ranging from r 2 = < 0.01 -0.02 (Figure 4). Similarly, a meta-analysis of first-author publications from the same institutions showed no significant difference in first-author publications between participants, and nonparticipants, with a point-estimate of -0.02 (p=0.64, SE=0.04, z=-0.47, [CI95%= -0.08, 0.05]), with effect sizes ranging from r 2 = < 0.01 -0.03 (Figure 6). Across a large multi-institutional sample, collectively there was a lack of evidence for reduced trainee productivity as measured by publication number.

Weighted Publication Metric (PubMetric): An alternative comprehensive publication measure
Both first-author publications and total publications capture different aspects of productivity. By choosing to report one or the other, some information is lost. Instead of limiting the accuracy of reporting by removing one or the other, we proposed creating a novel publication metric that could capture trainees' efforts on both types of contributions in a single metric. One concern that we anticipated was how to weigh these different contributions. For instance, first-author research papers may be valued over other types of contributions (e.g., middle-author research paper contributions or review papers). To address this issue, UNC developed a weighted publication metric (see Supplemental File 4, S4) that incorporates the four primary types of peer-reviewed publications into a single number. Impact factor was not included as a variable in the publication metric because impact factor as a measure of paper quality or journal prestige can be inherently biased by field. The UNC weighted publication metric was designed as a broader and more objective measure of the amount and quality of author contributions by trainees as reflected by authorship order.
To create the weighted publication metric, active training faculty at UNC were asked to rank the relative value of (A) first-author peer-reviewed research articles, (B) first-author peer-reviewed review articles, (C) middle-author peer-reviewed research articles, and (D) middle-author peerreviewed review articles (n=150 responses from 350 total contacted; see S4, for details). Firstauthor and co-first-author publications were considered synonymous. When averaging all faculty rankings and normalizing middle-author reviews to a weighting of 1, we generated the following equation for the weighted publication metric (PubMetric).
Weighted Publication Metric (PubMetric) = 2.07 x (number of first-author research papers) +1.54 x (number of first-author reviews) +1.37 x (number of middle-author research papers) +1.0 x (number of middle-author reviews) Four BEST institutions were able to provide weighted PubMetric data from PubMed scripts. Using this metric, similar patterns emerged as for total publications and first-author publications (S4).

Internships, Efficiency, & Productivity: Time to Degree, Total Publications, and Firstauthor Publications
Internships are a form of career training that have unique characteristics and formats, but all require a relatively large time commitment that one could predict would impact time to degree or productivity (for definition of internship, see Supplemental File 3, S3). Institutions that supported internship opportunities provided outcome data for trainees who participated in their internship programs, which had differing lengths and designs and had some variant of a competitive selection process (S3).
We did not detect a difference in time to degree between graduate students who completed an internship and those who did not (Figure 7a). Similarly, we found no evidence of decrease in publication productivity or in the number of first-author publications in individuals that participated in an internship (Figure 7b, 7c). Internships were associated with favorable effect for some institutions' first-author publications. Additional data on internship participation versus time to degree, total publications, first-author publications, and weighted publication metric showed no effect of participation (S3).

Discussion
With concerns about productivity and length of doctoral education balanced with the need to provide adequate professional development, data from ten United States academic institutions were analyzed to determine if participation in career and professional development activities alters these outcomes. Here we discuss the impact of professional development on traditional metrics of academic success. Our study is unique in that it compiled doctoral degree durations at ten different universities, recorded individual participation in career and professional development activities in terms of dosage, and tracked individual engagement in real-time rather than relying on surveys sent to trainees after graduation.
The data show that even extensive participation did not result in a significant increase in time to degree or decrease in productivity of publications for doctoral graduate students in the life sciences. Overall, this is true for both low-dose and high-dose participants, although in our analyses, we found some significant changes in specific variables at some institutions.
Time to degree was chosen as a proxy for efficiency of completion because it was a measure at all institutions and facilitated comparisons. Publications were chosen as a proxy for productivity because they are an objective measure and because publications are widely viewed as an important currency of graduate performance in life science higher education (37,38). The number of publications per graduate student in this study was in alignment with prior published work, where the average publication per graduate is 2.9 publications with a range of 0 to a maximum of 16 publications (39). Using our newly created weighted publication algorithm that considers all publications, the PubMetric, we found no difference in total number of publications between participants and controls at eight of the nine BEST institutions (S4).
Thus, across institutions nationwide, participating in career and professional development activities, including internships, did not negatively impact time to degree or manuscript publication. In fact, one institution even showed that participants with the highest dose (internships) had the most first-author publications. Although this observation could be partly explained by the fact that this program incorporated productivity into the selection process for internships, the same institutional requirements for first-author publications to graduate makes this explanation unlikely. Furthermore, other internship program institutions that recommended or required a first-author publication in order for a graduate intern to be selected also typically required one or more publications to graduate, reducing the likelihood that this explanation would fully account for the potentially beneficial effect.

Limitations
Overall, a potential selection bias exists for the data because individual participants were not randomly selected but instead self-selected to participate (40). Some of this effect might be due to self-selection via the a la carte model, or program selection bias via an application-based cohort model. It is possible that these selected individuals were highly organized multi-taskers before participating and became better informed and motivated at BEST events.
One limitation to our cross-institutional comparison is that each BEST program independently defined what it meant to be a 'participant' in their program; similarly, definitions of control, lowor high-dose participation varied by program. Three institutions defined their dosage based on the number of hours of professional development; five institutions defined their dosage using the number of events attended; and two institutions grouped their participants by the number of credits or points assigned for attendance.
Just as the program offerings of each institution were unique, so too were the trainee populations that were eligible for programming (S1). Some BEST institutions required trainees to apply to the program and participate in activities as a cohort while other BEST institutions used an a la carte model so that trainees could choose from among professional development offerings.
Others used a combination of cohort and a la carte, and some gradually opened program activities to more participants due to demand. For this reason, a classic "control" population (i.e. zero participation in professional development activities) is difficult to define when evaluating the impact of BEST programs. In addition, even the "control" population may have participated in other professional development events sponsored by other campus offices or student groups, scientific societies, companies, or other external organizations.

Culture Change
Notably, the US government clarified that researchers in doctoral and postdoctoral training who are supported by any federal funds are expected to not only conduct their research, but are also allowed to devote time to career and professional development (41). These guidelines helped faculty to better accept the notion of doctoral students' participation in activities outside of dissertation research.
Studies published by BEST institutions have further reinforced this change in faculty attitudes (25). These studies showed that faculty's initial hesitation is evolving to an understanding that next-generation scientists will not only need to be excellent researchers, but also need to be equipped with professional skills that are more effectively learned outside the laboratory. This viewpoint is supported by a snapshot of current faculty perceptions, which was obtained using subgroup surveys launched by institutions receiving NIH BEST funding. Responses showed that faculty believe that BEST career development programming is beneficial to trainees in a number of different ways: no delayed time to degree, enhanced happiness, positive effects in the lab, and more confidence in directing trainees' own career development (25).
Our current study, based on quantitative data, supports that participation in BEST programming did not adversely affect time to degree or numbers of manuscripts published, and in select cases, even correlated with a shorter or more productive outcome. We predict that as more evidencebased support for professional development comes to light, more faculty members will feel confident in encouraging their students to participate in such programming. Although further studies are needed to extend these conclusions across disciplines, the authors hope that these data will assuage concerns of faculty and trainees alike. Historically, biomedical sciences faculty had expressed initial hesitation toward time spent in first year laboratory rotations when first instituted, yet doctoral time to degree tracking at Cornell University revealed no statistically significant lengthening across comparison groups before and after rotations were mandated in 2003 for three graduate fields (S1) -and rotations are now a widely accepted best practice within the biomedical sciences. We hope that similarly, our data will provide encouragement for professional development during training to become equally commonplace as an accepted foundation of PhD training.
Over the past decade, academic institutions have increasingly recognized the breadth of careers pursued by doctoral students and the need for interventions and resources to support their future success (42). Many institutions have rapidly incorporated career and professional development training within doctoral programming (43,44 We hope that readers will share the results of our current study with their colleagues, and incorporate experiential learning activities into PhD training programs (3,46,48,50). Although this study focused on doctoral students from biomedical fields, we anticipate that the major conclusions of this study are likely applicable to graduate students and postdoctoral researchers in other STEM fields, as well as to other fields including those in the humanities, arts and social sciences.         Table 1c Legend. Participation was recorded at each institution as hours, events, or points. All bivariate analyses contrast control with any dosage (low plus high dosages combined). All dose-response analyses use the grouped definitions for control, low, and high as noted.

Boston University BEST Program:
BU's BEST is based on a classic feedback loop. Job market analysis informs program development which in turn equips trainees with the skills required to successfully join the workforce. Using the Labor•Insight TM software tool developed by Burning Glass Technologies, jobs, job trends, job locations, and so-called "hard" and "soft" skills required for various career pathways are revealed. With this information in hand, BU's BEST offers activities to equip trainees with skills needed in six broad biomedical career tracks. The programming is designed to enable the trainees to reflect on their career interests, to explore various career paths and to enhance their skills to prepare for a productive career. Trainees are encouraged to work on an IDP to gain insight into future career possibilities while participating in coursework, workshops and panel discussions with local professionals to gain additional knowledge about options. Once trainees' interests are refined to a particular career track, they can participate in offerings to hone the skills identified by Labor•Insight TM . Examples of workshops include those dedicated to grant writing, data analysis, entrepreneurship, creating a successful resume, LinkedIn profile or cover letter. Site visits and internships are also offered for more experiential learning. Finally, one-on-one career coaching is available and our trainees are encouraged to utilize our alumni mentor network for informational interviews. Taken together, these tools help trainees prepare to pursue their chosen path.

Cornell University Careers Beyond Academia/BEST:
Cornell University's Careers Beyond Academia/BEST provides flexible, experiential, empowering personalized opportunities for doctoral students and postdocs in all disciplines to make informed choices about their careers. Exposure to career options for PhDs comes via seminars in collaboration with department series organizers, workshops, signature "Careers in…" panels, symposia, employer site visits and courses to provide hands-on experience, all with an underlying mentoring component. Partnerships with several traineerun associations formed and supported include the Cornell Graduate Consulting Club (CGCC), Advancing Science and Policy (ASAP), the Technology & Entrepreneurship Club (TEC), Engineering Graduate Student Association (EGSA), Chemical Biology Interface (CBI training grant) and additional student-run organizations to address programming gaps based on iterative feedback. All together we provide group and individual coaching/advising sessions, training, interactions with practitioners, case competitions, practice describing their expertise in the language of their future employer (or funder) orally and online, and practical advice on researching and obtaining a job using the skills learned. Students and postdocs are awarded funds by application to attend conferences, join professional societies beyond their academic discipline, and create their own activities which often involve alumni. Embedded in the Graduate School, we partner with the Office of Postdoctoral Studies, Entrepreneurship@Cornell, the Society for Humanities, Career Services, CU-CIRTL, the Center for Teaching Innovation and other on-campus groups to cover additional focus areas that we co-develop and co-advertise. Careers Beyond Academia/BEST enhances training opportunities for graduate students and postdoctoral scholars in all fields through an individualized, flexible program that empowers trainees to acquire the knowledge and hone skills to become more credible for an ideal career outcome. KEYS to SUCCESSES. Flexibility: a program that is ready when students and postdocs are, offering opportunities at the dose they are ready to receive, increases both faculty buy-in and trainee empowerment. Personalization: as no two careers are identical, neither will their training needs be so; we also encourage and support student/postdoc-initiated ideas. Experiential opportunities: to be able to say 'I've done that' and write it on their resume. Gaining professorial buy-in: it is dependent on how we have marketed the program the skills learned will foster success in any field, including academia we are not pushing students to careers beyond academia but rather are enabling an informed choice for future success. We offer a resource to alleviate faculty pain points (if they have no experience in industry, or feel they can't connect trainees with mentors or opportunities in science policy or intellectual property law, e.g.) and to showcase their successful alumni. We also have a oneon-one mentoring program. Our program has also added value to research efforts at UCI e.g. improved scores on training grants and fellowships and new partnerships with industry. The program has helped with recruitment of talent pool. 30% of the incoming students mentioned that the presence of GPS-BIOMED program helped them make a positive decision to join UCI. Additionally, based on the alumni survey, a large percentage of them owe their career preparedness and success to GPS-BIOMED program, increasing alumni engagement.

University of North Carolina at Chapel Hill TIBBS Program:
The Training Initiatives in Biological and Biomedical Sciences (TIBBS) programming supplements our trainees scientific training with the non-bench skills needed to be successful in a wide range of careers through: regular professional development programming, student-led career cohorts, workshop series, site visits, and an immersive internship program. Professional development workshops (e.g., fellowship writing, career planning) supplements the career exposure available through the career cohorts an immersive experiences. TIBBS plans events and workshops targeted to doctoral and postdoctoral scientists at specific career stages, including TIBBS sponsors the Annual Career Blitz, which brings two-dozen scientists from a wide variety of research and research-related careers in and out of academia, to campus for an afternoon of instruction and networking (typically attendees by nearly 200 scientists annually). Career cohorts focus on a wide variety of career pathways, including: business and consulting; science policy and outreach; writing and communication; teaching intensive careers; and academic and research-intensive careers. TIBBS provides structure and support for scientist-led career cohorts that meet monthly to network with invited professionals, share career resources, and report back on informational interviews. Doctoral and postdoctoral scientists gain leadership experience through their groups and groups frequently collaborate to bring in external scientists whose job duties span interest areas. Workshop series take place two to three times per year, alternating topics of interest, mirroring each of the cohort's popular interest areas (e.g., policy series communication series, teaching series, and research-intensive careers series), as well Nationally known consultants and speakers anchor the workshop series that are supplemented by expert local knowledge. as including topics spanning career areas (e.g., leadership series, and industry skills series). UNC's proximity to the Research Triangle Park situates us to take advantage of multiple immersive learning opportunities, including site visits, and internships. Doctoral and postdoctoral scientists go on monthly field trips to local companies, organizations, and non-profits representing nearly all available career options, co-organized with local institutional partners (e.g., ELITE Consortium). TIBBS represents UNC's long-standing and continuing commitment to professional and career development for our 1000 biological and biomedical graduate student and postdoctoral scientists.

University of Rochester URBEST Program:
UR's Broadening Experiences in Scientific Training (URBEST) program funds instruction in leadership and professionalism, creates new opportunities for experiential learning through internships and shadowing, and provides training pathways in (1)

Vanderbilt ASPIRE Program:
The ASPIRE Program is designed to empower Vanderbilt's biomedical sciences PhD students and postdoctoral scholars to make well-informed career decisions with confidence. ASPIRE provides PhD students and postdocs (collectively, called "trainees") with programs for professional development, career exploration, and career enhancement. Except for a few professional development sessions that are required for all first year PhD students, all other activities are optional and trainees choose from among the offerings according to their training stage and career interests. Professional development opportunities include a twicemonthly ASPIRE Postdoctoral Café series for postdocs, an annual ½ day ASPIRE to Connect workshop focused on the importance of building professional relationships, a series of career planning sessions for first year students or more advanced grad students and postdocs, and several non-credit bearing short courses in communication-related topics. Career exploration activities include a collection of nearly 100 Beyond the Lab interviews with Vanderbilt PhD and postdoctoral alumni discussing their careers (https://medschool.vanderbilt.edu/career-development/beyond-the-lab-see-listen/), a monthly PhD Career Stories seminar series, and an Annual Career Symposium. Career enhancement activities are intended for post-qualifying PhD students and postdoctoral fellows, and include opportunities to participate in didactic and experiential modules, the opportunity to gain hands-on-experience through ASPIRE internships, and the opportunity to gain deeper insight into specific industries through ASPIRE on the Road group field trips to visit cities with high concentrations of biotech or policy-related employers. For a description of the full range of ASPIRE program features, see the ASPIRE Annual Reports at https://medschool.vanderbilt.edu/careerdevelopment/annual-report/.

Virginia Tech BEST Program:
Virginia Tech's Broadening Experiences in Scientific Training (BEST) Program activities are open to participation by pre-and postdoctoral scientists at any stage of their training, with some activities required by one or more graduate programs. Core offerings include: I. A 2-credit professional development course focused on self-assessment, skill-building, and career pathways for biomedical PhDs. Example skills topics include grantsmanship and CV writing, improvisation for science communication, and budgeting. II. Individual Development Planning workshops, III. Job simulation workshops delivered by outside professionals from a variety of careers, which involve hands-on activities and case studies. IV. A commercialization/shark tank module and pitch competition run jointly with biomedical engineering and business faculty. VT-BEST also delivers one-off activities such as networking training preceding professional/scientific events (with VT's Career Center), and workshops such as visualizing data or social media for scientists (with VT's Center for Communicating Science). Virginia Tech's recent partnership with the Roanoke RAMP accelerator provides training and shadowing opportunities in commercialization and start-ups. VT-BEST also facilitates a small number of internship opportunities, through industry partnerships and travel awards. Lastly, VT-BEST staff work closely with the Roanoke Graduate Student Association, Virginia Tech Carilion Student Outreach Program, and individual trainees, on the implementation of traineedriven professional development activities.

Wayne State BEST Program:
Wayne State BEST assists doctoral students in exploring and pursuing a variety of career options. One of the highlights of Wayne State BEST is that doctoral students outside the biomedical disciplines also participate, adding richness to the training experiences. Wayne State BEST has three successive phases: Phase I -Exploratory Seminars; Phase II -Didactic Workshops; Phase III -Career Exploration/Internships. Phase I acquaints students with multiple career options via 90-minute seminars, each exploring one of five career tracks with industry partners, faculty, and alumni whose work intersects with the biosciences and the following areas: undergraduate teaching, law, communication, business/industry, and government. Phase II comprises a series of daylong workshops on the career options identified in Phase I. These workshops serve as a bridge between the Phase I exploratory seminars and the Phase III career exploration/internship experiences. A team comprising community and industry leaders (including alumni) work with faculty facilitators to design a curriculum focusing on necessary skills sets for each of the career tracks. Attendees gain additional knowledge through one-on-one exchanges with professionals in these domains. Students learn how their scientific training, problem-solving abilities, and analytical aptitude can be mobilized to successfully address the needs of their desired career. Phase III offers students experiential learning about these career paths through career explorations/internships with private industry, state agencies, nonprofit organizations, or primarily undergraduate institutions. Wayne State offers workshops on constructing an Individualized Development Plan (IDP), which is required of all doctoral students. In addition, the Wayne State Graduate School offers professional development seminars on basic employment skills such as conducting a job search, preparing for an interview, converting a CV to a resume, building a LinkedIn page, and writing a cover letter. Wayne State faculty lead specialized workshops on abstract writing, poster presentation, professional communication in the workplace, and strategies for presenting scientific ideas to non-specialist audiences. The Graduate School established a 1-credit course for graduate students interested in preparing for a career outside of academia. This course uses exercises and assignments to build a professional portfolio necessary for employment in highly skilled positions. In order to measure graduate students' time in training, we contemplated if we should use time to degree or time to defense. One concern was that time to degree might not be a robust measure because it is a blunt instrument that can have delays built in between defending the dissertation and completing additional requirements, as well as delays due to official graduation dates. As a result, one would expect time to defense to be a more granular and sensitive measure to identify any potential delays due to involvement in professional development activities.   Figure 1a. Months to Defense versus binary professional development participation. Blue error bars represent standard deviation of the mean. Mean is denoted by a red line.

NIH BEST Program:
Following the BEST Data Summit an internship is defined as working in a professional setting for the purpose of receiving hands-on training. An internship assumes the trainee is able to develop some skills during the experience and results in a deliverable. On the other hand, an externship is defined as job shadowing a professional at work for the purpose of observing and experiencing the work environment and learning about the expectations of a profession (O'Brien et al. in press). In both an internship and externship, significant time is spent in the professional workplace environment and therefore out of the graduate student's own laboratory. For the purposes of this paper, internship and externship are collectively referred to as internship for consistency (exceptions are noted in figures as applicable).

Boston University BEST Program:
BU's BEST program established relationships with departments and programs within BU and with local employers and nonprofit organizations to develop internship opportunities in diverse career tracks such as business, administration, communication, policy, research and teaching. Sites submitted a description of the internship including intern responsibilities, deliverable and its evaluation, professional development objectives, benefits to the intern, and assigned mentor. Internships were offered on a rolling basis, varied both in length (from 1 month to 1 year) and in time (from 1-40 hours/week), and were paid where possible. Applicants must have passed their qualifying exams prior to the start of an internship, have a completed IDP on file, and have attended basic skill building pre-internship workshops (e.g. professionalism 101) and internship-specific workshops. Applicants submitted their resume, approval by the research advisor, sign off on satisfactory academic performance, a personality assessment and a pre-internship evaluation. Each applicant met individually with the internship director prior and after applying to ensure that the trainee's career goals were aligned with the goals of the internship. BU's BEST presented qualified applicants to the internship site. Interns were selected by the employers, and developed projects with deliverables set by the internship site. Both interns and site met with the internship director to evaluate learning objectives mid-and end of the internship.

Rutgers iJOBS Program:
Rutgers trainees who are interested in doing a deeper dive into a particular career track can apply to the Phase 2 cohort once they have completed at least 12 hours of Phase 1 events and completed their qualifying exams. About 20 trainees per year are admitted to Phase 2 and are matched by the iJOBS program directors with a professional in their area of interest for an externship/shadowing experience. The trainees spend time in the professional's workplace sitting in on meetings and observing their activities for a total of 72 hours spread out over the course of a semester.

University of Chicago myCHOICE Program:
A key goal of myCHOICE Experience programming is to provide real-world, practical experience in a specific career field. Internships, defined as "hands-on opportunities of limited duration (weeks to months)," are an important component of this training experience. myCHOICE collaborates with on-and off-campus partners to develop a diverse array of internships varying from scientific writing, to investment banking, to marketing and program development. All myCHOICE internships are unpaid and part-time (10 hours per week), lasting approximately 10 weeks. Trainees interested in internships must apply and receive permission from their PI. Interns who are graduate students formally register for the internship and receive academic credit.

University of North Carolina at Chapel Hill TIBBS Program:
UNC's ImPACT Internship program consisted of 160-hours, typically completed either one-month full time or multi-month part-time, in the career field of choice (industry research and development, science policy, teaching, museums/outreach, startups, etc.) Approximately 25-30 interns per year are typically selected to participate based on training stage (comps/quals complete to reach candidacy stage, often fourth or fifth years of training); research status (appropriate progress toward or completion of first-author publication for training stage); career exploration and professional development training; and selection of career path with competitive skills appropriate for field selected. This is the capstone experience available to UNC graduate students on a competitive basis are 160-hour internships that can take place during 1 month of full-time effort, or part time over 2-3 months. Interns are paid at their current stipend of salary rate. Graduate students must have passed their qualifying exams and all scientists must have written support of their faculty mentor in order to apply.

University of Rochester URBEST Program:
URBEST developed a flexible experiential learning program that included long-term internships (full-time, up to three month), short-term internships (hours-per-week) shadowing experiences (a couple of days total) and volunteer opportunities (< 4 hours per week). Internships took place within University of Rochester at core facilities (e.g., Office of Regulatory Support, Upstate Stem Cell Good Manufacturing Practice (GMP) Facility, Flow Cytometry Core) or within the city of Rochester (e.g., Rochester Museum and Science Center, Litron Laboratories). The majority of URBEST internships took place in different cities (e.g., Entasis Therapeutics, The US Food and Drug Administration (FDA), Pfizer Vaccines). To be allowed to do an internship, the learner must have officially enrolled in URBEST as a trainee for at least 6 months, have collected ~ 40 -60 points through the program, have passed their qualifying exam and have the permission of their PI. The graduate student was also required to have a first author publication or multiple publications if they were not first author. The URBEST program disseminated intern opportunities as they became available, posting them first to our URBEST LinkedIn Group as a benefit of enrolling in the program. While undertaking their internship all trainees needed to be registered as a PhD graduate student. If the graduate student was on some type of training grant or fellowship, they needed to discuss your stipend and training opportunity with their program officer to get approval for their internship. It is up to the program officer as to whether or not their internship contributed to graduate student training, during the URBEST program all program officers approved graduate student internship requests. Most internships were set up by the trainee using a "cold email" technique to set up informational interviews, which often led to experiential opportunities. A few trainees found well established internships that they could also apply (e.g., Scientific American, Bayer Global Regulatory Affairs, White House Office of Science and Technology Policy).

Vanderbilt ASPIRE Program:
Vanderbilt University's ASPIRE program has established relationships with local employers and several national nonprofit organizations to develop part-time internship opportunities in a range of career areas, including data science, college teaching, nonprofit management, business development, marketing, science policy and advocacy, and science outreach. ASPIRE internships are paid where possible, part-time (usually 6-8 hours per week), and generally last 10-12 weeks. Internships are offered on a rolling basis according to employer need and desired timing. All PhD student interns must have passed their qualifying exams prior to the start of an internship, and each applicant meets individually with our office staff prior to applying to ensure that the trainee's career interests are aligned with the goals of the internship. Interns are selected by the employers, and interns are expected to contribute to one or more projects during their internship. Since the inception of the ASPIRE internship program in 2015, nearly 100 trainees have completed internships, about 65% of whom have been PhD students.

Supplemental File 4. Publication reporting and metric development
Publication data collection procedures First-author and co-first-author publications were included in the first-author publication count.
Publications and their metadata were primarily collected through a Python script built to query the PubMed API (47), in combination with manual verification (see SI Table 4a). By automating the PubMed search process, the script allowed for replication and validation of publication data across multiple institutions and implementation of Cross-Institutional Instructions. Manual checking was used for institutions that could not access PubMetric results for technical reasons (one of ten institutions); or existing survey data from a graduate school survey were used (one of ten institutions).

Publication metric
The metric is a weighted calculation of different types of publications determined by polling 375 active training faculty at UNC about the relative value they place on different publication types. Respondents (n=120) were asked to assign their values to the following types of publications: This allows for the use of a single publication metric rather than having to depend on multiple measures, and may be of use especially when simplicity or overall trends are of most interest. This method of evaluating productivity is an alternative to attempting to assign credit to various flagship journals by name (which can be difficult to capture across fields), or impact factor measures (which are controversial), and provides an independent estimate of productivity based on role/contribution to each work as well as accounting for type of publication.
Sample survey SURVEY: Faculty survey to create publication productivity rating Start of Block: Productivity Metric Survey As part of a project to examine graduate student productivity we need your input. Your response will be used to develop a new metric to represent trainee publication "productivity" as a single quantitative measure.
Please rate the relative value you give to each publication type when evaluating trainee productivity.
---Q: How would you value the contribution of a candidate with each of the following publication types? 1 -Not valuable ----Thank you for helping to develop a new metric for assessing graduate student productivity. We will share the survey results with all respondents after the survey closes on 12/6/17. ---SI  Figure 4a. Weighted publication metric versus binary professional development participation. Blue error bars represent standard deviation of the mean. Mean is denoted by a red line. Significant p-values (<0.05) are denoted in red whereas non-significant differences are denoted in black for each independent samples t-test.