Unreviewed science in the news: 5 The evolution of preprint media coverage from 2014-2021

(200 words) 36 It has been argued that preprint coverage during the COVID-19 pandemic constituted a paradigm shif 37 journalism norms and practices. This study examines whether, in what ways, and to what extent this i 38 the case using a sample of 11,538 preprints posted on four preprint servers—bioRxiv, medRxiv, arXi 39 and SSRN—that received coverage in 94 English-language media outlets between 2014–2021. We 40 compared mentions of these preprints with mentions of a comparison sample of 397,446 peer reviewe 41 research articles indexed in the Web of Science to identify changes in the share of media coverage tha 42 mentioned preprints before and during the pandemic. We found that preprint media coverage increase 43 at a slow but steady rate pre-pandemic, then spiked dramatically. This increase applied only to COVID 44 19-related preprints, with minimal or no change in coverage of preprints on other topics. In addition, t 45 rise in preprint coverage was most pronounced among health and medicine-focused media outlets, wh 46 barely covered preprints before the pandemic but mentioned more COVID-19 preprints than outlets 47 focused on any other topic. These results suggest that the growth in coverage of preprints seen during 48 the pandemic period may imply a shift in journalistic norms, including a changing outlook on reportin 49 preliminary, unvetted research. 50

On January 10, 2020, the World Health Organization published its first set of guidelines for preventing and controlling a suspected "novel coronavirus (nCoV)" (WHO, 2020).Soon journalists found themselves plunged into an unexpected crisis, with an out-of-control, little understood infectiou disease, and an influx of new scientific information to sift through and report on.Without much peer reviewed literature to go on-especially in the early stages of the pandemic-many turned to preprint servers to share urgent new information with the public (Fraser et al., 2021).The ensuing media coverage of preprints seen during the pandemic has since been described as a complete rupture from p reporting practices (e.g., Burke, 2021;Makri, 2021).Yet, empirical evidence supporting this assertion lacking.As noted in previous research, there is currently an absence of longitudinal investigations tha examine preprint coverage over time and which assess the impact of COVID-19 on journalistic practi and norms (Fleerackers et al., 2023).This study fills this gap by examining how media coverage of preprints has evolved, both qualitatively and quantitatively, in the lead up to, and during the first year the COVID-19 pandemic.Using Altmetric data, it examines changes in the volume and nature of med coverage of 11,538 preprints posted between 2013 and 2021 on bioRxiv, medRxiv, arXiv, and SSRN four of the most actively used servers used to share COVID-19-related research (Waltman et al., 2021 2.0 Literature review 2.1 Preprint media coverage before and during the COVID-19 pandemic Preprints have been used extensively in physics, math, and computational science since arXiv launched in 1991.However, scientists in the biological and medical fields have been more reluctant to do so-that is, until recently (Puebla et al., 2021).The early months of the pandemic saw a sharp increase in the volume of available COVID-19-related preprints (Funk, 2023;Horbach, 2020), with  Vergoulis et al., 2021).One study (Kousha & Thelwall, 2020) found that preprints posted to arX bioRxiv, medRxiv, and SSRN comprised 13.26% of the COVID-19 literature during March-April 20 while an analysis by Fraser et al. (2021) found that preprints posted to 16 servers (including the four examined in this study) comprised almost 25% of the COVID-19-related research available from January-October 2020.
COVID-19-related preprints also gained traction within news media, receiving coverage in diverse media outlets around the world (Fleerackers et al., 2021;Massarani et al., 2021;Massarani & Neves, 2021;Simons & Schniedermann, 2023;van Schalkwyk & Dudek, 2022).One study found tha more than a quarter of COVID-19-related bioRxiv and medRxiv preprints were mentioned in at least one media story during the pandemic, while only about 1% of those on other topics received media coverage (Fraser et al., 2021).Some journalists reported adopting novel practices to report on these unreviewed studies, something they said they had never done before (Fleerackers et al., 2022a;Massarani et al., 2021).This media coverage of preprints seen during the COVID-19 pandemic has been described by some journalists as a "paradigm shift" (Fleerackers et al., 2022a).Yet, while studies conducted during the COVID-19 pandemic provide important evidence into how journalists covered preprints during th evolving health crisis, little is known about whether journalists have covered preprints on other topics during other communication contexts.For example, Fraser et al. (2021)'s widely cited study is often described as providing evidence that "During the pandemic, journalists…paid increased attention to preprints" (Kwon, 2021), but the authors did not compare pandemic preprint coverage to pre-pandem levels.Instead, they provided evidence that COVID-19-related preprints received an outsized amount media attention, relative to those on other topics posted to bioRxiv and medRxiv during the same time period-but not relative to preprints posted during different time periods or on different servers (Frase  , 2021).One recent study begins to fill this gap through an examination of coverage of preprints seven German newspapers from 2018-2021(Simons & Schniedermann, 2023).The authors identified low and stable rates of coverage leading up to the pandemic, followed by a major surge in 2020 and 2021 that was driven by COVID-19-related preprints.However, it is unclear whether this trend is reflective of other media outlets (e.g., those outside of Germany) and whether there are disciplinary differences in coverage trends.
More broadly, although preprints made up a significant proportion of the COVID-19-related literature available within the first months of the pandemic, it is unclear how media coverage of preprints compares to coverage of peer reviewed research.One article found that the five COVID-19 related research articles that received the most media coverage were all peer reviewed publications; however, the analysis was descriptive and did not compare the volume of preprint coverage to that of peer reviewed papers (Kousha & Thelwall, 2020).Another small study found no significant differenc in the amount of media coverage received by medRxiv preprints and peer reviewed publications abou COVID-19-related therapies that were posted between February 1-May 10, 2020 (Jung et al., 2021).Besançon et al. (2021) used Altmetric to examine news coverage of COVID-19-related preprints post to arXiv, medRxiv, and bioRxiv between January-July 2020, finding that these preprints received mo coverage than the non-COVID-19-related preprints posted to arXiv during the same time period.Aga coverage of preprints before the pandemic period was not considered.Fraser et al. (2020) found that bioRxiv preprints submitted between November 2013 and December 2017 received far less media coverage than either their peer reviewed versions or a control set of peer reviewed articles that were never deposited to bioRxiv.Finally, Waltman et al. (2021) found that, although some COVID-19-rela preprints were highly reported on, overall, news coverage of peer reviewed literature outstripped coverage of preprints.Unfortunately, Waltman et al. (2021) did not report the average attention receiv per preprint vs peer reviewed article.However, the authors did examine news coverage received by a sample of high-profile preprints and their corresponding peer reviewed articles.For 45% of these preprint-article pairs, the preprint received more than 20% of the total news attention; for 11% of the pairs, preprints received more than 80% of the coverage (Waltman et al., 2021).Again, the authors di not compare these findings to rates of coverage before the pandemic.
Collectively, these results provide some of the first evidence that preprints have historically received less media coverage than peer reviewed research and that this trend may have started to shift during the pandemic.However, given the mixed and incomplete body of evidence, several questions remain unanswered.In particular, it is unclear whether the volume of preprint media coverage increas decreased, or remained relatively stable in the years leading up the pandemic-information that could help shed light on whether preprint-based media coverage is likely to continue post-COVID-19.It is a unclear whether any changes in coverage seen during the pandemic apply only to COVID-19-related preprints or reflect a change in journalists' willingness to use preprints in general.As such, to examin whether the pandemic has truly introduced a "paradigm shift" in journalistic practice, this study uses sample of preprints that received coverage in English-language media between 2014-2021 to examin the following research questions: RQ1: Has the share of preprint coverage in the media increased during the COVID-19 pandemic?RQ2: Do changes in media coverage of COVID-19-related preprints extend to coverage of preprints on other topics?

Preprint media coverage in an evolving media landscape
It is also unclear from previous research which types of media outlets have driven media coverage of preprints and whether this has changed as a result of the pandemic.Journalism has evolve in important ways in the years leading up to the COVID-19 crisis, with financial pressures, shrinking  (Saari et al., 1998;Schäfer, 2017).These declines ha likely influenced the amount of media coverage that research articles-including preprints-receive, outlets specializing in science appear to cover more research than general interest publications (Wihb 2017).In addition, an array of actors who have historically been considered "peripheral," or outside o journalism, have entered the field, including bloggers, news aggregators, and other alternative outlets (Hermida, 2019;Schapals, 2022;Stocking, 2019).These peripheral actors may not always adhere to t established norms and practices that shape media coverage at traditional-or "legacy"-outlets (e.g., Harrison et al., 2020;Hurley & Tewksbury, 2012), which may affect how or whether they cover preprints.For example, journalists working at peripheral outlets may not be expected to adhere to professional journalism resources, such as the AP Style Guide, which recommend avoiding research t has not been peer reviewed (Froke et al., 2020;Haelle, 2020).Yet, both peripheral and legacy outlets actively covered COVID-19-related preprints during the early months of the pandemic (Fleerackers e al., 2021).Similarly, outlets which publish content but are not considered journalism, such as univers websites and press release distribution services, may also contribute to mobilizing preprint research.F example, the Science Media Centre in Germany-a non-journalistic outlet that provides science journalists with access to research and expert perspectives-began sharing roundups of newly posted preprints during the pandemic (Broer, 2020;Broer & Pröschel, 2022).Again, however, any evidence about the nature of non-journalistic outlets reporting on preprints is limited to the pandemic period.A such, our third research question asks: RQ3: Have changes in media coverage of preprints occurred similarly across media outlets?To identify media coverage, this study relies on data from Altmetric, 1 a company that tracks mentions of research outputs across a range of digital media, including news media.Research suggest that Altmetric's "Mainstream Media" category is a relatively reliable source of data but only when working with a predefined list of English-language media outlets (Fleerackers et al., 2022a;Ortega, 2020bOrtega, , 2020a)).In addition, because Altmetric regularly updates both the list of media outlets 2 and research outputs 3 it tracks, the volume of media coverage it collects may vary over time in ways that a unrelated to actual changes in news reporting.For these reasons, we decided to gather two datasets: 1.A primary dataset comprising news mentions of bioRxiv, medRxiv, arXiv, and SSRN preprin 2. A comparison dataset comprising news mentions of peer reviewed research indexed in the We of Science (WoS).

Identifying and characterizing media outlets that frequently cover research
Data were queried from local snapshots of the Web of Science and Altmetric Databases house at the Observatoire des sciences et des technologies (OST). 4Data filtering and cleaning were perform using the Python pandas package (The Pandas Development Team, 2023).To identify our predefined of media outlets, we queried a snapshot of the Altmetric database from June 3, 2021, for news mentio of all WoS research outputs associated with a digital object identifier (DOI).We restricted our search mentions of research outputs that had been published in 2013 or later and that were mentioned in new stories between January 1, 2014, and June 3, 2021.We then filtered for outlets that consistently cover a high volume of research, defined for the purposes of this study as outlets that mentioned at least 100 1 altmetric.com 2https://help.altmetric.com/support/solutions/articles/6000235999-news-and-mainstream-media 3 https://www.altmetric.com/about-our-data/how-it-works-2/ 4 https://www.ost.uqam.ca/WoS research items per year from 2014-2020.We manually checked the resulting 128 media outlets visiting the URLs for their home pages provided by Altmetric.After excluding 25 outlets that were no written in English, five that were not tracked by Altmetric from 2021-2022 (e.g., because they had changed their domain names), three whose URLs did not resolve, and one with all misidentified mentions, we were left with a final sample of 94 outlets.
Next, we applied a coding protocol adapted from Hermida & Young (2019) to characterize the nature of these media outlets.We analyzed each outlet's main topical focus (e.g., science and technology, health and medicine, general news, etc.) and assessed whether it was best described as legacy journalism (i.e., staffed by professional journalists who adhere to traditional journalistic norms peripheral journalism (i.e., staffed by individuals who have traditionally worked outside of journalism and who adhere to emerging or alternative norms), or non-journalism (i.e., organizations such as universities, press release services, or academic journals that do not produce journalism).A detailed version of the coding protocol, including examples, is available from Fleerackers and Fagan (2022).
Coding was performed by researchers with professional journalism experience: the lead autho and a research assistant who was not aware of the study objectives (cf.Hermida & Young, 2019).The two coders independently explored the media outlets' websites, examining their content, Mission Statement, and, if available, other relevant pages (e.g., Masthead, Editorial Guidelines, Code of Conduct).The coders compared their coding and resolved any discrepancies through discussion, and, needed, by consulting an outside researcher (also a former journalist).Such double coding approaches are appropriate when data are not very numerous (Krippendorff, 2004), as in the present study.Result of the final coding are reported in aggregate in Table 1; coding for the full list of outlets is available a (Alperin, Fleerackers, et al., 2023).

Gathering news mentions of preprint research
We queried Altmetric for mentions of bioRxiv, medRxiv, arXiv, and SSRN preprints in storie published by the 94 outlets since January 1, 2014.This yielded a total of 40,039 mentions of 15,041 preprints across 31,258 news stories.For each of these preprints, we gathered the publication dates fro the arXiv and Crossref APIs using the Python arxiv and habanero packages (Chamberlain, 2020;Schwab, 2021).
Next, because previous research suggests publication date metadata can often be incorrect or incomplete (Haustein et al., 2015), we manually checked subsamples of our data and compared the publication dates provided by Crossref, the arXiv API, and Altmetric.The most reliable publication d for each server was retained for analysis.For bioRxiv and medRxiv, this was the DOI creation date (i the date that the DOI for the preprint was deposited in Crossref); for arXiv, it was the date provided b the arXiv API; and for SSRN, it was either the "first posted on" date provided by Altmetric or Crossr DOI creation date, whichever came first.We removed 3,619 preprints that were published before 201 as these publication dates were particularly unreliable (perhaps because Altmetric started tracking mentions partway through 2012 and thus has incomplete data for previously published outputs)5 .Eve after excluding these preprints and selecting the most reliable publication date for each server, we not We made several further exclusions to ensure that the mentions in our dataset were mentions o true preprints (i.e., rather than postprints or published versions of preprints).First, we removed 165 mentions of postprints, which we defined as preprints that were posted on the same day, or after, their published versions were published.Because, as mentioned above, publication dates for preprints were often incorrect by a few days, we excluded an additional 332 mentions of preprints with a publication date within seven days of the published version's publication date (i.e., suspected postprints).We also removed 327 mentions of preprints in news stories that were published before the preprint was first posted, using a five-day cut off to allow for the slight inconsistencies we identified in the publication metadata.Because Altmetric does not disambiguate between preprints and published versions for som preprint servers6,7 and may thus erroneously include some mentions of peer reviewed research, we removed 3,547 mentions in news stories published after the peer reviewed version of the preprint was published, again using a five-day margin.While this approach may have removed some true mentions preprints, these false removals are likely limited, as it is relatively uncommon for news stories to mention research outputs more than a few weeks after initial publication (Maggio et al., 2017).Finall we removed an additional 1,021 duplicate news mentions (where the same preprint was mentioned in same story more than once).In total, filtering led to the exclusion of 9,081 mentions (22.5% of the original dataset).The code used for filtering has been made publicly available (Alperin, Shores, et al. 2023).Our final preprint sample comprised 31,028 mentions of 11,538 preprints by the 94 outlets in o sample (Alperin, Fleerackers, et al., 2023).We downloaded all the mentions of WoS research from our 94 outlets, resulting in 1,657,202 mention of 466,138 distinct research outputs.From these, we filtered 156,187 mentions of research articles tha were published prior to 2013, 579 mentions that were already included in the preprint data, and 14,48 duplicate mentions (where an article was mentioned in the same news story more than once).In total, filtering led to the exclusion of 170,669 mentions (10.3% of original dataset).
The final published research sample comprised 1,486,533 mentions of 397,446 distinct peer reviewed research outputs by the 94 outlets (Alperin, Fleerackers, et al., 2023).

Identifying news mentions of COVID-19 research
To identify COVID-19-related preprints and WoS outputs, we searched for the presence of the following COVID-19-related keywords in the outputs' titles using R version 4.3.0(2023): coronaviru covid-19, sars-cov, sars-cov-2, ncov-2019, 2019-ncov, hcov-19, sars-2, pandemic, covid, Severe Acut Respiratory Syndrome Coronavirus 2, 2019 ncov.These keywords were a combination of those used Fraser et al. (2021) and those listed in the National Library of Medicine's search strategy for identifyi COVID-19-related literature (Chen et al., 2020).We also added the term "pandemic," which wasn't included in either of these lists of keywords but is likely used in many COVID-19 titles.As some keywords (e.g., "pandemic") may have been used in non-COVID-19 contexts, we also filtered for research published in 2020 or later when identifying COVID-19-related research.

Statistical analyses
Statistical analysis was performed using Stata version 17 (StataCorp, 2021).The Stata script used for the following analysis has been made publicly available (Alperin, Shores, et al., 2023).
Throughout our analyses, we examined changes in preprint media coverage in terms of proportions, identifying research mentions during the study period, rather than the result of changing journalistic practices.For ease of reading, we use the term "share of preprint mentions" to refer to the proportion all research mentions that focused on preprints and "share of WoS mentions" to refer to the proportio that focused on WoS research.
To answer RQ1, we created a model (Equation ( 1)) to estimate the degree to which medRxiv a COVID-19 contributed to changes in the volume of media coverage of preprints after 2019.
Disentangling any change in preprint coverage due to the launch of the server and the onset of the pandemic was necessary as the creation of medRxiv preprints in 2019 (Kaiser, 2019) coincided closel with the start of the COVID-19 era.As such, in Equation ( 1), we regressed a binary indicator coded a if the media mention referenced a preprint and coded as 0 otherwise against time, encoded as linear d since Jan 1, 2014 and allowed to be identified with 3rd-order polynomial trends ( through ), with each vector of 3rd-order polynomial terms estimated in both the pre-COVID-19 era and COVID-19 e ( interacted with the vector of time trends).We differentiated pre-COVID-19 from COVID-19 era mentions through a binary indicator, coded as 1 if the preprint was mentioned in a media story publish after January 10, 2020 (i.e., when the WHO first used the term "2019-nCoV" to describe the novel coronavirus; WHO, 2020), and coded as 0 otherwise.We modeled the period between the first media mention of a medRxiv preprint (i.e., on July 23, 2019, which postdates the launch of the site on June 2 2019 by about one month) and the WHO's statement as a linear intercept shift ( ).In practice, this variable allowed us to differentiate the change in preprint mentions that occurred with the introductio of medRxiv before (but close to the onset of) COVID-19 from the effect of COVID-19 itself.Similar we modeled the mentions of preprints with titles that included COVID-19-related language (i.e., "sars where the key difference is that we identify changes in the pandemic era without and with COVID-19-related titles ( and , respectively) for the four preprint servers (j=1 through J=4), the four outlet topics (j=1 through J=4), and three outlet types (j=1 through J=3).

Results
4.1 Has the share of preprint coverage in the media increased during the COVID-19 pandemic?
Our models suggest that the annual number and share of preprint mentions increased slowly fr 2014-2019, then increased dramatically in 2020-2021 (Table 2, Figure 1).However, even during the pandemic period, preprint mentions made up only a small subset of media coverage of research, at les than 5% of all mentions of research.We also saw evidence of a shift in which servers received the mo attention during the pandemic.Before the introduction of COVID-19, most mentions of preprints cite preprints posted to arXiv or SSRN; yet during the pandemic, bioRxiv and medRxiv became the most frequently mentioned servers.With respect to medRxiv preprints, we found that the onset of COVID-19 increased the share preprint mentions in the media, beyond any increase due to the launch of the server in 2019.
Specifically, we estimate that, prior to the introduction of medRxiv and COVID-19, the share of preprints mentioned in the media was increasing at a glacial pace (an annual rate of 0.21 percentage points; p-value<0.000;95% CI [0.13 -0.29]; see solid gray line, Figure 1).When medRxiv was introduced, the share of preprint mentions did not change (estimated decrease=0.005percentage point scholcomml p-value=0.957;95% CI [-0.17 -0.16]).In contrast, the share of preprint mentions increased by an estimated 2.58 percentage points after the onset of the pandemic (p-value<0.000;95% CI [2.45 -2.70 see solid blue line).This significant but modest increase applied to all preprint mentions, but masks large differences in the proportion of preprint mentions between COVID-19-related and non-COVID 19-related research during the pandemic.Indeed, our model strongly suggests that preprints played a far greater role in media coverage COVID-19 specifically rather than in coverage of other topics.This can be seen from the "COVID-19 line (in fuchsia) in Figure 1, which represents the estimated share of preprint mentions among all the mentions of COVID-19-related research (i.e., both preprints and WoS articles that included COVID-1 related language in the title).We estimated an increase in these COVID-19-related preprint mentions 12.94 percentage points (p-value<0.000;95% .04]), a large increase relative to predicte preprint mentions based on pre-COVID-19 trends (gray dotted line).We explore coverage of non-COVID-19 preprints in more detail in Section 4.2.
We further tested whether any changes in the share of preprint mentions seen during the pandemic could be linked to changes in mentions of WoS research during this period.We implemente this test by comparing growth rates of media mentions for preprints and WoS research over time.Giv that preprint mentions comprised only about 2 percent of all mentions in our sample and to place preprint and WoS mentions on a common y-axis, we plotted preprint and WoS mentions as growth ra Growth rates for preprint and WoS mentions were each calculated using the total number of mentions the first 28 days of our data beginning with Sunday (i.e., January 5, 2014).These mentions in the first days comprised our "base rate," and the total number of mentions in each sequential 28 days were the scaled by that base rate.
Here, we find that the rise in the share of preprint mentions that took place during the pandem was not simply an artifact of a decrease in WoS mentions.As can be seen from Figure 2, WoS mentio  subsequent monthly mentions are relative to this base period.For example, a 28-day media mention count of " means that media mentions in that 28-day period were two times larger than media mention counts from Marc 2014 to March 29, 2014 (i.e., the initial 28 day period).The fuchsia and gray lines indicate 28-day preprint WoS media mentions, respectively, in this relative metric.

Do changes in media coverage of COVID-19-related preprints extend to coverage o preprints on other topics?
Our results suggest that the onset of the pandemic not only increased media attention to COVI 19-related preprints but may have also decreased attention to preprints on other topics.Among all research that excluded COVID-19-related language (solid gray line, Figure 1), we found that the shar of preprint mentions during the pandemic decreased by 0.18 percentage points, although this decrease was not significant (p-value=0.129;95% CI [-0.42 -0.05] 3, 2021, if the pandemic had not occurred, we would have expected the share of preprint mentions to 2.58 percentage points (dashed gray line, Figure 1); yet the observed share of non-COVID-19-related preprint mentions comprised only 0.86 percentage points of all media mentions, a difference of 1.71 percentage points from what would have been expected (p-value<0.000;95% ).This l result suggests that the pandemic may have shifted media attention away from preprints about non-COVID-19-related topics by modest amounts.In effect, our results suggest that COVID-19-related preprint mentions eclipsed pre-pandemic preprint mentions.
Looking at the number of preprint mentions by server, we observed that there was no increase non-COVID-19-related preprint mentions in the pandemic for any server (Figure 3).All point estimat were trivially small-about 0.7 to 1.8 fewer mentions per day, on average-and not statistically significantly different from zero (p-values range from 0.217 to 0.699).For articles that included COVID-19-related language in the titles, there was an average increase in daily media mentions of bioRxiv and medRxiv preprints-of 6.2 (p-value<0.000;95% CI [5.34 -6.95]) and 19.2 (p-value<0.0 95% .08]),respectively-and a significant decrease in average daily media mentions for arXiv and SSRN-of -2.7 (p-value<0.000;95% CI [-3.75 --1.68]) and -1.6 (p-value<0.000;95% CI 2.69 --0.51]) mentions per day, on average.In total, for the 511 days in the pandemic era in our samp this amounted to an increase of about 9,800 total mentions of medRxiv preprints and 3,170 total mentions of bioRxiv preprints.
It is important to note that the declines in mentions of arXiv and SSRN preprints were only significant for preprints that included a COVID-19-related keyword in the title.That is, the media we less likely to mention preprints from these servers that were about COVID-19; instead, when communicating about pandemic research, they tended to mention bioRxiv or medRxiv preprints.Thes results suggest that the media drew on the servers they expected would house the research most releva  Finally, we tested how preprint mentions changed across media outlets with four different top foci (i.e., General News, Science/Technology, Health/Medicine, Other) or of different types (i.e., lega peripheral, or non-journalism).For mentions of COVID-19-related research, we found that outlets in four topic categories increased their preprint coverage dramatically during the pandemic, but to differ extents.Increases ranged from 8.3 percentage points (Science/Technology) to 15.6 percentage points (Health/Medicine) and were all statistically significant (p-value<0.000for all coefficients) (Figure 4).Changes in the share of mentions for non-COVID-19-related preprints were trivial, with only the "Other" category seeing a small but statistically significant increase (0.9 percentage points).Finally, to provide a better sense of the nature of the outlets that frequently rely on preprints, w identified the 25 media outlets whose coverage included the largest share of research mentions in general (i.e., mentions of preprints and WoS outputs) and calculated their share of preprint mentions both before and during the COVID-19 era (Table 6).The list represents about 75% of all research mentions in our sample and includes a mix of legacy media, such as BBC News and The New York Times, and peripheral outlets, such as Reason or The Conversation.Several non-journalism outlets als appear on the list, mostly services such as EurekAlert! and Newswise, which do not publish original articles but distribute science press releases (many of which include mentions of new research).Amon outlets that tended to cover a high proportion of preprints in general, the US libertarian magazine Reason stood out, mentioning approximately one preprint for every three WoS outputs-far more tha any other outlet in our sample prior to the COVID-19 pandemic.Interestingly, the outlet's share of mentions actually decreased slightly during the pandemic, from 27% to 24%.Among the outlets that saw the largest increase in their share of preprint mentions, the peripheral Health/Medicine outlet New Medical topped the list, with essentially no preprint mentions before the pandemic but a share of 43% during the pandemic.Several major legacy General News outlets, such as BBC News, The Daily Mail The New York Times, and The Guardian, also saw notable increases in preprint coverage, moving from minimal use of preprints to covering about one preprint for every four or five mentions of research.
Although some specialized Science/Technology outlets (e.g., Scientific American, Phys.org) increase   A key finding from our analysis is that the volume of preprint media coverage increased by roughly fourfold in the pandemic period, a clear break from the slight but steady upward trend that preceded it.Virtually all of this increase was driven by coverage of COVID-19-related preprints, with little change in coverage of preprints on other topics.Although coverage of peer reviewed research continued to exceed preprint coverage-even during the height of the crisis-the growth in coverage preprints seen during this period may imply a shift in journalistic norms and practices, including a changing outlook on preliminary, unvetted research and its reporting.
At the same time, however, we observed a slight (but nonsignificant) decrease in coverage of non-COVID-19-related preprints during the pandemic.This lack of coverage of non-COVID-19-relat preprints may simply be the result of outsized media attention to COVID-19 in general (i.e., not just COVID-19-related preprints), which may have come at the expense of coverage on other topics.Yet, could also indicate that the surge in preprint coverage observed during the pandemic was a temporary change-a break from established norms that journalists made to cover a rapidly evolving crisis, rathe than a true shift in practice.More research is needed to assess the degree to which increases in preprin Interestingly, the sharp rise in preprint coverage seen during the pandemic was most pronounc for health and medical outlets, which appear to have been resistant to covering preprints until relative recently.While outlets specializing in other topics, such as science and technology, covered preprints least occasionally before the pandemic, our findings suggest that, for health and medical outlets, the crisis seems to have created something closer to the "paradigm shift" described by journalists in previous research (Fleerackers et al., 2022a).Preprints were barely mentioned in health and medical outlets up until 2019-even after medRxiv was launched-but become a frequent source of coverage these outlets after 2020, particularly when reporting on COVID-19.Again, more research is needed to assess whether this trend will continue beyond the pandemic.
The factors that motivated health and medical journalists to adapt their practices during COVI 19 also remain unclear.While the medical nature of the crisis likely played a primary role, at least som of this shift may be linked to a parallel shift in preprint use among health and medical scholars themselves.Like journalists, researchers in these areas have historically been hesitant to post or cite unreviewed research (Flanagin et al., 2020;Maslove, 2018), but became active users of preprints duri the pandemic (Fraser et al., 2021;Waltman et al., 2021).Since journalists who report on research rely heavily on interviews with scientific experts (Schultz, 2023), changing attitudes toward preprints amo medical scientists would likely affect reporting practices on medical and related issues.It is possible, other words, that the uptake of preprints by medical and health outlets reflects the growing acceptance preprints within the medical and health sciences.This may also be true of preprint-based journalism more broadly, as preprint adoption also grew during the study period.Waltman et al. (2021) report tha the number of preprints in 2020 was about 150% larger than the number of preprints in 2015, while Penfold and Polka (2020)-working with data from PubMed and 10 preprint servers-found that the However, just because more preprints are becoming available doesn't mean journalists will automatically cover them.By covering preprint science, journalists may-potentially-be adapting th own norms to follow those of scientists.
In terms of outlet types, we found that both traditional, legacy outlets (e.g., The New York Tim and peripheral media outlets (e.g., News Medical) were covering preprints to some extent before the pandemic, but greatly increased this coverage during the crisis.The similar pattern seen for the two outlet types is surprising, as peripheral media outlets are often conceptualized as following different norms, ethics, and practices than legacy media and as being less beholden to professional guidelines, such as those that urge journalists to avoid covering unreviewed research (Froke et al., 2020).Our findings thus align with previous scholarship which has suggested that the boundaries between legacy and peripheral journalism are blurring and that categorizing outlets this way may no longer be meaningful (Deuze & Witschge, 2018;Witschge et al., 2019).While more research is needed, it is possible that such blurring boundaries are especially likely in contexts where professional norms are n yet well-established, such as when reporting on preprints.Future studies could explore whether the similarities we observed in preprint coverage among peripheral and legacy outlets also apply to larger and more diverse outlet samples, or to other situations in which journalistic practices are rapidly evolving.
Collectively, our findings provide some of the first evidence that journalists are increasingly using preprints-at least in some areas-and that the pandemic has greatly accelerated this use.
However, this conclusion should be considered alongside several limitations.First, there are known challenges of working with Altmetric data to identify media coverage of research, particularly in languages other than English (see Ortega, 2020a, for a review).We have attempted to mitigate these challenges by working with a predefined set of English-language media outlets, as recommended in  (Fleerackers et al., 2022b).Yet, while the restricted nature of our sample of outlets strength of this study, it is also a limitation, as the patterns we observed among these 94 outlets may n apply to those that less frequently report on research or do so in different languages.Replicating our findings with a larger set of outlets or through complementary data gathering methods would be a fruitful avenue for future research.
We also aimed to make our findings more robust by contextualizing any increases in preprint media coverage alongside changes in coverage of peer reviewed research during the same time period To do so, we relied on Web of Science data, which is biased towards studies from scientific, technica and medical disciplines and published in English-language journals from the Global North (Alperin e al., 2014;Mongeon & Paul-Hus, 2016).Given our study's focus on English-language media outlets, t impact of the language bias is likely minimal (i.e., it is relatively unlikely that a journalist working fo an English-language outlet would cover non-English research).However, the disciplinary and geographic biases are limitations of our study that should be kept in mind when interpreting the result Finally, the nature of our data only enabled us to explore changes in preprint media coverage from 2014 through the first year and a half of the pandemic, leaving many questions unanswered abou what the future will bring.We hope that scholars will build on our findings to provide further insight into the implications of the preprint coverage seen during COVID-19 will persist long-term.
as medRxiv and bioRxiv becoming key disseminators of pandemic research (Els 2020; and changes to the digital communication landscape contributing to declines in specialized science journalism around the world for arXiv and SSRN sometimes differed from the dates visible on the server web page by a few days-a limitation that we kept in mind during data cleaning and analysis.
news mentions of peer-reviewed research counts.Specifically, we compared mentions of preprints against mentions of all research i our sample (i.e., mentions of preprints and WoS research).Doing so allowed us to control for any fluctuations in the volume of preprint mentions that were created by changes in Altmetric's approach or a related term) as a linear intercept shift ( ).This last variable is important, as it allowed u to differentiate the change in preprint mentions for COVID-19-related topics in the media from chang in preprint prints in the COVID-19-era but not about COVID-19 topics.Lastly, to adjust for seasonali and periodicity effects we controlled for week-of-year intercepts ( ; e.g., first week of 2014) and da of-month effects ( ; e.g., Tuesdays in January).In practice, controlling for periodicity and seasona had little effect on model parameters but allowed us to rule out correlations between period effects an the onset of COVID-19.Equation (1)Next, we estimated separate regressions that allow us to test whether changes in preprint mentions vary across (a) preprint servers, (b) media outlets focused on different topics, and (c) media outlets of different types.Because preprint servers necessarily represent preprint mentions, we discard mentions of articles from WoS and collapsed the data so that we could observe counts of preprint mentions by day and identify any changes in these counts among the four servers (RQ2).To identify changes among the four media outlet topics and three outlet types, respectively, we kept the data as described previously, with each row representing a unique media mention of a preprint or WoS articleTo identify changes in the share of preprint mentions across the four media outlet topics (RQ3), we focused on the three most prevalent topics in our sample-Health/Medicine, General News, and Science/Technology-and an "Other" category that included a variety of other topics (e.g., Business, Lifestyle, Explicit Point-of-View).Because we are exploring heterogeneity across servers, topics, and types, we simplified the regression Equation (1) by replacing the linear, quadratic, and cubic variables with month-by-y fixed effects ( in Equation (2) below).These fixed effects control for time trends non-parametrical in a similar way as in Equation (1) but without the need to directly identify the time effects (i.e., these partialled from the regression equation as "nuisance parameters").Our estimation equation for these heterogeneous changes therefore appears as follows:

Figure 1 .
Figure 1.Share of preprint mentions per day.
2.3 percentage points between May 2014 to September 2019, and this pace of grow remained relatively unchanged after COVID-19 began and started to garner media attention.In contra preprint mentions had increased by about 5.7-fold by the time of the WHO's announcement about "2019-nCoV" in January 2020, but skyrocketed to a 30-fold increase at the height of the pandemic in May 2020.This figure thus shows that the increase in the proportion of preprint mentions during the pandemic era was driven almost entirely by an increased number of preprint mentions and not a decre in the number of WoS mentions.

Figure 2 .
Figure 2. Growth rates of mentions for preprints and Web of Science (WoS) articles over time.
of interest.It also suggests that COVID-19-related coverage tended to focus on medical aspects of the pandemic and less so on social or economic aspects.

Figure 3 :
Figure 3: Change in the average daily count of preprint mentions, by server. scholcomml

Figure 4 .
Figure 4. Change in the average daily share of preprint mentions, by topic and outlet.
preprints during COVID-19, these increases tended to be less pronounced than those seen among the major General News outlets.
argued that preprint coverage during the pandemic constituted a break from journalism norms and a paradigm shift in how emergent research is reported on and shared with the public (Burk 2021;Makri, 2021).Using longitudinal data from the Web of Science (WoS) and four preprint server this study sought to establish whether, in what ways, and to what extent this is the case.By identifyin how the volume and nature of preprint media coverage has changed over time and what role the pandemic has played in this change, our study makes an important contribution to our understanding journalists' use of preprints-a topic about which much has been written, but very little is actually known.
in the coming years, as media outlets and scientists turn their attention away from COVID-19 and toward other issues.

Table 1 .
Nature of media outlets that frequently cover Web of Science research

Table 2 .
Number and share of preprint mentions * partial year ). Model-based estimates suggest that by Ju

Table 6 .
Largest 25 media outlets based on mentions and the share of mentions that include preprints

Outlet's Share of All Research Mentions Outlet's Share of Preprint Mentions
Top 25 outlets shown based on their share of mentions of Web of Science outputs and preprints, representing about 75% of all mentions in our sample.Column 2 (Outlet's Share of All Research Mentions) shows each outlet's share of all mentions of Web of Science outputs and preprints.Columns 3 and 4 (Outlet's Share of Preprint Mentions) shows the share of each outlet's mentions of preprints prior to COVID-19 and the share of each outlet's mentions for preprints during COVID-19.Grayscale conditional formatting is based on column 2 alone and then columns 3 and 4 jointly.