From amazing work to I beg to differ - analysis of bioRxiv preprints that received one public comment till September 2019

While early commenting on studies is seen as one of the advantages of preprints, the nature of such comments, and the people who post them, have not been systematically explored. We analysed comments posted between 21 May 2015 and 9 September 2019 for 1,983 bioRxiv preprints that received only one comment. Sixty-nine percent of comments were posted by non-authors (n=1,366), and 31% by preprint authors (n=617). Twelve percent of non-author comments (n=168) were full review reports traditionally found during journal review, while the rest most commonly contained praises (n=577, 42%), suggestions (n=399, 29%), or criticisms (n=226, 17%). Authors’ comments most commonly contained publication status updates (n=354, 57%), additional study information (n=158, 26%), or solicited feedback for the preprints (n=65, 11%). Our study points to the value of preprint commenting, but further studies are needed to determine the role that comments play in shaping preprint versions and eventual journal publications.


Introduction
The practice of sharing preprint, authors' versions of non-peer reviewed manuscripts, is on the rise. Once almost exclusively limited to the fields of high energy physics and economics on arXiv, RePec and SSRN preprint servers, preprints have gained much ground across a wide range of disciplines. 1-5 Meta-research on preprints, however, remains scarce and mostly limited to the explorations of two servers: arXiv and bioRxiv. This limited research has shown that citation of preprints in scholarly literature had increased, and that articles first posted as preprints had higher citations rates and Altmetric scores than those not posted as preprints. [6][7][8][9][10] Additionally, only minimal changes were found between preprints and the versions (of record) published in journals. [11][12][13] In 2020, the COVID-19 pandemic led to a large increase in the posting of preprints, as well as scrutiny and the number of comments they receive on both social media platforms (e.g. Twitter) and comment sections of servers on which they are posted, with some comments prompting preprint retractions. 13,14 However, despite 70% of preprint servers allowing users to post comments on their platforms, 15 and researchers perceiving the possibility of receiving comments as one of the advantages of preprints compared to traditional publishing, 16 no research, to the best of our knowledge, has examined the nature of comments or actors involved in preprint commenting. In this study, which originated before the COVID-19 pandemic, we aimed to conduct an exploratory analysis of comments left on the bioRxiv servers. Furthermore, as at that time, the majority of preprints with comments, only a had single public comment, we decided to focus exclusively on such preprints.

Methods
We conducted a cross-sectional study of bioRxiv preprints that received a single comment on the bioRxiv platform between 21 May 2015 (the earliest date available through the bioRxiv comment API) and September 9, 2019 (study data collection date).

Data Collection
As part of our Preprint Observatory project, 17 MM collected all available comments and related metadata using the bioRxiv comment API. 18 Collected data included DOIs, links to the preprints, commenter username (i.e. proper name or created name), date and time the comment was posted, and the comment text. Data was stored and managed in Microsoft Excel.
Initial extraction covered 6,454 comments posted to 3,265 unique preprints (which represents 6% of 56,427 preprints deposited on or before 9 September 2019; however, as the bioRxiv comment API did not provide access to comments posted before May 2015 this percentage does not represent the exact prevalence of commenting till September 2019). Of the 3,265 preprints with comments, 1,983 (60%) received only a single public comment, and we decided to focus on them in our first exploratory study. We enriched the data of those 1,983 comments by adding preprint authors, subject area classification, word count for comments, and published date of preprints as reported in Crossref and extracted during our Preprints Uptake and Use Project. 19 Finally, we classified the commenters as authors or non-authors, and for authors we also captured their byline order (i.e. first, last or otherdefined as neither first nor last).

Data Analysis
Comments' content classification was inductively derived using an iterative process of open-coding and constant comparison. 20 Initial categories were devised by LAM based on a sample of 35 comments, and later expanded by MM using a sample of 200 comments. This initial categorization revealed distinct differences in content of comments left by authors and those left by non-authors.

Identity of the commenter
MM and JC first checked whether each comment had been posted by an author of the preprint. This was done by comparing if the posted username matched any of the names of the preprint authors (and was helped by a simple full username search with any of the authors' names -the simple search detected only 301 out of our later manually detected 617 cases as usernames often contained initials or symbols that were not an exact match with the names used in the preprint author byline). If the username was a pseudonym or a lab name, we classified the commenter as a non-author. During coding we amended our initial classification if the comments' contents provided identification of the commenter.

Content analysis
After grouping comments by the commenter type (author or non-author), MM, JC, and LM independently categorized all comments. Each comment could be classified to multiple categories. The only exception to this rule was if the comment was similar in structure and content to a full peer review report that is traditionally submitted as part of a journal peer review process. In those cases, we decided not to analyse the full contents of such review as they were often authored by multiple authors, contained multiple review reports, or included links to detailed reports posted on other websites. For all other comments, we classified the type of content they contained, but not the number of instances of each type they contained. For example, if the content type was a suggestion, we did not count the number of suggestions made in the comment, i.e. one suggestion for formatting a table, another for a figure, and additional suggestion for expanding the literature section. The three coders held weekly meetings online after coding batches of 200 to 300 comments. These meetings allowed for comparison of categorizations, resolving of differences, clarification of existing or introduction of new categories. Before each meeting, MM or JC would compare differences between the coders. If only one coder categorized a comment differently (e.g. did not mark a specific category) MM or JC re-read the comment, ruled on the found difference, and recorded the final categorization in the main database. When a single coder indicated a category the other two did not, or all coders disagreed on the categorization, the comment was marked and discussed at a weekly meeting until consensus was reached. We observed that our initial disagreement was most common for comments we categorised as suggestions or criticisms, and where tone, rather than content, dictated the categorisation (e.g. Comment 1: "Great to see more well assembled lizard genomes, but it would have been nice to cite the more recent assemblies of…"; Comment 2: The authors state in the introduction that [method] has not been yet been reported". I beg to differ… following models have been generated and published… [provides references to 3 studies]. We categorised comment 1 as suggestion, and comment 2 as criticism, based on their tone even though they both provided authors with additional references. As comments could have multiple categories, comment 1 was also classified as a praise).
While methods exist for calculating inter-rater reliability for data that could be classified as belonging to multiple categories, 21 after each weekly meeting we only stored our agreed upon classification, so we cannot reconstruct the initial disagreements to produce such rating. It was also not our goal to study the difficulty of classifying comments, but rather, using a consensus approach, to explore the different types of comments posted on bioRxiv (before the pandemic).
Our final classification tree and an example comment for each category are shown in Supplementary Table 1, and all comments and our assigned categories in our project's database. 17 Finally, to see if comments of preprints that received a single public comment, and that were the focus of our study, differed from first comments left for preprints which received more than one comment, we also randomly chose 200 of the latter preprints and analysed their first comments. This sub-analysis showed that all of these comments could be classified under our identified comment types.

Statistical Analysis
We report absolute numbers and percentages for types of comments, and medians and interquartile ranges (IQR) for number of words per comment, number of comments per preprint and days from posting of the preprint to the comment. As number of words and days are integers, when medians or 25 th and 75 th percentiles had decimals, we rounded them to ease readability. Note on word count: As the texts of the comments were retrieved in HTML syntax, we replaced the hyperlink syntax (e.g. <a….a>) with the word LINK and counted it as only one word. When references were written out as Author et al., year, or PMID: number, those were counted by as many words as were written. Differences in number of words and time to publication between author and non-author comments were tested with Mann-Whitney test. We did not use time-to-event analysis as information for comments posted before May 2015 was not available through the API. Analysed comments came from all 27 bioRxiv subject area classifications (assigned by the authors during preprint upload, Supplementary Table 2), but despite there being slight differences in the number of comments per category; due to our sample size, focus on identifying comment types, and the fact that perceived preprint impact, as well as authors' prestige, country and other factors might influence the posting of comments, we chose not to explore differences in commenting between subject areas. All analyses were conducted using JASP version 0.12.2.

Results
Between 21 May 2016 and 9 September 2019, 1,983 bioRxiv preprints received a single public comment on the bioRxiv website. More than two thirds of those comments were posted by non-authors (n=1,366, 69%) while the remainder were posted by the preprint's authors (n=617, 31%, Table 1). Overall, the non-author comments were longer than comments posted by the authors (median number of words was 38, IQR 17 to 83, for non-authors vs 18, IQR 11 to 32, for authors, Mann-Whitney test, P<0.001), and they were posted a median of 23 days (IQR 3 to 117) after the preprints. In comparison, authors' comments were posted after a median of 91 days (IQR 3 to 23, Mann-Whitney test, P<0.001). Differences between types of comments with regards to number of words and days between preprint and comment publication are shown in Table 1.
Twelve percent of non-author's comments (n=168) were full review reports resembling those traditionally submitted during the journal peer review process. They were authored by either single individuals (n=87, 52%) or group of authors (n=81, 48%, Supplementary Table 3).
The latter most commonly published their review following a journal club discussion (n=41, 51%). Comments not resembling full peer review reports most commonly praised the preprint (n=577, 42%), made suggestions on how to improve it (n=399, 29%), or criticized some aspect of the preprint (n=226, 17%, Table 1). Praise was most commonly found alongside suggestions (n=201; 50%) or comments asking for clarifications (n=101; 47%), and least commonly alongside comments that criticised the preprint (n=70, 31%), reported issues (n=38, 29%) or that inquired of the preprints publications status (n=9, 26%, Supplementary Lastly, we present some examples of the comments we classified as belonging to the "other" category (a full list of those comments available on our project website). There were three comments that raised research integrity issues (a possible figure duplication, an undeclared conflict of interest, and use of bots to inflate paper download numbers). There were also comments that raised personal issues. In one comment a parent requested more information on a rare disease (covered by the preprint) that was affecting their children, and in another case an individual inquired about possible PhD mentors for a topic related to the preprint. There were also comments that touched upon the culture of preprinting, with one comment asking authors to include brief summaries of what had changed between preprint versions, another expressing a view that preprints make traditional publishing redundant, and one praising authors for replying to questions they asked through email. Similarly, one comment we classified as full peer review report, also included a statement of hope "to get more comments on bioRxiv…prior to submission to a peer reviewed-journal" as they would "rather have a revised pre-print than a correction / retraction" in a journal.
Authors' comments most commonly contained updates about the preprint's publication status (n=354, 57%), additional information on the study (n=158, 26%), or solicited feedback for the preprint (n=65, 11%, Table 1). Of all authors' comments, 321 (52%) were posted by the first author of the preprint, 209 (34%) by the last, and 65 (11%) by other authors (we could not identify the byline order for four percent of comments, n=22, as the registered username was either a pseudonym, e.g. W1ndy, or a lab name, e.g. Lewy Body Lab). A small percentage (n=29, 5%) of author comments were replies to feedback authors received elsewhere, e.g. during peer review or through personal emails (Supplementary Table 4). Lastly, as above, we present few examples of authors' comments classified as belonging to the "other" category (with full list of those available on our project website). In five comments authors requested suggestions on where to publish their preprint, and in one comment authors mentioned that an editor saw their preprint and invited them to submit it to their journal. In one comment, an author alerted the readers of an error in a figure and also playfully chided (using a smiley emoticon) the co-author for hastily uploading the files before checking them. In another co-authors alerted readers that the preprint had been posted without the approval of the co-authors and urging the scientific community to ignore this version (to date the preprint in question has not been retracted). 22 Finally, in one example (of a comment classified as a publication status update), the author said they did not plan to submit the preprint to a journal, as publishing on bioRxiv makes it freely available to everyone.

Discussion
Our study of preprints posted on the bioRxiv server that received a single public comment found that more than two thirds of those comments were left by non-authors and most commonly praised, offered suggestions, or criticised the preprints. Additionally, almost a sixth of these non-author comments contained detailed peer review reports akin to those traditionally submitted during the journal peer review process. These findings support previous studies that showed the opportunity to receive feedback was perceived as one the benefits of preprints compared to traditional publishing. 23 However, we also found that less than ten percent of all bioRxiv preprints received public comments before the COVID-19 pandemic. This low prevalence of scholarly public commenting has been previously observed for post-publication commenting of biomedical articles, and was the reason for discontinuing PubMed Commons, the National Library of Medicine's commenting resource. 24 Similar low prevalence of postpublication commenting has also been found across disciplines on PubPeer. 25 Nevertheless, as has been stated for those services, 24, 25 some of the comments have been crucial for scholarly debates and even led to retractions of papers, a practice also observed for bioRxiv preprints. 26 In our study, we observed that eleven percent of authors' comments were actively inviting others to comment on their preprint, with one comment explicitly stating that they would rather make changes to the preprint than to a version published in a journal.
The lack of traditional peer-review is often perceived as the biggest criticism of preprinting, alongside cases of information misuse 27 and posting of low-quality studies. Thus, bioRxiv (alongside arXiv and medRxiv) have displayed clear disclaimers for COVID-19 preprints that state preprints are "preliminary reports that have not been peer-reviewed" and they should not be "reported in media as established information". 28 Related to this criticism and the benefits of preprint commenting, there has also been a rise of specialised preprint review services (e.g. PreReview, 29 Review Commons, 30 Peerage of Science) 31 or overlay journals (e.g. Rapid Reviews, 32 Discreet Analysis) 33 aimed at providing expert reviews for preprints, or endorsement of preprints (e.g. Plaudit). 34 On a similar note to emphasize the possible role that commenting has in the scientific discourse, reference software Zotero can display references that have PubPeer comments, 35 and a recently launched biomedical search engine PubliBee, 36 implemented (up)voting of comments.
Alongside posting of full peer review reports, our study also confirmed other known practices and potential benefits associated with the preprinting culture. For example, using preprints as final publication outputs, soliciting or being invited by editors to publish studies posted as preprints, calling out suspected research integrity issues, engaging in discussion or proposing collaborations, as well as publishing of peer review reports from those training on how to conduct peer review or from journal club discussions. These findings may provide authors encouragement to consider or continue depositing preprints.
Furthermore, we have shown that almost a third of the comments were left by the authors of the preprints, and their comments were mostly updates of preprints' publication status or additional information about the studies. Authors' comments were also in general left after a much longer period than those of the non-authors. This aligns with found median times of 166 to 182 days between posting a preprint on bioRxiv and publication of that study in a journal, 37,38 which were similar to the median time of 172 days we found for comments on publication status updates.

Limitations
We did not attempt to define if non-authors that posted comments were indeed peers, nor did we compare their expertise or publication records with those of the authors of the preprint on which they were commenting. We are also aware that some comments were left by patients, students and the individuals that stated a lack of expertise in the field. However, defining and soliciting feedback from a competent peer is known to be difficult, 39 with previous studies demonstrating minimal agreements between peers assessing the same study. 40 Furthermore, we did not attempt to define the quality of the comments, nor if the contents of comments (e.g. raised criticisms or suggestions) were indeed valid. We also did not check if comments led to changes or updates of the preprints or eventual published manuscripts, nor if the authors were even aware of them. Regarding the latter, as we analysed preprints that only had a single comment, none of the authors used the preprint platform to reply to them. We however did find that five percent of authors' comments were replies to comments or peer reviews they received elsewhere, and we did encounter an example of a non-author comment that indicated they communicated with the authors by email. The purpose of our research was not to provide external validity of the claims stated in the comments, but rather showcase, for the first time, the most common types of comments left on the platform (before the COVID-19 pandemic). Our study is also limited in that we did not analyse discourse that might occur in preprints which received multiple comments. However, we did analyse the first comments of a random sample of 200 of such preprints to confirm that they do fall within the categories analysed here. Finally, we acknowledge that our backgrounds are not in biology, and that this may have affected our ability to make a clear distinction between some comment types, especially in distinguishing between suggestions and criticisms. We however feel that the observed differences in the number of words between our identified comment types, as well as prevalence or praise which is more common for comments that contained suggestions than criticisms, provides support for our categorization.

Future Directions
In the last year, the landscape of scientific communication has been significantly altered by the COVID-19 pandemic. Looking ahead, in our future research, we will examine comments and exchanges that occur when multiple comments are posted for the same preprint, as well as the possible difference in types of comments during the COVID-19 pandemic. If warranted, additional research and discourse analysis should be conducted to better understand preprint commenting, as well as the factors associated with authors willingness to respond to comments or incorporate feedback into preprint updates or submitted manuscripts for publication.
Nevertheless, as we have previously advocated describing changes between preprint versions, as well as between preprints and published versions of record, 41 perhaps it is also time for commenting sections and social media platforms to implement categorization of comments and even quality evaluation or upvoting of comments. 42

Conclusions
Despite a small prevalence of commenting occurring on bioRxiv before the COVID-19 pandemic, the non-author comments contained many aspects of feedback seen in traditional peer review, while those by the authors most commonly addressed the publication status of the preprints or provided additional information on the study. Based on the types of comments we identified, bioRxiv commenting platform appears to have potential benefits for both the public and the scholarly community. Further research could measure the direct impact of these comments on later preprint versions or journal publications, as well as the feasibility and sustainability of maintaining and moderating commenting sections of bioRxiv or other preprint servers. Finally, we believe that user-friendly integration of comments from server platforms and those posted on social media (e.g. Twitter) and specialized review platforms would be beneficial for a wide variety of stakeholders, including the public authors, commenters, and researchers interested in analysis of comments.