ABSTRACT
This study uses statistical methods to analyze the differences in peer review periods between the processes of preprint based on authors’ order of posting published papers in a given journal. For this purpose, we developed a web crawler that downloaded metadata from bioRχiv and PLOS ONE. The average publication rate of papers posted in bioRχiv accounts for 40.67% of the total thus far. These papers were published in 1,626 academic journals. The journal that published most of these papers was PLOS ONE. Analysis of Peer Review and Acceptance Time (PRAT) of papers published in journals via preprints revealed the timing of posting papers related to these intervals. The median of PRAT of f < s was 110 days, and the median of PRAT of s < f was 139.50 days. If the date order of publishing a paper is the posted date of the preprint before the date of submission to the journal, the PRAT tends to be shorter than when the date order of publishing a paper is the posted date of the preprint after the date of submission to the journal.
1 Introduction
Peer review is an important process that ensures the quality of a submitted paper [1][2][3][4]. However, this process has been widely criticized due to the delay introduced in publishing new findings [5][6][7][8][9]. Preprints have been attracting attention from researchers as a way of solving this issue [10]. It seems entirely possible, and perhaps inevitable, that the process of peer review will be shortened [11]. A preprint is a complete scientific manuscript uploaded by authors on a public server before formal review [12]. By posting papers on preprint servers, authors can publish draft manuscripts in a very short period after obtaining research results, and they can receive comments and advice on preprints from scientists worldwide. Preprints gained popularity around 30 years ago with the advent of arXiv, an open preprint server widely used in physics and mathematics [13][14][15][16]. Preprints were disseminated by authors over this period to bypass lengthy journal publication schedules [17][18]. Although scientists in other fields have been cautious in adopting the preprint mode of communication, the growth of preprint servers in chemistry and biology suggests that the acceptance of preprints in these may be mounting [19][20]. In recent years, some subject-specific preprint servers have been established. For instance, the website of bioRχiv, a biology preprint server, started in November 2013, it is becoming an increasingly popular platform for authors to share their work at a rate that increases almost monthly [21][22][23]. More than half of the works on the platform were published in journals with high citation frequency in the Web of Science Core Collection [24]. The journal with the most preprints is PLOS ONE [25]. It is a peer reviewed open access mega journal published by the Public Library of Science since 2006, and its scope encompasses the results of scientific research from any scientific or medical discipline [26][27][28].
This study uses statistical methods to analyze the differences in peer review periods between processes of preprint based on authors’ order of posting published papers in PLOS ONE.
2 Methods
First, this study investigated papers posted on bioRχiv. The web crawler visited every page on bioRχiv and downloaded metadata of the title, authors, published digital object identifier (DOI), posted date, and versions of a paper. If the paper was published, it would download the journal title and the published DOI. Next, the metadata was manually verified. The papers were matched with the PLOS ONE database using the keywords of published DOIs. Second, this study investigated articles published in PLOS ONE. The web crawler visited articles of PLOS ONE via bioRχiv and downloaded the received date, accepted date, and published date of these papers, following which metadata of bioRχiv and PLOS ONE was merged. Third, the Peer Review and Acceptance Time (PRAT) was defined as the difference in the numerical value between the accepted date and received date of published articles. The process of authors posting papers was divided into a pattern.
Variable declaration
Four variables were declared as follows: Lowercase italic f is the date when the author should have the posted paper appear for the first time in the preprint server. Lowercase italic l is the date when the author should have the posted paper appear for the last time in the preprint server. Lowercase italic s is the date when the author should have submitted the paper to the journal. Lowercase italic a is the date to have a paper accepted by a journal.
Variable definition
The PRAT of a paper x is days of a minus s; it is defined as equation (1).
3 Results and Discussion
3.1 Posted Papers
The number of papers posted on bioRχiv is increasing rapidly every month (43,812 from November 2013 to February 28, 2019).
Recently, more than 2,000 papers were posted each month, of which 17,818 were published. Most papers are published within six months. Figure 1 shows the status of published and unpublished papers posted on bioRχiv.
Published and Unpublished Papers Posted on bioRχiv
3.2 Published Journals
These papers were published in 1,626 academic journals. The journal (the number and share of these papers) that published most of these papers was PLOS ONE (902, 5.05%), followed by Scientific Reports ‒ Nature (881, 4.94%), eLife (866, 4.86%), Nature Communications (611, 3.43%), Bioinformatics (509, 2.86%), PNAS (487, 2.73%), PLOS Computational Biology (375, 2.10%), PLOS Genetics (333, 1.87%), Nucleic Acids Research (285, 1.60%), Genetics (283, 1.59%), etc. Figure 2 shows the top 20 journals by papers in bioRχiv. This study focuses on PLOS ONE, because it is the journal that has published the most papers.
Top 20 Journals by Papers in bioRχiv
3.3 Analysis of Papers
These results analyzed the process of posting in PLOS ONE via bioRχiv. The average publication rate of papers posted in bioRχiv accounts for 40.67% of the total thus far. However, from 2013 to 2017, the rate was between 58% and 77%. The journal posting the most papers was PLOS ONE. PLOS ONE has published 902 papers from bioRχiv, of which the document type of two papers was a correction. The PRAT of 900 papers in PLOS ONE were calculated according to equation (1). The shortest PRAT (minimum) was 7 days, the longest (maximum) was 562 days, the middle (median) was 116 days, Quartile 1 of PRAT was 82 days, and Quartile 3 of PRAT was 166 days. Figure 3 shows the status of these PRAT. The PRAT were tested because the curve in Figure 3 does not look like a normal distribution. The p-value of the Shapiro-Wilk test was around zero (<3.73e-25) and less than the 0.01 significance level. Consequently, according to the test, it cannot be stated that the analyzed data follow the normal distribution.
Peer Review and Acceptance Time of PLOS ONE
3.4 Process Order
If the difference in the numerical value between date α and date β of paper x is less than or equal to 7 days, but both days were defined as the same date, this case is expressed as α & β.
For example, when this condition is satisfied for the first posted date (f) and the submitted date (s) of paper x, it is defined as equation (2).
If the difference in the numerical value between date α and date β of paper x is greater than 7 days, α and β were defined as a different date, and if date α is earlier than date β, the equation is expressed as α < β. For example, when this condition is satisfied for the first posted date (f) and the submitted date (s), it is defined as equation (3).
Process orders of 900 papers have been calculated in equations (2) and (3); these have been divided into 18 patterns based on variables of f, l, s, and a. These were merged into three groups. Table 1 shows 18 patterns and 3 groups.
Process order of three groups
The PRAT of PLOS ONE by every six months from July 2017 to Jun 2019 was 158, 171, 166, and 157 days. However, the PRAT of this study were pretty shorter than usual case. The median of PRAT of f < s was 110 days, and the median of PRAT of s < f was 139.50 days. The PRAT tends to be shorter than when the date order of publishing a paper is the posted date of the preprint after the date of submission to the journal.
Statistical data of three groups
Figure 4 shows the status of three groups, which were tested because these did not appear to be a normal distribution. The p-values of the Shapiro-Wilk test of f & s, f < s, and s < f were around zero (<1.94e-14, <2.12e-13, and <6.95e-12) and less than the 0.01 significance level.
Peer Review and Acceptance Time of PLOS ONE (days)
Between f < s and s < f was compared because the two groups of authors exhibited different behavior. If the behavior of an author is f < s, the author could expect to receive comments and advice on papers from scientists worldwide before his/her paper is submitted to journals. However, if the behavior of an author is s < f, he/she cannot expect to receive these. Figure 5 shows the PRAT of the first posted date (f) before the submitted date (s) (f < s) and the PRAT of the submitted date (s) before the first posted date (f) (s < f). If the median of f < s is compared with the median of s < f, the test must use the Mann-Whitney U test because f < s and s < f did not present as a normal distribution. The p-value of the Mann-Whitney U test was around zero (<2.07e-7) and less than the 0.01 significance level. It is clear that the median of f < s is different from the median of s < f.
First Posted Date and Received Date
4. Conclusion
This study analyzed the PRAT of papers published in journals via preprints and it found that the timing of posting papers related to these intervals. If the date order of publishing a paper is the posted date of the preprint before the date of submission to the journal, the PRAT tends to be shorter than the date order of publishing a paper whose posted date of preprint is after the date of submission to the journal.
ACKNOWLEDGMENTS
This work was supported by JSPS KAKENHI Grant Number JP19K12707, JP18K11597, and ROIS NII Open Collaborative Research 2019-(19FS02).
Footnotes
tsunoda-h{at}tsurumi-u.ac.jp, yuan{at}nii.ac.jp, nisizawa{at}nii.ac.jp, liuxm{at}mail.las.ac.cn, AMANO.Kou{at}nims.go.jp