Abstract
While we often think of words as having a fixed meaning that we use to describe a changing world, words are also dynamic and changing. Scientific research can also be remarkably fast-moving, with new concepts or approaches rapidly gaining mind share. We examined scientific writing, both preprint and pre-publication peer-reviewed text, to identify terms that have changed and examine their use. One particular challenge that we faced was that the shift from closed to open access publishing meant that the size of available corpora changed by over an order of magnitude in the last two decades. We developed an approach to evaluate semantic shift by accounting for both intra- and inter-year variability using multiple integrated models. Using this strategy and examining year-by-year changes revealed thousands of change points in both corpora. We found change points for tokens including ‘cas9’, ‘pandemic’, and ‘sars’ among many others. The consistent change-points between pre-publication peer-reviewed and preprinted text were largely related to the COVID-19 pandemic. We developed a web app for exploration (https://greenelab.github.io/word-lapse/) that enables users to investigate individual terms. To our knowledge, this analysis is the first to examine semantic shift in biomedical preprints and pre-publication peer-reviewed text, and it lays the foundation for future work to examine how terms acquire new meaning and the extent to which that process is encouraged or discouraged by peer review.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Funded by The Gordon and Betty Moore Foundation, GBMF4552; The National Human Genome Research Institute, R01 HG010067