Evidence provided by high-impact cell culture studies does not support authors’ claims

Background Reliability of preclinical research is of critical concern. Previous studies demonstrate the low reproducibility in research and recommend raising standards to improve reproducibility and robustness. One understudied aspect of this quality issue is the harmony between the hypotheses and the experimental design in published work. Methods and findings In this study we focused on highly cited cell culture studies and investigated whether the claims of the study are backed with sufficient experimental evidence or not. We created an open access database containing all 282 claims asserted by 103 different high-impact articles as well as the results of this study. Our findings revealed that only 64% of all claims were sufficiently supported by evidence and there were concerning misinterpretations such as considering the results of tetrazolium salt reduction assays as indicators of cell death or apoptosis. Conclusions Our analysis revealed an alarming discordance between the actual experimental findings and the way that the manuscript is written to discuss them in highly cited cell culture studies. In order to improve quality of pre-clinical research, we require a clear nomenclature by which different cell culture claims are distinctively categorized, materials and methods sections to be written more meticulously and cell culture techniques to be selected and utilized more carefully.


ABSTRACT INTRODUCTION
There is an alarming concern regarding reliability of the published research findings [1]. This is 44 particularly evident in preclinical studies as the clinical translatability is minimal [2]. This low 45 efficiency in research has been discussed extensively in recent years and the lack of 46 reproducibility and overall quality are agreed upon as the main culprits of the problem [3].
Reproducibility in preclinical research is estimated to be between 10% to 25% [4,5] and the cost of irreproducible research is calculated to be at least 28 billion USD/year in USA alone [6]. There 49 are many factors contributing to this crisis including; lack of robustness, biased design, use of 50 inadequate models (cell line and/or animal), underpowered studies (insufficient sample size), 51 lack of proper controls (positive, negative), poor use of statistics and the absence of 52 replication/confirmation studies [7]. It is important to note that these design problems often 53 expand to questionable research practices such as p-hacking and cherry picking. 54 Scientists agree that the standards for publishing preclinical research must be raised in a way to 55 encourage robustness and rigor [5,8]. Therefore, many aspects of the preclinical study design 56 have been tacked by various studies over the years. However, the question of whether we can 57 trust results of published preclinical studies remain at large. One aspect that has not been 58 investigated before is the compatibility of the way the manuscript was written with the actual 59 experimental design. More specifically, the relationship between the claims of the studies and 60 the evidence provided to support these claims. In this study, we focused on cell culture research 61 and investigated if the evidence provided by high-impact studies sufficiently supports the claims 62 authors asserted in their manuscript. One of the first things we have noticed during our 63 investigation was the inconsistency in the nomenclature. Many claims such as cytotoxicity, 64 viability, growth, and proliferation were used interchangeably by the authors. Moreover, there 65 were several publications in which only one type of evidence (tetrazolium reduction assay 66 results) was provided to assert various claims. When we searched the literature to refine the 67 consensus nomenclature, much to our disappointment, we could not find any. Many of these 68 terms are not considered uncommon, unfamiliar, or vague enough to be defined in high-impact 69 reviews or guidelines, or to be included in the glossary sections of the molecular biology, 70 biochemistry and even cell culture textbooks. Therefore, we decided to define these terms 71 ourselves mostly based on different sections of "Guidance Document on Good In Vitro Method 72 Practices" by OECD [9] which was the only document we find that might be considered as a 73 consensus nomenclature source. We then carried on our analysis accordingly. As a result, this 74 study contains a nomenclature recommendation as well as the analysis of high-impact cell culture 75 studies. 76 The study consisted of three phases. In phase one, we selected high-impact cell culture studies. 78 In phase two, we identified the claims asserted by the authors as well as the evidence provided 79 by them to support these claims. In the final phase we analyzed sufficiency of the evidence for 80 each of the claims.

81
Article Selection 82 We searched Web of Science (WOS) database (Clarivate Analytics) for studies that contain at least 83 one of these keywords: "cytotoxicity, viability, cell death, growth inhibition, proliferation, or anti- controversial as many of the articles might have used the term to represent viability decrease. 129 We addressed this in results section. We decided to consider anti-cell (which was asserted just 130 once) as if it was referring to cell death as well. 131 Cell growth may indicate either proliferation rate or the size change of cells depending on the 132 definition embraced. Since, there already is a term representing proliferation rate as the name 133 implies, we first considered to accept it as a measure of increased cell size. However, after looking 134 at the articles in our list, we realized that the term almost exclusively used to indicate 135 proliferation rate and consequently we embraced that definition in our analysis. 136 Database Construction 137 We constructed a database in Airtable to carry out evidence analysis. Information from WOS 138 database such as "article name", "DOI", "citation count", "journal name" as well as our 139 parameters of interest such as "claim", "evidence", "method", "sufficiency of evidence", and 140 "subject area" were entered for every article investigated.

141
Here, "method" represents scientific methods used in the study whereas "evidence" is defined 142 as a supergroup of methods measuring same biological phenomenon. For example, two separate 143 methods such as lactate dehydrogenase (LDH) activity assay and PI both of which measure 144 membrane damage as an indicator of cell death were classified into "membrane integrity" 145 evidence supergroup. Similarly, various tetrazolium and resazurin reduction assays were 146 considered to provide "dehydrogenase activity" evidence which is an indicator of cellular 147 metabolic activity. 148 We have also divided the studies in two notional groups of "subject area" first being 149 "Biochemistry, Molecular Biology, Genetics, and Medicine" and second "Chemistry,     Figure 1C) 223 We also analyzed whether the subject area or the journal that the article was published in might 224 be an indicator of evidence claim relationship. According to our analysis, the subject area did not 225 have a significant influence over evidence sufficiency ( Figure 1D). Similarly, amongst the most 226 frequently appeared journals in our database none of them demonstrated a significant difference 227 when compared the overall claims supported by sufficient evidence.

229
Our findings reveal a discordance between the claims and the evidence of high-impact cell culture 230 studies. This is especially evident in studies utilizing the findings of tetrazolium reduction assay 231 alone to support various claims. Striking examples include article id#9 claiming viability, 232 proliferation, cytotoxicity and apoptosis changes, and article id#26 claiming viability, 233 proliferation, growth changes and anti-cell activity with results from this assay. This is partly 234 because these assay kits are advertised by their manufacturers as a tool to measure viability, 235 cytotoxicity, proliferation and growth. Combining this with being relatively easier to perform and 236 affordable leads to these assays being perceived as a one-size-fits-all solution by research groups 237 wishing to avoid more complicated cell culture techniques. However, this reductionist approach 238 makes it difficult for the findings obtained from the study to provide meaningful answers to the 239 research questions of the article. Even though the reduction of cellular metabolic activity is not 240 a clear indicator of apoptosis or cell death, since that statement exists in a high-impact article, 241 there is a good chance that it will be cited as such as well.
In fact, the articles we analyzed were cited more than 9000 times as of March 2021 (within two 243 to four years). Even though the claims without sufficient evidence may not be the reason for 244 citation in most of the cases, the impact of such studies with unreliable findings on preclinical 245 research is undeniably large. Many agree that we need strategies in place to improve the 246 standards for pre-clinical research [5,8]. As a part of this process, we need clear and distinct 247 definitions for terms corresponding to the claims asserted in the articles. In this work we offered