Survey on Open Science Practices in Functional Neuroimaging

Replicability and reproducibility of scientific findings is paramount for sustainable progress in neuroscience. Preregistration of the hypotheses and methods of an empirical study before analysis, the sharing of primary research data, and compliance with data standards such as the Brain Imaging Data Structure (BIDS), are considered effective practices to secure progress and to substantiate quality of research. We investigated the current level of adoption of open science practices in neuroimaging and the difficulties that prevent researchers from using them. Email invitations to participate in the survey were sent to addresses received through a PubMed search of human functional magnetic resonance imaging studies between 2010 and 2020. 283 persons completed the questionnaire. Although half of the participants were experienced with preregistration, the willingness to preregister studies in the future was modest. The majority of participants had experience with the sharing of primary neuroimaging data. Most of the participants were interested in implementing a standardized data structure such as BIDS in their labs. Based on demographic variables, we compared participants on seven subscales, which had been generated through factor analysis. It was found that experienced researchers at lower career level had higher fear of being transparent, researchers with residence in the EU had a higher need for data governance, and researchers at medical faculties as compared to other university faculties reported a higher need for data governance and a more unsupportive environment. The results suggest growing adoption of open science practices but also highlight a number of important impediments.


Introduction
Neuroimaging, and in particular functional magnetic resonance imaging (fMRI), has contributed greatly to the generation and testing of neural models of brain function and dysfunction in mental disorders. Although the number of neuroimaging publications increases with every year, a growing literature is shaking the ground, questioning the replicability of many reported findings [1][2][3][4] . Assessing validity requires researchers to be fully transparent about the a priori hypotheses underlying a study, the complete reporting of methods, and the availability of data to reproduce the findings. These conditions are often not met 5,6 . Open science practices can protect against such adversities, but they confront scientists with additional demands to learn and adopt new techniques. To accelerate the implementation of open science practices, it is necessary to better understand obstacles that prevent researchers from adopting these practices. While survey data are available on researchers' preferences, barriers and fears related to data sharing in psychology 7 , open science practices besides data sharing have not been surveyed in the behavioral sciences, yet. Neuroimaging data is complex and hard to de-identify [8][9][10][11][12] , confronting researchers in this field with intricate challenges to share data. We investigated the familiarity, adoption, experience, and obstacles concerning open science practices in neuroimaging research. We focused on three fundamental instruments of a reproducible science: Preregistration, data sharing, and current standards of formatting and structuring data as implemented with the Brain Imaging Data Structure (BIDS) 13 . In a preregistration, authors provide an overview on the planned study and explain the a priori hypotheses along with the methods they plan to use to test the hypotheses 14 . The document is time-stamped and any changes made thereafter are documented for transparency.
Preregistrations are instrumental to avoid confusion of a priori and a posteriori definition of hypotheses and analysis methods, which can easily lead to flawed interpretation of a p-value from a statistical result and can create overconfidence in findings [15][16][17] . In face of high flexibility in preprocessing and analysis methods 1,18 , preregistration can dramatically enhance the transparency of a neuroimaging project. More than in basic science, it is mandatory to register clinical trials in a public registry before data acquisition, in order to publish in a renowned biomedical journal. In practice, leading clinical registries leave it at the discretion of the researcher as to how much detail they use to describe the analytic strategy for processing their neuroimaging data. One may register a neuroimaging endpoint in some way similar to "higher BOLD response in ROI (Region-of-Interest) X for the contrast of conditions A vs. B". There are many possible analysis strategies to assess this endpoint; the search space for significant voxels could be extended to the whole-brain or reduced to a small volume defined by a ROI mask, the mask could be anatomically or functionally defined, and so on. For a confirmatory hypothesis test, the complete analysis plan should be defined a priori 2,19 , but this is hardly the case in clinical trials with neuroimaging endpoints.
A growing literature is providing tools and guidelines to facilitate reproducible neuroimaging findings and data sharing 2,[20][21][22][23] . Standards such as BIDS, which was introduced by Gorgolewski et al. in 2016 13 , present a well-documented scheme to structure data files in directories, provide agreed upon terminology for naming these files, and explain how metadata should be reported. The sharing of primary research data is critical for a reproducible science and can save resources, as existing data can be re-used and aggregated with other data sets for future research projects. Still, researchers often eschew data sharing, e.g. because of a lack of incentives, the fear of misuse, and legal issues such as data protection and privacy issues 7,[24][25][26][27] . In this respect, it is of interest how the General Data Protection Regulation (GDPR), which came into force in the European Union (EU) May, 2018, may affect the preference to share data among researchers inside vs. outside the EU. Moreover, the more complex the dataset, the more resources may be required to prepare a sharable dataset, thus taking up time that could be used to do new experiments 28 . Where scientist practitioners   must balance research and clinical work and where data are collected from vulnerable patient populations, the situation can be even more fraught. Therefore, we analyzed differences between researchers who indicated an affiliation with a medical faculty vs. a different, nonmedical faculty. Data and materials from this research are available online 29 .

Participants
A PubMed search with the search terms ("fMRI" OR "functional magnetic resonance imaging" OR "functional Magnetic Resonance Imaging") was done to collect email addresses from corresponding authors of scientific articles published between 2010/01/01 and 2020/08/28. The "Humans" search filter was applied to exclude animal imaging work. An email was sent to 14,690 addresses on 2020/01/12 with an invitation to participate, including a personalized link to the survey. If the recipients did not click the link or did not complete the survey after 14 days, they received a single reminder email. Figure 1 Figure 3) and reported themselves in cognitive neuroscience ( Figure 4). Participants from the European Union were overrepresented in the sample, while the USA and UK ranked second and third in number of participants ( Figure 5). Half of the sample held a full or associate professorship or a comparable position ( Figure 6).
By clicking the personalized link, participants were navigated to an online form where they gave their informed consent before they could start with the questionnaire. This research was conducted in accordance with the declaration of Helsinki and was approved by the Ethics Committee of the Medical Faculty Mannheim of the University of Heidelberg.

Materials
The questionnaire was composed of five building blocks. Blocks 1-3 focused on three areas of open science practices: data structure, preregistration and data sharing. The fourth block asked about technical expertise with software (which was not analyzed for this publication) and the fifth part assessed sociodemographic data. In the beginning of each block a brief introduction to the topic area with definitions for key terms was provided. One or more questions on the subjective experience with the topic followed. Further, it included one or more questions to assess the likelihood to adopt practices of this topic area in the future on a 5-point Likert scale ("extremely unlikely" -1, "somewhat unlikely" -2, "neither likely nor unlikely" -3, "somewhat likely" -4, "extremely likely" -5). The items for the data structure block were created by the author team with the major goal to assess knowledge and usage of BIDS in the fMRI community. Barriers and fears of adopting preregistration and data sharing practices were assessed by asking for agreement with statements on a 7-point Likert scale ("strongly disagree" -1, "disagree" -2, "somewhat disagree" -3, "neither agree nor disagree" -4, "somewhat agree" -5, "agree" -6, "strongly agree" -7). For the data sharing block we used items from a previously published study on data sharing in psychology 7 . Due to the broader scope of our survey and to reduce burden for participants, a selection of items and response options was drawn from Houtkoop et al. 7

Data analysis
Statistics software R version 4.0.5 was used to analyze the data. To analyze individual differences, we defined subgroups based on demographic variables of interest: 1) Career level (full/associate professors vs. assistant professors or lower stage), 2) years of research experience , 3) EU residency (EU resident vs. no EU resident) and 4) affiliation with medical faculty (university hospital/medical faculty vs. other faculty). T-tests were used to assess individual differences and Bayes Factor (BayesFactor Version 0.9.12-4.2 31 ) was determined to assess the relative evidence for the alternative hypothesis versus the null hypothesis (BF10).
We used the low information cauchy prior with a scale factor of 0.707, which is the default of the BayesFactor package that was used for this analysis and which has been suggested for psychological applications. Bayes factors take values between p(Data|H1) and p(Data|H0), with the common minimum cutoff of 3 (or below ⅓) indicating claims of evidence in favour of one hypothesis over the other. To explore latent variables that may drive responses to items on both data sharing and preregistration, an exploratory factor analysis was performed using R package lavaan_0.6-7 and psych_2.0.12 32 . An exploratory structural equation model was chosen to leverage the advantages of exploratory factor analysis and confirmatory factor analysis 33 , allowing the evaluation of exploratory models with goodness of fit measures. In total, the 28 statements that related to barriers and fears of data sharing and preregistration, as well as preference of how to share data, were used for the analyses. Each statement was rated on a Likert scale ranging from 1 ("strongly disagree") to 7 ("strongly agree"). Factor analysis was performed using maximum likelihood estimation and oblique rotation (Oblimin), allowing factors to correlate with each other. The number of factors was determined using parallel analysis. Items with factor loadings >0.4 were retained.
To investigate whether groups with different response patterns exist, we performed a datadriven cluster analysis on the seven factors received from exploratory structural equation modeling. The euclidean distance was used to construct the dissimilarity matrix and clustering performed using Ward's method. The optimal number of clusters was chosen based on the elbow and the silhouette method using the factoextra package version 1.0.7 34 . To explore whether any demographic variables could predict cluster belongingness, we performed a logistic regression with research experience, primary affiliation with medical faculty, EU residency, and career level as predictors. Model accuracy was calculated using the Caret package 35 .

Preregistration is facing challenges
42.4 % participants indicated they have never preregistered a study. Among the rest of participants, the most frequently used preregistration platform was the Open Science Framework (OSF, 32.5%), followed by ClinicalTrials.gov (25.1%), and AsPredicted (9.5%).
14.1 % indicated they had submitted a registered report article type 36 to a scientific journal ( Figure 7). About the same number of participants who said they had preregistered a study before indicated they were likely or extremely likely to preregister their next study online (55%), while 26% disagreed ( Figure 8). Asked about potential barriers for preregistration, 64% agreed at least to some extent with the statement that their analyses were too complex to preregister. The statement "There is no sufficient reward for preregistration" reached the second rank (53%). 46% agreed that preparing a preregistration is too time-consuming for them and 41% agreed that they know too little about preregistration platforms (41%) or that they have never learned to preregister a project (41%). 74% disagreed with the statement that they had never thought about preregistering a project (14% agreed). 10% indicated that their supervisor does not support preregistration. Asked about potential fears of preregistration, 49% agreed that they were afraid that their preregistered methods may turn out as suboptimal or inadequate. 23% agreed they were afraid that their preregistered hypotheses may turn out false. We also asked whether participants think that it is necessary to register studies with an explorative research question and 48% agreed ( Figure 9).

3.2
Sharing raw data is common practice for many 66% of all participants said they have shared neuroimaging raw data with other researchers outside their department before. Asked about the intention to share primary research data of their next neuroimaging paper in an online repository, 54% indicated they were likely or extremely likely to do this, while 25% were unlikely or extremely unlikely ( Figure 10). Asked whether they were not allowed to share primary neuroimaging data due to legal constraints, 64% disagreed at least to some extent, while 9% agreed (27% did neither agree nor disagree, Figure 11). If a participant did not disagree strongly with the above statement, a follow up question was asked to investigate the reasons why the participant thought s/he was not allowed to share primary neuroimaging research data. Most participants endorsed the statement that anonymity cannot be guaranteed if the data is shared (45.2% agreed at least somewhat). 41% indicated their consent forms state that data will not be shared. 29.5% responded that their institutional review board does not allow them to share data. 14.8% reported stakeholder interests prohibiting data from being shared and 6.7% said that a funder, advisor or supervisor does not allow them to share data ( Figure 12).

3.3
Europeans more hesitant to share raw data online in the future To explore interindividual differences that may result from national data protection legislation, we compared participants who indicated their country of residence within the European Union (EU) vs. outside the EU. The number of participants who indicated they had shared data in the past outside their department did not significantly differ between EU and non EU researchers (Χ 2 (1)= 0.287, p = 0.591). More participants from the EU agreed with the statement they are not allowed to share primary neuroimaging data for legal reasons, t(251.94) = 2.84, p<0.005, BF10=6.26, and less participants from the EU agreed they will likely share primary research data from their next neuroimaging paper online, t(269.59) = 3.09, p<0.002, BF10=10.75.

Researchers appreciate data sharing agreements
To learn more about the preferred mode of data sharing, we let participants evaluate several options on how data can be shared with other researchers. Highest agreement was found for the option to share data under a data sharing agreement to be signed by the recipient (65%), directly followed by the option to share upon personal request and therewith bypassing a data repository (64%). With 58% agreement, sharing via a managed online repository with restricted access found high approval, too. The option to share via an online repository with unrestricted access was prefered by 35% of participants, while 45% expressed disagreement with this item. 17% prefered that researchers with reasonable interest can work with their data, but that this work needs to be done on the server of their home institution (63% disagreed). Finally, 6% agreed they preferred not to give away raw data to other researchers, whereas 81% disagreed ( Figure 13).

Lack of resources poses a high hurdle to data sharing
Asked about barriers for and fears of data sharing, 67% agreed at least somewhat that preparing data to make it suitable for online sharing is too time-consuming. The second leading statement "I lack funding to make data suitable for online sharing" received 61% agreement. 47% of participants agreed they are afraid of being scooped, i.e., that other researchers may publish results received with their data set before they can. 41% agreed they knew too little about suitable data repositories and 40% agreed they never learned to share their research data online. 38% endorsed the statement they are afraid not to get proper recognition for sharing data. The concern that data sets were too big (33%) or too complex (30%) to share were found on the following ranks. 25% expressed fears that other researchers could run alternative analyses on their data to rebut their own conclusions and 24% agreed they are afraid that other researchers will discover errors in their data. 11% agreed their supervisor does not support online data sharing. 11% agreed they have never thought about data sharing, whereas 81% disagreed ( Figure 14).

High interest in using BIDS
72% of respondents indicated that they had heard about BIDS before. 35% said that they had used BIDS in the past and have been working with it for 2.27 (1.78) years on average. The vast majority, 91%, find it likely or extremely likely that they are going to use BIDS in the future ( Figure 15). Participants who said that they have not used BIDS before were asked to report the reason. Most indicated they had not heard about BIDS before (41.5%), they had no time to implement it in the lab (36.1%), or to learn more about it (28.4%). 12.6% agreed they were lacking technical expertise to get BIDS conversion running, 10.9% said they were currently implementing it, and 6% said they were using a different data structure format than BIDS. 5.5% deemed BIDS not relevant for their lab ( Figure 16). Those preferring to operate software via graphical user interface (GUI) used BIDS significantly less often as compared participants who prefer to interact via command interface, Χ 2 (1)= 18.72 , p < 0.001. Those who indicated that they had used BIDS before were then asked about experience with BIDS-compatible software: 32% participants experienced with BIDS used custom code to convert raw neuroimaging data into the BIDS format, while 16% indicated that they have not used any conversion software ( Figure 17). Several participants confirmed they have been using software that can operate on BIDS formatted data sets such as fMRIPrep 20 (44%), MRIQC 37 (23%), OpenNeuro 38 (18%) and other tools (<10%) (Figure 18).

Factors underlying barriers, fears, and preferences of preregistration and data sharing
We explored whether the answers of our participants could be reduced to a smaller set of interpretable latent variables. Bartlett's test confirmed that the items correlated sufficiently, for data governance, unsupportive environment, and lack of resources for data sharing.
We used the results from factor analysis to build seven subscales from our questionnaire. For each participant we calculated subscale scores by averaging the item scores assigned to each factor. The subscale scores were further used to explore individual differences, comparing participants based on demographic variables. The Bonferroni corrected results of all performed comparisons can be found in Table 3. For the factor "fear of being transparent" we found that people with a lower career level were significantly more fearful than people with a higher career level. For "need for data governance", people having their primary affiliation with a medical faculty showed significantly higher scores than people having their primary affiliation with a psychological or other faculty. Respondents residing in the EU had a higher need for data governance than non-EU residents. Lastly, people affiliated with a medical faculty scored higher on "unsupportive environment", as did respondents with a lower career level compared to respondents with a higher career level.

Distinct subgroups of open science profiles
We explored whether there are groups of participants with distinct profiles, according to scores achieved on the subscales, which might serve as potential target groups for future actions on open science practices. The suggested optimal number of clusters was two, which was supported by the highest Dunn Index for the two-cluster solution (0.155), compared to the three-and four-cluster solutions ( Figure 19). As visible in the profile plot ( Figure 20), cluster 1 consists of researchers with less experience, more complex datasets, and more concerns regarding data sharing and preregistration, as well as a less supportive environment and fewer resources for data sharing. Cluster 2 was composed of researchers who were more experienced with open science practices and who saw overall less barriers and had lower fears.
To find out whether cluster-belongingness could be explained by demographic variables, we conducted a regression analysis. Overall the explanatory power of our regression model was marginally better than chance, Χ²(4)=10.09, p=0.039, (Table 4) with an out-of-sample accuracy of 59,9%, based on 10-fold cross-validation. The affiliation with a medical faculty and full/associate professorship predicted whether a participant belonged to cluster 1 at trend level, with p=0.059 and p=0.067, respectively.

Discussion
Preregistration of research questions, hypotheses and the analysis plan as well as data sharing were proposed to improve the replicability, robustness and reproducibility 16,39 . This survey aimed to shed light on the experience with and attitude towards open science practices in human neuroimaging, namely with regards to preregistration, data sharing and data standards. We reached out to researchers who had published papers using human fMRI in the past, which was reflected by the resulting sample being mainly composed of researchers who were advanced in their careers. It can be assumed that most participants of this survey were heading their own labs and that they oversaw and exerted influence in their field of research.
Surprisingly, the interest to use preregistration was rather modest. About one half of participants had preregistered a study before, with OSF as the most commonly used platform.
There was no indication of a trend towards more widespread use of preregistration in the future. Still, two thirds had at least thought about preregistering their research. Besides the barriers and fears that we had asked for, some participants shared a critical perspective on the role of preregistration as a technique to promote the quality of science ( Further instruments to respond to the many barriers and fears of data sharing have been described elsewhere 7 . Explorative regression analysis showed that the demographic variables we had used to predict belongingness to the two clusters barely exceeded chance level and the out-of-sample accuracy was relatively low. None of the variables that were tested predicted cluster belongingness beyond trend level. Future research is necessary to confirm our findings and to explore more variables that may aid the prediction.

Limitations
Conclusions from this survey are limited by the low response rate to the survey invitation (2.4%), which was below the rates reported in previous investigations (4% 27

Conclusions
Limited time and insufficient education about tools to structure and share data were

Conflicts of interest
The authors declare no conflicts of interest.

Acknowledgement
Thanks are due to Gordon Feld for critical reading of an earlier version of this manuscript. We are thankful to our colleagues from the Department for Psychosomatic Medicine and Psychotherapy, CIMH, for their feedback on the questionnaire during development.                 Table 4. Results from logistic regression with Cluster as the dependent variable and the demographic variables "research experience", "career level", "EU residency" and "Affiliation with medical faculty" as predictors. · Preregistration constrains the creativity that is at the basis of progress in science · Preregistration leads to terrible papers, where too much text is spent on explaining the preregistered content and the justifications for deviating from them · Realistic standards for evaluating conformity to the preregistration missing · Pre-registration is only meaningful for purely confirmatory studies. Purely confirmatory studies are only meaningful when there is a strong hypothesis and the goal of the confirmatory study is to confirm this hypothesis.

Comments on further barriers in the way of open science
· The benefits of pre-registration have not been thoroughly demonstrated in order to merit its adoption Data sharing · Data protection regulations from host institution incompatible with sharing · Money to store and manage data repositories missing after grant terminates · Neuroimaging data are intellectual property, rights of researchers acquiring data need to be protected · No canonical interpretation of the laws/regulations available · Practical guides on how to share clinical data online missing · Whether the data will be used by anyone at all, and how long a given repository will last is unknown.

Comments expressing further fears of open science:
· Lose my job because not complying with host institutions data protection regulation · My worries about not being able to publish every last ounce of results from my data are very high.
· I unfortunately think that the open science movement has the capacity to really disadvantage jr. researchers in comparison to well-established labs · Transparency is nice, but we seem to be willing to sacrifice part of our creativity through forced standardization · My greatest fear is giving away your research ideas with preregistration Feedback on the questionnaire: · Don't think this survey captured my opinions very accurately. I am a strong supporter of Open Science, but have a number of concerns about data sharing and the potential for abuse · A question was lacking about lack of confidence in how to interpret the jurdical bases for data sharing · In the survey it was a bit unclear if data sharing refers to neurogimaging data only or in general · Many researchers will not reply, let alone reply honestly · I think that analyses for individual papers can be prespecified, but it would be hard to prespecify analyses for large studies. I understood that you are referring to pre-registration of the entire large study, which I said I do not do · There was insufficient opportunities to comment on the role of journals (static, laminated publications etc) in effectively prohibiting open science practices. Open science may obviate the need for journals.
· The question at the bottom of the page asking for legal issues yes/no was difficult to answer, because we have these issues for old data (not considering data sharing) but we always take care of these now in new projects (including data sharing).
· Many of your questions are difficult to answer / ambiguous since there are different hurdles to share data from healthy participants and patients Other: Preregistration provides a way of claiming precedence for an idea, even if the results don't bear out the findings Table 5. In the end of the survey, the participants were given the opportunity to write a freetext comment to the authors of the survey. 45 (17%) of the participants took advantage of this option. The table lists a selection of these comments that bring up aspects that were not properly covered by the survey questions, or that give constructive feedback on the questionnaire itself. Comments have been shortened or reworded at the discretion of the author (CP) to make them more concise..