PT - JOURNAL ARTICLE AU - Xiao Chen AU - Bin Lu AU - Chao-Gan Yan TI - A Comprehensive Assessment of Reproducibility of R-fMRI Metrics on the Impact of Different Strategies for Multiple Comparison Correction and Sample Sizes AID - 10.1101/128645 DP - 2017 Jan 01 TA - bioRxiv PG - 128645 4099 - http://biorxiv.org/content/early/2017/07/18/128645.short 4100 - http://biorxiv.org/content/early/2017/07/18/128645.full AB - Concerns regarding the reproducibility of findings have been raised in the field of resting-state functional magnetic resonance imaging (R-fMRI). However, little is known about operationally defined R-fMRI reproducibility and to what extent it is affected by multiple comparison correction strategies and sample sizes. We comprehensively assessed test-retest reliability and replicability, two aspects of reproducibility, on widely used R-fMRI metrics in both between-subjects contrasts of sex differences as well as within-subject comparisons of eyes-open and eyes-closed (EOEC) conditions. We noted permutation test with Threshold-Free Cluster Enhancement (TFCE), a strict multiple comparison correction strategy, reached the best balance between family wise error rate (under 5%) and test-retest reliability / replicability (e.g., 0.68 for test-retest reliability and 0.25 for replicability of amplitude of low-frequency fluctuations (ALFF) for between-subject sex differences, 0.49 for replicability of ALFF for within-subject EOEC differences). Although the effects in R-fMRI metrics can attain moderate reliability, they were poorly replicated in a distinct dataset (replicability < 0.3 for between-subject sex differences, < 0.5 for within-subject EOEC differences). By randomly drawing different sample sizes from a single site, we found reliability, sensitivity and positive predictive value (PPV) rose as sample size increased. Small sample sizes (e.g., < 80 (40 in each group)) not only minimized power (sensitivity < 2%), but also decreased the likelihood that significant results reflect “true” effects (PPV < 0.26). Our findings have implications for how to select multiple comparison correction strategies and highlight the importance of sufficiently large sample sizes in future R-fMRI studies.