Abstract
Intraclass correlation (ICC) is a reliability metric that gauges similarity when, for example, entities are measured under similar, or even the same, well-controlled conditions, which in MRI applications include runs/sessions, twins, parent/child, scanners, sites, etc. The popular definition and interpretation of ICC are usually framed statistically under the conventional ANOVA platform. Here, we show that the ANOVA frame-work is often limited, rigid, and inflexible in modeling capabilities. Additionally we provide a comprehensive overview of ICC analysis in its prior usage in neuroimaging. These intrinsic limitations motivate several novel improvements. Specifically, we start with the conventional ICC model under the ANOVA platform, and extend it along two dimensions: first, fixing the failure in ICC estimation when negative values occur under degen-erative circumstance, and second, incorporating reliability information of effect estimates into the ICC model. These endeavors lead to four modeling strategies: linear mixed-effects (LME), Bayesian mixed-effects (BME), multilevel mixed-effects (MME), and Bayesian multilevel mixed-effects (BMME). Compared to ANOVA, each of these four models directly provides estimates for fixed effects as well as their statistical significances, in addition to the ICC estimate. These new modeling approaches can also accommodate missing data as well as fixed effects for confounding variables. More importantly, we show that the novel MME and BMME approaches offer more accurate characterization and decomposition among the variance components, leading to more robust ICC computation. Based on these theoretical considerations and model performance comparisons with a real experimental dataset, we offer the following general-purpose recommendations. First, ICC estimation through MME or BMME is preferable when precision information is available for the effect estimate; precision information provides weights that more accurately allocate the variances in the data. When precision information is unavailable, ICC estimation through LME or the novel BME is the preferred option. Second, even though the absolute agreement version, ICC(2,1), is presently more popular in the field, the consistency version, ICC(3,1), is a practical and informative choice for whole-brain ICC analysis that achieves a well-balanced compromise when all potential fixed effects are accounted for. Third, approaches for clear, meaningful, and useful result reporting in ICC analysis are discussed. All models, ICC formulations, and related statistical testing methods have been implemented in an open source program 3dICC, which is publicly available as part of the AFNI suite. Even though our work here focuses on the whole brain level, the modeling strategy and recommendations can be equivalently applied to other situations such as voxel, region, and network levels.