Beyond the intraclass correlation: A hierarchical modeling approach to test-retest assessment

Gang Chen; Daniel S. Pine; Melissa A. Brotman; Ashley R. Smith; Robert W. Cox; Simone P. Haller

doi:10.1101/2021.01.04.425305

Abstract

The concept of test-retest reliability indexes the repeatability or consistency of a measurement across time. High reliability is critical for any scientific study, specifically for the study of individual differences. Evidence of poor reliability of commonly used behavioral and functional neuroimaging tasks is mounting. These reports have called into question the adequacy of using even the most common, well-characterized cognitive tasks with robust population-level effects, to measure individual differences. Here, we lay out a hierarchical framework that estimates reliability as a correlation divorced from trial-level variability, and show how reliability tends to be underestimated under the conventional intraclass correlation framework. In addition, we examine how reliability estimation diverges between the modeling frameworks and assess how different factors (e.g., trial and subject sample sizes, relative magnitude of cross-trial variability) impact reliability estimates across the different frameworks. This work highlights that a large number of trials (e.g., greater than 100) may be required to achieve reasonably precise reliability estimates. We reference the tools of TRR and 3dLMEr for the community to apply trial-level models to behavior and neuroimaging data.

Competing Interest Statement

The authors have declared no competing interest.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.