Abstract
Data-driven discovery of image-derived phenotypes (IDPs) from large-scale multimodal brain imaging data has enormous potential for neuroscientific and clinical research by linking IDPs to subjects’ demographic, behavioural, clinical and cognitive measures (i.e., non-imaging derived phenotypes or nIDPs). However, current approaches are primarily based on unsupervised approaches, without the use of information in nIDPs. In this paper, we proposed a semi-supervised, multimodal, and multi-task fusion approach, termed SuperBigFLICA, for IDP discovery, which simultaneously integrates information from multiple imaging modalities as well as multiple nIDPs. SuperBigFLICA is computationally efficient and largely avoids the need for parameter tuning. Using the UK Biobank brain imaging dataset with around 40,000 subjects and 47 modalities, along with more than 17,000 nIDPs, we showed that SuperBigFLICA enhances the prediction power of nIDPs, benchmarked against IDPs derived by conventional expert-knowledge and unsupervised-learning approaches (with average nIDP prediction accuracy improvements of up to 46%). It also enables the learning of generic imaging features that can predict new nIDPs. Further empirical analysis of the SuperBigFLICA algorithm demonstrates its robustness in different prediction tasks and the ability to derive biologically meaningful IDPs in predicting health outcomes and cognitive nIDPs, such as fluid intelligence and hypertension.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Financial support was provided by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z, the Wellcome Trust Collaborative Award in Science 215573/Z/19/Z and further supported by the Netherlands Organization for Scientific Research Vici Grant No. 17854 and NWO-CAS Grant No. 012-200-013. We are grateful to UK Biobank and its participants (access application 8107). Computation used the Oxford Biomedical Research Computing (BMRC) facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the NIHR Oxford Biomedical Research Centre. The authors declare that they have no competing financial interests.
UK Biobank data is available from UK Biobank via their standard data access procedure (see http://www.ukbiobank.ac.uk/register-apply). SuperBigFLICA code is available at https://github.com/weikanggong/SuperBigFLICA.
S.S and C.B are co-senior authors.
1