Clustering by phenotype and genome-wide association study in autism

Akira Narita; Masato Nagai; Satoshi Mizuno; Soichi Ogishima; Gen Tamiya; Masao Ueki; Rieko Sakurai; Satoshi Makino; Taku Obara; Mami Ishikuro; Chizuru Yamanaka; Hiroko Matsubara; Yasutaka Kuniyoshi; Keiko Murakami; Tomoko Kobayashi; Mika Kobayashi; Takuma Usuzaki; Hisashi Ohseto; Atsushi Hozawa; Masahiro Kikuya; Hirohito Metoki; Shigeo Kure; Shinichi Kuriyama

doi:10.1101/614958

Abstract

Autism spectrum disorder (ASD) has clinically and genetically heterogeneous characteristics. Here, we show a two-step genome-wide association study (GWAS). In the first step, we observed no significant associations in a GWAS including 597 cases and 370 controls. In the second step, we conducted a cluster analysis using k-means with 15 clusters based on Autism Diagnostic Interview-Revised (ADI-R) scores and history of vitamin treatment. We then conducted GWAS by each subgroup of cases vs all controls (cluster-based GWAS) and identified significant associations with 93 chromosomal loci that satisfied the genome-wide significance threshold of P<5.0×10⁻⁸. These loci included previously reported candidate genes for ASD: CDH9, MED13L, SOX5, CADM2, CADM1, DAB1, SEMA5A, RORA, MED13, COBL, EPHA7, HIF1AN, ICE1, PML, and WNT7B. We observed that clustering-based GWAS, even with a smaller sample size, revealed abundant significant associations. These findings suggest that clustering may successfully identify subgroups that are aetiologically more homogeneous.