Abstract
Autism spectrum disorder (ASD) has clinically and genetically heterogeneous characteristics. Here, we show a two-step genome-wide association study (GWAS). In the first step, we observed no significant associations in a GWAS including 597 cases and 370 controls. In the second step, we conducted a cluster analysis using k-means with 15 clusters based on Autism Diagnostic Interview-Revised (ADI-R) scores and history of vitamin treatment. We then conducted GWAS by each subgroup of cases vs all controls (cluster-based GWAS) and identified significant associations with 93 chromosomal loci that satisfied the genome-wide significance threshold of P<5.0×10−8. These loci included previously reported candidate genes for ASD: CDH9, MED13L, SOX5, CADM2, CADM1, DAB1, SEMA5A, RORA, MED13, COBL, EPHA7, HIF1AN, ICE1, PML, and WNT7B. We observed that clustering-based GWAS, even with a smaller sample size, revealed abundant significant associations. These findings suggest that clustering may successfully identify subgroups that are aetiologically more homogeneous.