Abstract
Autism spectrum disorder (ASD) has phenotypically and genetically heterogeneous characteristics. Here, we show a two-step genome-wide association study (GWAS). We used two datasets: one genotyped with the Illumina Human Omni2.5 (Omni2.5) in the discovery stage, and the other genotyped with the Illumina BeadChip 1Mv3 (1Mv3) in the replication stage. In the first step in the discovery stage, we observed no significant associations in a GWAS of 597 probands and 370 controls. In the second step in the discovery stage, we conducted cluster analyses in the combined dataset of male probands using Omni2.5 and 1Mv3 using k-means with a cluster number of 15 based on Autism Diagnostic Interview-Revised (ADI-R) scores and history of vitamin treatment, and redivided it for the discovery and replication stages. We then conducted GWAS in each subgroup of probands vs controls without the brothers of the probands belonging to the subgroup being analysed (cluster-based GWAS) and identified 65 chromosomal loci, which included 30 intragenic loci located in 21 genes and 35 intergenic ones, that satisfied the threshold of P<5.0×10−8. Some of these loci were located within or near previously reported candidate genes for ASD: CDH5, CNTN5, CNTNAP5, DNAH17, DPP10, DSCAM, FOXK1, GABBR2, GRIN2A5, ITPR1, NTM, SDK1, SNCA and SRRM4. Although we observed no consistent genes that displayed genome-wide significance between the results from the Omni2.5 and 1Mv3 datasets, we observed that cluster-based GWAS, even with a small sample size, revealed abundant significant associations. These findings suggest that clustering may successfully identify subgroups with relatively homogeneous disease aetiologies. Further studies are warranted to validate clusters and to replicate our findings in larger cohorts.