Abstract
Whereas genome-wide association studies (GWAS) allowed identifying thousands of associations between variants and traits, their success rate in pinpointing causal genes has been disproportionately low. Here, we integrate biobank-scale phenotype data from carriers of a rare copy-number variant (CNV), Mendelian randomization and animal modeling to identify causative genes in a GWAS locus for age at menarche (AaM). We show that the dosage of the 16p11.2 BP4-BP5 interval is correlated positively with AaM in the UK and Estonian biobanks and 16p11.2 clinical cohorts, with a directionally consistent trend for pubertal onset in males. These correlations parallel an increase in reproductive tract disorders in both sexes. In support of these observations, 16p11.2 mouse models display perturbed pubertal onset and structurally altered reproductive organs that track with CNV dose. Further, we report a negative correlation between the 16p11.2 dosage and relative hypothalamic volume in both humans and mice, intimating a perturbation in the gonadotropin-releasing hormone (GnRH) axis. Two independent lines of evidence identified candidate causal genes for AaM; Mendelian randomization and agnostic dosage modulation of each 16p11.2 gene in zebrafish gnrh3:egfp models. ASPHD1, expressed predominantly in brain and pituitary gland, emerged as a major phenotype driver; and it is subject to modulation by KCTD13 to exacerbate GnRH neuron phenotype. Together, our data highlight the power of an interdisciplinary approach to elucidate disease etiologies underlying complex traits.