RT Journal Article SR Electronic T1 Protein-coding repeat polymorphisms strongly shape diverse human phenotypes JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.01.19.427332 DO 10.1101/2021.01.19.427332 A1 Ronen E. Mukamel A1 Robert E. Handsaker A1 Maxwell A. Sherman A1 Alison R. Barton A1 Yiming Zheng A1 Steven A. McCarroll A1 Po-Ru Loh YR 2021 UL http://biorxiv.org/content/early/2021/01/20/2021.01.19.427332.abstract AB Hundreds of the proteins encoded in human genomes contain domains that vary in size or copy number due to variable numbers of tandem repeats (VNTRs) in proteincoding exons. VNTRs have eluded analysis by the molecular methods—SNP arrays and high-throughput sequencing—used in large-scale human genetic studies to date; thus, the relationships of VNTRs to most human phenotypes are unknown. We developed ways to estimate VNTR lengths from whole-exome sequencing data, identify the SNP haplotypes on which VNTR alleles reside, and use imputation to project these haplotypes into abundant SNP data. We analyzed 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 791 phenotypes. Analysis revealed some of the strongest associations of common variants with human phenotypes including height, hair morphology, and biomarkers of human health; for example, a VNTR encoding 13-44 copies of a 19-amino-acid repeat in the chondroitin sulfate domain of aggrecan (ACAN) associated with height variation of 3.4 centimeters (s.e. 0.3 cm). Incorporating large-effect VNTRs into analysis also made it possible to map many additional effects at the same loci: for the blood biomarker lipoprotein(a), for example, analysis of the kringle IV-2 VNTR within the LPA gene revealed that 18 coding SNPs and the VNTR in LPA explained 90% of lipoprotein(a) heritability in Europeans, enabling insights about population differences and epidemiological significance of this clinical biomarker. These results point to strong, cryptic effects of highly polymorphic common structural variants that have largely eluded molecular analyses to date.Competing Interest StatementThe authors have declared no competing interest.