Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery

Abstract

The Greater Middle East (GME) has been a central hub of human migration and population admixture. The tradition of consanguinity, variably practiced in the Persian Gulf region, North Africa, and Central Asia1,2,3, has resulted in an elevated burden of recessive disease4. Here we generated a whole-exome GME variome from 1,111 unrelated subjects. We detected substantial diversity and admixture in continental and subregional populations, corresponding to several ancient founder populations with little evidence of bottlenecks. Measured consanguinity rates were an order of magnitude above those in other sampled populations, and the GME population exhibited an increased burden of runs of homozygosity (ROHs) but showed no evidence for reduced burden of deleterious variation due to classically theorized 'genetic purging'. Applying this database to unsolved recessive conditions in the GME population reduced the number of potential disease-causing variants by four- to sevenfold. These results show variegated genetic architecture in GME populations and support future human genetic discoveries in Mendelian and population genetics.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The Greater Middle East Variome as a hub of human genetics.
Figure 2: Wide diversity and high inbreeding coefficients in GME population substructure.
Figure 3: Distributions of short and long runs of homozygosity correlate with patterns of bottlenecks and recent consanguinity.
Figure 4: The GME Variome facilitates the discovery of genes associated with Mendelian disease.

Similar content being viewed by others

References

  1. Anwar, W.A., Khyatti, M. & Hemminki, K. Consanguinity and genetic diseases in North Africa and immigrants to Europe. Eur. J. Public Health 24 (Suppl. 1), 57–63 (2014).

    Article  PubMed  Google Scholar 

  2. Al-Gazali, L., Hamamy, H. & Al-Arrayad, S. Genetic disorders in the Arab world. Br. Med. J. 333, 831–834 (2006).

    Article  Google Scholar 

  3. Hussain, R. & Bittles, A.H. The prevalence and demographic characteristics of consanguineous marriages in Pakistan. J. Biosoc. Sci. 30, 261–275 (1998).

    Article  CAS  PubMed  Google Scholar 

  4. Sheffield, V.C., Stone, E.M. & Carmi, R. Use of isolated inbred human populations for identification of disease genes. Trends Genet. 14, 391–396 (1998).

    Article  CAS  PubMed  Google Scholar 

  5. Sharp, J.M. The Broader Middle East and North Africa Initiative: an overview. in CRS Report for Congress Congressional Research Service. The Library of Congress. US Government. Vol. RS22053 (2005).

  6. Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ravindranath, V. et al. Regional research priorities in brain and nervous system disorders. Nature 527, S198–S206 (2015).

    Article  CAS  PubMed  Google Scholar 

  8. Hunter-Zinck, H. et al. Population genetic structure of the people of Qatar. Am. J. Hum. Genet. 87, 17–25 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  10. Moreno-Estrada, A. et al. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 9, e1003925 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Botigué, L.R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl. Acad. Sci. USA 110, 11791–11796 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).

    Article  CAS  PubMed  Google Scholar 

  13. Henn, B.M. et al. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genet. 8, e1002397 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Gérard, N., Berriche, S., Aouizérate, A., Diéterlen, F. & Lucotte, G. North African Berber and Arab influences in the western Mediterranean revealed by Y-chromosome DNA haplotypes. Hum. Biol. 78, 307–316 (2006).

    Article  PubMed  Google Scholar 

  15. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. SIGMA Type 2 Diabetes Consortium. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature 506, 97–101 (2014).

  18. Pickrell, J.K. & Pritchard, J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Tadmouri, G.O. et al. Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Leutenegger, A.L., Sahbatou, M., Gazal, S., Cann, H. & Génin, E. Consanguinity around the world: what do the genomic data of the HGDP-CEPH diversity panel tell us? Eur. J. Hum. Genet. 19, 583–587 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Pippucci, T., Magi, A., Gialluisi, A. & Romeo, G. Detection of runs of homozygosity from whole exome sequencing data: state of the art and perspectives for clinical, population and epidemiological studies. Hum. Hered. 77, 63–72 (2014).

    Article  PubMed  Google Scholar 

  22. Pemberton, T.J. et al. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91, 275–292 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Szpiech, Z.A. et al. Long runs of homozygosity are enriched for deleterious variation. Am. J. Hum. Genet. 93, 90–102 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Itan, Y. & Casanova, J.L. Can the impact of human genetic variations be predicted? Proc. Natl. Acad. Sci. USA 112, 11426–11427 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).

    Article  CAS  PubMed  Google Scholar 

  27. Jones, S. The Darwin Archipelago (Yale University Press, 2011).

  28. Haldane, J.B.S. The effect of variation of fitness. Am. Nat. 71, 337–349 (1937).

    Article  Google Scholar 

  29. Overall, A.D., Ahmad, M. & Nichols, R.A. The effect of reproductive compensation on recessive disorders within consanguineous human populations. Heredity 88, 474–479 (2002).

    Article  CAS  PubMed  Google Scholar 

  30. Neale, B.M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Simons, Y.B., Turchin, M.C., Pritchard, J.K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Casanova, J.L., Conley, M.E., Seligman, S.J., Abel, L. & Notarangelo, L.D. Guidelines for genetic studies in single patients: lessons from primary immunodeficiencies. J. Exp. Med. 211, 2137–2149 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. MacArthur, D.G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Novarino, G. et al. Exome sequencing links corticospinal motor neuron disease to common neurodegenerative disorders. Science 343, 506–511 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Blackstone, C., O'Kane, C.J. & Reid, E. Hereditary spastic paraplegias: membrane traffic and the motor pathway. Nat. Rev. Neurosci. 12, 31–42 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Dixon-Salazar, T.J. et al. Exome sequencing can improve diagnosis and alter patient management. Sci. Transl. Med. 4, 138ra78 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Okada, S. et al. Impairment of immunity to Candida and Mycobacterium in humans with bi-allelic RORC mutations. Science 349, 606–613 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Alsalem, A.B., Halees, A.S., Anazi, S., Alshamekh, S. & Alkuraya, F.S. Autozygome sequencing expands the horizon of human knockout research and provides novel insights into human phenotypic variation. PLoS Genet. 9, e1004030 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Cann, H.M. et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).

    Article  CAS  PubMed  Google Scholar 

  46. Behar, D.M. et al. The genome-wide structure of the Jewish people. Nature 466, 238–242 (2010).

    Article  CAS  PubMed  Google Scholar 

  47. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Pruitt, K.D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).

    Article  CAS  PubMed  Google Scholar 

  49. Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    CAS  PubMed  Google Scholar 

  51. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer Science & Business Media, 2009).

  52. Polasek, O. et al. Comparative assessment of methods for estimating individual genome-wide homozygosity-by-descent from human genomic data. BMC Genomics 11, 139 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Magi, A. et al. H3M2: detection of runs of homozygosity from whole-exome sequencing data. Bioinformatics 30, 2852–2859 (2014).

    Article  CAS  PubMed  Google Scholar 

  54. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).

    Article  CAS  Google Scholar 

  57. Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Erichsen, A.K., Koht, J., Stray-Pedersen, A., Abdelnoor, M. & Tallaksen, C.M. Prevalence of hereditary ataxia and spastic paraplegia in southeast Norway: a population-based study. Brain 132, 1577–1588 (2009).

    Article  PubMed  Google Scholar 

  59. Stevanin, G. et al. Mutations in SPG11 are frequent in autosomal recessive spastic paraplegia with thin corpus callosum, cognitive decline and lower motor neuron degeneration. Brain 131, 772–784 (2008).

    Article  PubMed  Google Scholar 

  60. Vardi-Saliternik, R., Friedlander, Y. & Cohen, T. Consanguinity in a population sample of Israeli Muslim Arabs, Christian Arabs and Druze. Ann. Hum. Biol. 29, 422–431 (2002).

    Article  CAS  PubMed  Google Scholar 

  61. Shami, S.A., Qaisar, R. & Bittles, A.H. Consanguinity and adult morbidity in Pakistan. Lancet 338, 954 (1991).

    Article  CAS  PubMed  Google Scholar 

  62. Stoltenberg, C., Magnus, P., Lie, R.T., Daltveit, A.K. & Irgens, L.M. Birth defects and parental consanguinity in Norway. Am. J. Epidemiol. 145, 439–448 (1997).

    Article  CAS  PubMed  Google Scholar 

  63. Do, R. et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nat. Genet. 47, 126–131 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. SIGMA Type 2 Diabetes Consortium. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. J. Am. Med. Assoc. 311, 2305–2314 (2014).

  65. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Wang, S., Lachance, J., Tishkoff, S.A., Hey, J. & Xing, J. Apparent variation in Neanderthal admixture among African populations is consistent with gene flow from non-African populations. Genome Biol. Evol. 5, 2075–2081 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Lowery, R.K. et al. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms. Gene 530, 83–94 (2013).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank S. Sunyaev and D. Reich for help with PolyPhen-2 and DAF corrections, M. Turchin for help with purging analysis, J. Pickrell for help with TreeMix, and V. Bafna, N. Schork, and S. Bonissone for suggestions. Work was supported by grants from the US National Institutes of Health (P01HD070494 and R01NS048453), the Qatari National Research Foundation (NPRP6-1463), the Simons Foundation Autism Research Initiative (175303 and 275275) to J.G.G., the Yale Center for Mendelian Disorders (U54HG006504), the Broad Institute (U54HG003067), the Rockefeller University CTSA (5UL1RR024143-04), the Howard Hughes Medical Institute (to J.G.G. and J.-L.C.), INSERM, the St. Giles Foundation, and the Candidoser Association and by grants R01AI088364, R37AI095983, P01AI061093, U01AI109697 (to J.-L.C.), U01AI088685 (to J.-L.C. and L.A.), R21AI107508 (to E. Jouanguy), the DHFMR Collaborative Research Grant, and KACST 13-BIO1113-20 (to F.S.A.).

Author information

Authors and Affiliations

Authors

Consortia

Contributions

E.M.S. performed analysis and generated all figures. A.H., Y.I., Y.H., and M.A.A. consulted on analysis. E.G.S., A.B., B.B., L.A., F.S.A., J.-L.C., and J.G.G. contributed subjects and jointly wrote and edited the manuscript. S.B.G. oversaw sequencing. A.G.C. consulted on population studies. The GME Variome Consortium identified subjects for study.

Corresponding author

Correspondence to Joseph G Gleeson.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Country distribution of GME samples and designation of geographical subregions.

GME samples collected across 20 countries and territories from the GME. Pie size corresponds to the number of samples from each country, and each pie shows the proportion of samples filtered because of quality control and relationship status (Online Methods). Geographical subregions are colored to show the sets of grouped countries. Some non-uniformity of sampling was inevitable owing to the inaccessibility of some populations. Map downloaded from https://www.presentationmagazine.com/ then colored.

Supplementary Figure 2 Unbiased genetic clustering demonstrates shorter genetic distance between samples from proximal geographical subregions.

Dendrogram of unbiased genetic clustering correlated with geographical subregion designation. 2,497 samples underwent exome sequencing from the Greater Middle East Consortium, including 1,111 GME samples as well as samples from Africa, East Asia, Europe, the Americas, Oceania, and unknown regions. Calculated identity-by-state (IBS) distances between samples represent the number of non-identical positions. Concordance between recruitment location and IBS clustering for all GME subregions was observed. Some intermixing was evident, suggesting recent migration events.

Supplementary Figure 3 ADMIXTURE cross-validation.

(a) Cross-validation errors for the ADMIXTURE results shown in Supplementary Figure 1. Analysis with k = 6 gave the lowest cross-validation error. (b) Cross-validation errors for GME and 1000 Genomes Project samples.

Supplementary Figure 4 Unsupervised ADMIXTURE analysis of GME populations shows genetic history.

Results of ADMIXTURE analysis for LD-filtered variants for 1,111 GME samples across the six geographical subregions. Eleven iterations of k were run, from 2 to 12, to optimize clustering. Each vertical bar represents a single individual. The y axis shows the estimated proportion of the genome assigned to each ancestral cluster. Samples grouped by subregion and organized from west (left) to east (right), showing trends of overlap. Substantial substructure was apparent throughout much of the GME, but three apparent ‘sources’ of ancestral populations stem from the NWA (yellow), AP (red), and PP (green) subregions.

Supplementary Figure 5 Introgression analysis of GME and 1000 Genomes Project exome samples shows consistent Neanderthal introgression on all GME, European, and East Asian samples except for NWA.

(a) Individuals from the 1000 Genomes Project reference populations and GME subregions were projected onto the first two principal components calculated from Neanderthal, chimpanzee, and Denisovan genomes. PC1 separates ancient human populations from chimpanzee, and PC2 separates the Neanderthal and Denisovan populations. When human samples were projected onto these principal components, they clustered near the center of these three species. Arrows are drawn from the center of the sub-Saharan African populations to each of the ancestral human and chimpanzee points. The sub-Saharan African populations represent a control group, where only limited Neanderthal and Denisovan introgression should be present. (b) Magnified view of a showing the dispersal of human populations within these two principal components. Samples are colored on the basis of continental origin, and subpopulations are labeled to indicate the center of each population. African populations were found to be separate from the remaining populations, which were found from this adjusted origin along the Neanderthal vector. Most populations were found to be tightly clustered with only the TP and NWA populations, showing clear separation, suggesting a common time point of introgression among these clustered populations. The NWA samples had less introgression than the other GME populations.

Supplementary Figure 6 Heat map of pairwise FST values among all 1000 Genomes Project and GME populations identifies three clusters with a low degree of differentiation.

Top right, Wright's fixation index; bottom left, standard error values. Populations are ordered on the basis of geographical location. Three distinct clusters of close populations (shown as a blue gradient) are evident: 1000 Genomes Project Africa (LWK and YRI); 1000 Genomes Project Europe (FIN, CEU, and TSI), and GME subregions (NWA, NEA, AP, SD, TP, and PP); and 1000 Genomes Project East Asia (JPT, CHS, and CHB). Among global populations, the GME and European populations were more closely related than any other two continental regions. The greatest distance between any two populations was estimated as 0.212 for YRI and JPT. As populations became more distant, standard error values increased but remained small for all comparisons.

Supplementary Figure 7 Principal-component analysis on GME and 1000 Genomes Project populations showed that PC3 and PC4 explained inter-GME variance.

Plots comparing all combinations of PC1, PC2, PC3, and PC4 and percentages of variance explained. GME populations are color-coded by geographical regions. PC1 (39.03%) and PC2 (31.38%) together accounted for the majority of variation in the data and were associated with separating Africans and East Asians from other samples, respectively. PC3 and PC4 separated GME and European populations along north–south and east–west axes, respectively. AP was the most distant cluster from the 1000 Genomes Project reference populations, showing the greatest separation along PC3. Both of the North African populations tended to cluster closer to the sub-Saharan African cluster, whereas PP and TP trended toward the East Asian cluster.

Supplementary Figure 8 Reported consanguineous marriage rates many fold higher in GME than in other continental populations.

Clinical survey results aggregated to estimate regional averages of the consanguineous marriage rate. Weighted averages, taking sample size into account, were calculated across all studies falling within a given region. The highest rates of consanguineous marriage were documented in PP and AP.

Supplementary Figure 9 GME samples carried longer and rarer runs of homozygosity than 1000 Genomes Project populations.

(a) Cumulative proportion total ROH length by bin for African, East Asian, European, and GME populations. African populations had the shortest accumulation of ROH spans, whereas GME populations showed the longest despite the limited influence of bottlenecks. (b) Distribution of total ROH length (in Mb) for all 1000 Genomes Project and GME populations. Wider distributions were evident for the GME populations owing to heterogeneity in long ROHs. (c) The total number of exomic bases found in ROHs binned by frequency in each population. GME ROHs tended to be unique in comparison to 1000 Genomes Project populations.

Supplementary Figure 10 Identity-by-state distance comparing human and chimpanzee reference genomes showed burden bias associated with hg19 corrected using estimated ancestral alleles.

(a) Homozygous and heterozygous variant counts shown for samples using hg19 (left) and PanTro2 (right) as the reference genomes. PanTro2 alleles demonstrated a linear relationship between populations, arguing for no burden difference. (b) IBS distance to the reference for chimpanzee genomes PanTro2 and PanTro4 (x axis) versus human hg19 (y axis). Human populations stratify by IBS distance using the hg19 reference genome. With chimpanzee ancestral variants, populations were equidistant from the chimpanzee reference genome.

Supplementary Figure 11 Correction of PolyPhen-2 predictions for derived variants resolved missense burden bias.

(a) The proportions of derived (Der) and ancestral (Anc) variants falling into each PolyPhen-2 class (B, benign; P, possibly damaging; D, probably damaging), across 14 allele frequency bins. The bias was apparent in the absence of possibly damaging and probably damaging calls for derived variants across nearly all bins. This bias can misrepresent results when comparing populations. (b) The same proportions after correction of derived variant PolyPhen-2 classes (Online Methods). Derived variant classes reflect the distributions of the ancestral variants. The x axis shows derived allele frequency bins, with parentheses and square brackets designating exclusion and inclusion, respectively.

Supplementary Figure 12 Mean derived allele frequencies for GME and 1000 Genomes Project populations across seven functional and deleteriousness variant classes suggested equivalent selective pressure.

(a) Calculated mean DAFs and standard errors for GME and 1000 Genomes Project populations. Variants were separated by functional class (noncoding, synonymous, nonsynonymous, and LOF) and corrected PolyPhen-2 deleteriousness class (benign, possibly damaging, probably and damaging). Populations are ordered as indicated on the right. No significant difference between populations was found for any variant class. (b) Mean DAF comparison for the X chromosome. Large error bars for some classes reflect limited ascertainment of variants within those classes.

Supplementary Figure 13 Comparison of allele frequency estimates from Exome Variant Server European-American and African-American populations showed poor correlation.

Comparison of the distribution of estimated allele frequencies for shared variants from two populations, EA and AA, showed poor correlation (Pearson's r = 0.1147). Hexagonal bins are colored according to the abundance of variants falling within each region. The linear regression line (blue) and identity line (black) are shown.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13 and Supplementary Tables 1, 2 and 4. (PDF 2258 kb)

Supplementary Table 3

List of variants predicted to be potentially homozygous loss of function in verified healthy GME samples. (XLS 132 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scott, E., Halees, A., Itan, Y. et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nat Genet 48, 1071–1076 (2016). https://doi.org/10.1038/ng.3592

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3592

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing