ABSTRACT
Pluripotent stem cells from diverse humans offer the potential to study human functional variation in controlled culture environments. A portion of this variation originates from ancient admixture between modern humans and Neandertals, which introduced alleles that left a phenotypic legacy on individual humans today. Here we show that a large repository of human induced pluripotent stem cells (iPSCs) harbors extensive Neandertal DNA, including most known functionally relevant Neandertal alleles present in modern humans. This resource contains Neandertal DNA that contributes to human phenotypes and diseases, encodes hundreds of amino acid changes, and alters gene expression in specific tissues. Human iPSCs thus provide an opportunity to experimentally explore the Neandertal contribution to present-day phenotypes, and potentially study Neandertal traits.
MAIN TEXT
Protocols have been developed to differentiate human embryonic and induced pluripotent stem cells (iPSCs) into many different cell types of the human body1. In addition, stem cells can self-organize into complex three-dimensional structures containing multiple cell types that resemble human tissues (such as the brain, liver, stomach, intestine, and kidney)2. These stem cell-derived systems can be used to explore how natural variation between human individuals impacts development and cell biology3. Some of the variation in present-day humans derives from admixture between modern and archaic hominids. Analyses of Neandertal genomes have shown that Neandertals and modern humans interbred approximately 55,000 years ago as the latter migrated out of Africa. As a consequence, around 2 percent of the genomes of all present-day non-Africans derive from Neandertal ancestors4–6. Because the segments of DNA inherited from Neandertals varies between individuals, cumulatively at least 40% of the Neandertal genome survives in people today7. Recent genome-wide association studies suggest that the DNA relics from this admixture left an extensive phenotypic legacy, influencing for example skin and hair color, immune response, lipid metabolism, bone morphology, blood coagulation, sleep patterns, and mood disorders7–15. In addition, Neandertal-derived DNA has a significant effect on gene expression in adult human tissues16,17. However, these associations have been observed in living people or in tissues, where there is limited opportunity for controlled experimentation. Further, there are few opportunities to study the impact of Neandertal-derived DNA on modern human development. Recently, the Human Induced Pluripotent Stem Cell Initiative (HipSci) published their work on generating and characterizing a large resource of human iPSCs with genome-wide genotype data18. This large repository presents an unprecedented opportunity to identify carriers of Neandertal alleles of interest and explore the genetic mechanisms underlying Neandertal and modern human phenotypes.
We have analyzed the genome sequences from 173 of these individuals (mostly Europeans) and identified the modern-human and Neandertal component of each individual’s ancestry (Fig. 1A, S1). We used alleles in present-day humans that are shared with the Vindija Neandertal and absent in Yoruba individuals, along with a linkage disequilibrium-based test for incomplete lineage sorting (ILS), to identify haplotypes that are likely of Neandertal origin (Fig. S1B-C). We used the Vindija Neandertal genome to identify Neandertal haplotypes because it is genetically more similar to the introgressing Neandertals than the Altai individual5, thus providing additional power to detect haplotypes6. Based on these inferred haplotypes, we find that cumulatively 19.6% (661 Mb) of the Neandertal genome is represented in these cell lines, with between 18.7 and 30.9 Mb Neandertal DNA per individual (Table 1, S1–S3). We found that 98% of inferred haplotypes overlap previously identified introgressed sequence and that the cumulative amount of Neandertal DNA present in this resource approaches the total amount that has been identified in Europeans19 (21.3%; Fig. 1B). We note that the addition of non-European populations would extend this even further. For example, an additional 16.4% of the Neandertal genome has been identified in east and south Asians19, but is absent from the HipSci resource as individuals are largely of European ancestry.
We next analyzed the power of the HipSci resource to study functionally relevant Neandertal DNA. We collected recently published Neandertal-derived phenotype and disease associated alleles8,10–14 and found that most (22/24) of the alleles that reached genome-wide significance are present in the resource in more than 1 iPSC line (Table 2). These alleles are associated with a variety of processes including digestive function, nutrition, skin color, coagulatory protein production, and immune response (Fig. 1C). In addition, we identified hundreds of alleles that alter amino acids, are expression quantitative trait loci (eQTL)14 or show allele-specific expression16 (Fig. 1D; Table S1).
We find that 50 HipSci lines chosen at random will allow the interrogation of ~310 Neandertal-associated eQTLs with each site represented in at least 5 cell lines (Fig. 1E, S2). We note that each Neandertal allele present in the HipSci resource exists in a primarily modern human background. However, in each individual many Neandertal alleles co-occur (Fig. 1F). For example, in the HipSci resource, one of the Neandertal alleles at the OAS1 locus (chr12:113425154), a locus with the highest Neandertal frequencies in present-day humans, is paired with 90% of the other introgressed Neandertal alleles in at least one cell line. It may thus be possible to leverage such co-occurrences to study epistatic interactions among Neandertal alleles. Additionally, as iPSC resources continue to expand to include individuals from other populations, it will become possible to explore the phenotypic contribution of alleles derived from other archaic hominins such as Denisovans20, a distant Asian relative of Neandertals that made even larger genetic contributions to present-day people in parts of Oceania21–23.
Taken together, our analysis suggests that human-Neandertal hybrid iPSC resources can be used to systematically explore Neandertal allele function in diverse cell types differentiated in controlled culture environments, including the previously unexplored study of developmental processes.
METHODS
Detection of Neandertal haplotypes
To define Neandertal haplotypes, we first identified a set of SNPs where one allele is likely of Neandertal origin. These Neandertal SNPs (aSNPs) have one allele that is (iii) present in the genomes of the Vindija Neandertal6, and (ii) present 1,000 Genomes Project (phase III) Eurasian populations, but (iii) absent from Yoruban, an African population with little to no Neandertal admixture19. To detect putative Neandertal haplotypes we scanned for consecutive stretches of aSNPs in the genomes of the cell lines where the individual carries the Neandertal-shared alleles, with continuous SNPs located not more than 20,000 bps from one another. To define a Neandertal haplotype we required a consecutive stretch of at least three Neandertal alleles across their corresponding successive aSNPs. Haplotypes are additionally tested for a length that exceeds the expected length for segments of incomplete lineage sorting, based on the algorithm presented by Huerta-Sanchez et al. 24 and the age of the divergence to Neandertals of 465,000 years used in 12, a conservative estimate of the human mutation rate, (mu=1x10−8 per site per generation) and two recombination maps 25,26. The resulting P values have been corrected for multiple testing using the Benjamini-Hochberg approach. We included haplotypes with an FDR < 0.05 for ILS for at least one of the recombination maps, or if no recombination map data was available, inferred haplotypes with a length greater than 50kb or at least 10 consecutive aSNPs with an Neandertal allele to our analyses. All inferred haplotypes for each cell line are available in supplementary data tables. We have applied the method to the genotype data for 173 individuals of the HipSci resource and all non-Africans of the 1,000 Genomes project (phase III).
Detection of Neandertal missense, regulatory and pheWAS variants
We sought to identify putatively functional Neandertal alleles that overlap confidently inferred Neandertal haplotypes (see section “Detection of Neandertal haplotypes”) in the cell lines by detecting those that alter the protein or regulatory sequence of a gene. First, for the detection of Neandertal alleles that modify the protein sequence, we selected all Neandertal alleles within confidently inferred Neandertal haplotypes detected in any cell line and annotated them functionally using the variant effect predictor (VEP, human Ensembl version 73). We selected those alleles that were defined as ‘missense’ by VEP. Second, we annotated Neandertal alleles likely to be involved in gene regulation by overlapping them with three datasets: (i) enhancer and promoter regions provided by the Ensembl Regulatory Build 27, (ii) significant eQTLs in the GTEx dataset 17 and (iii) allele-specfic expression 16. For (i) we identified aSNPs with the Neandertal alleles directly overlapping with a regulatory motif. For (ii) we selected the inferred Neandertal haplotypes with the top 20 most significant eQTLs in each of the 48 GTEx tissues with more than 50 individuals 28 and required to have at least one aSNP to be present in a given Neandertal haplotype and iPSC individual, resulting in a total of 409 such Neandertal haplotypes. For (iii) we selected all Neandertal alleles that have been identified to show allele-specific expression (FDR<0.1). Third, we queried the pheWAS database for associations of Neandertal alleles and specific phenotypes in modern humans. We selected all 925 aSNPs with significant phenotype associations detected by Simonti et al.8. We further selected multiple additional significant phenotypes associations for Neandertal alleles (Table 2) 8,l0–14.
Power analysis
The ability to study a particular Neandertal variant depends both on its effect size and its frequency within a given sample of individuals or cell lines. We cannot control the effect size, but one can – within reason – control the number of samples they consider. Larger sample sizes allow more variants to be studied, but may offer diminishing returns. To determine the power of studies of certain sample sizes, we considered how many Neandertal variants would be present at particular frequency thresholds, as an effect of sample size. For each category of Neandertal allele (eQTL, amino acid change, etc), we subsampled X cell lines, and counted the number of Neandertal variants present at least Z times, for values of Z = (1, 5, 10, 15, 20). This subsampling was repeated 100 times for all values of X and Z. We plot the average number of Neandertal variants present at a particular rate on the Y axis. Each possible value of Z is given a different color, and the range of values over 100 resamplings is shown as colored confidence intervals. For example, given a sample of 50 random cell lines, 62% of the 501 Neandertal eQTLs in the HipSci resource are present in at least 5 cell lines, and 92% are present in at least 1 cell line.
Principle Component Analysis on HipSci lines
To infer the genetic relationship between the HipSci individuals and present-day people, we have performed a principal component analysis using polymorphic sites in 1000 Genomes Eurasians 19 that show large population differentiation between Europeans and Asians (Fst>0.5). Population differentiation has been calculated based on Fst, using the Weir and Cockerham calculation implemented in vcftools 29 and 100 unrelated Asians and Europeans each in the 1000 Genomes panel. The principle components have been computed using SNPs with Fst > 0.5 between Europeans and Asians in the 1000 Genomes. While the HipSci resource contains non-European individuals, almost all of the individuals with genotypes are clustering with Europeans from the 1000 Genomes panel (Fig. 1B).
AUTHOR CONTRIBUTIONS
MD, BV, JGC analyzed the data and wrote the manuscript with input from JK and SP.
AUTHOR INFORMATION
The authors declare no conflicts of interest. Correspondence and requests for materials should be addressed to gray_camp{at}eva.mpg.de.
ACKNOWLEDGEMENTS
This work was supported by the NOMIS Foundation and the Max Planck Society. The GTEx data used for the analyses described in this manuscript were obtained from dbGaP accession number phs000424.v6.p1.c1 on 05/23/2016.