Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Population structure of UK Biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure

View ORCID ProfileKevin J. Galinsky, Po-Ru Loh, Mallick Swapan, View ORCID ProfileNick J. Patterson, Alkes L. Price
doi: https://doi.org/10.1101/055855
Kevin J. Galinsky
1Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
2Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kevin J. Galinsky
Po-Ru Loh
2Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mallick Swapan
2Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
4Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nick J. Patterson
2Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nick J. Patterson
Alkes L. Price
1Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
2Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Analyzing genetic differences between closely related populations can be a powerful way to detect recent adaptation. The very large sample size of the UK Biobank is ideal for detecting selection using population differentiation, and enables an analysis of UK population structure at fine resolution. In analyses of 113,851 UK Biobank samples, population structure in the UK is dominated by 5 principal components (PCs) spanning 6 clusters: Northern Ireland, Scotland, northern England, southern England, and two Welsh clusters. Analyses with ancient Eurasians show that populations in the northern UK have higher levels of Steppe ancestry, and that UK population structure cannot be explained as a simple mixture of Celts and Saxons. A scan for unusual population differentiation along top PCs identified a genome-wide significant signal of selection at the coding variant rs601338 in FUT2 (p = 9.16 × 10−9). In addition, by combining evidence of unusual differentiation within the UK with evidence from ancient Eurasians, we identified new genome-wide significant (p < 5 × 10−8) signals of recent selection at two additional loci: CYP1A2/CSK and F12. We detected strong associations to diastolic blood pressure in the UK Biobank for the variants with new selection signals at CYP1A2/CSK (p = 1.10 × 10−19)) and for variants with ancient Eurasian selection signals in the ATXN2/SH2B3 locus (p = 8.00 × 10−33), implicating recent adaptation related to blood pressure.

Introduction

Detecting signals of selection can provide biological insights into adaptations that have shaped human history1–4. Searching for genetic variants that are unusually differentiated between populations is a powerful way to detect recent selection5; this approach has been applied to detect signals of selection linked to lactase resistance6,7, fatty acid decomposition8, hypoxia response9–11, malaria resistance12–14, and other traits and diseases15–18.

Leveraging population differentiation to detect selection is particularly powerful when analyzing closely related subpopulations with large sample sizes19. Here, we analyze 113,851 samples of UK ancestry from the UK Biobank (see URLs) in conjunction with recently published People of the British Isles (PoBI)20 and ancient DNA21–24 data sets to draw inferences about population structure and recent selection. We employ a recently developed selection statistic that detects unusual population differentiation along continuous principal components (PCs) instead of between discrete subpopulations25, and combine our results with independent results from ancient Eurasians23. We detect three new signals of selection, and show that genetic variants with both new and previously reported23 signals of selection are strongly associated to diastolic blood pressure in UK Biobank samples.

Results

Population Structure in the UK Biobank

We restricted our analyses of population structure to 113,851 UK Biobank samples of UK ancestry and 202,486 SNPs after quality control (QC) filtering and linkage disequilibrium (LD) pruning (see Online Methods). We ran principal components analysis (PCA) on this data, using our FastPCA implementation25 (see URLs). We determined that the top 5 PCs represent geographic population structure (Figure 1), by visually examining plots of the top 10 PCs (SupplementaryFigure 1), observing that the eigenvalues for the top 5 PCs were above background levels, and that the eigenvectors were correlated with birth coordinate (SupplementaryTable 1). The eigenvalue for PC1 was 20.99, which corresponds to the eigenvalue that would be expected at this sample size for two discrete subpopulations of equal size with an FST of 1.76 × 10−4 (SupplementaryTable 1).

Figure 1
  • Download figure
  • Open in new tab
Figure 1 Results of PCA with k-means clustering

The top 5 PCs in UK Biobank data are displayed. Samples were clustered using these PCs into 6 clusters with k-means clustering (see Table 1). PC5 is plotted against PC2, because PC5 primarily separated the orange and red clusters, which were separated from the other clusters by PC2.

We ran k-means clustering on these 5 PCs to partition the samples into 6 clusters, since K PCs can differentiate K + 1 populations (Figure 1, Table 1, SupplementaryFigure 2). To identify the populations underlying the 6 clusters, we projected the PoBI dataset20, comprising 2,039 samples from 30 regions of the UK, onto the UK Biobank PCs (Figure 2, SupplementaryFigure 3). The individuals in the PoBI study were from rural areas of the UK and had all four grandparents born within 80 km of each other, allowing a glimpse into the genetics of the UK before the increase in mobility of the 20th century. We selected representative PoBI sample regions that best aligned with the 6 UK Biobank clusters by comparing centroids of each projected population region with those from the UK Biobank clusters via visual inspection (see Online Methods, Table 1). The largest cluster represented southern England, three clusters represented different regions in the northern UK (northern England, Northern Ireland and Scotland) and two clusters represented north and south Wales. The PCs separated the six UK clusters along two general geographical axes: a north-south axis and a Welsh-specific axis. PC1 and PC3 both separated individuals on north-south axes of variation, with southern England on one end and one of the northern UK clusters on the other. PC2 separated the Welsh clusters from the rest of the UK. PC4 separated the Scotland cluster from the Northern Ireland cluster. PC5 separated the north Wales and south Wales (also known as Pembrokeshire) clusters from each other.

Figure 2
  • Download figure
  • Open in new tab
Figure 2 Results of PCA with projection of PoBI samples

The top 5 PCs in UK Biobank data are displayed with PoBI samples projected onto these PCs. PoBI populations which visually best matched the clusters from k-means clustering were used to assign names to the six clusters (Table 1).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1 Correspondence between UK Biobank clusters and PoBI populations

We report the PoBI population that most closely corresponds to each UK Biobank cluster (see main text).

We next analyzed UK Biobank population structure in conjunction with ancient DNA samples. Modern European populations are known to have descended from three ancestral populations: Steppe, Mesolithic Europeans and Neolithic farmers21,22. We projected ancient samples from these three populations as well as ancient Saxon samples24 onto the UK Biobank PCs (Figure 3, SupplementaryFigure 4, see Online Methods). These populations were primarily differentiated along PC1 and PC3, indicating higher levels of Steppe ancestry in northern UK populations.

Figure 3
  • Download figure
  • Open in new tab
Figure 3 Results of PCA with projection of ancient samples

The top 5 PCs in UK Biobank data are displayed with ancient samples projected onto these PCs.

Additionally, the lack of any ancient sample correlation with PC2 suggests that Welsh populations are not differentially admixed with any ancient population in our data set, and likely underwent Welsh-specific genetic drift. We confirmed these findings by projecting pan-European POPRES26 samples onto the UK Biobank PCs (see Online Methods, SupplementaryFigure 5) noting that of the continental European populations, Russians (who have the most Steppe ancestry) lie on one side and Spanish and Italians (who have least)22 lie on the other side along PC1 and PC3, and that none of the continental European populations projected onto the same regions as the Welsh on PC2 and PC5.

In addition to the impact of ancient Eurasian populations, we know that the genetics of the UK has been strongly impacted by Anglo-Saxon migrations since the Iron Age24, with the Angles arriving in eastern England and the Saxons in southern England. The Anglo-Saxons interbred with the native Celts, which explains much of the genetic landscape in the UK. We analyzed a variety of samples from Celtic (Scotland and Wales) and Anglo-Saxon (southern and eastern England) populations from modern Britain in conjunction with the PoBI samples20 and 10 ancient Saxon samples from eastern England24 in order to assess the relative amounts of Steppe ancestry. We computed f4 statistics27 of the form f4(Steppe, Neolithic Farmer; Pop1, Pop2), where Steppe and Neolithic Farmer populations are from ref. 21,22, Popl is either a modern Celtic or ancient Saxon population and Pop2 is a modern Anglo-Saxon population (Table 2, SupplementaryTable 2). This statistic is sensitive to Steppe ancestry with positive values indicating more Steppe ancestry in Pop1 than Pop2. We consistently obtained significantly positive f4 statistics, implying that both the modern Celtic samples and the ancient Saxon samples have more Steppe ancestry than the modern Anglo-Saxon samples from southern and eastern England. This indicates that southern and eastern England is not exclusively a genetic mix of Celts and Saxons. There are a variety of possible explanations, but one is that the present genetic structure of Britain, while subtle, is quite old, and that southern England in Roman times already had less Steppe ancestry than Wales and Scotland.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2 Results of f4 statistics in ancient and modern British samples

We report f4 statistics of of the form f4(Steppe, Neolithic Farmer; Pop1, Pop2), representing a z-score with positive values indicating more Steppe ancestry in Popl than Pop2. Samples for Popl were either modern Celtic (Scotland and Wales) or ancient Saxon. Samples for Pop2 were modern Anglo-Saxon (southern and eastern England).

Signals of Natural Selection

We searched for signals of selection using a recently developed selection statistic that detects unusual population differentiation along continuous PCs25. Notably, this statistic is able to detect selection signals at genome-wide significance. We analyzed the top 5 UK Biobank PCs (which were computed using LD-pruned SNPs), and computed selection statistics at 510,665 SNPs, reflecting the set of SNPs after QC but before LD-pruning (see Online Methods). The Manhattan plot for PC1 is reported in Figure 4, with additional plots in SupplementaryFigure 6. We detected genome-wide significant signals of selection at FUT2 and at several loci with widely known signals of selection (Table 3). Loci with suggestive signals of selection (p < 10−6) are reported in SupplementaryTable 3. FUT2 has also previously been reported as a target of natural selection28,29, although those results focused on frequency differences between highly diverged continental populations whereas our results implicate much more recent selection. FUT2 encodes fucosyltransferase 2, an enzyme that affects the Lewis blood group. The SNP with the most significant p-value, rs601338, is a coding variant where the variant rs601338*G encodes the secretor allele and the rs601338*A variant encodes the nonsecretor allele, which protects against the Norwalk norovirus30,31. This SNP also affects the progression of HIV infection32, and is associated with vitamin B12 levels33, Crohn’s disease34, celiac disease and inflammatory bowel disease35, possibly due to changes in gut microbiome energy metabolism36. rs601338*A is more common in northern UK samples (SupplementaryTable 4). Similar allele frequency patterns were also observed in GERA37 and PoBI20 samples at rs492602 and rs676388 (SupplementaryTable 4), two linked SNPs in FUT2 whose allele frequencies vary on a north-south axis in UK Biobank data. rs492602 and rs676388 were suggestively significant (p < 1.00 × 10 −6) but not genome-wide-significant in tests for selection using the GERA data set (SupplementaryTable 5), emphasizing the advantage of analyzing more closely related subpopulations in very large sample sizes in the UK Biobank data set. These three SNPs were also significant when analyzing the 6 UK Biobank clusters described above using a test for selection based on unusual differentiation between discrete subpopulations (SupplementaryTable 6).

Figure 4
  • Download figure
  • Open in new tab
Figure 4 Selection statistics for UK Biobank along PC1

A Manhattan plot with – log10(p) values is displayed. Values above the significance threshold (dotted line, p = 1.96 × 10−8, α = 0.05 after correcting for 5 PCs and 510,665 SNPs) are displayed as larger points and are labeled with the locus they correspond to (see Table 3). – log1o(p) values larger than 10 are truncated at 10 for easier visualization and are displayed as even larger points.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3 Top signals of selection for UK Biobank along PC1-PC5

We report the top signal of natural selection for each locus reaching genome-wide significance (p < 1.96 × 10−8) along any of the top five PCs. Neighboring SNPs <1Mb apart with genome-wide significant signals were grouped together into a single locus.

To detect additional signals of selection, we combined our PC-based selection statistics from the UK Biobank data with a previously described selection statistic that detects unusual allele frequency differences after the admixture of ancient Eurasian populations by identifying SNPs whose allele frequencies are inconsistent with admixture proportions inferred from genome-wide data23. For each of PC1-PC5 in UK Biobank, we summed our chi-square (1 d.o.f.) selection statistics for that PC with the chi-square (4 d.o.f.) selection statistics from ref. 23 to produce chi-square (5 d.o.f.) statistics that combine these independent signals (see Online Methods). We confirmed the independence of the two selection statistics by checking that the combined statistics were not inflated, as well as by examining the correlations between the two selection statistics (SupplementaryTable 7). We looked for signals that were genome-wide significant in the combined selection statistic but not in either of the constituent UK Biobank or ancient Eurasian selection statistics. Results are reported in Table 4.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 4 Top signals of selection for combined selection statistics

We report the top selection statistic for each locus reaching genome-wide significance, restricting to loci that were not genome-wide significant in either the UK Biobank selection statistic or the ancient Eurasian selection statistics. Neighboring SNPs <1Mb apart with genome-wide significant signals were grouped together into a single locus.

We detected new genome-wide significant signals of selection at the F12 and CYP1A2/CSK loci. We are not currently aware of previous evidence of selection at F12. F12 codes for coagulation factor XII, a protein involved in blood clotting38. The SNP at the F12 locus, rs2545801 was suggestively significant in the ancient Eurasian analysis (p = 5.35 × 10−8), and combining it with the UK Biobank selection statistic on PC2 produced a genome-wide significant signal. This SNP has been associated with activated partial thromboplastin time, a measure of blood clotting speed where shorter time is a risk factor for strokes39. An additional significant SNP at F12, rs2731672, affects expression of F12 in liver40 and is associated with plasma levels of factor XII41. The CYP1A2/CSK locus has previously been reported as a target of natural selection when comparing inter-continental allele and haplotype frequencies42,43, but our results implicate much more recent selection. The two detected SNPs at this locus are in strong LD (r2 = 0.858). The top SNP, rs1378942, is in an intron in the CSK gene. This SNP has greatly varying allele frequency across continents43, is associated with blood pressure44,45 and systemic sclerosis (an autoimmune disease affecting connective tissue)46. The second SNP, rs2472304 in CYP1A2, is associated with esophageal cancer47, caffeine consumption48 and may mediate the protective effect of caffeine on Parkinson’s disease49.

We tested SNPs with genome-wide significant signals of selection in the constituent UK Biobank or ancient Eurasian scans or the combined scan for association with 15 phenotypes in the UK Biobank data set, using the top 5 PCs as covariates (SupplementaryTable 8, see Online Methods). The top SNP at F12 (rs2545801) was associated with height (p = 4.8 × 10−11), and the top SNP at CYP1A2/CSK (rs1378942) was associated with diastolic blood pressure (DBP) (p = 3.6 × 10−19) and hypertension (p = 4.8 × 10−9), consistent with previous findings50. We detected additional associations with DBP (p = 8.00 × 10−33) and hypertension (p = 1.30 × 10−9) at the ATXN2/SH2B3 locus which was reported as under selection in the ancient Eurasian scan. The top SNP in ATXN2/SH2B3, rs3184504, is known to be associated with blood pressure51. We note that PC1 and PC3 were strongly associated with height in the UK Biobank data set, and PC3 and PC4 were associated with DBP(SupplementaryTable 9). GRK452, AGT52 and ATP1A114 have also been reported to be under selection and to be associated with DBP or hypertension. None of the SNPs in GRK4 or ATP1A1 were found to be under selection or associated with DBP or hypertension in our analyses. The AGT SNP rs699 was associated with DBP (p = 7.2 × 10−10) and nominally associated to hypertension (p = 4.8 × 10−4), although it did not produce a significant signal of selection in our analyses.

Discussion

In this study, we used PCA to analyze the population structure of a large UK cohort (N = 113,851). We detected 5 PCs representing geographic population structure that partitioned this cohort into six subpopulation clusters. Projecting ancient samples onto these PCs revealed greater Steppe ancestry in northern UK samples. No ancient samples were found to vary along the Welsh-specific axis, suggesting that the Welsh populations differ from the rest of the UK due to drift and not different levels of admixture. We also determined that UK population structure cannot be explained as a simple mixture of Celts and Saxons.

We leveraged the subtle population structure and large sample size of the UK Biobank data set to detect signals of natural selection. We determined that the rs601338*A allele of FUT2 was more common in northern UK samples, suggesting that pathogens may have exerted selective pressure in those populations. Combining a selection statistic that detects selection via population differentiation within the UK with a separate statistic that detects selection since ancient population admixture in Europe, we were able to detect selection at two additional loci, F12 and CYP1A2/CSK. We additionally found associations to diastolic blood pressure at CYP1A2/CSKand at the ATXN2/SH2B3 locus implicated in a previous selection scan.

We conclude by noting three limitations in our work. First, we employed PCA, a widely used method for analyzing population structure25,53,54, but haplotype-based methods such as fineSTRUCTURE may be more powerful20,55,56; recent advances in computationally efficient phasing57,58 increase the prospects for applying such methods to biobank scale data. Second, we employed methods designed to detect selection at individual loci, but did not employ methods to detect polygenic selection59–63; our observation that top PCs were correlated with height and DBP in the UK Biobank data set, which could potentially be consistent with the action of polygenic selection on these traits, motivates further analyses of possible polygenic selection. Finally, the PC-based test for selection that we employed assumes that allele frequencies vary linearly along a PC. The spatial ancestry analysis (SPA) method64–66 allows for a logistic relationship between allele frequency and ancestry, and is not constrained by this limitation. However, the advantage of the PC-based test for selection is that it allows for the detection of genome-wide significant signals, a key consideration in genome scans for selection.

Online Methods

UK Biobank data set

The UK Biobank phase 1 data release contains 847,131 SNPs and 152,729 samples. We removed SNPs that were multi-allelic, had a genotyping rate less than 99%, or had minor allele frequency (MAF) less than 1%. We also removed samples with non-British ancestry as well as samples with a genotyping rate less than 98%. This left 510,665 SNPs and 118,650 samples, a data set that we call “QC*.” Using PLINK267 (see URLs), we removed SNPs not in Hardy-Weinberg equilibrium (p < 10−6), and we LD-pruned SNPs to have r2 < 0.2. We then generated a genetic relationship matrix (GRM) and removed one of each any pair of samples with relatedness greater than 0.05. This data set, which we call “LD,” contained 210,113 SNPs and 113,851 samples. Taking the full set of SNPs from the QC* data set and the set of unrelated samples from the LD data set produces the final “QC” dataset.

PoBI and POPRES data sets

The 2,039 UK PoBI samples were a subset of the 4,371 samples collected as part of the PoBI project20. The 2,039 samples were a subset of the 2,886 samples genotyped on the Illumina Human 1.2M-Duo genotyping chip, with 2,510 samples passing QC procedures and 2,039 samples with all four grandparents born within 80km of each other. We also examined 2,988 European POPRES samples from the LOLIPOP and CoLaus collections26. These samples were genotyped on the Affymetrix GeneChip 500K Array.

Ancient DNA data sets

Ancient DNA was gathered from several regions. 9 Steppe samples were collected from the Yamna oblast in Russia22, 7 west-European hunter-gatherers from Loschbour21, 26 Neolithic farmer samples from the Anatolian region22, and 10 Saxon samples from three sites in the UK24. DNA was extracted from bone tissue, PCR amplified and then purified using a hybrid capture approach22–24. The resulting DNA was sequenced on Illumina MiSeq, HiSeq or NextSeq platforms. Sequenced reads were aligned to the human genome using BWA and called SNPs were intersected with the SNPs found on the Human Origins Array27.

PCA

We ran PCA on the UK Biobank LD dataset using the FastPCA software in EIGENSOFT25 (see URLs). We identified several artifactual PCs that were dominated by regions of long-range LD (Supplementary Figure 7). Removing loci with significant or suggestive selection signals (SupplementaryTable 10) along with their flanking 1Mb regions from the LD data set and rerunning PCA eliminated these artifactual PCs (SupplementaryFigure 1). We refer to the resulting data set with 202,486 SNPs and 113,851 samples as the “PC” dataset.

PC Projection

We projected PoBI20 (642,288 SNPs, 2,039 samples from 30 populations), POPRES26 (453,442 SNPs, 4,079 samples from 60 populations) and ancient DNA22,23 (159,588 SNPs, 52 samples from 4 populations) samples onto the UK Biobank PCs via PC projection53. The SNPs in the UK Biobank QC data set were intersected with those in the projected data set and A/T and C/G SNPs were removed due to strand ambiguity (75,254, 37,593 and 24,467 SNPs for PoBI, POPRES and ancient DNA, respectively). The intersected set of SNPs was stringently LD-pruned for r2 < 0.05 using PLINK267 (see URLs) (leaving 27,769, 20,914 and 15,722 SNPs respectively). SNP weights were computed for the intersected set of SNPs and these weights were then used to project the new samples onto the UK Biobank PCs53.

PCA-based selection statistic

PCA is equivalent to the singular value decomposition (X = UΣVT) where X is the normalized genomic matrix, U is the matrix of left singular vectors, V is the matrix of right singular vectors, and Σ is a diagonal matrix of singular values. The singular values are related to the eigenvalues of the genetic relationship matrix (GRM) by the relationship Λ = Σ2/M, where M is the number of SNPs used to compute the GRM XTX/M. The matrix U has the properties UTU = I and U = XvΣ−1. By the central limit theorem, the elements of U follow a normal distribution and after rescaling by M they follow a chi-square (1 d.o.f.) distribution. In other words, the statistic Embedded Image for the ith SNP at the kth PC follows a chi-square (1 d.o.f.) distribution25. One benefit of this statistic is that the PCs can be generated on one set of SNPs (here we used the PC dataset described earlier) and the selection statistic can be calculated on another set of SNPs (we used the QC dataset).

Signals of selection were clustered by considering all SNPs for which the p-value along at least one PC was less than an initial threshold (which we set at 10−6) and clustering together SNPs within 1Mb. We defined genome-wide significant loci based on clusters that contained at least one SNP with a p-value smaller than the genome-wide significance threshold. Since we analyzed 5 PCs and 510,665 SNPs, the genome-wide significance threshold was 0.05/(5 × 510,665) = 1.96 × 10−8. We defined suggestive loci based on clusters with at least two SNPs crossing the initial threshold (but none crossing the genome-wide significance threshold).

Combined selection statistic

We intersected the chi-square (4 d.o.f.) ancient Eurasian selection statistics for 1,004,613 SNPs from Mathieson et al.23 with the PC-based chi-square (1 d.o.f.) UK Biobank selection statistics for 510,665 QC SNPs, producing a list of 115,066 SNPs. For each SNP and each PC, we added the ancient Eurasian selection statistics to the UK Biobank selection statistics for that PC, producing chi-square (5 d.o.f.) statistics which we corrected using genomic control.

Association tests

Association analyses were performed using PLINK267 with the top 5 PC as covariates using the “––linear” or “––logistic” flags.

URLs

UK Biobank: http://www.ukbiobank.ac.uk/

EIGENSOFT v6.1.1 (FastPCA and PC-based selection statistic): http://www.hsph.harvard.edu/alkes-price/software/

PLINK2: https://www.cog-genomics.org/plink2

Acknowledgments

We thank Iain Mathieson and David Reich for helpful discussions and Stephan Schiffels for technical assistance with Saxon samples. This research was conducted using the UK Biobank Resource and was funded by NIH grant R01 HG006399.

References

  1. 1.↵
    Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  2. 2.
    Nielsen, R., Hellmann, I., Hubisz, M., Bustamante, C. & Clark, A. G. Recent and ongoing selection in the human genome. Nat. Rev. Genet. 8, 857–868 (2007).
    OpenUrlCrossRefPubMed
  3. 3.
    Novembre, J. & Di Rienzo, A. Spatial patterns of variation due to natural selection in humans. Nat. Rev. Genet. 10, 745–755 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  4. 4.↵
    Scheinfeldt, L. B. & Tishkoff, S. A. Recent human adaptation: genomic approaches, interpretation and insights. Nat. Rev. Genet. 14, 692–702 (2013).
    OpenUrlCrossRefPubMed
  5. 5.↵
    Shriver, M. D. et al. The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs. Hum. Genomics 1, 274 (2004).
    OpenUrlCrossRefPubMed
  6. 6.↵
    Bersaglieri, T. et al. Genetic Signatures of Strong Recent Positive Selection at the Lactase Gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).
    OpenUrlCrossRefPubMedWeb of Science
  7. 7.↵
    Tishkoff, S. A. et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat. Genet. 39, 31–40 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  8. 8.↵
    Fumagalli, M. et al. Greenlandic Inuit show genetic signatures of diet and climate adaptation. Science 349, 1343–1347 (2015).
    OpenUrlAbstract/FREE Full Text
  9. 9.↵
    Yi, X. et al. Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude. Science 329, 75–78 (2010).
    OpenUrlAbstract/FREE Full Text
  10. 10.
    Bigham, A. et al. Identifying Signatures of Natural Selection in Tibetan and Andean Populations Using Dense Genome Scan Data. PLoS Genet 6, e1001116 (2010).
    OpenUrlCrossRefPubMed
  11. 11.↵
    Lorenzo, F. R. et al. A genetic mechanism for Tibetan high-altitude adaptation. Nat. Genet. 46, 951–956 (2014).
    OpenUrlCrossRefPubMed
  12. 12.↵
    Hamblin, M. T. & Di Rienzo, A. Detection of the Signature of Natural Selection in Humans: Evidence from the Duffy Blood Group Locus. Am. J. Hum. Genet. 66, 1669–1679 (2000).
    OpenUrlCrossRefPubMedWeb of Science
  13. 13.
    Ayodo, G. et al. Combining Evidence of Natural Selection with Association Analysis Increases Power to Detect Malaria-Resistance Variants. Am. J. Hum. Genet. 81, 234–242 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  14. 14.↵
    Gurdasani, D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015).
    OpenUrlCrossRefPubMed
  15. 15.↵
    Lamason, R. L. et al. SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans. Science 310, 1782–1786 (2005).
    OpenUrlAbstract/FREE Full Text
  16. 16.
    Perry, G. H. et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 39, 1256–1260 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  17. 17.
    Hancock, A. M. et al. Adaptations to Climate-Mediated Selective Pressures in Humans. PLoS Genet 7, e1001375 (2011).
    OpenUrlCrossRefPubMed
  18. 18.↵
    Ko, W.-Y. et al. Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations. Am. J. Hum. Genet. 93, 54–66 (2013).
    OpenUrlCrossRefPubMed
  19. 19.↵
    Bhatia, G. et al. Genome-wide Comparison of African-Ancestry Populations from CARe and Other Cohorts Reveals Signals of Natural Selection. Am. J. Hum. Genet. 89, 368–381 (2011).
    OpenUrlCrossRefPubMed
  20. 20.↵
    Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).
    OpenUrlCrossRefPubMed
  21. 21.↵
    Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014).
    OpenUrlCrossRefPubMedWeb of Science
  22. 22.↵
    Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).
    OpenUrlCrossRefPubMed
  23. 23.↵
    Mathieson, I. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015).
    OpenUrlCrossRefPubMed
  24. 24.↵
    Schiffels, S. et al. Iron Age and Anglo-Saxon genomes from East England reveal British migration history. Nat. Commun. 7, 10408 (2016).
    OpenUrlCrossRefPubMed
  25. 25.↵
    Galinsky, K. J. et al. Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia. Am. J. Hum. Genet. 98, 456–472 (2016).
    OpenUrlCrossRef
  26. 26.↵
    Nelson, M. R. et al. The Population Reference Sample, POPRES: A Resource for Population, Disease, and Pharmacological Genetics Research. Am. J. Hum. Genet. 83, 347358 (2008).
    OpenUrl
  27. 27.↵
    Patterson, N. et al. Ancient Admixture in Human History. Genetics 192, 1065–1093 (2012).
    OpenUrlAbstract/FREE Full Text
  28. 28.↵
    Ferrer-Admetlla, A. et al. A Natural History of FUT2 Polymorphism in Humans. Mol. Biol. Evol. 26, 1993–2003 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  29. 29.↵
    Fumagalli, M. et al. Widespread balancing selection and pathogen-driven selection at blood group antigen genes. Genome Res. 19, 199–212 (2009).
    OpenUrlAbstract/FREE Full Text
  30. 30.↵
    Thorven, M. et al. A Homozygous Nonsense Mutation (428G→A) in the Human Secretor (FUT2) Gene Provides Resistance to Symptomatic Norovirus (GGII) Infections. J. Virol. 79, 15351–15355 (2005).
    OpenUrlAbstract/FREE Full Text
  31. 31.↵
    Carlsson, B. et al. The G428A Nonsense Mutation in FUT2 Provides Strong but Not Absolute Protection against Symptomatic GII.4 Norovirus Infection. PLOS ONE 4, e5593 (2009).
    OpenUrlCrossRefPubMed
  32. 32.↵
    Kindberg, E. et al. A nonsense mutation (428G→A) in the fucosyltransferase FUT2 gene affects the progression of HIV-1 infection: AIDS 20, 685–689 (2006).
    OpenUrlCrossRefPubMedWeb of Science
  33. 33.↵
    Hazra, A. et al. Common variants of FUT2 are associated with plasma vitamin B12 levels. Nat. Genet. 40, 1160–1162 (2008).
    OpenUrlCrossRefPubMedWeb of Science
  34. 34.↵
    McGovern, D. P. B. et al. Fucosyltransferase 2 (FUT2) non-secretor status is associated with Crohn’s disease. Hum. Mol. Genet. 19, 3468–3476 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  35. 35.↵
    Parmar, A. S. et al. Association study of FUT2 (rs601338) with celiac disease and inflammatory bowel disease in the Finnish population. Tissue Antigens 80, 488–493 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  36. 36.↵
    Tong, M. et al. Reprograming of gut microbiome energy metabolism by the FUT2 Crohn’s disease risk polymorphism. ISME J. 8, 2193–2206 (2014).
    OpenUrlCrossRefPubMed
  37. 37.↵
    Banda, Y. et al. Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort. Genetics 200, 1285–1295 (2015).
    OpenUrlAbstract/FREE Full Text
  38. 38.↵
    Renne, T., Schmaier, A. H., Nickel, K. F., Blombäck, M. & Maas, C. In vivo roles of factor XII. Blood 120, 4296–4303 (2012).
    OpenUrlAbstract/FREE Full Text
  39. 39.↵
    Tang, W. et al. Genetic Associations for Activated Partial Thromboplastin Time and Prothrombin Time, their Gene Expression Profiles, and Risk of Coronary Artery Disease. Am. J. Hum. Genet. 91, 152–162 (2012).
    OpenUrlCrossRefPubMed
  40. 40.↵
    Innocenti, F. et al. Identification, Replication, and Functional Fine-Mapping of Expression Quantitative Trait Loci in Primary Human Liver Tissue. PLoS Genet. 7, (2011).
  41. 41.↵
    Guerrero, J. A. et al. Novel loci involved in platelet function and platelet count identified by a genome-wide study performed in children. Haematologica 96, 1335–1343 (2011).
    OpenUrlAbstract/FREE Full Text
  42. 42.↵
    Wooding, S. P. et al. DNA Sequence Variation in a 3.7-kb Noncoding Sequence 5′ of the CYP1A2 Gene: Implications for Human Population History and Natural Selection. Am. J. Hum. Genet. 71, 528–542 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  43. 43.↵
    Ding, K. & Kullo, I. J. Geographic differences in allele frequencies of susceptibility SNPs for cardiovascular disease. BMC Med. Genet. 12, 55 (2011).
    OpenUrlPubMed
  44. 44.↵
    Newton-Cheh, C. et al. Genome-wide association study identifies eight loci associated with blood pressure. Nat. Genet. 41, 666–676 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  45. 45.↵
    Tabara, Y. et al. Common Variants in the ATP2B1 Gene Are Associated With Susceptibility to Hypertension The Japanese Millennium Genome Project. Hypertension 56, 973–980 (2010).
    OpenUrlCrossRef
  46. 46.↵
    Martin, J.-E. et al. Identification of CSK as a systemic sclerosis genetic risk factor through Genome Wide Association Study follow-up. Hum. Mol. Genet. 21, 2825–2835 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  47. 47.↵
    Xie, Q. et al. Decision Forest Analysis of 61 Single Nucleotide Polymorphisms in a Case-Control Study of Esophageal Cancer; a novel method. BMC Bioinformatics 6, 1–9 (2005).
    OpenUrlCrossRefPubMedWeb of Science
  48. 48.↵
    Cornelis, M. C. et al. Genome-Wide Meta-Analysis Identifies Regions on 7p21 (AHR) and 15q24 (CYP1A2) As Determinants of Habitual Caffeine Consumption. PLoS Genet. 7, (2011).
  49. 49.↵
    Popat, R. A. et al. Coffee, ADORA2A, and CYP1A2: the caffeine connection in Parkinson’s disease. Eur. J. Neurol. Off. J. Eur. Fed. Neurol. Soc. 18, 756–765 (2011).
    OpenUrl
  50. 50.↵
    Hong, K.-W. et al. Recapitulation of two genomewide association studies on blood pressure and essential hypertension in the Korean population. J. Hum. Genet. 55, 336–341 (2010).
    OpenUrlCrossRefPubMed
  51. 51.↵
    Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk. Nature 478, 103–109 (2011).
    OpenUrlCrossRefPubMedWeb of Science
  52. 52.↵
    Sabeti, P. C. et al. Positive Natural Selection in the Human Lineage. Science 312, 16141620 (2006).
    OpenUrl
  53. 53.↵
    Patterson, N., Price, A. L. & Reich, D. Population Structure and Eigenanalysis. PLoS Genet 2, e190 (2006).
    OpenUrlCrossRefPubMed
  54. 54.↵
    Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).
    OpenUrlCrossRefPubMedWeb of Science
  55. 55.↵
    Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of Population Structure using Dense Haplotype Data. PLoS Genet 8, e1002453 (2012).
    OpenUrlCrossRefPubMed
  56. 56.↵
    The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
    OpenUrlCrossRefPubMed
  57. 57.↵
    Loh, P., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. in press. http://biorxiv.org/content/early/2015/10/04/028282
  58. 58.↵
    O’Connell, J. R., Sharp, K., Delaneau, O. & Marchini, J. Haplotype estimation for biobank scale datasets. Nat. Genet. accepted in principle.
  59. 59.↵
    Pritchard, J. K., Pickrell, J. K. & Coop, G. The Genetics of Human Adaptation: Hard Sweeps, Soft Sweeps, and Polygenic Adaptation. Curr. Biol. 20, R208–R215 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  60. 60.
    Pritchard, J. K. & Di Rienzo, A. Adaptation - not by sweeps alone. Nat. Rev. Genet. 11, 665–667 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  61. 61.
    Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).
    OpenUrlCrossRefPubMed
  62. 62.
    Berg, J. J. & Coop, G. A Population Genetic Signal of Polygenic Adaptation. PLOS Genet 10, e1004412 (2014).
    OpenUrlCrossRefPubMed
  63. 63.↵
    Robinson, M. R. et al. Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 47, 1357–1362 (2015).
    OpenUrlCrossRefPubMed
  64. 64.↵
    Yang, W.-Y., Novembre, J., Eskin, E. & Halperin, E. A model-based approach for analysis of spatial structure in genetic data. Nat. Genet. 44, 725–731 (2012).
    OpenUrlCrossRefPubMed
  65. 65.
    Baran, Y., Quintela, I., Carracedo, Á., Pasaniuc, B. & Halperin, E. Enhanced Localization of Genetic Samples through Linkage-Disequilibrium Correction. Am. J. Hum. Genet. 92, 882–894 (2013).
    OpenUrlCrossRefPubMed
  66. 66.↵
    Baran, Y. & Halperin, E. A Note on the Relations Between Spatio-Genetic Models. J. Comput. Biol. 22, 905–917 (2015).
    OpenUrl
  67. 67.↵
    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
    OpenUrlCrossRefPubMed
  68. 68.
    Heffelfinger, C. et al. Haplotype structure and positive selection at TLR1. Eur. J. Hum. Genet. 22, 551–557 (2014).
    OpenUrlCrossRefPubMed
  69. 69.
    Burton, P. R. et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  70. 70.
    Pickrell, J. K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009).
    OpenUrlAbstract/FREE Full Text
  71. 71.
    de Bakker, P. I. W. et al. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat. Genet. 38, 1166–1172 (2006).
    OpenUrlCrossRefPubMedWeb of Science
  72. 72.
    Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A Map of Recent Positive Selection in the Human Genome. PLoS Biol 4, e72 (2006).
    OpenUrlCrossRefPubMed
Back to top
PreviousNext
Posted May 27, 2016.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Population structure of UK Biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Population structure of UK Biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure
Kevin J. Galinsky, Po-Ru Loh, Mallick Swapan, Nick J. Patterson, Alkes L. Price
bioRxiv 055855; doi: https://doi.org/10.1101/055855
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Population structure of UK Biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure
Kevin J. Galinsky, Po-Ru Loh, Mallick Swapan, Nick J. Patterson, Alkes L. Price
bioRxiv 055855; doi: https://doi.org/10.1101/055855

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4246)
  • Biochemistry (9175)
  • Bioengineering (6807)
  • Bioinformatics (24066)
  • Biophysics (12160)
  • Cancer Biology (9567)
  • Cell Biology (13847)
  • Clinical Trials (138)
  • Developmental Biology (7661)
  • Ecology (11739)
  • Epidemiology (2066)
  • Evolutionary Biology (15547)
  • Genetics (10673)
  • Genomics (14365)
  • Immunology (9515)
  • Microbiology (22916)
  • Molecular Biology (9135)
  • Neuroscience (49170)
  • Paleontology (358)
  • Pathology (1487)
  • Pharmacology and Toxicology (2584)
  • Physiology (3851)
  • Plant Biology (8351)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2301)
  • Systems Biology (6207)
  • Zoology (1304)