A flexible multi-species genome-wide 60K SNP chip developed from pooled resequencing of 240 Eucalyptus tree genomes across 12 species

New Phytol. 2015 Jun;206(4):1527-40. doi: 10.1111/nph.13322. Epub 2015 Feb 13.

Abstract

We used whole genome resequencing of pooled individuals to develop a high-density single-nucleotide polymorphism (SNP) chip for Eucalyptus. Genomes of 240 trees of 12 species were sequenced at 3.5× each, and 46 997 586 raw SNP variants were subject to multivariable filtering metrics toward a multispecies, genome-wide distributed chip content. Of the 60 904 SNPs on the chip, 59 222 were genotyped and 51 204 were polymorphic across 14 Eucalyptus species, providing a 96% genome-wide coverage with 1 SNP/12-20 kb, and 47 069 SNPs at ≤ 10 kb from 30 444 of the 33 917 genes in the Eucalyptus genome. Given the EUChip60K multi-species genotyping flexibility, we show that both the sample size and taxonomic composition of cluster files impact heterozygous call specificity and sensitivity by benchmarking against 'gold standard' genotypes derived from deeply sequenced individual tree genomes. Thousands of SNPs were shared across species, likely representing ancient variants arisen before the split of these taxa, hinting to a recent eucalypt radiation. We show that the variable SNP filtering constraints allowed coverage of the entire site frequency spectrum, mitigating SNP ascertainment bias. The EUChip60K represents an outstanding tool with which to address population genomics questions in Eucalyptus and to empower genomic selection, GWAS and the broader study of complex trait variation in eucalypts.

Keywords: Myrtaceae; pooled resequencing; population structure; single-nucleotide polymorphism (SNP) ascertainment bias; trans-species SNPs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Eucalyptus / genetics*
  • Genetics, Population
  • Genome, Plant*
  • Genotype
  • High-Throughput Nucleotide Sequencing / methods*
  • Molecular Sequence Annotation
  • Oligonucleotide Array Sequence Analysis*
  • Polymorphism, Single Nucleotide / genetics*
  • Species Specificity
  • Trees / genetics*