ABSTRACT
Objectives Accurate single-nucleotide polymorphisms (SNP) calls are crucial for robust evolutionary and population genetic inferences in genomic analyses. Such inferences can reveal the time-scales and processes associated with the emergence and spread of pandemic plant pathogens, such as the rice blast fungus Magnaporthe oryzae (Syn. Pyricularia oryzae). However, the specificity and sensitivity of SNP calls depend on the filtering parameters applied to the data. Here, we used a benchmarking approach to evaluate the impact of SNP calling on different population genetic analyses of the rice blast fungus, namely genetic clustering, topology of phylogenetic reconstructions and estimation of evolutionary rates.
Results To benchmark SNP calling parameters, we generated a gold standard set of validated SNPs by sequencing nine M. oryzae genomes with both Illumina short-reads and Oxford Nanopore Technologies (ONT). We used the gold standard set of SNPs to identify the SNP calling parameter configuration that maximizes sensitivity and specificity. We found that the choice of parameter configurations can substantially change the number of ascertained SNPs, preferentially affecting SNPs segregating at low population frequency. However, SNP calling parameter configurations did not significantly affect the clustering of isolates in clonal lineages, the monophyly of each clonal lineage, and the estimation of evolutionary rates. We leverage the evolutionary rates obtained from each SNP calling parameter configuration to generate divergence time estimates that take into account the uncertainty associated with both the estimation of evolutionary rates and SNP calling. Our analysis indicates that M. oryzae clonal lineage expansions took place ~300 years ago.
Competing Interest Statement
The authors have declared no competing interest.
List of abbreviations
- M. oryzae
- Magnaporthe oryzae
- ONT
- Oxford Nanopore Technologies
- VQSR
- Variant Quality Score Recalibration
- GATK
- Genome Analysis Toolkit
- SNP
- Single Nucleotide Polymorphism
- GSVD
- Gold Standard Variants Dataset
- QD
- Quality by Depth
- REF
- Reference
- ALT
- Alternative
- ReadPosRankSum
- Rank sum test for relative positioning REF versus ALT alleles within reads.
- MQRankSum
- Rank sum test for mapping qualities of REF versus ALT reads
- BaseQRankSum
- Rank sum test of REF versus ALT base quality scores
- FS
- Strand bias estimated using Fisher’s exact test
- VCF
- Variant Call Format
- PCA
- Principal Component Analysis
- SFS
- Site Frequency Spectrum
- MCMC
- Markov Chain Monte Carlo
- MRCA
- Most Recent Common Ancestor
- TMRCA
- Time to the Most Recent Common Ancestor
- HPD
- High Posterior Density
- ESS
- Estimated Sample Size
- TMRCA
- Time to the Most Recent Common Ancestor