TY - JOUR T1 - Fantastic beasts and how to sequence them: genomic approaches for obscure model organisms JF - bioRxiv DO - 10.1101/165928 SP - 165928 AU - Mikhail V. Matz Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/07/20/165928.abstract N2 - Summary Application of genomic approaches to “obscure model organisms” (OMOs), meaning species with little or no genomic resources, enables increasingly sophisticated studies of genomic basis of evolution, acclimatization and adaptation in real ecological context. Here, I aim to identify sequencing solutions and data handling techniques best suited for genomic analysis of OMOs.Trends- Adoption of allele frequency spectrum (AFS) analyses for demographic and population structure studies, based on RAD sequencing data.- Switch from RAD to whole-genome and exome sequencing in “genome scanning” studies.- Adoption of cost-efficient gene expression quantification methods based on counting transcripts rather than on whole-transcriptome resequencing.- Use of genetically driven gene expression signatures to examine genotype-phenotype associations and genetic basis of adaptation.- Studies of ecological dynamics and inheritance of DNA methylation marks based on cost-efficient alternatives to bisulfite sequencing.- Adoption of “third-generation” sequencing technologies (PacBio and Oxford Nanopore Technologies) for generation of genome and transcriptome references.Outstanding questions- How the power and accuracy of AFS analysis is affected by the limited number of RAD loci? Although each RAD locus might contain several SNPs and thus the total SNP count might seem large, SNPs from the same RAD locus are highly correlated and so the number of independent data points is in fact much lower.- What are the limits to genotype imputation in natural populations? Which pilot experiments could help decide whether low-coverage whole genome sequencing with imputation might be a feasible strategy for a particular organism?- How to profile genetically determined gene expression in non-clonal organisms? Cross-tissue analysis is promising but more validation experiments are needed to develop guidelines on how many and which tissues should be profiled for best cost-benefit balance.- Can methylated DNA bases be reliably detected by single-molecule sequencing in complex genomes? Pilot data on bacterial DNA is very promising but additional validation in complex genomes is required.Glossary- Allele Frequency Spectrum, AFS (same as Site Frequency Spectrum, SFS): histogram of the number of segregating variants depending on their frequency in one or more populations.- Restriction site-Associated DNA (RAD) sequencing: family of diverse genotyping methods that sequence short fragments of the genome adjacent to recognition site(s) for specific restriction endonuclease(s).- Linkage Disequilibrium (LD): in this review, correlation of genotypes at a pair of markers across individuals.- LD block: typical distance between markers in the genome across which their genotypes remain correlated.- Genome scan: profiling of genotypes along the genome looking for unusual patterns.Often used to look for signatures of natural selection or introgression.- “Denser-than-LD” genotyping: genotyping of several polymorphic markers per LD block.- Highly contiguous reference: genome or transcriptome reference sequence containing the least amount of fragmentation.- Phased data: data showing which SNP alleles belong to the same homologous chromosome copy.- Cross-tissue gene expression analysis: looking for individual-specific shifts in gene expression detectable across multiple tissues. Such shifts are predominantly genetic in nature. ER -