TY - JOUR T1 - Going down the rabbit hole: a review on how to link genome-wide data with ecology and evolution in natural populations JF - bioRxiv DO - 10.1101/052761 SP - 052761 AU - Yann X.C. Bourgeois Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/05/12/052761.abstract N2 - Characterizing species history and assessing the nature and extent of local adaptation is crucial in conservation, agronomy, functional ecology and evolutionary biology. The ongoing and constant improvement of next-generation sequencing (NGS) techniques has facilitated the production of an increasingly growing amount of genetic markers across genomes of non-model species. The study of variation at these markers across natural populations has deepened the understanding of how population history and selection act on genomes. However, this improvement has come with a burst of analytical tools that can confuse naïve users. This confusion can limit the amount of information effectively retrieved from complex genomic datasets. In addition, the lack of a unified analytical pipeline impairs the diffusion of the most recent analytical tools into fields like conservation biology. This requires efforts be made in providing introduction to these methods. In this paper I describe possible analytical protocols and recent methods dealing with analysis of genome-scale datasets, clarify the strategy they use to infer demographic history and selection, and discuss some of their limitations.GlossarySNPsingle nucleotide polymorphism.Variant callingidentifying confidently genomic variants from alignment data (in SAM/BAM format, see Li et al., 2009). Classical SNP callers include the Genome Analysis Toolkit or GATK (McKenna et al., 2010), freebayes (Garrison and Marth, 2012), samtools (Li et al., 2009) or Platypus (Rimmer et al., 2014). Other tools call large-scale variants such as inversions, translocations or copy-number variation (see main text).Phasinga process which identifies the alleles that are co-located on the same chromosome copy.Pooled sequencinga protocol where tens or hundreds samples are pooled in a single library prior sequencing (Futschik and Schlötterer, 2010). This prevents any individual identification of each sample. ER -