RT Journal Article SR Electronic T1 Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.10.29.514266 DO 10.1101/2022.10.29.514266 A1 M. Elise Lauterbur A1 Maria Izabel A. Cavassim A1 Ariella L. Gladstein A1 Graham Gower A1 Nathaniel S. Pope A1 Georgia Tsambos A1 Jeff Adrion A1 Saurabh Belsare A1 Arjun Biddanda A1 Victoria Caudill A1 Jean Cury A1 Ignacio Echevarria A1 Benjamin C. Haller A1 Ahmed R. Hasan A1 Xin Huang A1 Leonardo Nicola Martin Iasi A1 Ekaterina Noskova A1 Jana Obšteter A1 Vitor Antonio Corrêa Pavinato A1 Alice Pearson A1 David Peede A1 Manolo F. Perez A1 Murillo F. Rodrigues A1 Chris C. R. Smith A1 Jeffrey P. Spence A1 Anastasia Teterina A1 Silas Tittes A1 Per Unneberg A1 Juan Manuel Vazquez A1 Ryan K. Waples A1 Anthony Wilder Wohns A1 Yan Wong A1 Franz Baumdicker A1 Reed A. Cartwright A1 Gregor Gorjanc A1 Ryan N. Gutenkunst A1 Jerome Kelleher A1 Andrew D. Kern A1 Aaron P. Ragsdale A1 Peter L. Ralph A1 Daniel R. Schrider A1 Ilan Gronau YR 2022 UL http://biorxiv.org/content/early/2022/10/31/2022.10.29.514266.abstract AB Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic data sets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and to the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al.,2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than three-fold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.Competing Interest StatementThe authors have declared no competing interest.