New Results
Building pangenome graphs
View ORCID ProfileErik Garrison, View ORCID ProfileAndrea Guarracino, View ORCID ProfileSimon Heumos, View ORCID ProfileFlavia Villani, View ORCID ProfileZhigui Bao, View ORCID ProfileLorenzo Tattini, View ORCID ProfileJörg Hagmann, View ORCID ProfileSebastian Vorbrugg, View ORCID ProfileSantiago Marco-Sola, View ORCID ProfileChristian Kubica, View ORCID ProfileDavid G. Ashbrook, View ORCID ProfileKaisa Thorell, View ORCID ProfileRachel L. Rusholme-Pilcher, View ORCID ProfileGianni Liti, Emilio Rudbeck, View ORCID ProfileSven Nahnsen, View ORCID ProfileZuyu Yang, View ORCID ProfileMwaniki N. Moses, View ORCID ProfileFranklin L. Nobrega, View ORCID ProfileYi Wu, View ORCID ProfileHao Chen, View ORCID ProfileJoep de Ligt, View ORCID ProfilePeter H. Sudmant, View ORCID ProfileNicole Soranzo, View ORCID ProfileVincenza Colonna, View ORCID ProfileRobert W. Williams, View ORCID ProfilePjotr Prins
doi: https://doi.org/10.1101/2023.04.05.535718
Erik Garrison
1Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S Manassas St, Memphis, 38163, Tennessee, USA
Andrea Guarracino
1Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S Manassas St, Memphis, 38163, Tennessee, USA
2Fondazione Human Technopole, Viale Rita Levi Montalcini, 20157 Milan, Italy
Simon Heumos
3Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany
4Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany
Flavia Villani
1Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S Manassas St, Memphis, 38163, Tennessee, USA
Zhigui Bao
5Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road 97, Shenzhen, 518120, Guangdong, China
Lorenzo Tattini
6Institute for Research on Cancer and Aging, Nice (IRCAN), Nice, France
7CNRS UMR 7284, INSERM U 1081, Université Côte d’Azur (UCA), Nice, France
Jörg Hagmann
8Computomics GmbH, Eisenbahnstr. 1, 72072 Tübingen, Baden-Württemberg, Germany
Sebastian Vorbrugg
9Department of Molecular Biology, Max Planck Institute for Biology, Max-Planck-Ring 9, 72076 Tübingen, Baden-Wuerttemberg, Germany
Santiago Marco-Sola
10Computer Sciences Department, Barcelona Supercomputing Center, Barcelona 08034, Spain
11Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autonoma de Barcelona, Barcelona 08193, Spain
Christian Kubica
9Department of Molecular Biology, Max Planck Institute for Biology, Max-Planck-Ring 9, 72076 Tübingen, Baden-Wuerttemberg, Germany
David G. Ashbrook
1Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S Manassas St, Memphis, 38163, Tennessee, USA
Kaisa Thorell
12Chemistry and Molecular Biology, Faculty of Science, University of Gothenburg, Sweden
Rachel L. Rusholme-Pilcher
13Earlham Institute, Norwich Research Park, Colney Lane, Norwich, Norfolk, NR4 7UZ. UK
Gianni Liti
6Institute for Research on Cancer and Aging, Nice (IRCAN), Nice, France
7CNRS UMR 7284, INSERM U 1081, Université Côte d’Azur (UCA), Nice, France
Emilio Rudbeck
14Clinical Genomics Gothenburg, Bioinformatics and Data Centre, University of Gothenburg, Sweden
Sven Nahnsen
3Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany
4Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany
Zuyu Yang
15The Institute of Environmental Science and Research, New Zealand
Mwaniki N. Moses
16Department of Computer Science, University of Pisa
Franklin L. Nobrega
17School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
Yi Wu
17School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
Hao Chen
18Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science Center, Memphis, TN
Joep de Ligt
15The Institute of Environmental Science and Research, New Zealand
Peter H. Sudmant
19Department of Integrative Biology, University of California Berkeley, Berkeley, CA
Nicole Soranzo
20Wellcome Sanger Institute, Genome Campus, Hinxton CB10 1SA, UK
21National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge, UK
22Department of Haematology, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
23British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
2Fondazione Human Technopole, Viale Rita Levi Montalcini, 20157 Milan, Italy
Vincenza Colonna
1Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S Manassas St, Memphis, 38163, Tennessee, USA
24Institute of Genetics and Biophysics, National Research Council, Naples 80111, Italy
Robert W. Williams
1Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S Manassas St, Memphis, 38163, Tennessee, USA
Pjotr Prins
1Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S Manassas St, Memphis, 38163, Tennessee, USA
Abstract
Pangenome graphs can represent all variation between multiple genomes, but existing methods for constructing them are biased due to reference-guided approaches. In response, we have developed PanGenome Graph Builder (PGGB), a reference-free pipeline for constructing unbi-ased pangenome graphs. PGGB uses all-to-all whole-genome alignments and learned graph embeddings to build and iteratively refine a model in which we can identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.
Competing Interest Statement
Author J.H. is employed by Computomics GmbH.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Posted April 06, 2023.
Building pangenome graphs
Erik Garrison, Andrea Guarracino, Simon Heumos, Flavia Villani, Zhigui Bao, Lorenzo Tattini, Jörg Hagmann, Sebastian Vorbrugg, Santiago Marco-Sola, Christian Kubica, David G. Ashbrook, Kaisa Thorell, Rachel L. Rusholme-Pilcher, Gianni Liti, Emilio Rudbeck, Sven Nahnsen, Zuyu Yang, Mwaniki N. Moses, Franklin L. Nobrega, Yi Wu, Hao Chen, Joep de Ligt, Peter H. Sudmant, Nicole Soranzo, Vincenza Colonna, Robert W. Williams, Pjotr Prins
bioRxiv 2023.04.05.535718; doi: https://doi.org/10.1101/2023.04.05.535718
Building pangenome graphs
Erik Garrison, Andrea Guarracino, Simon Heumos, Flavia Villani, Zhigui Bao, Lorenzo Tattini, Jörg Hagmann, Sebastian Vorbrugg, Santiago Marco-Sola, Christian Kubica, David G. Ashbrook, Kaisa Thorell, Rachel L. Rusholme-Pilcher, Gianni Liti, Emilio Rudbeck, Sven Nahnsen, Zuyu Yang, Mwaniki N. Moses, Franklin L. Nobrega, Yi Wu, Hao Chen, Joep de Ligt, Peter H. Sudmant, Nicole Soranzo, Vincenza Colonna, Robert W. Williams, Pjotr Prins
bioRxiv 2023.04.05.535718; doi: https://doi.org/10.1101/2023.04.05.535718
Subject Area
Subject Areas
- Biochemistry (11697)
- Bioengineering (8714)
- Bioinformatics (29118)
- Biophysics (14924)
- Cancer Biology (12047)
- Cell Biology (17347)
- Clinical Trials (138)
- Developmental Biology (9405)
- Ecology (14138)
- Epidemiology (2067)
- Evolutionary Biology (18260)
- Genetics (12214)
- Genomics (16759)
- Immunology (11838)
- Microbiology (27986)
- Molecular Biology (11545)
- Neuroscience (60780)
- Paleontology (450)
- Pathology (1864)
- Pharmacology and Toxicology (3228)
- Physiology (4937)
- Plant Biology (10381)
- Synthetic Biology (2876)
- Systems Biology (7332)
- Zoology (1642)