New Results
Rapid and efficient analysis of 20,000 RNA-seq samples with Toil
View ORCID ProfileJohn Vivian, Arjun Rao, View ORCID ProfileFrank Austin Nothaft, Christopher Ketchum, Joel Armstrong, Adam Novak, Jacob Pfeil, Jake Narkizian, Alden D. Deran, Audrey Musselman-Brown, Hannes Schmidt, Peter Amstutz, Brian Craft, Mary Goldman, Kate Rosenbloom, Melissa Cline, Brian O’Connor, Megan Hanna, Chet Birger, W. James Kent, David A. Patterson, Anthony D. Joseph, Jingchun Zhu, Sasha Zaranek, Gad Getz, David Haussler, Benedict Paten
doi: https://doi.org/10.1101/062497
John Vivian
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Arjun Rao
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Frank Austin Nothaft
2AMP Lab, University of California Berkeley, Berkeley, CA, USA.
3UC Berkeley ASPIRE Lab, Berkeley, CA, USA.
Christopher Ketchum
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Joel Armstrong
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Adam Novak
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Jacob Pfeil
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Jake Narkizian
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Alden D. Deran
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Audrey Musselman-Brown
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Hannes Schmidt
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Peter Amstutz
4Curoverse, Somerville, MA, USA.
Brian Craft
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Mary Goldman
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Kate Rosenbloom
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Melissa Cline
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Brian O’Connor
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Megan Hanna
5Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA.
Chet Birger
5Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA.
W. James Kent
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
David A. Patterson
2AMP Lab, University of California Berkeley, Berkeley, CA, USA.
3UC Berkeley ASPIRE Lab, Berkeley, CA, USA.
Anthony D. Joseph
2AMP Lab, University of California Berkeley, Berkeley, CA, USA.
3UC Berkeley ASPIRE Lab, Berkeley, CA, USA.
Jingchun Zhu
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Sasha Zaranek
4Curoverse, Somerville, MA, USA.
Gad Getz
5Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA.
David Haussler
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
Benedict Paten
1Computational Genomics Lab, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
ABSTRACT
Toil is portable, open-source workflow software that supports contemporary workflow definition languages and can be used to securely and reproducibly run scientific workflows efficiently at large-scale. To demonstrate Toil, we processed over 20,000 RNA-seq samples to create a consistent meta-analysis of five datasets free of computational batch effects that we make freely available. Nearly all the samples were analysed in under four days using a commercial cloud cluster of 32,000 preemptable cores.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Posted July 07, 2016.
Rapid and efficient analysis of 20,000 RNA-seq samples with Toil
John Vivian, Arjun Rao, Frank Austin Nothaft, Christopher Ketchum, Joel Armstrong, Adam Novak, Jacob Pfeil, Jake Narkizian, Alden D. Deran, Audrey Musselman-Brown, Hannes Schmidt, Peter Amstutz, Brian Craft, Mary Goldman, Kate Rosenbloom, Melissa Cline, Brian O’Connor, Megan Hanna, Chet Birger, W. James Kent, David A. Patterson, Anthony D. Joseph, Jingchun Zhu, Sasha Zaranek, Gad Getz, David Haussler, Benedict Paten
bioRxiv 062497; doi: https://doi.org/10.1101/062497
Rapid and efficient analysis of 20,000 RNA-seq samples with Toil
John Vivian, Arjun Rao, Frank Austin Nothaft, Christopher Ketchum, Joel Armstrong, Adam Novak, Jacob Pfeil, Jake Narkizian, Alden D. Deran, Audrey Musselman-Brown, Hannes Schmidt, Peter Amstutz, Brian Craft, Mary Goldman, Kate Rosenbloom, Melissa Cline, Brian O’Connor, Megan Hanna, Chet Birger, W. James Kent, David A. Patterson, Anthony D. Joseph, Jingchun Zhu, Sasha Zaranek, Gad Getz, David Haussler, Benedict Paten
bioRxiv 062497; doi: https://doi.org/10.1101/062497
Subject Area
Subject Areas
- Biochemistry (13747)
- Bioengineering (10483)
- Bioinformatics (33293)
- Biophysics (17168)
- Cancer Biology (14242)
- Cell Biology (20184)
- Clinical Trials (138)
- Developmental Biology (10900)
- Ecology (16077)
- Epidemiology (2067)
- Evolutionary Biology (20398)
- Genetics (13441)
- Genomics (18689)
- Immunology (13816)
- Microbiology (32255)
- Molecular Biology (13421)
- Neuroscience (70258)
- Paleontology (528)
- Pathology (2203)
- Pharmacology and Toxicology (3753)
- Physiology (5902)
- Plant Biology (12055)
- Synthetic Biology (3376)
- Systems Biology (8188)
- Zoology (1847)