Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)

View ORCID ProfileMichael C. Schatz, Anthony A. Philippakis, Enis Afgan, Eric Banks, Vincent J. Carey, Robert J. Carroll, Alessandro Culotti, Kyle Ellrott, Jeremy Goecks, Robert L. Grossman, View ORCID ProfileIra M. Hall, View ORCID ProfileKasper D. Hansen, Jonathan Lawson, Jeffrey T. Leek, Anne O’Donnell Luria, Stephen Mosher, Martin Morgan, Anton Nekrutenko, Brian D. O’Connor, Kevin Osborn, Benedict Paten, Candace Patterson, Frederick J. Tan, Casey Overby Taylor, Jennifer Vessio, View ORCID ProfileLevi Waldron, Ting Wang, Kristin Wuichet, AnVIL Team
doi: https://doi.org/10.1101/2021.04.22.436044
Michael C. Schatz
1Department of Biology, Johns Hopkins University, Baltimore, MD
2Department of Computer Science, Johns Hopkins University, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael C. Schatz
  • For correspondence: mschatz@cs.jhu.edu aphilipp@broadinstitute.org
Anthony A. Philippakis
3Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: mschatz@cs.jhu.edu aphilipp@broadinstitute.org
Enis Afgan
1Department of Biology, Johns Hopkins University, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric Banks
3Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vincent J. Carey
4Harvard Medical School, Harvard University, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert J. Carroll
5Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alessandro Culotti
6Center for Translational Data Science, University of Chicago, Chicago, IL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kyle Ellrott
7Biomedical Engineering, Oregon Health & Science University, Portland, OR
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeremy Goecks
7Biomedical Engineering, Oregon Health & Science University, Portland, OR
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert L. Grossman
6Center for Translational Data Science, University of Chicago, Chicago, IL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ira M. Hall
8Yale School of Medicine, Yale University, New Haven, CT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ira M. Hall
Kasper D. Hansen
9Department of Biostatistics, Johns Hopkins University, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kasper D. Hansen
Jonathan Lawson
3Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey T. Leek
9Department of Biostatistics, Johns Hopkins University, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anne O’Donnell Luria
3Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Mosher
1Department of Biology, Johns Hopkins University, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin Morgan
10Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anton Nekrutenko
11Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, PA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian D. O’Connor
3Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kevin Osborn
12UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benedict Paten
12UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Candace Patterson
3Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Frederick J. Tan
13Department of Embryology, Carnegie Institution, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Casey Overby Taylor
14Departments of Medicine and Biomedical Engineering, Johns Hopkins University, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer Vessio
1Department of Biology, Johns Hopkins University, Baltimore, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Levi Waldron
15Department of Epidemiology and Biostatistics, City University of New York Graduate School of Public Health and Health Policy, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Levi Waldron
Ting Wang
16Department of Genetics, Washington University of St. Louis. St. Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kristin Wuichet
5Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
AnVIL Team
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

The traditional model of genomic data analysis - downloading data from centralized warehouses for analysis with local computing resources - is increasingly unsustainable. Not only are transfers slow and cost prohibitive, but this approach also leads to redundant and siloed compute infrastructure that makes it difficult to ensure security and compliance of protected data. The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) inverts this model, providing a unified cloud computing environment for data storage, management, and analysis. AnVIL eliminates the need for data movement, allows for active threat detection and monitoring, and provides scalable, shared computing resources that can be acquired by researchers as needed. This presents many new opportunities for collaboration and data sharing that will ultimately lead to scientific discoveries at scales not previously possible.

Competing Interest Statement

A. Philippakis is a Venture Partner at GV and has received funding from Intel, IBM, Microsoft, Alphabet, and Bayer. D. Baker, E. Afgan, J. Goecks, J.Chilton, and A. Nekrutenko are founders of and hold equity in GalaxyWorks, LLC. The results of the study discussed in this publication could affect the value of GalaxyWorks, LLC. These arrangements have been reviewed and approved by the Johns Hopkins University, Oregon Health & Science University, and The Pennsylvania State University in accordance with their respective conflict of interest policies. V. Carrey has financial interest in Amazon, NVIDIA, and AMD.

Footnotes

  • ↵** AnVIL Team members and affiliations listed on Table 1

  • http://anvilproject.org

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted April 23, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)
Michael C. Schatz, Anthony A. Philippakis, Enis Afgan, Eric Banks, Vincent J. Carey, Robert J. Carroll, Alessandro Culotti, Kyle Ellrott, Jeremy Goecks, Robert L. Grossman, Ira M. Hall, Kasper D. Hansen, Jonathan Lawson, Jeffrey T. Leek, Anne O’Donnell Luria, Stephen Mosher, Martin Morgan, Anton Nekrutenko, Brian D. O’Connor, Kevin Osborn, Benedict Paten, Candace Patterson, Frederick J. Tan, Casey Overby Taylor, Jennifer Vessio, Levi Waldron, Ting Wang, Kristin Wuichet, AnVIL Team
bioRxiv 2021.04.22.436044; doi: https://doi.org/10.1101/2021.04.22.436044
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)
Michael C. Schatz, Anthony A. Philippakis, Enis Afgan, Eric Banks, Vincent J. Carey, Robert J. Carroll, Alessandro Culotti, Kyle Ellrott, Jeremy Goecks, Robert L. Grossman, Ira M. Hall, Kasper D. Hansen, Jonathan Lawson, Jeffrey T. Leek, Anne O’Donnell Luria, Stephen Mosher, Martin Morgan, Anton Nekrutenko, Brian D. O’Connor, Kevin Osborn, Benedict Paten, Candace Patterson, Frederick J. Tan, Casey Overby Taylor, Jennifer Vessio, Levi Waldron, Ting Wang, Kristin Wuichet, AnVIL Team
bioRxiv 2021.04.22.436044; doi: https://doi.org/10.1101/2021.04.22.436044

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4369)
  • Biochemistry (9545)
  • Bioengineering (7068)
  • Bioinformatics (24767)
  • Biophysics (12559)
  • Cancer Biology (9923)
  • Cell Biology (14297)
  • Clinical Trials (138)
  • Developmental Biology (7929)
  • Ecology (12074)
  • Epidemiology (2067)
  • Evolutionary Biology (15954)
  • Genetics (10903)
  • Genomics (14705)
  • Immunology (9843)
  • Microbiology (23582)
  • Molecular Biology (9454)
  • Neuroscience (50691)
  • Paleontology (369)
  • Pathology (1535)
  • Pharmacology and Toxicology (2674)
  • Physiology (3997)
  • Plant Biology (8638)
  • Scientific Communication and Education (1505)
  • Synthetic Biology (2388)
  • Systems Biology (6415)
  • Zoology (1344)