Abstract
Background With the advent of precision medicine, pharmacogenomics data is becoming increasingly critical to patient care. These data describe the relationship between a particular variant in the genome and the response to a drug by the patient. As utilizing this kind of data becomes more integral to medical treatment decisions, appropriate storage and sharing of this data will be critical. A potential way of securely storing and sharing pharmacogenomics data is a smart contract with the Ethereum blockchain. This is an open-source blockchain platform for decentralized applications. A transaction-based, state machine, the “world” of Ethereum maintains user accounts and storage in a network state. Immutable pieces of code called “smart contracts” may be deployed to the Ethereum network and run on the Ethereum Virtual Machine when called by a user or other contract. The 2019 iDASH (Integrating Data for Analysis, Anonymization, and Sharing) competition for Secure Genome Analysis challenged participants to develop time- and space-efficient smart contracts to log and query gene-drug relationship data on the Ethereum blockchain.
Methods We designed a smart contract to store and query pharmacogenomics data (gene-drug interaction data) in Ethereum using an index-based, multi-mapping approach allowing for time and space efficient storage and query. Our solution to the IDASH competition ranked in the top three at a workshop held in Bloomington, IN in October 2019. Although our solution performed well in the challenge, we wanted to improve its scalability and query efficiency. To that end, we developed an alternate “fastQuery” solution that stores pooled rather than raw data, allowing for significantly improved query time for 0-AND queries, and constant query time for 1- and 2-AND queries.
Results We tested the performance of both of our solutions in Truffle (v5.0.31) using datasets ranging from 100 to 1000 entries, and inserting data at 25, 50, 100, and 200 observations at a time. On a private, proof-of-authority test network, our challenge solution requires approximately 70 seconds, 500 MB of memory, and 80 MB of disk space to insert 1000 entries (200 at a time); and 400 ms and 5 MB of memory to query a two-AND query from 1000 entries. This solution exhibits constant memory for insertion and querying, and linear query time. Our alternate fastQuery solution requires approximately 60 seconds, 500 MB of memory, and 80 MB of disk space to insert 1000 entries (200 at a time); and 83 ms and 5 MB of memory to query a two-AND query from 1000 entries. This solution exhibits constant memory for insertion and querying, linear query time for 0-AND queries, and constant query time for 1- and 2-AND queries in a database of up to 1000 entries.
Conclusion In this study we showed that pharmacogenomics data can be stored and queried efficiently on the Ethereum blockchain. Our approach has the potential to be useful for a wide range of datasets in biomedical research; while we focused on gene-drug interaction data, our solution designs could be used to store a range of clinical trial data. Moreover, our solutions could be adapted to store and query data in any field where high-integrity data storage and efficient access is required.