Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits

Matt Carland, Madhuchanda Bose, Biljana Novković, Haley Pedersen, Charles Manson, Shany Lahan, Alex Pavlenko, Puya G. Yazdi, View ORCID ProfileManfred G. Grabherr
doi: https://doi.org/10.1101/2021.10.18.464896
Matt Carland
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Madhuchanda Bose
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Biljana Novković
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Haley Pedersen
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charles Manson
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shany Lahan
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alex Pavlenko
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Puya G. Yazdi
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manfred G. Grabherr
1SelfDecode.com, 1031 Ives Dairy Road Suite 228 - 1047, Miami FL 33179, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Manfred G. Grabherr
  • For correspondence: manfred.grabherr@imbim.uu.se
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

The vast majority of human traits, including many disease phenotypes, are affected by alleles at numerous genomic loci. With a continually increasing set of variants with published clinical disease or biomarker associations, an easy-to-use tool for non-programmers to rapidly screen VCF files for risk alleles is needed. We have developed EZTraits as a tool to quickly evaluate genotype data (e.g., from microarrays) against a set of rules defined by the user. These rules can be defined directly in the scripting language Lua, for genotype calls using variant ID (RS number) or chromosomal position. Alternatively, EZTraits can parse simple and intuitive text including concepts like ‘any’ or ‘all’. Thus, EZTraits is designed to support rapid genetic analysis and hypothesis-testing by researchers, regardless of programming experience or technical background. The software is implemented in C++ and compiles and runs on Linux and MacOS. The source code is available under the MIT license from https://github.com/selfdecode/rd-eztraits

Contact: manfred{at}selfdecode.com

INTRODUCTION

Many common health disorders are highly polygenic. This means that calculating an individual’s aggregate genetic risk is often unfeasible without the use of complex polygenic risk scores (PRS) [1]. However, there are a number of health-relevant traits for which smaller subsets of variants can account for disproportionately large amounts of phenotypic variance. These mono- or oligogenic traits are therefore amenable to simpler analytical approaches.

One illustrative example is the APOE gene, in which a two-SNP haplotype may modulate an individual’s risk of late-onset Alzheimer’s disease by approximately 15x [2]. Another example is the ability to digest lactose into adulthood, which can be fully predicted on the basis of just six SNPs in the MCM6 gene, among which a single heterozygous- or homozygous-derived genotype implies lactose tolerance [3]. Similarly, dietary tolerance to fructose can be predicted by the presence of a few different combinations of homozygous mutations in the ALDOB gene [4].

Furthermore, small numbers of variants may also be useful for characterizing individual variability within specific biological pathways. One example is the COMT gene, in which various four-SNP haplotypes have been associated with significant differences in the biological activity of the gene’s product enzyme [5, 6]. Even in the absence of a direct link to a clinical phenotype, such genetic markers may serve as a useful “jumping-off” point for further investigations into the etiological structure of clinically relevant phenotypes.

Here, we present EZTraits, a tool specifically designed for non-programmers. EZTraits is intended to assist with searching VCF files for the presence of mono- or oligogenic traits and return their trait associations to the user, based either on our library of variant-trait associations or new, user-added conditions and associations. Thus, EZTraits allows genomic researchers to quickly and easily analyze a wide variety of phenotypes of clinical and scientific interest, regardless of their level of programming ability.

METHODS

EZTraits evaluates variant combinations by internally building and interpreting Lua scripts. The Lua programming language [7] was designed with ease-of-use in mind and has been widely adopted by non-programmers for computer game modding and writing plug-ins, making it a natural choice for use by researchers both with or without a coding background.

There are two ways for users to build analyses with EZTraits: (a) by writing or modifying a Lua “snippet,” which contains pre-made variables for supplying key genotype and phenotype information; or (b) by writing a plaintext rule set that provides genotype and phenotype information by using simple concepts such as all and any, which allows for a more intuitive and compact representation. This conversion feature allows users to easily write in rsID-trait associations to use with EZTraits without any “coding” at all.

Using Scripts

Users can write Lua snippets directly by providing the appropriate genotype and phenotype information. For genotypes, SNPs can be referenced using either their rsID or chromosomal position (following the syntax ‘chr1:6658743’). Phenotype information is entered by modifying two return variables: the floating-point variable ‘risk’; and the string ‘comment’ — both of which can be manipulated directly in the Lua script snippet. These two variables allow the user to flexibly provide either quantitative or qualitative phenotype data (or both), depending on the trait being analyzed.

For example, the snippet:

Figure
  • Download figure
  • Open in new tab

is completed into a valid Lua function by adding variables that correspond to the RS identifiers or chromosomal positions. These variables are automatically initialized from a VCF or TSV file, and together with a small amount of bracketing code, the complete function is:

Figure
  • Download figure
  • Open in new tab

This function is then called directly from C++ by EZTraits, and the results are presented to the user.

Structured text entry

In addition, EZTraits can automatically convert text files into Lua by applying some simple-yet-intuitive concepts, such as any and all, i.e., any of the following conditions satisfy a trait, or all in combination do. This text ruleset is then automatically converted into fully-functional Lua code via the tool Txt2Lua.

For example, the rules to define fructose tolerance/intolerance using three common causal SNPs can be written as:

Figure
  • Download figure
  • Open in new tab

EZTraits accepts and interprets the keywords ‘All’, ‘Any’, and ‘else’, optionally followed by a colon. Acceptable genotype call formats include ‘CG’ and C/G (with optional single quotation marks), where the latter convention has to be used for sites that contain indels, e.g., ‘T/TGAT’.

The above text thus translates into the Lua snippet:

Figure
  • Download figure
  • Open in new tab
Figure
  • Download figure
  • Open in new tab

RESULTS

EZTraits is a command-line tool that compiles and runs on Linux and Mac operating systems. Inputs are VCF or space/tab-delimited TSV files. The Lua interpreter, version 5.4.2, is embedded so that EZTraits has no external dependencies. EZTraits has minimal requirements in terms of RAM, using less than 5KB on average. It takes about 2.4 minutes to parse a whole-genome VCF file from the 1000 Genomes Project [8].

Usage

EZTraits has two input parameters: (a) the VCF or TSV file; and (b) the Lua snippet. The usage for processing a VCF and TSV file is:

  • ./EZTraits -i data/sample.vcf -lua scripts/test.lua

  • ./EZTraitsCSV -i data/sample.csv -lua scripts/test.lua

The output is written to the console. To convert structured text, run e.g.:

  • ./Txt2Lua -i scripts/fructose.txt > fruct_test.lua

Funding

Financial support for this work was supplied by SelfDecode’s research and development budget.

Conflicts of interest

All authors are either employed by and/or hold stock or stock options in SelfDecode. In addition, PGY has equity in Systomic Health LLC and Ethobiotics LLC. There are no other relevant activities or financial relationships which have influenced this work.

REFERENCES

  1. 1.↵
    Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med 12, 44 (2020).
    OpenUrl
  2. 2.↵
    Husain MA et al. APOE and Alzheimer’s Disease: From Lipid Transport to Physiopathology and Therapeutics. Front Neurosci. 2021;15:630502. Published 2021 Feb 17.
    OpenUrl
  3. 3.↵
    Anguita-Ruiz A et al. Genetics of Lactose Intolerance: An Updated Review and Online Interactive World Maps of Phenotype and Genotype Frequencies. Nutrients 2020, 12(9), 2689;
    OpenUrl
  4. 4.↵
    Coffee EM et al. Increased prevalence of mutant null alleles that cause hereditary fructose intolerance in the American population. J Inherit Metab Dis. 2010 Feb; 33(1): 33–42.
    OpenUrlCrossRefPubMed
  5. 5.↵
    Nackley AG et al. Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science. 2006 Dec 22;314(5807):1930–3.
    OpenUrlAbstract/FREE Full Text
  6. 6.↵
    Nackley AG et al. Low enzymatic activity haplotypes of the human catechol-O-methyltransferase gene: enrichment for marker SNPs. PLoS One. 2009;4(4):e5237.
    OpenUrlCrossRefPubMed
  7. 7.↵
    Ierusalimschy R et al. (June 1996). Lua—An Extensible Extension Language. Software: Practice and Experience. 26 (6): 635–652.
    OpenUrl
  8. 8.↵
    Clarke L et al. A worldwide collection of genome variation incorporating the 1000 Genomes Project data. Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D854–D859
    OpenUrlCrossRefPubMed
Back to top
PreviousNext
Posted October 19, 2021.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits
Matt Carland, Madhuchanda Bose, Biljana Novković, Haley Pedersen, Charles Manson, Shany Lahan, Alex Pavlenko, Puya G. Yazdi, Manfred G. Grabherr
bioRxiv 2021.10.18.464896; doi: https://doi.org/10.1101/2021.10.18.464896
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
EZTraits: a Programmable Tool to Evaluate Multi-site Deterministic Traits
Matt Carland, Madhuchanda Bose, Biljana Novković, Haley Pedersen, Charles Manson, Shany Lahan, Alex Pavlenko, Puya G. Yazdi, Manfred G. Grabherr
bioRxiv 2021.10.18.464896; doi: https://doi.org/10.1101/2021.10.18.464896

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3505)
  • Biochemistry (7346)
  • Bioengineering (5323)
  • Bioinformatics (20260)
  • Biophysics (10016)
  • Cancer Biology (7743)
  • Cell Biology (11300)
  • Clinical Trials (138)
  • Developmental Biology (6437)
  • Ecology (9951)
  • Epidemiology (2065)
  • Evolutionary Biology (13321)
  • Genetics (9361)
  • Genomics (12583)
  • Immunology (7701)
  • Microbiology (19021)
  • Molecular Biology (7441)
  • Neuroscience (41036)
  • Paleontology (300)
  • Pathology (1229)
  • Pharmacology and Toxicology (2137)
  • Physiology (3160)
  • Plant Biology (6860)
  • Scientific Communication and Education (1272)
  • Synthetic Biology (1896)
  • Systems Biology (5311)
  • Zoology (1089)