TY - JOUR T1 - Using balances to engineer features for the classification of health biomarkers: a new approach to balance selection JF - bioRxiv DO - 10.1101/600122 SP - 600122 AU - Thomas P. Quinn AU - Ionas Erb Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/04/09/600122.abstract N2 - Since the turn of the century, technological advances have made it possible to obtain a molecular profile of any tissue in a cost-effective manner. Among these advances include sophisticated high-throughput assays that measure the relative abundance of microorganisms, RNA molecules, and metabolites. While these data are most often collected to gain new insights into biological systems, they can also be used as biomarkers to create clinically useful diagnostic classifiers. How best to classify high-dimensional “-omics” data remains an area of active research. However, few explicitly model the relative nature of these data, and instead rely on cumbersome normalizations which often invoke untestable assumptions. This report (a) emphasizes the relative nature of health biomarkers, (b) discusses the literature surrounding the classification of relative data, and (c) benchmarks how different transformations perform across multiple biomarker types. In doing so, this report explores how one could use balances to engineer features prior to classification, and proposes a simple procedure, called discriminative balance analysis, to select discriminative 2- and 3-part balances. ER -