RT Journal Article SR Electronic T1 Assessing key decisions for transcriptomic data integration in biochemical networks JF bioRxiv FD Cold Spring Harbor Laboratory SP 301945 DO 10.1101/301945 A1 Anne Richelle A1 Chintan Joshi A1 Nathan E. Lewis YR 2018 UL http://biorxiv.org/content/early/2018/04/16/301945.abstract AB Genome-scale models of metabolism (GEMs) describe all metabolic reactions that may occur organism-wide. It is known that each tissue exhibits differential gene expression patterns and enzymatic activities. Therefore, transcriptomic data are commonly used to tailor GEMs and capture tissue-specific behavior. However, since measured gene expression levels span several orders of magnitude, and many reactions in GEMs involve multiple genes, decisions must be made on how to overlay the data onto the network. Referred to here as “preprocessing”, as it addresses the steps prior to context-specific model construction, these decisions include how to map gene expression levels to the gene-protein-reaction rules (i.e. gene mapping), the selection of thresholds on expression data to consider the associated gene as “active” (i.e. thresholding), and the order in which these gene mapping and thresholding are imposed. Each of these decisions could impact the resulting expression values associated with each reaction, and therefore model construction and biological interpretation. However, the influence of these decisions has not been systematically tested, nor is it clear which combination of preprocessing decisions will capture the most appropriate biological description of the available data. To this end, we compared 20 different combinations of existing preprocessing decisions, each of which were imposed on transcriptomic dataset across 32 tissues. Our analysis suggested that the thresholding approach has the greatest influence on the definition of which reaction may be considered as active. Finally, we compared tissue-specific active reaction lists based on their capacity to recapitulate groups of tissues at the organ-system level and through this identified optimal preprocessing decisions. These results now provide guidelines that will facilitate the construction of more accurate context-specific metabolic models and analyses with biochemical networks.