TY - JOUR T1 - DAGBagM: Learning directed acyclic graphs of mixed variables with an application to identify prognostic protein biomarkers in ovarian cancer JF - bioRxiv DO - 10.1101/2020.10.26.349076 SP - 2020.10.26.349076 AU - Shrabanti Chowdhury AU - Ru Wang AU - Qing Yu AU - Catherine J. Huntoon AU - Larry M. Karnitz AU - Scott H. Kaufmann AU - Steven P. Gygi AU - Michael J. Birrer AU - Amanda G. Paulovich AU - Jie Peng AU - Pei Wang Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/10/27/2020.10.26.349076.abstract N2 - Motivation Directed gene/protein regulatory networks inferred by applying directed acyclic graph (DAG) models to proteogenomic data has been shown effective for detecting causal biomarkers of clinical outcomes. However, there remain unsolved challenges in DAG learning to jointly model clinical outcome variables, which often take binary values, and biomarker measurements, which usually are continuous variables. Therefore, in this paper, we propose a new tool, DAGBagM, to learn DAGs with both continuous and binary nodes. By using appropriate models for continuous and binary variables, DAGBagM allows for either type of nodes to be parents or children nodes in the learned graph. DAGBagM also employs a bootstrap aggregating strategy to reduce false positives and achieve better estimation accuracy. Moreover, the aggregation procedure provides a flexible framework to robustly incorporate prior information on edges for DAG reconstruction.Results As shown by simulation studies, DAGBagM performs better in identifying edges between continuous and binary nodes, as compared to commonly used strategies of either treating binary variables as continuous or discretizing continuous variables. Moreover, DAGBagM outperforms several popular DAG structure learning algorithms including the score-based hill climbing (HC) algorithm, constraint-based PC-algorithm (PC-alg), and the hybrid method max-min hill climbing (MMHC) even for constructing DAG with only continuous nodes. The HC implementation in the R package DAGBagM is much faster than that in a widely used DAG learning R package bnlearn. When applying DAGBagM to proteomics datasets from ovarian cancer studies, we identify potential prognostic protein biomarkers in ovarian cancer.Availability and implementation DAGBagM is made available as a github repository https://github.com/jie108/dagbagM.Competing Interest StatementThe authors have declared no competing interest. ER -