RT Journal Article SR Electronic T1 ALaSCA: A novel in silico simulation platform to untangle biological pathway mechanisms, with a case study in Type 1 Diabetes progression JF bioRxiv FD Cold Spring Harbor Laboratory SP 2023.03.16.532913 DO 10.1101/2023.03.16.532913 A1 Louw, Carla A1 Truter, Nina A1 Bergh, Wikus A1 van den Heever, Martine A1 Horn, Shade A1 Oudrhiri, Radouane A1 van Niekerk, Dawie A1 Loos, Ben A1 Singh, Raminderpal YR 2023 UL http://biorxiv.org/content/early/2023/03/20/2023.03.16.532913.abstract AB Introduction The analysis of signaling pathways is a cornerstone in clarifying the biological mechanisms involved in complex genetic disorders. These pathways have intricate topologies, and the existing methods that are used for the interpretation of these pathways, remain limited. We have therefore developed the Adaptable Large-Scale Causal Analysis (ALaSCA) computational platform, which uses causal analysis and counterfactual simulation techniques. ALaSCA offers the ability to simulate the outcome of a number of different hypotheses to gain insight into the complex dynamics of biological mechanisms prior to, or even without, wet lab experimentation. ALaSCA is offered as a proprietary Python library for bioinformaticians and data scientists to use in their life sciences workflows. Here we demonstrate the ability of ALaSCA to untangle the pivots and redundancies within biological pathways of various drivers of a specific phenotypic process. This is achieved by studying a major disease of global relevance, namely Type 1 Diabetes (T1D), and quantifying causal relationships between antioxidant proteins and T1D progression. ALaSCA is also benchmarked against standard associative analysis methods.Methods We use our in silico simulation platform, ALaSCA, to apply both a number of machine learning (ML) and data imputation techniques, and perform causal inference and counterfactual simulation. ALaSCA uses standard ML and causal analysis libraries as well as custom code developed for data imputation and counterfactual simulation. Counterfactual simulation is a method for simulating potential or hypothetical model outcomes in the field of causal analysis (Glymour, Pearl and Jewell, 2016). We apply ALaSCA to T1D by using proteomic data from Liu et al. (2018), as the patients were selected based on the presence of T1D susceptible HLA (human leukocyte antigen)-DR/DQ alleles through genotyping at birth and followed prospectively. The genetic cause of T1D in this cohort is therefore known and the mechanism and proteins through which it causes T1D are well-characterized. This biological mechanism was converted into a directed acyclic graph (DAG) for the subsequent causal analyses. The dataset was used to benchmark the causal inference and counterfactual simulation capabilities of ALaSCA.Results and discussion After data imputation of the Liu, et al. (2018) dataset, causal inference and counterfactual simulation were completed. The causal inference output of the HLA, antioxidant, and non-causal proteins showed that the HLA proteins had the overall strongest causal effects on T1D, with antioxidant proteins having the overall second largest causal effects on T1D. The non-causal proteins showed negligibly small effects on T1D in comparison with the HLA and antioxidant proteins. With counterfactual simulation we were able to replicate evidence for and gain understanding into the protective effect that antioxidant proteins, specifically Superoxide dismutase 1 (SOD1), have in T1D, a trend which is seen in literature. We were also able to replicate an unusual case from literature where antioxidant proteins, specifically Catalase, do not have a protective effect on T1D.Conclusion By analyzing the disease mechanism, with the inferred causal effects and counterfactual simulation, we identified the upstream HLA proteins, specifically the DR alpha chain and DR beta 4 chain proteins as causes of the protective effect of the antioxidant proteins on T1D. In contrast, through counterfactual simulation of the unusual case, in which the DR alpha chain and DR beta 4 chain proteins are not present in the model, we saw that the adverse effect which the antioxidant proteins have on T1D is due to the HLA protein, DQ beta 1 chain, and not the antioxidant proteins themselves. Future work would entail the application of the ALaSCA platform on various other diseases, and to integrate it into wet lab experimental design in a number of different biological study areas and topics.Competing Interest StatementCarla Louw, Nina Truter, Wikus Bergh, Martine van den Heever, Shade Horn, and Raminderpal Singh are employees of Incubate bio BV.