DRomics, a workflow to exploit dose-response 1 omics data in ecotoxicology 2

Omics technologies has opened new possibilities to assess environmental risks and to

considering the nature of the measured signal (e.g. counts of reads in RNAseq) and the way 23 data were collected (e.g. batch effect, situation with no experimental replicates). Another 24 important upgrade is the development of tools to ease the biological interpretation of results. 25 Various functions are proposed to visualize, summarize and compare the responses, for 26 different biological groups (defined from biological annotation), optionally at different 27 experimental levels (e.g. measurements at several omics level or in different experimental 28 conditions). A new shiny app named DRomicsInterpreter-shiny is dedicated to the biological 29 interpretation of results. The institutional web page https://lbbe.univ-lyon1.fr/fr/dromics 30 gathers links to all resources related to DRomics, including the two shiny applications.

Introduction 36
Dose-response (DR) modeling belongs to the toolkit of ecotoxicologists. The latter are used to this approach 37 when working on apical endpoints (e.g. reproduction, photosynthesis). The derived sensitivity thresholds, e.g. 38 Effective Concentrations (ECx), BenchMark Dose (BMD), are at the basis of regulatory risk assessment. 39 The recent years have seen the emergence of works using DR omics data (e.g. transcriptomic, proteomic, 40 metabolomic) in ecotoxicology (Zhang et al., 2018). Typical DR designs, with many doses (>6), ensure a good 41 description of the DR relationship and a robust and precise estimation of a sensitivity threshold (such as the 42 benchmark dose -BMD) that is useful to fix regulatory thresholds (Ewald et al., 2022). However, such designs 43 are less commonly used in omics studies, as tools classically used to analyse omics data are dedicated to 44 differential expression analysis between few conditions (limma, Ritchie et al., 2025, DESeq2, Love et al., 2014 EdgeR, Robinson et al., 2010). The analysis of such data typically starts with a pairwise differential analysis to 46 the control followed by an enrichment procedure to identify GO (Gene Ontology) terms or KEGG (Kyoto 47 Encyclopedia of Genes and Genomes) biological pathways of differentially expressed items (e.g. contigs, 48 proteins) ( organisms or communities (meta-omics), biological models commonly used in the field of ecotoxicology. The 94 main characteristics and functionalities of those five tools are summarized in Table 1.  DRomics is the only tool which is composed of both an R package and a free web application (see Table  101 1). We chose to develop it in the R language among others i) to facilitate the interoperability with the 102 Bioconductor packages that implement state-of-the-art bioinformatics methods and ii) to add a shiny 103 application (interactive web app. straight from R) that can be launched both from the package and from a 104 server freely accessible without registration. The companion DRomics-shiny application was thought for 105 users who do not want to work in the R environment, but also to help new users to take the package in 106 hand. For that purpose, the R code of the whole performed analysis is provided in the last page of the shiny 107 application. While developing DRomics, we were cautious to limit as much as possible the dependencies 108 to other statistical tools and to only depend on well-maintained R packages. Notice that the installation of 109 BMDx is currently impossible because of dependencies to non-maintained R packages. 110 DRomics was at the root designed to be able to analyze data from typical DR design, favoring the 111 number of doses over the number of replicates per dose, or even for datasets with no experimental 112 replicates. This situation of DR approach with no replicates is met in some field studies (one dose per 113 sample) and in some screening studies as illustrated in Rollin et al. (2023 also a characterization of the DR response in four classes (increasing, decreasing, U-shape, bell-shape) 147 which we thought may be of great interest in an AOP perspective, when the DR analysis not only aims at 148 defining BMD values but also at making sense of DR (multi-)omics data in environmental risk assessment. 149 New features in the DRomics DR modeling workflow 150 Figure 1 maps the DRomics workflow and the functionalities offered to explore the results. First 151 developed for microarray data, DRomics is now able to handle RNAseq data, metabolomic or other 152 continuous omic data (e.g. proteomics data) or even continuous non omic data (e.g. growth data) that 153 could be used for phenotypical anchoring (respectively imported using RNAseqdata(), 154 continuousomicdata() and continuousanchoringdata() functions). The same modeling workflow, choosing 155 the best fit models among our complete family of models (as described previously), was declined for each 156 type of data (e.g. intensities, counts) to ensure the comparability of results (e.g. transcriptomics vs. 157 metabolomics, transcriptomics vs. anchoring). Writing in the R language ensures the interoperability with 158 functions of the Bioconductor packages. Thus, functions of the DESeq2 and limma packages are internally 159 called within the package to normalize and/or transform omic data and to implement the trend tests for 160 selecting significantly responsive items. The call to additional R functions can be added for example to 161 correct omics data for a potential batch effect (see an example using ComBat-seq in the package vignette: 162 https://cran.r-project.org/web/packages/DRomics/vignettes/DRomics_vignette.html#batcheffect). 163 Various functions were also added to the package (highlighted in bold type in Figure 1) to respond to 164 user requests. Among them we can cite PCAdataplot() for a visualization of data after the importation step 165 and detection of potential outlier samples or batch effect, targetplot() to visualize the response of targeted 166 items whatever they are selected or not in the DRomics workflow, bmdboot() and bmdplot() to compute 167 and visualize confidence intervals on BMD values using bootstrap. 168 Moreover, we performed modifications in the modeling workflow to ensure a better robustness of 169 results on data with a low number of doses. For example, we changed the default information criterion 170 used for best model selection from the AIC to the AICc, as recommended by Burnham

New functions and the new shiny application to help biological interpretation 176
In toxicology, while working on sequenced and well-annotated organisms, items (e.g. genes) can be 177 functionally annotated prior to DR analysis. BMDExpress integrates this annotation step for 6 model species 178 on the basis of GO or reactome databases, and FastBMD for 14 model species on the basis of GeneID, GO, 179 Reactome or KEGG databases (see Table 1). Such an annotation of all the items whatever they respond or 180 not to the dose gradient exposition, enables the classical enrichment analysis (Wu et al., 2021). This 181 analysis consists in highlighting gene sets/pathways which are the most overrepresented among the 182 responding ones. 183 In ecotoxicology, one often works on non-model organisms or even on samples from environmental This implies the need for the user to retrieve and manage its own annotation, which is a challenging task, 186 especially for RNAseq experiments for which the number of measured contigs may be huge (several 187 millions of contigs). We thus considered that this annotation step could be done after the 188 selection/modeling workflow, to reduce the number of items to annotate, and so the difficulty of this task. 189 Due to the great diversity of annotation pipelines that can be developed for such non-model organisms, 190 we did not include an annotation step in DRomics. However, and to support the interpretation of the 191 workflow results in view of a biological annotation provided by the user, we recently developed new 192 functions and a second shiny application (named DRomicsInterpreter-shiny) (Figure 1). 193 After augmenting the DRomics output with information about the functional role of items, the graphical 194 representations offered by DRomics give a new insight into the results. Further, provided a common 195 annotation system is used at different biological scales under study, DRomics can be used to compare the 196 response at various levels. Figures 2 and 3 give some illustrations on an example with two molecular levels 197 from • the trends (increasing, decreasing, U-shape or bell-shape) of the DR curves (using the 202 trendplot() function, see a one-level example on Figure 1), 203 • the group/pathway-level sensitivity calculated as a quantile of BMD values of the group (using 204 the sensitivityplot() function, see an one-level example on Figure 1 and a multi-level example 205 on Figure 2), 206 • for selected groups/pathways, all the BMD values with their confidence intervals (using the 207 bmdplot() function, see a multilevel-level example on Figure 1, right part) 208 • for selected groups/pathways, all the BMD values with the corresponding DR curve signal 209 coded as a color gradient (using the bmdplotwithgradient() function, see a multi-level example 210 on Figure 3, left part). 211 • for selected groups/pathways, all the DR curves represented as curves (using the curvesplot() 212 function, see a multi-level example on Figure 3, right part), 213 Those functions can also be used to compare the response at one molecular level but measured under 214 different experimental conditions (different time points, different pre-exposure scenarios, in vitro/in vivo, 215 etc.). The selectgroups() function was also developed to help the user to focus its interpretation on the 216 most represented and/or the most sensitive biological groups (see an example of use in the package 217 vignette: https://cran.r-218 project.org/web/packages/DRomics/vignettes/DRomics_vignette.html#selectgroups). 219 220  we do not plan to implement model averaging in DRomics because the BMD estimation is not our only 236 purpose. We also want to characterize each response by its trend, which is itself dependent of the model 237 choice and not averageable. Instead, we plan to add an alternative to our current bootstrap procedure, 238 enabling the fit of a different model at each bootstrap iteration, to be able to pass the uncertainty due to 239 the model choice both on the BMD uncertainty and on the trend uncertainty. 240 Concerning the modeling workflow, so far, we added specific functionalities to analyse continuous 241 anchoring data, essentially to prevent comparisons of BMD values at different biological scales but with 242 BMD obtained from different analysis workflows, which could induce a bias. As anchoring data may be non-243 continuous, such as dichotomous survival data, or reproduction data reported as number of offspring per 244 individual-day (Delignette-Muller et al., 2014), we plan new developments in DRomics to be able to analyse 245 those data properly taking into account their nature. 246 Concerning our recent development of functions to help the biological interpretation of DRomics 247 results, we plan to enlarge the range of the methods proposed, by imaging new plots and summaries to 248 explore and characterize the responses and their diversity within a biological group/pathway. 249

Conclusion 250
The development of DRomics has been driven by the demands of ecotoxicologists to help make full 251 sense of their dose-response (multi-)omics studies. Hence, DRomics was augmented to provide a common 252 workflow to handle (meta-)transcriptomics, proteomics, metabolomics and/or anchoring data. This creates 253 the foundations for a proper comparison of responses at different omics levels (and anchoring endpoints) 254 and a mechanistic understanding in an AOP perspective. Along with functional annotations, DRomics 255 outputs (response trends, BMD, etc.) can now be processed using a series of graphical functions thought 256 to help their biological interpretation at the metabolic pathway level. The comparison is made easy (i) of 257 different measurements, for instance transcripto-and metabolomics, (ii) of different biological materials, 258 for instance communities with/without pre-exposure history, or (iii) of experimental settings, for instance 259 successive timepoints or different temperatures. Moreover, two special cases have been addressed: 260 experiments with a batch effect and designs with no replicates. DRomics future direction and evolutions 261 depend on upcoming challenges and needs brought by (eco)toxicologists. 262

Acknowledgements 263
The authors thank colleagues and students whose requests and/or discussions contributed to the 264 recent evolution of DRomics, especially Olivier Armant, Ellis Franklin, Sandrine Frelon, Sophie Prud'homme, 265 Mechthild Schmitt-Jansen and Philippe Veber. We also want to thank the referees for their helpful 266 suggestions that contributed to the improvement this manuscript. 267 Data, scripts, code, and supplementary information availability 268 Scripts and code are available online: DRomics is written in R, freely available from CRAN and 269 encompasses two shiny applications (https://cran.r-project.org/web/packages/DRomics/