DRomics, a workflow to exploit dose-response omics data in ecotoxicology

Omics technologies has opened new possibilities to assess environmental risks and to understand the mode(s) of action of pollutants. Coupled to dose-response experimental designs, they allow a non-targeted assessment of organism responses at the molecular level along an exposure gradient. However, describing the dose-response relationships on such high-throughput data is no easy task. In a first part, we review the software available for this purpose, and their main features. We set out arguments on some statistical and modeling choices we have made while developing the R package DRomics and its positioning compared to others tools. The DRomics main analysis workflow is made available through a web interface, namely a shiny app named DRomics-shiny. Next, we present the new functionalities recently implemented. DRomics has been augmented especially to be able to handle varied omics data considering the nature of the measured signal (e.g. counts of reads in RNAseq) and the way data were collected (e.g. batch effect, situation with no experimental replicates). Another important upgrade is the development of tools to ease the biological interpretation of results. Various functions are proposed to visualize, summarize and compare the responses, for different biological groups (defined from biological annotation), optionally at different experimental levels (e.g. measurements at several omics level or in different experimental conditions). A new shiny app named DRomicsInterpreter-shiny is dedicated to the biological interpretation of results. The institutional web page https://lbbe.univ-lyon1.fr/fr/dromics gathers links to all resources related to DRomics, including the two shiny applications.

designs, they allow a non-targeted assessment of organism responses at the molecular level 17 along an exposure gradient. However, describing the dose-response relationships on such high-18 throughput data is no easy task. In a first part, we review the software available for this 19 purpose, and their main features. We set out arguments on some statistical and modeling 20 choices we have made while developing the R package DRomics and its positioning compared 21 to others tools. The DRomics main analysis workflow is made available through a web interface, 22 namely a shiny app named DRomics-shiny. Next, we present the new functionalities recently 23 implemented. DRomics has been augmented especially to be able to handle varied omics data 24 considering the nature of the measured signal (e.g. counts of reads in RNAseq) and the way 25 data were collected (e.g. batch effect, situation with no experimental replicates). Another 26 important upgrade is the development of tools to ease the biological interpretation of results. 27 Various functions are proposed to visualize, summarize and compare the responses, for 28 different biological groups (defined from biological annotation), optionally at different 29 experimental levels (e.g. measurements at several omics level or in different experimental 30 conditions). A new shiny app named DRomicsInterpreter-shiny is dedicated to the biological 31 interpretation of results. The institutional web page https://lbbe.univ-lyon1.fr/fr/dromics 32 gathers links to all resources related to DRomics, including the two shiny applications.

Introduction 38
Dose-response (DR) modeling belongs to the toolkit of ecotoxicologists. The latter are used to this approach 39 when working on apical endpoints (e.g. reproduction, photosynthesis). The derived sensitivity thresholds, e.g. 40 Effective Concentrations (ECx), BenchMark Dose (BMD), are at the basis of regulatory risk assessment. 41 42 The recent years have seen the emergence of works using DR omics data (e.g. transcriptomic, proteomic, 43 metabolomic) in ecotoxicology (Zhang et al., 2018). Typical DR designs, with many doses (>6), ensure a good 44 description of the DR relationship and a robust and precise estimation of a sensitivity threshold (such as the 45 benchmark dose -BMD) that is useful to fix regulatory thresholds (Ewald et al., 2022). However, such designs 46 are less commonly used in omics studies, as tools classically used to analyse omics data are dedicated to 47 differential expression analysis between few conditions (limma, Ritchie et al., 2025, DESeq2, Love et al., 2014 EdgeR, Robinson et al., 2010). The analysis of such data typically starts with a pairwise differential analysis to 49 the control followed by an enrichment procedure to identify GO (Gene Ontology) terms or KEGG (Kyoto 50 Encyclopedia of Genes and Genomes) biological pathways of differentially expressed items (e.g. contigs, 51 proteins) ( organisms or communities (meta-omics), biological models commonly used in the field of ecotoxicology. The 94 main characteristics and functionalities of those five tools are summarized in Table 1.  DRomics is the only tool which is composed of both an R package and a free web application (see Table  101 1). We chose to develop it in the R language among others i) to facilitate the interoperability with the 102 Bioconductor packages that implement state-of-the-art bioinformatics methods and ii) to add a shiny 103 application (interactive web app. straight from R) that can be launched both from the package and from a 104 server freely accessible without registration. The companion DRomics-shiny application was thought for 105 users who do not want to work in the R environment, but also to help new users to take the package in 106 hand. For that purpose, the R code of the whole performed analysis is provided in the last page of the shiny 107 application. While developing DRomics, we were cautious to limit as much as possible the dependencies 108 to other statistical tools and to only depend on well-maintained R packages. Notice that the installation of 109 BMDx is currently impossible because of dependencies to non-maintained R packages. 110 111 DRomics was at the root designed to be able to analyze data from typical DR design, favoring the 112 number of doses over the number of replicates per dose, or even for datasets with no experimental 113 replicates. This situation of DR approach with no replicates is met in some field studies (one dose per 114 sample) and in some screening studies as illustrated in Rollin et al. (2023 Various functions were also added to the package (highlighted in bold type in Figure 1) to respond to 167 user requests. Among them we can cite PCAdataplot() for a visualization of data after the importation step 168 and detection of potential outlier samples or batch effect, targetplot() to visualize the response of targeted 169 items whatever they are selected or not in the DRomics workflow, bmdboot() and bmdplot() to compute 170 and visualize confidence intervals on BMD values using bootstrap. 171 172 Moreover, we performed modifications in the modeling workflow to ensure a better robustness of 173 results on data with a low number of doses. For example, we changed the default information criterion 174 used for best model selection from the AIC to the AICc, as recommended by Burnham and Anderson (2004), 175 and limited the set of models for weak designs with few doses (4 or 5). Despite this care one should favor 176 optimal dose-response designs with more doses (at least 6-7, and never less than 4) and less (or no) 177 replicates

New functions and the new shiny application to help biological interpretation 180
In toxicology, while working on sequenced and well-annotated organisms, items (e.g. genes) can be 181 functionally annotated prior to DR analysis. BMDExpress integrates this annotation step for 6 model species 182 on the basis of GO or reactome databases, and FastBMD for 14 model species on the basis of GeneID, GO, 183 Reactome or KEGG databases (see Table 1). Such an annotation of all the items whatever they respond or 184 not to the dose gradient exposition, enables the classical enrichment analysis (Wu et al., 2021). This 185 analysis consists in highlighting gene sets/pathways which are the most overrepresented among the 186 responding ones. 187 188 In This implies the need for the user to retrieve and manage its own annotation, which is a challenging task, 191 especially for RNAseq experiments for which the number of measured contigs may be huge (several 192 millions of contigs). We thus considered that this annotation step could be done after the 193 selection/modeling workflow, to reduce the number of items to annotate, and so the difficulty of this task. 194 Due to the great diversity of annotation pipelines that can be developed for such non-model organisms, 195 we did not include an annotation step in DRomics. However, and to support the interpretation of the 196 workflow results in view of a biological annotation provided by the user, we recently developed new 197 functions and a second shiny application (named DRomicsInterpreter-shiny) (Figure 1). 198 199 After augmenting the DRomics output with information about the functional role of items, the graphical 200 representations offered by DRomics give a new insight into the results. Further, provided a common 201 annotation system is used at different biological scales under study, DRomics can be used to compare the 202 response at various levels. Figures 2 and 3  • the trends (increasing, decreasing, U-shape or bell-shape) of the DR curves (using the 208 trendplot() function, see a one-level example on Figure 1), 209 • the group/pathway-level sensitivity calculated as a quantile of BMD values of the group (using 210 the sensitivityplot() function, see an one-level example on Figure 1 and a multi-level example 211 on Figure 2), 212 • for selected groups/pathways, all the BMD values with their confidence intervals (using the 213 bmdplot() function, see a multilevel-level example on Figure 1, right part) 214 • for selected groups/pathways, all the BMD values with the corresponding DR curve signal 215 coded as a color gradient (using the bmdplotwithgradient() function, see a multi-level example 216 on Figure 3, left part). 217 • for selected groups/pathways, all the DR curves represented as curves (using the curvesplot() 218 function, see a multi-level example on Figure 3, right part), 219 Those functions can also be used to compare the response at one molecular level but measured under 220 different experimental conditions (different time points, different pre-exposure scenarios, in vitro/in vivo, 221 etc.). The selectgroups() function was also developed to help the user to focus its interpretation on the 222 most represented and/or the most sensitive biological groups (see an example in the package vignette: 223 https://cran.r-project.org/web/packages/DRomics/vignettes/DRomics_vignette.html#selectgroups). 224 225

Conflict of interest disclosure 284
The authors declare that they comply with the PCI rule of having no financial conflicts of interest in 285 relation to the content of the article. 286

Funding 287
This work has been financially supported by the Graduate School H2O'Lyon (ANR-17-EURE-0018) of 288 Université de Lyon (UdL), within the program "Investissements d'Avenir" operated by the French National 289 Research Agency (ANR)", by the "Ecosphère continentale et côtière" (EC2CO) interdisciplinary program 290 from the Centre National de la Recherche Scientifique (CNRS, France), as parts of the DROMADERE and the 291 ECORA projects, and by the "Agence Nationale de la Recherche" as a part of the Chroco project (ANR-21-292 CE34-0003). 293