DeepMed: A unified, modular pipeline for end-to-end deep learning in computational pathology

The interpretation of digitized histopathology images has been transformed thanks to artificial intelligence (AI). End-to-end AI algorithms can infer high-level features directly from raw image data, extending the capabilities of human experts. In particular, AI can predict tumor subtypes, genetic mutations and gene expression directly from hematoxylin and eosin (H&E) stained pathology slides. However, existing end-to-end AI workflows are poorly standardized and not easily adaptable to new tasks. Here, we introduce DeepMed, a Python library for predicting any high-level attribute directly from histopathological whole slide images alone, or from images coupled with additional meta-data (https://github.com/KatherLab/deepmed). Unlike earlier computational pipelines, DeepMed is highly developer-friendly: its structure is modular and separates preprocessing, training, deployment, statistics, and visualization in such a way that any one of these processes can be altered without affecting the others. Also, DeepMed scales easily from local use on laptop computers to multi-GPU clusters in cloud computing services and therefore can be used for teaching, prototyping and for large-scale applications. Finally, DeepMed is user-friendly and allows researchers to easily test multiple hypotheses in a single dataset (via cross-validation) or in multiple datasets (via external validation). Here, we demonstrate and document DeepMed’s abilities to predict molecular alterations, histopathological subtypes and molecular features from routine histopathology images, using a large benchmark dataset which we release publicly. In summary, DeepMed is a fully integrated and broadly applicable end-to-end AI pipeline for the biomedical research community.


Introduction End-to-End Deep Learning in Computational Pathology
Histopathology slides stained with hematoxylin and eosin (H&E) are ubiquitously available for virtually every single patient with a solid tumor. 1 H&E tissue slides are indispensable for making a disease diagnosis. Beyond that, they are broadly used to derive qualitative and quantitative biomarkers for translational and basic cancer research studies. 2 Artificial intelligence (AI), specifically Deep Learning (DL) with convolutional neural networks (CNNs) can be used to automatically analyze digitized whole slide images (WSIs) of H&E slides and can yield quantifiable information beyond the capabilities of human experts. In the last four years, multiple research groups have shown that DL methods can predict high-level concepts such as the presence of specific genetic mutations 3,4 , gene expression 5 , whole genome duplications 6 , patient survival 7 and treatment response 8 from H&E WSI. Since the first publication in 2018 9 demonstrated a robust end-to-end workflow, more than one hundred academic studies have used similar approaches 2 . However, unlike in other areas of bioinformatics, there are currently no standard pipelines for end-to-end-DL in computational pathology. Therefore, virtually all research teams who are active in this field have implemented their own pipeline with highly similar setups. For example, multiple analysis pipelines have been developed between 2018 and 2021 to predict the mutations of oncogenic driver genes from H&E WSI. 3,5,[9][10][11][12] The overall design of these pipelines is largely identical: they load a WSI, tessellate it into tiles, perform data augmentation and/or normalization, train a CNN, deploy the network on tiles from test patients and use an aggregation function to pool the tile-level predictions on a patient level. 9

Limitations of Previous Deep Learning Pipelines in Computational Pathology
Why do researchers re-implement essentially identical pipelines instead of re-using source codes of previous publications? A key reason is that published pipelines are not modular. Individual components of these pipelines are highly interconnected and cannot be easily changed without disrupting the overall workflow ( Figure 1A). For example, many methods have been designed to train CNNs on a training set and test them on a designated test set. 9 Others have used stratified cross-validation 4,13 or Monte-Carlo cross-validation 14 on a patient-level. Moving from one experimental design to another requires a multitude of upstream and downstream changes, related to data preprocessing, statistical metrics, visualization and essentially any component of the pipeline. Also, using the pipeline for different types of input data (for example, prediction of continuous instead of categorical values) disrupts the whole workflow from data loading, training to visualization of the results. Finally, end-to-end DL pipelines are being run on different types of hardware ranging from laptop computers with a single graphics processing unit (GPU) over workstations, in-house servers to commercial cloud computing services ( Figure 1B) with Windows or Linux operating systems. Current processing pipelines cannot be easily deployed on these different types of hardware and operating systems. We aimed to address these issues and developed DeepMed, a modular, extensible, versatile, easily usable, powerful DL pipeline for endto-end computational pathology in translational and basic research.

Development, Application and Validation of the Protocol
DeepMed is a pipeline that integrates a multitude of algorithms which were developed and evaluated in end-to-end computational pathology. Unlike previously published pipelines, DeepMed includes all commonly used variants of data loading, network training, statistics and visualization in a fully modular way ( Figure 1C). DeepMed can be used for a wide range of problems including: simple classification and regression tasks on histological image data only, 3,9,13,[15][16][17][18] , prediction of survival markers 19 ( Figure 1D) and inclusion of additional non-image in the training process in a multi-modal way 20 (Figure 1E). DeepMed enables researchers to conveniently use established methods and test dozens of hypotheses in a single cohort or multiple patient cohorts with minimal data preprocessing. At the same time, it offers a high degree of flexibility for developers who can use the robust backbone of DeepMed to try out new methods without the overhead of re-implementing a full end-to-end DL pipeline. Computational pathology is a fast-evolving field attracting much attention worldwide, evidenced by the increasing numbers of publications. 2 Researchers must be able to iterate new ideas rapidly and scalably in order to keep up with new advances. DeepMed enables computational researchers to quickly try out many different approaches on large datasets. The modular composition of this pipeline allows programmers to easily adapt our code to fit their workflows. Furthermore, as new technologies or approaches become available, sections can be updated without changing the user-facing parts of the pipeline. Here, we present a comprehensive overview of setting up, using, validating and extending DeepMed and provide two benchmark datasets which can be used for many common problems in end-to-end computational pathology. evaluated by calculating the objective (loss) function. The typical loss function for categorical targets (binary/Multi-class) is Cross -Entropy (Binary Cross-Entropy [BCE] / Cross-Entropy [CE]) and the common loss functions for continuous targets are Mean Square Error (MSE) or Mean Absolute Error (MAE). By modifying the final layer of the DL neural network (one output node for continuous targets and ܰ output nodes for ܰ class categorical targets) and selecting the best loss function for optimization, it is possible to train DL algorithms for a wide range of targets. Finally, and most importantly for the end user, a range of statistics and visualizations can be generated (Figure 2F), including highly predictive image tiles 6 , receiver operating characteristic curves (ROCs) 9 , whole-slide prediction heatmaps 3 and additional tile-level and patient-level statistics including AUROC, F1-score, p-values and others (shown in Suppl. Table 2).

Prediction of molecular features from breast cancer histology images
Here we present the results achieved for the main functionalities, summarized in Table 1, of DeepMed in benchmark datasets which are provided under an open access license at https://zenodo.org/record/5337009. As a benchmark task, we use prediction of pathological and molecular features in breast cancer based on slide-level (WSI-level) labels. This problem has been widely investigated in dedicated studies 8,11,25,26 and as part of systematic pan-cancer studies [4][5][6] and represents a clearly defined weakly supervised prediction task. We applied DeepMed to predict estrogen receptor (ER) status, histological subtype (ductal or lobular), TCGA gene expression subtype (Basal, LumA and LumB) and the density of tumor infiltrating lymphocytes (TILs).
We refer to the scripts to perform weakly supervised deep learning analysis with DeepMed analysis as "experiments" and give a detailed description of how to construct DeepMed experiments in the section "Materials, Methods and Procedure". The first sample case experiment "train_and_deploy_multitarget.py" (Full Script 4) runs a DeepMed analysis that applies transfer learning with ResNet-18 on the Benchmark dataset, TCGA-BRCA-A2 (from Walter Reed National Military Medical Center, Bethesda, MD, USA). Subsequently, the resulting neural network model is deployed to make the predictions for three targets on the second, independent, test dataset, TCGA-BRCA-E2 (from Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA). When we applied this workflow and evaluated the performance of the model on the test dataset, the classification performance was high, ranging from a patient-level AUROC of 0.656 for ER status to 0.860 for histological subtype ( Table 2). All targets reached statistical significance (p<0.05).
If an independent test dataset is not available, researchers can perform exploratory analyses with DeepMed using patient-level cross-validation. As a demonstration of this feature, we ran DeepMed analysis to evaluate the performance on unseen patients only on the training dataset TCGA-BRCA-A2, which is described in "crossvalidated_train_multitarget" (Full Script 5). In this within-cohort analysis, we found a moderate to high prediction performance for all targets, but statistical significance was only reached for the easiest task, prediction of histological subtype (p=0.023) ( Table 3).
The receiver operating characteristic (ROC) curves that are among the output of the DeepMed analysis plotted by true positive rate (TPR) against the false positive rate (FPR) and demonstrating the performance of both cross-validated within-cohort and the train-and-deploy analyses are shown together in Figure 4. The ROC curves are created for each different class of predicted targets. In the case of cross-validated tasks, the curves for individual folds for each class are given in the same graph along with the calculated mean curve for the ease of interpretation of results. Furthermore, DeepMed can produce a single output image which includes a collage of tiles with highest prediction scores for each class ("top tiles"). In the top tilescollages of classes, each row stores the tiles with the highest prediction scores from the patients with the highest prediction scores. Therefore, the size of the image is the requested number of patients times the number of tiles, which is 4 times 4 by default while these numbers can be changed in the experiment scripts. Figure 5 shows the top tiles predicted by the models trained on the dataset TCGA-BRCA-A2 and deployed on the dataset TCGA-BRCA-E2. This type of visualization enables explainability of DL models, which is useful as a plausibility check as well as for discovery of new morphological biomarkers. 27 Another strength of the DeepMed is the subgroup training functionality, which enables users to apply the DL analysis on a subset of patients of the original dataset. The subsets are defined by users in the experiment script.In "train_and_deploy_subgroup_based_TMB.py" (Full Script 9), we show how to train a model for prediction of ER status on subgroups of patients based on their tumor mutational burde (TMB, low and high as binarized at the median). The results show that ER status was predictable with an AUROC of 0.687 and 0.768 for TMB-high and TMB-low subgroups respectively with a p-value smaller than 0.05 in both ( Table 5).
These findings demonstrate the utility of DeepMed even in extremely small datasets of only 100 patients or less. Real-world applications of weakly supervised Deep Learning usually train and validate the models on thousands of cases 2,16,18,28,29 and a continuous improvement of classifier performance has been demonstrated for higher patient numbers. 15,28 Prediction of molecular features from multi-modal data sources (multi-input mode) In clinical decision making, healthcare providers rarely use only a single data type. Usually, different types of data, for example images and tabular data are used. 30 DeepMed can integrate this multi-modal decision making by incorporating additional variables as an input to the neural network. We repeated the analyses of the breast cancer dataset and additionally provided the model with progesterone receptor (PR) status, HER2 status and age as input variables. The experiment script is shown in "train_and_deploy_multitarget_multiinput.py" (Full Script 8). We found that this addition of non-image information to the training markedly improves classifier performance in an external validation experiment ( Table 4), showing that DeepMed can leverage tabular information to boost image-based prediction performance.
The parameterization mode of DeepMed has also been demonstrated with the multi-modality feature in "train_and_deploy_parameterizing.py" (Full Script 10). Parameterization mode provides users with the opportunity of running the same experiment with an unlimited number of different parameters separately and returns the results in the same project folder with an overall statistics report. In this experiment described in Full Script 10, a model has been trained to predict ER status on the external dataset first using proliferation and then diagnosis age as an additional input. The results of the parameterized experiment, given in Table 6, again showed an increase in the model performance albeit weaker than the previous experiment and proved DeepMed's ability to generate strong multi-input models.

Limitations
DeepMed has the capacity to build deep learning networks for patient-level feature prediction directly from histopathology slides. DeepMed is a re-implementation of algorithms that have already been published. 3,[15][16][17] The fundamental limitation of this method is that not all clinically significant traits can be predicted from histopathological slides. Two recent large-scale assessments consistently demonstrated that this method predicts approximately one-third of all evaluated genetic changes in human cancer 4,6 -and two thirds of all tested molecular alterations are not predictable. However, although this approach is not ubiquitously applicable, it has been shown to provide clinically relevant performance 15,28 and can be used to discover previously unknown biological mechanisms 27,31 . Another restriction is that many non-computer-savvy researchers find obtaining, storing, preparing, and evaluating histology image data difficult. DeepMed intends to reduce the work required to employ deep learning in end-to-end computational pathology by removing this load. However, some of the problems are caused by a lack of standardization in computational pathology (for example, the widespread use of numerous proprietary image file formats) or are inherent to the area (such as the large file size of digitized whole slide images).

Outlook
DeepMed is a straightforward, scalable, and powerful implementation of end-to-end weakly supervised Deep Learning in histology, as demonstrated here. New technologies, such as multiple instance learning 32 and vision transformers 12 , have recently been investigated for such weakly supervised prediction challenges. Although there is limited data on these strategies' realworld performance, some of them may become the de-facto state of the art in the future. DeepMed's modular design makes it simple to include new technologies without affecting the user experience, performance metrics, or any other high-level aspects of the implementation. As a result, DeepMed could be a versatile and future-proof instrument for academic computational pathology.

Software and hardware requirements and setup
DeepMed has been tested on Windows 10, Windows Server 2019 and Ubuntu 18.04. The first step is to install Python 3.8 on the computer. We recommend using Anaconda (https://www.anaconda.com/). The easiest way to get started is to download the latest version of DeepMed from Github here https://github.com/KatherLab/deepmed and install it with "pip". On Windows, navigate to the directory containing the code, run the standard terminal or powershell and execute: Alternatively, download the codes from Github, navigate to the folder and execute: This command will install DeepMed and all its requirements for the current user so that it can be called by other scripts. This will require administrative rights on Windows. Alternatively, the source codes can be directly downloaded from the Github website. DeepMed runs on laptops with an Nvidia graphics processing unit (GPU), on desktop computers with one or multiple Nvidia GPUs or a computing cluster such as EC2 instances on Amazon Web Services (AWS). Throughout this manuscript, we refer to the DeepMed release "v0.8.7" which remains available at https://github.com/KatherLab/deepmed/releases/tag/v0.8.7.

Data requirements and example data
We have shown the functionalities of DeepMed on two benchmark datasets, TCGA-BRCA-A2 and TCGA-BRCA-E2, that are available at https://zenodo.org/record/5337009 32  has not been trained and tested on images from the same patient. DeepMed expects the header of the patient and slides columns to be named "PATIENT" and "FILENAME" respectively by default in the slide table.

Basic workflow
DeepMed allows the user to perform common analyses with very little code. In general, a short Python script ("experiment script") is enough to run a DeepMed workflow on a dataset which has been prepared in a suitable format. Here, a guideline to construct experiment scripts is given. Apart from the presented functionalities here, all parameters of DeepMed are shown in Suppl. . A training cohort consisting of a set of multiple cohorts one to train a neural network model on can be initialized in the following way: When using Windows-like paths with backslashes, the string for the paths ought to be prefixed with an r to prevent the backslashes from being interpreted as character escapes: Defining the experiment structure: Next, the user has to define how to use the selected cohorts. This is done by using TaskGetters, one of the distinguishing features of DeepMed: they define the steps which have to be performed in an experiment. Thanks to their composable structure, they allow the user to easily construct a wide variety of common experiment schemes such as the training and deployment of a singular model, cross-validation, training models on different subgroups or with different hyperparameters. Internally, the TaskGetter will generate a series of tasks, each of which describes a step of our experiment such as preprocessing data, training or deploying a model or evaluating a deployment result. Most of these features are implemented in such a form that they adapt other TaskGetters: they take a TaskGetter and modify it. A cross-validation TaskGetter for example would take another TaskGetter and invoke it multiple times, each time with a different training and testing set. Similarly, a subgroup TaskGetter would apply another TaskGetter to one or multiple subsets of our data set. This way, we can nest TaskGetters to build up more and more complex experiment setups.
In the following, we will construct a TaskGetter for a simple, single-target training: S i m p l e R u n describes how to use the data. In this case, we want to train a simple, single-target model. All the following lines describe how this training is to be performed.
dictates the label that the user selects to predict within the model. The clinical table is expected to have a column with that name.
are the cohorts to be used for training.
allows the user to define values which indicate a non-informational training sample. Patients with these indicated labels will be excluded from training.
Training the model: For the training to be able to start, the function defines the location for saving the training results.
• g e t = s i m p l e _ t r a i n _ g e t calls the previously constructed TaskGetter.

Deployment (Inference):
Deployment of the neural network model is imperative to ascertain its performance whenever a test dataset is available. In DeepMed, the deployment procedure is quite similar to that of training. After defining the test cohorts, a TaskGetter is constructed with a parameter Next, the user has to specify how and from where to load the models to be deployed. Usually, the train parameter inside the TaskGetter is used to further define the modalities of a network's training. In the case of deployment, a pretrained model is loaded instead of training a model from scratch. The loaded model given to the simple TaskGetter is then deployed at the final step: Training and Deployment in a single script: With DeepMed, the training and a consecutive deployment are possible to run in the same experiment script. An example for running a training and deployment analysis is given in Full Script 1. Defining evaluation Metrics: While the above samples show how to train the model and deploy on a test set, there will not be any statistics or visual output without defining the metrics alongside with the basic parameters. In order to assess the performance of the model on the test set, the parameter e v a l u a t o r s which hold a Python list must be given to the run adapter. For instance: These metrics will calculate the area under the receiver operating characteristic curve (AUROC) and the count of testing samples. However, they are calculated on a tile basis. It is often advantageous to calculate metrics on a per-patient basis instead. This can be done with the Grouped adapter: This will modify the AUROC and count metrics in such a way that they are calculated on a perpatient basis instead of a per-tile basis, meaning that instead of the overall tile count per class, the number of patients per class will be calculated. Additionally, p value on a per-patient basis is added into the evaluation metrics, which is calculated by applying a two-tailed t test for differences in the metrics of target classes. Measuring on a per-patient basis is the default behavior of the Group adapter; thus, b y option can be skipped when grouping is desired on a perpatient basis. The possible evaluation metrics are shown in Suppl. Table 2.
If the deployment script is extended to make use of these evaluators, re-running the script should yield a file called s t a t s . c s v that contains the requested metrics in the project output directory. The whole experiment script extended with evaluator metrics is given below: Full Script 2: simple_train_and_deploy_w_evaluators.py: The experiment script for a simple training and deployment that has been extended with evaluator metrics.
The presented experiment here sets an example for the simple run mode in DeepMed. In other words, the training and deployment tasks with all parameters defined were only on the class-level prediction. Besides simple run, DeepMed has several training modes that can be used in any desired combination and order depending on the user's needs: multi-target, cross-validation, subgroup and parameterize. In the following sections, we will introduce the remaining training modes and the data types and parameters that can be used for all training modes.
Multi-target training: DeepMed can also run for more than one target with a possibility of using multiple GPUs in the process. A sample script to construct an experiment for a multi-target training is given below with the step by step explanations.
The script starts with a simple TaskGetter as in the previous examples except that the target_label is not specified this time around. The reason for this is to make sure not to restrict the run's target label, but automatically repeat the training with different target labels. To achieve this, a run adapter is used, which takes another TaskGetter and transforms it instead of generating runs by itself. In this example, a single target TaskGetter will be given and adapted into a multitarget one. How to construct such a run adapter is shown here: u  l  t  i  T  a  r  g  e  t  (  s  i  m  p  l  e  _  t  r  a  i  n  _  g  e  t  ,  t  a  r  g  e  t  _  l  a  b  e  l  s  =  [  '  E  R  S  t  a  t  u  s  B  y  I  H  C  '  ,  '  T  C  G  A  S  u  b  t  y  p  e  '  ,  \  '  N  e  o  p  l  a  s  m  H  i  s  t  o  l  o  g  i  c  T  y  p  e  N  a  m  e  '  ,  '  T  I  L  R  e  g  i  o  n  a  l  F  r  a  c  t  i  o  n  ' ] ) The relative deployment script must be modified for the multiple target data. This is done again with the help of a run adapter that takes the simple deploy TaskGetter as an argument. The deployment script again defines the test cohort and the project directory that is going to store the results. Assuming the model to be deployed is the model that has been trained in the previous example, the training project directory is defined with the output directory of the previously run project that contains the trained model for multiple targets and assigned into the load variable.
The multi-target TaskGetter takes the simple deploy TaskGetter as an argument, and runs it with its additional functionalities, mainly the multi-target initialization with the parameter t a r g e t _ l a b e l s . In Full Script 3, multiple targets are set to ER status, TCGA subtype, neoplasm histologic type and TIL regional fraction. The resulting output will be four subdirectories inside the project directory for each target saving the simple run's predictions, the metrics defined in the simple evaluators and a The whole script to deploy a multi-target model is given in Full Script 3.

Full Script 3: deploy_multitarget.py: The sample experiment script to deploy the trained multi-target model on the benchmark datasets.
It should be noted that it is possible to write a single script to train a model on a training cohort and deploy it on a test cohort in any other configuration of experiment. In this case, to perform the multi-target analysis that has been discussed so far, the user must modify the experiment script in a way that they define training and test cohort and instead of defining a load variable, simply pass the training test cohort to the simple deploy TaskGetter. This is shown in Full Script 4.

Cross-validation for within-cohort experiments
In addition to simple training, users have the option to perform cross-validation analyses where the training data has been randomly split into training and test datasets for a desired number of times (number of folds), and a model is being trained on each of the different training datasets and evaluated on the test dataset. Cross-validation, in this way, provides users a better overview of performance of the models by having several unseen data by each fold's models for their evaluation.
A whole sample experiment script to perform a DeepMed analysis with cross-validation is given below in the Full Script 5.
Full Script 5: crossvalidated_train.py: The sample experiment script to run within cohort cross-validation analysis to predict ER status only on the benchmark dataset TCGA-BRCA-A2.
In order to run a cross-validated deep learning analysis with DeepMed for a single target, an adapter called Crossval needs to be called. Crossval takes the target label and performs cross validation for it. Once the cross-validation steps have been completed, it applies additional evaluators. These evaluators can then operate on data from all folds. While the evaluators in the single TaskGetter would report AUROC, p value and the patient counts for each model, the evaluators defined in the cross-validation TaskGetter would aggregate these statistics for each target over the different folds and yield top tiles which influenced the model's decisions at most and the ROC curves on a patient-and fold-level.
When there is more than one target, as seen in the benchmark datasets used here, running a cross validation analysis with DeepMed only requires an additional run adapter for multi-target training. This multi-target run adapter executes the cross-validation task getter multiple times, each time with a different target label and output directory, allowing for additional evaluation over all the targets' cross-validation results. A whole experiment script with the training modes, crossvalidation and multi-target, is given in Full Script 6.
e t ) Full Script 6: crossvalidated_train_multitarget.py: The sample experiment script to run within-cohort cross-validation analysis on the benchmark dataset TCGA-BRCA-A2.

The Full Script 6 introduces another novelty that is the
o v e r option in the A g g r e g a t e S t a t s . Without the over option, results from all classes of each target will be reported for all folds in the statistics file. Thus, the over option commands the pipeline to aggregate over the desired column, in this case the folds. It results in a summary of the results including the mean auroc, total patient counts, and p values for the calculations.

Categorical and continuous targets
In the above samples, the training has been applied to both categorical and continuous data. Categorical targets were "ER Status By IHC" (binary categorical) and "TCGA Subtype" (multiclass categorical). This means the data within the target groups are always a finite number of non-overlapping classes, referring to "Positive" and "Negative" for "ER Status By IHC" and "Basal", "Lum A" and "Lum B" for "TCGA Subtype". In addition to categorical data, DeepMed is also able to process and train continuous data where the targets can take any number in an infinite range.
There are two ways that DeepMed handles continuous targets, such as "TMB (nonsynonymous)" (tumor mutation burden): discretization and regression. If the parameter n _ b i n s is initialized with a value greater than 0 when the experiment script is handed over to the pipeline, the continuous values are transformed into discrete values by putting them in a desired number of intervals or bins whose borders or cut-off points have been determined by the discretization algorithm. The bins are then subjected to classification in the same way as categorical targets are when being processed. The number of bins is 2 by default, meaning that if the n _ b i n s parameter is ignored, the continuous targets are going to be subjected to the binarized classification. If the n _ b i n s parameter is set to 0, DeepMed will perform a regression task. To evaluate the performance of such tasks, the coefficient of determination that can be called with r 2 in the evaluators parameter is available in the metrics. Full Script 7 describes an example script to run a regression analysis with a 3-fold cross-validation that is targeting "total number of mutations".

Multi-Modality
DeepMed can also perform analysis on data from multiple inputs. These inputs can be of different modalities, for example image or tabular data, clinical or genetic information. In DeepMed, it is possible to provide both continuous and categorical variables to the model as additional inputs. Multiple studies 33,34 have shown that the performance of Deep Learning models can improve by adding such additional pieces of information to the model. The general practice of combining the modalities is to have a high level embedding of individual modalities is tensor concatenation. In DeepMed, the image input (tiles of whole-slide images) is run through an ImageNet-trained ResNet-18, and the resulting feature vector is concatenated with the chosen tabular data normalized by computing the standard score. The new concatenated network is then given into two fully connected layers and connected to the output layers where predictions are made. The Full Script 8 shows a whole experiment script to run a multi-modal DeepMed analysis. The  Full Script 8: train_and_deploy_multitarget_multiinput.py: The sample experiment script which includes command to run training and deployment with an additional tabular input on the benchmark datasets for multiple targets.

Subgroup Training
One of the most important training modes of DeepMed is subgroup training which allows users to train models for subsets of the original dataset based on user-defined characteristics of the data. This enables us to train different models to analyze subsets of data and compare results. The subgroup training obliges users to create a Python function that describes how to divide the dataset into subsets. Here, we will show how to define a function that would retrieve the subgroups based on TMB values from the clinical because, in the backend of DeepMed, the defined operations described in this function will be applied on each row, which is a pandas (a Python module for data analysis) Series, of the pandas DataFrame.
The Full Script 9 presents a whole DeepMed experiment script to run a simple train-and-deploy experiment to predict the ER status on TMB-low and TMB-high patients separately. Run adapter to pass the simple TaskGetter, in this case, is

Parameterizing
DeepMed provides users with the opportunity to create and run experiments with different parameters from the same script. This process, called parameterizing, allows the user to run experiments with an unlimited number and combinations of parameters consecutively, thereby saving time and returning aggregated statistics in an orderly fashion.
A parameterization assigns values to arguments which can be given to a TaskGetter. Similarly to the Crossval and MultiTarget adapters, the Parameterize adapter allows us to repeatedly invoke a TaskGetter with different parameterizations. To do so, we supply it with a dictionary which maps the names of the result directories to their respective parameterizations.
In Full Script 10, we use parameterization to compare the effects of different auxiliary variables when training multi-modal models. The script will train and evaluate two models, one combining the image data with proliferation status and one combining image data with patient age at diagnosis. The results of these models will be saved in the folders 'with Proliferation' and 'with Diagnosis Age' respectively. After training both models, it will aggregate these statistics for their results into a single file.     Tables   E  x  p  e  r  i  m  e  n  t  P  r  e  p  r  o  c  e  s  s  i  n  g  M  o  d  e  l  E  v  a  l  u  a  t  i  o  n   1 .