## Abstract

Recent failures of clinical trials in Alzheimer’s Disease underline the critical importance of identifying optimal intervention time to maximize cognitive benefit. While several models of disease progression have been proposed, we still lack quantitative approaches simulating the effect of treatment strategies on the clinical evolution. In this work, we present a data-driven method to model dynamical relationships between imaging and clinical biomarkers. Our approach allows simulating intervention at any stage of the pathology by modulating the progression speed of the biomarkers, and by subsequently assessing the impact on disease evolution. When applied to multi-modal imaging and clinical data from the Alzheimer’s Disease Neuroimaging Initiative our method enables to generate hypothetical scenarios of amyloid lowering interventions. Our results show that in a study with 1000 individuals per arm, accumulation should be completely arrested at least 5 years before Alzheimer’s dementia diagnosis to lead to statistically powered improvement of clinical endpoints.

## 1. Introduction

The number of people affected by Alzheimer’s Disease (AD) has recently exceeded 46 millions and is expected to double every 20 years [1], thus posing significant healthcare challenges. Yet, while the disease mechanisms remain in large part unknown, there are still no effective pharmacological treatments leading to tangible improvements of patients’ clinical progression. One of the main challenges in understanding AD is that its progression goes through a silent asymptomatic phase that can stretch over decades before a clinical diagnosis can be established based on cognitive and behavioral symptoms. To help designing appropriate intervention strategies, hypothetical models of the disease history have been proposed, characterizing the progression by a cascade of morphological and molecular changes affecting the brain, ultimately leading to cognitive impairment [2, 3]. The dominant hypothesis is that disease dynamics along the asymptomatic period are driven by the deposition in the brain of the amyloid *β* peptide, triggering the so-called “amyloid cascade”[4, 5, 6, 7, 8]. Based on this rationale, clinical trials have been focusing on the development and testing of disease modifiers targeting amyloid *β* aggregates [9], for example by increasing its clearance or blocking its accumulation. Although the amyloid hypothesis has been recently invigorated by a post-hoc analysis of the aducanumab trial [10], clinical trials failed so far to show efficacy of this kind of treatments, as the clinical primary endpoints were not met [11, 12, 13], or because of unacceptable adverse effects [14]. In the past years, growing consensus emerged about the critical importance of intervention time, and about the need of starting anti-amyloid treatments during the pre-symptomatic stages of the disease [15]. Nevertheless, the design of optimal intervention strategies is currently not supported by quantitative analysis methods allowing to model and assess the effect of intervention time and dosing [16]. The availability of models of the pathophysiology of AD would entail great potential to test and analyze clinical hypothesis characterizing AD mechanisms, progression, and intervention scenarios.

Within this context, quantitative models of disease progression, referred to as Disease Progression Models (DPMs), have been proposed [17, 18, 19, 20, 21], to quantify the dynamics of the changes affecting the brain during the whole disease span. These models rely on the statistical analysis of large datasets of different data modalities, such as clinical scores, or brain imaging measures derived from Magnetic Resonance Imaging (MRI), Amyloid- and Fluorodeoxyglucose-Positron Emission Tomography (PET) [22, 23, 24]. In general, DPMs estimate a long-term disease evolution from the joint analysis of multivariate time-series acquired on a short-term time-scale. Due to the temporal delay between the disease onset and the appearance of the first symptoms, DPMs rely on the identification of an appropriate temporal reference to describe the long-term disease evolution [25, 26]. These tools are promising approaches for the analysis of clinical trials data, as they allow to represent the longitudinal evolution of multiple biomarkers through a global model of disease progression. Such a model can be subsequently used as a reference in order to stage subjects and quantify their relative progression speed [27, 28, 29]. However, these approaches remain purely descriptive as they don’t account for causal relationships among biomarkers. Therefore, they generally don’t allow to simulate progression scenarios based on hypothetical intervention strategies, thus providing a limited interpretation of the pathological dynamics. This latter capability is of utmost importance for planning and assessment of disease modifying treatments.

To fill this gap, recent works such as [30, 31] proposed to model AD progression based on specific assumptions on the biochemical processes of pathological protein propagation. These approaches explicitly define biomarkers interactions through the specification of sets of Ordinary Differential Equations (ODEs), and are ideally suited to simulate the effect of drug interventions [32]. However, these methods are mostly based on the arbitrary choices of pre-defined evolution models, which are not inferred from data. This issue was recently addressed by [33], where the authors proposed an hybrid modeling method combining traditional DPMs with dynamical models of AD progression. Still, since this approach requires to design suitable models of protein propagation across brain regions, extending this method to jointly account for spatio-temporal interactions between several processes, such as amyloid propagation, glucose hypometabolism, and brain atrophy, is considerably more complex. Finally, these methods are usually designed to account for imaging data only, which prevents to jointly simulate heterogeneous measures [34], such as image-based biomarkers and clinical outcomes, the latter remaining the reference markers for patients and clinicians.

In this work we present a novel computational model of AD progression allowing to simulate intervention strategies across the history of the disease. The model is here used to quantify the potential effect of amyloid modifiers on the progression of brain atrophy, glucose hypometabolism, and ultimately on the clinical outcomes for different scenarios of intervention. To this end, we model the joint spatio-temporal variation of different modalities along the history of AD by identifying a system of ODEs governing the pathological progression. This latent ODEs system is specified within an interpretable low-dimensional space relating multi-modal information, and combines clinically-inspired constraints with unknown interactions that we wish to estimate. The interpretability of the relationships in the latent space is ensured by mapping each data modality to a specific latent coordinate. The model is formulated within a Bayesian framework, where the latent representation and dynamics are efficiently estimated through stochastic variational inference. To generate hypothetical scenarios of amyloid lowering interventions, we apply our approach to multi-modal imaging and clinical data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Our results provide a meaningful quantification of different intervention strategies, compatible with findings previously reported in clinical studies. For example, we estimate that in a study with 100 individuals per arm, statistically powered improvement of clinical endpoints can be obtained by completely arresting amyloid accumulation at least 8 years before Alzheimer’s dementia. The minimum intervention time decreases to 5 years for studies based on 1000 individuals per arm.

## 2. Results

In the following sections, healthy individuals will be denoted as NL stable, subjects with mild cognitive impairment as MCI stable, subjects diagnosed with Alzheimer’s dementia as AD, subjects progressing from NL to MCI as NL converters, and subjects progressing from MCI to AD as MCI converters. Amyloid concentration and glucose metabolism are respectively measured by (18)F-florbetapir Amyloid (AV45)-PET and (18)F-fluorodeoxyglucose (FDG)-PET imaging. Cognitive and functional abilities are assessed by the following neuro-psychological tests: Alzheimer’s Disease Assessment Scale (ADAS11), Mini-Mental State Examination (MMSE), Functional Assessment Questionnaire (FAQ), Rey Auditory Verbal Learning Test (RAVLT) immediate, and RAVLT forgetting.

### 2.1. Study cohort and biomarkers’ changes across clinical groups

Our study is based on a cohort of 311 amyloid positive individuals composed of 46 NL stable subjects, 10 NL converters subjects, 106 subjects diagnosed with MCI, 76 MCI converters subjects, and 73 AD patients. The term “amyloid positive” refers to subjects whose amyloid level in the cerebrospinal fluid (CSF) was below the nominal cutoff of 192 pg/ml [35] either at baseline, or during any follow-up visit, and conversion to AD was determined using the last available follow-up information. The length of follow-up varies between subjects and goes from 0 to 6 years. Further information about the data are available on https://adni.bitbucket.io/reference/, while details on data acquisition and processing are provided in Section 4.1. We show in Table 1A socio-demographic information for the training cohort across the different clinical groups. Table 1B shows baseline values and annual rates of change across clinical groups for amyloid burden (average normalized AV45 uptake in frontal cortex, anterior cingulate, precuneus and parietal cortex), glucose hypometabolism (average normalized FDG uptake in frontal cortex, anterior cingulate, precuneus and parietal cortex), for hippocampal and medial temporal lobe volumes, and for the cognitive ability as measured by ADAS11. Compatibly with previously reported results [36, 37], we observe that while regional atrophy, hypometabolism and cognition show increasing rate of change when moving from healthy to pathological conditions, the change of AV45 is maximum in NL stable and MCI stable subjects. We also notice the increased magnitude of ADAS11 in AD as compared to the other clinical groups. Finally, the magnitude of change of FDG is generally milder than the atrophy rates.

The observations presented in Table 1 provide us with a glimpse into the biomarkers’ trajectories characterising AD. The complexity of the dynamical changes we may infer is however limited, as the clinical stages roughly approximate a temporal scale describing the disease history, while very little insights can be obtained about the biomarkers’ interactions. Within this context, our model allows the quantification of the fine-grained dynamical relationships across biomarkers at stake during the history of the disease. Investigation of intervention scenarios can be subsequently carried out by opportunely modulating the estimated dynamics parameters according to specific intervention hypothesis (e.g. amyloid lowering at a certain time).

### 2.2. Model overview

We provide in Figure 1 an overview of the presented method. Baseline multi-modal imaging and clinical information for a given subject are transformed into a latent variable composed of four z-scores quantifying respectively the overall severity of atrophy, glucose hypometabolism, amyloid burden, and cognitive and functional assessment. The model estimates the dynamical relationships across these z-scores to optimally describe the temporal transitions between follow-up observations. These transition rules are here mathematically defined by the parameters of a system of ODEs, which is estimated from the data. This dynamical system allows to compute the evolution of the z-scores over time from any baseline observation, and to predict the associated multi-modal imaging and clinical measures. The model thus enables to simulate the pathological progression of biomarkers across the entire history of the disease. Once the model is estimated, we can modify the ODEs parameters to simulate different evolution scenarios according to specific hypothesis. For example, by reducing the parameters associated with the progression rate of amyloid, we can investigate the relative change in the evolution of the other biomarkers. This setup thus provides us with a data-driven system enabling the exploration of hypothetical intervention strategies, and their effect on the pathological cascade.

In the following sections, MRI, FDG-PET, and AV45-PET images are processed in order to respectively extract regional gray matter density, glucose hypometabolism and amyloid load from a brain parcellation. The z-scores of gray matter atrophy (*z ^{atr}*), glucose hypometabolism (

*z*), and amyloid burden (

^{hmet}*z*), are computed using the measures obtained by this pre-processing step. The clinical z-score

^{amy}*z*is derived from neuro-psychological scores: ADAS11, MMSE, FAQ, RAVLT immediate, and RAVLT forgetting. Further details about experimental setup, method formulation, and data pre-processing are given in Section 4.

^{cli}### 2.3. Progression model and latent relationships

We show in Figure 2 the dynamical relationships across the different z-scores estimated by the model, where direction and intensity of the arrows quantify the estimated increase of one variable with respect to the other. Being the scores adimensional, they have been conveniently rescaled to the range [0,1] indicating increasing pathological levels. These relationships extend the summary statistics reported in Table 1 to a much finer temporal scale and wider range of possible biomarkers’ values. We observe in Figure 2A, 2B and 2C that large values of the amyloid score *z ^{amy}* trigger the increase of the remaining ones:

*z*,

^{hmet}*z*, and

^{atr}*z*. Figure 2D shows that large increase of the atrophy score

^{cli}*z*is associated to higher hypometabolism indicated by large values of

^{atr}*z*. Moreover, we note that high

^{hmet}*z*values also contribute to an increase of

^{hmet}*z*(Figure 2E). Finally, Figure 2F shows that high atrophy values lead to an increase mostly along the clinical dimension

^{cli}*z*. This chain of relationships is in agreement with the cascade hypothesis of AD [2, 3].

^{cli}Relying on the dynamical relationships shown in Figure 2, starting from any initial set of biomarkers values we can estimate the relative trajectories over time. Figure 3 (left), shows the evolution obtained by extrapolating backward and forward in time the trajectory associated to the z-scores of the AD group. The x-axis represents the years from conversion to AD, where the instant *t*=0 corresponds to the average time of diagnosis estimated for the group of MCI progressing to dementia. As observed in Figure 2 and Table 1, the amyloid score *z ^{amy}* increases and saturates first, followed by

*z*and

^{hmet}*z*scores whose progression slows down when reaching clinical conversion, while the clinical score exhibits strong acceleration in the latest progression stages. Figure 3 (right) shows the group-wise distribution of the disease severity estimated for each subject relatively to the modelled long-term latent trajectories (Section 4.7). The group-wise difference of disease severity across groups is statistically significant and increases when going from healthy to pathological stages (Wilcoxon-Mann-Whitney test

^{atr}*p*< 0.001 for each comparisons). The reliability of the estimation of disease severity was further assessed through testing on an independent cohort, and by comparison with a previously proposed disease progression modeling method from the state-of-the-art [25]. The results are provided in section 1 of the Supplementary Material and show positive generalization results as well as a favourable comparison with the benchmark method.

From the z-score trajectories of Figure 3 (left) we predict the progression of imaging and clinical measures shown in Figure 4. We observe that amyloid load globally increases and saturates early, compatibly with the positive amyloid condition of the study cohort. Glucose hypometabolism and gray matter atrophy increase are delayed with respect to amyloid, and tend to map prevalently temporal and parietal regions. Finally, the clinical measures exhibit a non-linear pattern of change, accelerating during the latest progression stages. These dynamics are compatible with the summary measures on the raw data reported in Table 1.

### 2.4. Simulating clinical intervention

This experimental section is based on two intervention scenarios: a first one in which amyloid is lowered by 100%, and a second one in which it is reduced by 50% with respect to the estimated natural progression. In Figure 5 we show the latent z-scores evolution resulting from either 100% or 50% amyloid lowering performed at the time *t* = −12.5 years. According to these scenarios, intervention results in a sensitive reduction of the pathological progression for atrophy, hypometabolism and clinical scores, albeit with a stronger effect in case of total blockage.

We further estimated the resulting clinical endpoints associated with the two amyloid lowering scenarios, at increasing time points and for different sample sizes. Clinical endpoints consisted in the simulated ADAS11, MMSE, FAQ, RAVLT immediate, and RAVLT forgetting scores at the reference conversion time (*t*=0). The case placebo indicates the scenario where clinical values were computed at conversion time from the estimated natural progression shown in Figure 3. Figure 6 shows the change in statistical power depending on intervention time and sample sizes. For large sample sizes (1000 subjects per arm) a power greater than 0.8 can be obtained around 5 years before conversion, depending on the outcome score, where in general we observe that RAVLT forgetting exhibits a higher power than the other scores. When sample size is lower than 100 subjects per arm, a power greater than 0.8 is reached if intervention is performed at the latest 8 years before conversion, with a mild variability depending on the considered clinical score. We notice that in the case of a 50% amyloid lowering, in order to reach the same power intervention needs to be consistently performed earlier compared to the scenario of 100% amyloid lowering for the same sample size and clinical score. For instance, if we consider ADAS11 with a sample size of 100 subjects per arm, a power of 0.8 is obtained for a 100% amyloid lowering intervention performed 8 years before conversion, while in case of a 50% amyloid lowering the equivalent effect would be obtained by intervening 10.5 years before conversion.

We provide in Table 2 the estimated improvement for each clinical score at conversion with a sample size of 100 subjects per arm for both 100% and 50% amyloid lowering depending on the intervention time. We observe that for the same intervention time, 100% amyloid lowering always results in a larger improvement of clinical endpoints compared to 50% amyloid lowering. We also note that in the case of 100% lowering, clinical endpoints obtained for intervention at *t*=-10 years correspond to typical cutoff values for inclusion into AD trials (ADAS11 = 13.4 ±6.2, MMSE= 25.8 ± 2.5, see Supplementary Table 2) [39, 40].

## 3. Discussion

We presented a framework to jointly model the progression of multi-modal imaging and clinical data, based on the estimation of latent biomarkers’ relationships governing AD progression. The model is designed to simulate intervention scenarios in clinical trials, and in this study we focused on assessing the effect of anti-amyloid drugs on biomarkers’ evolution, by quantifying the effect of intervention time and drug efficacy on clinical outcomes. Our results underline the critical importance of intervention time, which should be performed sensibly early during the pathological history to effectively appreciate the effectiveness of disease modifiers.

The results obtained with our model are compatible with findings reported in recent clinical studies [11, 12, 13]. For example, if we consider 500 patients per arm and perform a 100% amyloid lowering intervention for 2 years to reproduce the conditions of the recent trial of Verubecestat [12], the average improvement of MMSE predicted by our model is of 0.02, falling in the 95% confidence interval measured during that study ([-0.5; 0.8]). While recent anti-amyloid trials such as [11, 12, 13] included between 500 and 1000 mild AD subjects per arm and were conducted over a period of two years at most, our analysis suggests that clinical trials performed with less than 1000 subjects with mild AD may be consistently under-powered. Indeed, we see in Figure 6 that with a sample size of 1000 subjects per arm and a total blockage of amyloid production, a power of 0.8 can be obtained only if intervention is performed at least 5 years before conversion.

These results allow to quantify the crucial role of intervention time, and provide an experimental justification for testing amyloid modifying drugs in the pre-clinical stage [15, 41]. This is for example illustrated in Table 2, in which we notice that clinical endpoints are close to placebo even when the simulated intervention takes place up to 5 years before conversion, while stronger cognitive and functional changes happen when amyloid is lowered by 100% or 50% at least 10 years before conversion. These findings may be explained by considering that amyloid accumulates over more than a decade, and that when amyloid clearance occurs the pathological cascade is already entrenched [42]. Our results are thus supporting the need to identify subjects at the pre-clinical stage, that is to say still cognitively normal, which is a challenging task. Currently, one of the main criteria to enroll subjects into clinical trials is the presence of amyloid in the brain, and blood-based markers are considered as potential candidates for identifying patients at risk for AD [43]. Moreover, recent works such as [44, 45] have proposed more complex entry criteria to constitute cohorts based on multi-modal measurements. Within this context, our model could also be used as an enrichment tool by quantifying the disease severity based on multi-modal data as shown in Figure 3. Similarly, the method could be applied to predict the evolution of single patient given its current available measurements.

An additional critical aspect of anti-amyloid trials is the effect of dose exposure on the production of amyloid [16]. Currently, *β*-site amyloid precursor protein cleaving enzyme (BACE) inhibitors allow to suppress amyloid production from 50% to 90%. In this study we showed that lowering amyloid by 50% consistently decreases the treatment effect compared to a 100% lowering at the same time. For instance, if we consider a sample size of 1000 subjects per arm in the case of a 50% amyloid lowering intervention, an 80% power can be reached only 6.5 years before conversion instead of 5 years for a 100% amyloid lowering intervention. This ability of our model to control the rate of amyloid progression is fundamental in order to provide realistic simulations of anti-amyloid trials.

In Figure 2 we showed that amyloid triggers the pathological cascade affecting the other markers, thus confirming its dominating role on disease progression. Assuming that the data used to estimate the model is sufficient to completely depict the history of the pathology, our model can be interpreted from a causal perspective. However, we cannot exclude the existence of other mechanisms driving amyloid accumulation, which our model cannot infer from the existing data. Therefore, our findings should be considered with care, while the integration of additional biomarkers of interest will be necessary to account for multiple drivers of the disease. It is worth noting that recent works ventured the idea to combine drugs targeting multiple mechanisms at the same time [46]. For instance, pathologists have shown tau deposition in brainstem nuclei in adolescents and children [47], and clinicians are currently investigating the pathological effect of early tau spreading on AD progression [48], raising crucial questions about its relationship with amyloid accumulation, and the impact on cognitive impairment [49]. Our model would allow to address these questions by including measures derived from Tau-PET images, and simulating scenarios of production blockage of both proteins at different rates or intervention time.

Lately, disappointing results of clinical studies led to hypothesize specific treatments targeting AD sub-populations based on their genotype [50]. While in our work we describe a global progression of AD, in the future we will account for sub-trajectories due to genetic factors, such as the presence of *ϵ*4 allele of apolipoprotein (APOE4), which is a major risk for developing AD influencing both disease onset and progression [51]. This could be done by estimating dynamical systems specific to the genetic condition of each patient. Simulating the dynamical relationships specific to genetic factors would allow to evaluate the effect of APOE4 on intervention time or drug dosage. In addition, there exist numerous non-genetic aggravating factors that may also affect disease evolution, such as diabetes, obesity or smoking. Extending our model to account for panels of risk factors would ultimately allow to test in silico personalized intervention strategies. Moreover, a key aspect of clinical trials is their economic cost. Our model could be extended to help designing clinical trials by optimizing intervention with respect to the available funding. Given a budget, we could simulate scenarios based on different sample size, and trials duration, while estimating the expected cognitive outcome.

Results presented in this work are based on a model estimated by relying solely on a subset of the ADNI cohort, and therefore they may not be fully representative of the general AD progression. Indeed, subjects included in this cohort were either amyloid-positive at baseline, or became amyloid-positive during their follow-up visits (see Section 2.1). They may therefore provide a limited representation of the pathological temporal window captured by the model. Applying the model on a cohort containing amyloid-negative subjects may provide additional insights on the overall disease history. However, this is a challenging task as it would require to identify sub-trajectories dissociated from normal ageing [52, 53]. In addition to this specific characteristic of the cohort, there exists additional biases impacting the model estimation. For instance, the fact that gray matter atrophy becomes abnormal before glucose metabolism in Figure 4 can be explained by the generally high atrophy rate of change in some key regions in normal elders, such as in the hippocampus, compared to the rate of change of FDG (see Table 1). We note that this stronger change of atrophy with respect to glucose hypometabolism can already be appreciated in the clinically healthy group. The existence of such biases can also be observed in Figure 5, in which we notice that atrophy is less affected by intervention, implying that its evolution is here importantly decorrelated from amyloid burden.

## 4. Methods

### 4.1. Data acquisition and preprocessing

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. For up-to-date information, see www.adni-info.org.

We considered four types of biomarkers, related to clinical scores, gray matter atrophy, amyloid load and glucose hypometabolism, and respectively denoted by *cli, atr, amy* and *hmet*. MRI images were processed following the longitudinal pipeline of Freesurfer [54], to obtain gray matter volumes in a standard anatomical space. AV45-PET and FDG-PET images were aligned to the closest MRI in time, and normalized to the cerebellum uptake. Regional gray matter density, amyloid load and glucose hypometabolism were extracted from the Desikan-Killiany parcellation [55]. We discarded white-matter, ventricular, and cerebellar regions, thus obtaining 82 regions that were averaged across hemispheres. Therefore, for a given subject, **x**^{atr}, **x**^{amy} and **x**^{hmet} are respectively 41-dimensional vectors. The variable **x**^{cli} is composed of the neuro-psychological scores ADAS11, MMSE, RAVLT immediate, RAVLT forgetting and FAQ. The total number of measures is of 2188 longitudinal data points. We note that the model requires all the measures to be available at baseline in order to obtain a latent representation, but is able to handle missing data in the follow-up. Further details on the cohort are given in Section 2.1.

### 4.2. Data modelling

We consider observations , which correspond to multivariate measures derived from *M* different modalities (e.g clinical scores, MRI, AV45, or FDG measures) at time *t* for subject *i*. Each vector has dimension *D _{m}*. We postulate the following generative model, in which the modalities are assumed to be independently generated by a common latent representation of the data

**z**

_{i}(

*t*): where is measurement noise, while

*ψ*are the parameters of the function

_{m}*μ*which maps the latent state to the data space for the modality

_{m}*m*. For simplicity of notation we denote

**z**

_{i}(

*t*) by

**z**(

*t*). We assume that each coordinate of

**z**is associated to a specific modality

*m*, leading to an M-dimensional latent space. The Λ operator which gives the value of the latent representation at a given time

*t*, is defined by the solution of the following system of ODEs:

For each coordinate, the first term of the equation enforces a sigmoidal evolution with a progression rate *k _{m}*, while the second term accounts for the relationship between modalities

*m*and

*j*through the parameters

*α*. This system can be rewritten as:

_{m,j}*θ*denotes the parameters of the system of ODEs, which correspond to the entries of the matrices

_{ODE}**W**and

**V**. According to Equation 3, for each initial condition

**z**(0), the latent state at time

*t*can be computed through integration, .

### 4.3. Variational inference

We rewrite *p*(**X**_{i}(*t*)|**z**_{i}(*t*),**σ ^{2}, ψ**) as

*p*(

**X**

_{i}(

*t*)|

**z**

_{i}(

*t*

_{0}),

*θ*,

_{ODE}**σ**). Assuming independence between subjects, the marginal log-likelihood writes as:

^{2}, ψFor ease of notation, we drop the *i* index, and dependence on *t* and *t*_{0} is made implicit. Within a Bayesian framework, we wish to maximize in order to obtain a posterior distribution for the latent variable **z**. Since derivation of this quantity is generally not tractable, we resort to stochastic variational inference to tackle the optimization problem. We assume a prior for *p*(**z**), and introduce an approximate posterior distribution *q*(**z|X**) [56], in order to derive a lower-bound for the marginal log-likelihood:
where refers to the Kullback-Leibler (KL) divergence. We propose to factorize the distribution *q*(**z|X**) across modalities such that, *q*(**z|X**) = ⊓_{m}*q*(*z ^{m}*|

**x**

^{m}), where , is a variational Gaussian approximation with moments parameterized by the functions

*f*and

*h*. This modality-wise encoding of the data enables to interpret each coordinate of

**z**as a compressed representation of the corresponding modality. Moreover, the lower-bound simplifies as:

Details about the ELBO derivation and the computation of the KL divergence are given in sections 3 and 4 of the Supplementary Material. A graphical model of the method is also provided in Supplementary Figure 3, while Supplementary Algorithm 1 details the steps to compute the ELBO.

### 4.4. Model optimization

Using the reparameterization trick [57], we can efficiently sample from the posterior distribution *q*(**z**(*t*_{0})|**X**(*t*_{0})) to approximate the expectation terms. Moreover, thanks to our choices of priors and approximations the KL terms can be computed in closed-form. In practice, we sample from *q*(**z**(*t*_{0})|**X**(*t*_{0})) to obtain a latent representation **z**(*t*_{0}) at baseline, while the follow-up points are estimated by decoding the latent time-series obtained through the integration of the ODEs of Eq 3. The model is trained by computing the total ELBO for all the subjects at all the available time points. The parameters **ψ, φ**^{1}, *φ*^{2}, *θ _{ODE}, σ* are optimized using gradient descent, which requires to backpropagate through the integration operation.

In order to enable backpropagation through the ODEs integration we need to numerically solve the differential equation using only operations that can be differentiated. In this work, we used the Midpoint method which follows a second order Runge-Kutta scheme. The method consists in evaluating the derivative of the solution at (*t*_{i+1} + *t _{i}*)/2, which is the midpoint between

*t*at which the correct

_{i}**z**(

*t*) is evaluated, and the following

*t*

_{i+1}:

Therefore, solving the system of Equation 3 on the interval [*t*_{0},…, *t*] only requires operations that can be differentiated, allowing to compute the derivatives of the ELBO with respect to all the parameters, and to optimize them by gradient descent. Moreover, in order to control the variability of the estimated latent trajectory **z**(*t*) due to the error propagation during integration, we initialized the weights of *φ*^{1} and *φ*^{2} such that the approximate posterior of the latent representation for each modality *m* at baseline was following a distribution. Finally, we also tested other ODE solvers such as Runge-Kutta 4, which gave similar results than the Midpoint method with a slower execution time due its more expensive approximation scheme.

Concerning the implementation, we trained the model using the ADAM optimizer [58] with a learning rate of 0.01. The functions *f, h* and *μ _{m}* were parameterized as linear transformations. The model was implemented in Pytorch [59], and we used the

*torchdiffeq*package developed in [60] to backpropagate through the ODE solver.

### 4.5. Simulating the long-term progression of AD

To simulate the long-term progression of AD we first project the AD cohort in the latent space via the encoding functions. We can subsequently follow the trajectories of these subjects backward and forward in time, in order to estimate the associated trajectory from the healthy to their respective pathological condition. In practice, a Gaussian Mixture Model is used to fit the empirical distribution of the AD subjects’ latent projection. The number of components and covariance type of the GMM is selected by relying on the Akaike information criterion [61]. The fitted GMM allows us to sample pathological latent representations **z**_{i}(*t*_{0}), that can be integrated forward and backward in time thanks to the estimated set of latent ODEs, to finally obtain a collection of latent trajectories **Z**(*t*) = [**z**_{1}(*t*),…, **z**_{N}(*t*)] summarising the distribution of the long-term AD evolution.

### 4.6. Simulating intervention

In this section we assume that we computed the average latent progression of the disease **z**(*t*). Thanks to the modality-wise encoding of Section 4.3 each coordinate of the latent representation can be interpreted as representing a single data modality. Therefore, we propose to simulate the effect of an hypothetical intervention on the disease progression, by modulating the vector after each integration step such that:

The values *γ _{m}* are fixed between 0 and 1, allowing to control the influence of the corresponding modalities on the system evolution, and to create hypothetical scenarios of evolution. For example, for a 100% (resp. 50%) amyloid lowering intervention we set

*γ*= 0 (resp.

_{amy}*γ*= 0.5).

_{amy}### 4.7. Evaluating disease severity

Given an evolution **z**(*t*) describing the disease progression in the latent space, we propose to consider this trajectory as a reference and to use it in order to quantify the individual disease severity of a subject **X**. This is done by estimating a time-shift *τ* defined as:

This time-shift allows to quantify the pathological stage of a subject with respect to the disease progression along the reference trajectory **z**(*t*). Moreover, the time-shift can still be estimated even in the case of missing data modalities, by only encoding the available measures of the observed subject.

## Data availability

The data used in this study are available from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).

## Author information

Clément Abi Nader and Marco Lorenzi designed the method. Implementation was carried out by Clément Abi Nader. The manuscript was written by Clément Abi Nader with support from Marco Lorenzi, Nicholas Ayache, Giovanni Frisoni, and Philippe Robert.

## Competing interests

The authors declare no competing interests.

## Supplementary Material

### 1) Time-shift comparison and validation

We compared our estimated disease severity (Figure 3 in the manuscript) with the one obtained applying the monotonic Gaussian Process (GP) model of [25] from the state-of-the-art (Figure 1A). While both methods estimate significant time differences when going from healthy to pathological stages, our approach captures a larger temporal variability for both earlier and later stages of the disease, as shown in Figure 1B, highlighting a stronger separability across clinical stages.

We also assessed the model on an independent testing cohort from the ADNI composed of 130 NL stable, 10 NL converters, 125 MCI stable, 7 MCI converters, and 12 AD subjects which were not necessarily amyloid positive. It is important to note that no PET-FDG data was available for these subjects. We provide in Table 1 socio-demographic and clinical information for the testing cohort across the different clinical groups. Despite the fact that no FDG data was used to estimate the disease severity, we observe in Figure 2 that the method still exhibits good separating performances between clinical stages, coherently with the clinical status of the testing individuals.

### 2) Simulated clinical endpoints

We provide in Table 2 the estimated values for each clinical score at predicted conversion time for the normal progression case when performing the simulations presented in Section 2.4.

### 3) Lower bound

We provide here the detailed derivation to obtain the ELBO of Equation 6 in the main manuscript.

Given that:

We obtain:

### 4) KL divergence

We have that:

We use the closed-form formula to calculate the KL divergence between two normal distributions:

### 5) Graphical model

Figure 3 below provides the graphical model illustrating the method presented in Section 4.

### 6) ELBO computation

Algorithm 1 below details the steps to compute the ELBO for a given subject *i* at time *t*.

## Acknowledgments

This work has been supported by the French government, through the UCA^{JEDI} and 3IA Côte d’Azur Investments in the Future project managed by the National Research Agency (ref.n ANR-15-IDEX-01 and ANR-19-P3IA-0002), the grant AAP Sante 06 2017-260 DGA-DSH, and by the Inria Sophia-Antipolis-Méditerranée, “NEF” computation cluster.

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and DOD ADNI. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company;CereSpir, Inc.;Cogstate;Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.;Lumosity;Lundbeck;Merck & Co., Inc.; Meso Scale Diagnostics, LLC.;NeuroRx Research; Neurotrack Technologies;Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging;Servier; Takeda Pharmaceutical Company; and Transition Therapeutics.The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

## Footnotes

↵1 Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵