Abstract
Brain metastases (BMs) are the largest disabling site for non-small cell lung cancers, but are only visible when sizeable. Individualized prediction of the BM risk and extent is a major challenge for therapeutic decision. This study assesses mechanistic models of BM apparition and growth against clinical imaging data.
We implemented a quantitative computational method to confront biologically-informed mathematical models to clinical data of BMs. Primary tumor growth parameters were estimated from size at diagnosis and histology. Metastatic dissemination and growth parameters were fitted to either population data of BM probability (n=183 patients) or longitudinal measurements of number and size of visible BMs (63 size measurements in two patients). Pre-clinical phases from first cancer cell to detection were estimated to 2.1-5.3 years. A model featuring dormancy was best able to describe the longitudinal data, as well as BM probability as a function of primary tumor size at diagnosis. It predicted first appearance of BMs at 14-19 months pre-diagnosis. Model-informed predictions of invisible cerebral disease burden could be used to inform therapeutic intervention.
Author summary Management of brain metastasis in lung cancer is a major clinical challenge. This study reports on a quantitative modeling analysis of patient-specific time courses of the apparition and growth of brain metastases in lung cancer. Several biological theories were tested by confronting mechanistic mathematical models to clinical data. The best of these models, which features periods of stable metastatic size (dormancy), provides a valuable computational tool for personalized therapy, by informing on the extent of invisible cerebral metastases at various time points during therapeutic management. This information is critical for clinical interventions such as the use of whole brain radiation therapy.
Introduction
Lung cancer is the first cause of cancer-related death worldwide1. Nearly 80% of lung cancers are non-small cell lung cancer (NSCLC), and 60% of them are diagnosed at the metastatic stage2. Brain metastases (BMs) affect more than 20% of patients with NSCLC3, 4. Despite recent advances in this field, BMs remain a major challenge as they are associated with a poor prognosis5. In addition, BMs are responsible for disabling symptoms decreasing patients’ quality of life. Lung cancer is known to be one of the most deleterious in terms of BMs3, 6– 9.
The purpose of the current study is to address the following biological and clinical problems using quantitative mathematical modeling: 1) What was the pre-diagnosis course of a patient presenting with NSCLC? When was the first cancer cell initiated? 2) For a patient developing clinically overt BMs, when did cerebral invasion occur? Was it at an early or late stage of the disease (linear versus parallel model of metastasis10, 11)? 3) Were most of the BMs spread by the primary tumor (PT) or was metastasis-to-metastasis spread of significant importance12–14? 4) How did the BMs grow in comparison with the PT? At a same or distinct rate? Were there dormancy periods15, 16? 5) For patients with no BM at diagnosis, what is the risk and extent (if any) of the occult disseminated disease in the brain? When will the BMs appear (if present)? 6) For patients with a limited number of BMs (typically, one to three), what therapeutic strategy to follow, in particular regarding the use of whole-brain radiotherapy (WBRT)?
The latter is particularly relevant since, as of today, the utility of WBRT in the management of BMs from NSCLC is still controversial, due to important neuro-cognitive toxicities 6, 17, 18. Several phase III trials were conducted but no firm conclusion applicable for the entire patient population could be drawn19, 20, in particular regarding to the epidermal growth factor receptor (EGFR) mutation status21, 22. This points to the need of rational tools to decide therapeutic action in a patient-specific way. Similarly, the clinical follow-up and planning of cerebral magnetic resonance imaging (MRI) could highly benefit from individualized predictions of the probability of relapse.
To face these issues, quantitative mathematical modeling may be of considerable help, by providing new insights as well as useful numerical tools in the era of personalized medicine23, 24. However, despite numerous studies, the majority of the efforts have remained focused on mathematical models at the scale of cell populations25, and relatively few studies have focused on the metastatic process. Moreover, the confrontation of the models to empirical data remains infrequent. Historically, modeling efforts in the field of metastasis were first initiated by statistical models phenomenologically describing relapse hazards26–29. On the mechanistic side, in the 1970’s, Liotta et al. were pioneers in the development of a biologically-based, low-parameterized and experimentally-validated model for all the main steps of the metastatic process30. Since then, only a relatively small number of studies addressed this topic31–40. Of specific relevance to the current work, the Iwata model31 introduced a size-structured population approach to capture the time development of a population of regional metastases from a hepatocellular carcinoma. It was further extended and studied from the mathematical and numerical points of view41–43 and confronted to animal data37–39.
Apart from notable exceptions31, 34, 44, 45, the use of mathematical modeling to interrogate clinical data remains very limited. Animal studies allowing to capture the natural course of the disease37, 38 and the effect of therapeutic interventions in controlled environments39 provide valuable data for quantitative analysis. However, experimental procedures are tedious and often only provide access to the total metastatic burden, neglecting its size distribution into distinct metastatic tumors. The latter information requires access to imaging modalities only available in rare occasions in animal experiments38. Individual-level clinical data of metastatic development are also challenging to obtain because precise number and size of existing lesions are not routinely reported in medical records. Moreover, patients usually receive treatment soon after diagnosis, which hampers access to the natural course of the disease. Apart from the landmark work of Iwata et al.31, no study has modeled longitudinal data of individual-level number and size of metastases. Here, we focused on BMs from NSCLC considering that: 1) they are of worse prognosis than other metastatic locations5, 2) they are easily quantifiable using MRI, 3) due to the blood brain barrier, BMs are often hardly reached by systemic treatments, thus calling for predictive tools and 4) integration of multiple metastatic sites would require substantial improvements of the model beyond the scope of this work.
Grounded on biological knowledge about the organism-scale dynamics of metastatic disease46, we present here a computational method for calibration (from clinical data) and simulation of: 1) the pre-diagnosis PT growth phase, 2) the BM probability as a function of diagnosis PT size and 3) different biological hypotheses of PT and BM dynamics. The resulting mechanistic model is further employed to infer clinically relevant parameters such as the time of BM appearance and number and size of invisible lesions.
Results
Pre-diagnosis natural course of lung primary tumors
We investigated two possible growth models for the natural course of lung PTs: exponential and Gompertz models. Exponential growth is the simplest model expressing uncontrolled proliferation and is often adapted to describe tumor growth kinetics during limited observation periods47. However, it has been shown that on longer timeframes (typically for volumes to increase 100 to 1000-fold), the specific growth rate of tumors decreases48, which is well captured by the Gompertz model48–50. For calibration of the models, we used primarily the data of the PT size and mean doubling time at diagnosis according to the histology of lesions, retrieved from a meta-analysis from the literature comprising a total of 388 adenocarcinomas and 377 squamous cell carcinomas47, 51 (see supplementary Table S1). For calibration of the Gompertz model (two parameters), we additionally assumed a carrying capacity of 1012 cells10. For an adenocarcinoma with a median diameter of 35 mm3 at diagnosis52, we obtained a pre-diagnosis phase of 19 years in the exponential case versus 5.4 years in the Gompertz case (Figure 1). The first figure seems unrealistic in comparison with previous reports estimating the age of lung primary tumors to be 3-4 years old53, based on a different method using time to recurrence due to Collins11. The Gompertz estimate on the other hand is rather consistent with the literature range. Moreover, the resulting estimate of α0,p – which can be interpreted as the cellular proliferation rate – was realistic, generating an estimated length of the cell cycle of 24.4 hours54. We therefore concluded that this model was better adapted to describe the pre-diagnosis natural history of the PT.
Population-level probability of brain metastasis occurrence as a function of primary tumor size can be described by a mechanistic computational model of metastasis
PT size is a major predictive factor of BM in NSCLC52. We extracted data from the literature about the quantitative relationship between PT size at diagnosis and BM apparition probability52, focusing on adenocarcinoma (n = 136) and squamous cell carcinoma (n = 47) because these are the two histological types of the patient data used below. Following a method previously employed for breast cancer metastatic relapse39, 55, the dissemination law we considered (a power law of the PT size, see equation (2)) – combined with the estimated preclinical Gompertz growth of the PT – was able to adequately fit the data, in both histological types (Figure 2). Of note, this result was obtained with a minimal number of parameters to describe the inter-patient heterogeneity in metastatic potential. Namely, apart from the PT size, a population distribution on only the parameter μ was sufficient to describe the data. Consequently, this parameter had a very important coefficient of variation (> 6,000%). The median value μpop gives a quantitative way to measure the reported higher BM aggressiveness of adenocarcinomas over squamous cell carcinomas52 and we found a difference of two log10 orders of magnitude.
A benchmark of biological scenarios against individual longitudinal data of metastatic number and size suggests dormancy
Growth kinetics
We further used our modeling framework – designed to simulate the dynamics of BM apparition and growth (see Methods) – to interrogate patient-specific longitudinal data retrieved from clinical imaging during post-diagnosis follow-up. These consisted in 10 and 11 PT measurements and 47 and 16 BM measurements in two patients (one with an adenocarcinoma, the other with a squamous cell carcinoma). We first analyzed the adenocarcinoma patient to develop the model and used the second patient to validate our findings.
The PT first responded to systemic therapy (EGFR tyrosine kinase inhibitors and chemotherapy) before slowly regrowing (Figure 3A). However, a first distant BM was detected 20 months after diagnosis, which kept growing uncontrolled (Figure 3B). Other BMs appeared during follow-up (Figure 3C), reaching a total of 20 BMs at 48 months, date of last examination (Figure 3D-E).
To model the effect of systemic treatment on the PT, we found that a tumor growth inhibition model56 (equation (1)) was able to adequately fit the data (Figure 3A).
Interestingly, for BM kinetics, we found that when applying the Gompertz model with parameters calibrated from the method explained above (i.e. only exploiting the PT size at diagnosis and its histology), when using initial BM conditions, the predicted growth curves when using initial conditions matched the observed data surprisingly well (Figure 3B). This indicated that: 1) for this patient, the BMs did not respond to PT therapy, at least during the observed phase, 2) it is reasonable to assume that all BMs grow at the same growth rate, 3) the BMs growth rate might be similar to the PT growth rate, at least during the clinically overt phase and 4) this growth rate is reasonably well estimated from the method we proposed.
Quantitative assessment of five theories of metastatic dissemination and colonization
However, this mere description of the BMs growth is not satisfactory as a model of systemic disease, since the dissemination part is absent. In particular, no model is given for the apparition times of the BMs. To include the dissemination component of the metastatic process, we relied on a modeling framework first initiated by Iwata et al.31 (see Methods). It consists in describing the population of BMs by means of a time-dependent size-structured density. The relevant quantity to be compared with the model is then the BM cumulative size distribution (Figure 3E).
We first asked whether an elementary base model was able to reproduce the data. It consisted mostly in the assumptions of 1) same growth parameters for the PT and the BMs and 2) no secondary dissemination (i.e. BMs spread only by the PT). The best fit of this model was inaccurate (Figure S1), suggesting that more intricate phenomena were at play. Therefore, we improved the base model into four more intricate scenarios (Figure 4) to be tested against the data: 1) secondary dissemination, i.e. the ability of BMs to spread BMs themselves31, 2) a delay before initiation of metastatic ability of the PT (the so-called linear model in which dissemination occurs at a late stage, opposed to the parallel model where dissemination is an early event10), 3) differential growth, i.e. different growth parameters for the PT and the BMs and 4) dormancy, i.e. the ability of disseminated cells to survive as single cells or as a small size bulk for a given period before resuming growth15, 16, 57.
The models exhibited differential descriptive power, as quantified by the best-fit value of the objective function (Table 1). Interestingly, inclusion of secondary dissemination yielded similar results as the base model (Figure S2), suggesting that if this process does occur in the reality, then it does not affect significantly the time course of visible BMs. Indeed, under this model, even at the last time point (48 months post-diagnosis), there were only 12 second-generation BMs representing a small total burden of 18,700 cells with the largest BM comprising 16,400 cells (≃ 0.32 mm).
Adjunction in the model of a delay for BM spreading initiation significantly improved the fit (Figure 5C), with a very large inferred value of the delay td (4.8 years after the first cancer cell, i.e. 6 months before diagnosis). However, the improvement was not as good as for the two remaining models, in particular on the cumulative size distribution (Figure S3A). Consideration of different growth parameters between the PT and the BMs led to a significant improvement of the goodness-of-fit for both the dynamics of the number of visible BMs and cumulative size distributions while not deteriorating the practical identifiability of the parameters (Figure S4). Of note, the estimated BM growth parameters remained close to the PT ones. The dormancy model also achieved accurate goodness-of-fit, with an estimated dormancy time of τ = 133 days (Figure 5 and Table 2). Together, these two models were the best to describe both the dynamics of BM apparition (Figure 5A-B) and their size distribution (Figure 5C). Given the previous observation that during the clinical phase the BMs grew at a growth rate consistent with the one predicted for the PT, combined to the facts that the optimal objective value was achieved for the dormancy model and that it is more parsimonious (one parameter less), we selected the latter for further predictions. It should be noted however that despite good description overall of the time dynamics of cumulative size distributions, the size of the largest metastasis was underestimated (Figure 5C). Clinically relevant inferences about the actual time of appearance of the BMs were similar.
Collectively, our methods allowed a quantitative comparison of several theories of metastatic development, suggesting rejection of the base, secondary and delay models alone while the dormancy and different growth models were able to explain the data.
Clinically relevant simulations of the disease course reveals time of brain metastases onset
From the quantitative calibration of the dormancy model to the data, several inferences of clinical interest can be made. The value of γ that generated the best-fit was 1, suggesting equal probability for all the cells within the PT to disseminate. In turn, this can be the sign of a well-vascularized tumor, which might be prone to anti-angiogenic therapy. Interestingly, the value of μ inferred from this patient-level data was in the same range as the one inferred from the above population analysis (μ = 2 × 10−12 versus μpop = 1.39 × 10−11, see Table 2), giving further support to our population approach. All parameters were estimated with excellent precision (standard errors < 5%, Table 2).
Once calibrated from the data, our model allowed to simulate the predicted natural history of the disease. The supplementary movie S1 shows a simulation of the PT growth together with the apparition of the entire population of BMs (visible + invisible). Stars indicate tumors that are present but invisible (< 5 mm), and the BMs size distribution time course is also simulated. BMs represented in gray were born before the diagnosis, while BMs in white are the ones born after.
Apart from the age of the PT, prediction of birth times and invisible BM burden at any time point could be conducted. Interestingly, we found that the first BM – which clinically appeared 19 months after diagnosis – was already present 14 months before clinical detection of the PT (Figure 6A-B). In fact, at the time of diagnosis, our model predicted the presence of occult BMs, representing a total burden of 1,167 cells mostly distributed into the largest (first) BM (size 1,046 cells ≃ 0.126 mm), see Figure 6C. Notably, when the PT was at the size of the visibility threshold, no metastasis was predicted to have occurred yet. This suggests that if the disease had been detected through systematic screening, BM occurrence might have been prevented provided the tumor had been operable at this time. Of interest to radiotherapists, the amount of predicted BMs present at time of appearance of the first BM was already of 28 tumors, the largest one being 2.79 mm large and with an overall BM burden of 1.7 × 10C cells. Therefore, provided that neuro-cognitive risks would be acceptable, the model would recommend pan-cerebral intervention rather than localized intervention only.
Together, these results demonstrate the clinical utility of the model for prediction of the invisible BM in order to inform therapeutic decision.
Validation in a second patient
To test whether the dormancy model was generalizable, we used data from a second patient, which was not employed during the model development phase. Given the different histological type of the lung PT (squamous cell carcinoma), we adapted the doubling time accordingly (Table S1) and found a younger age of the PT of 2.1 years. Estimation of the log-kill parameter from the TGI PT model during therapy suffered from lack of identifiability, due to an estimated short duration of treatment effectiveness (see parameter k in Table 2).
Response of the PT to therapy was indeed characterized by a faster relapse growth rate (α1) as compared to the first patient. The qualitative structure of the model confirmed its descriptive power by being able to give an accurate description of both the treated PT and BMs dynamics (Figure S5-6), while the “base” and “secondary dissemination” models were not. Several parameters appeared to be patient-specific, such as the dormancy duration τ, estimated to 171 days. Due to the lower number of data points available for this patient, parameters identifiability was found worse, but still acceptable for the dissemination and BM growth model (Table 2).
Resulting clinical predictions were distinct (Figure S6), emphasizing the patient-specific nature of BM dynamics. The first BM occurred 14 months after PT onset, but was predicted to have been disseminated 45 months prior to diagnosis. While for both patients cerebral dissemination had already occurred at the time of diagnosis, its extent was different with a much larger mass (>1,000 cells) for the first patient than for the second (8 cells). This is due to the long period of dormancy for patient 2, resulting in all 8 BMs being still dormant at the time of diagnosis. Whereas the first BM appeared sooner ion the second patient, our model was able to describe the lower mass of cells thanks to the dormancy parameter.
Discussion
Using both population-and individual-level data of BM development in NSCLC, we have developed a general method based on biologically grounded computational models that allows: 1) to infer the disease age from the PT size and histology, 2) to test multiple scenarios of metastatic dissemination and colonization against macroscopic data available in the clinic – which suggested the presence of prolonged periods of dormancy–, and 3) to infer times of BM initiations and number and size of invisible lesions.
Estimation of the duration of the preclinical PT growth has important implications in terms of prediction of BMs, since BM is more likely to have occurred for an old PT compared to a young one. Our results showed a significant difference whether considering exponential or Gompertz growth, which was consistent with previous findings estimating similar unrealistically long pre-diagnosis periods under exponential growth (up to 54 years for tumors detected during screening programs51). Of note, our age estimates of 5.3 and 2.1 years-old are in relative agreement with the 3-4 years range found by others relying on a different method53.
Importantly, we were then able to lever this description of the natural PT growth into a mechanistic model able to describe the probability of BM occurrence. Our model then provided a quantitative theory for the reported differences in BM between adenocarcinomas and squamous cell carcinomas in terms of a difference in the cell-scale dissemination parameter μ.
More generally, we believe that our computational platform provides a way to translate biological findings into clinically useful numerical tools. However, in order to provide robust inference, the complexity of the model had to remain commensurate with the available data. Thus, several higher order phenomena relevant to the metastatic process were ignored or aggregated into mesoscopic parameters. The metastatic potential μ for instance is the product of several cell-scale probabilities relating to the multiple steps of the metastatic cascade58. Interestingly, the median value inferred from a population analysis based on probabilities of BM (μpop = 1.39 × 10−11 and 1.76 × 10−13) matched the ones that were found from analysis of patient-specific data (μ = 2 × 10−12 and 1.02 × 10−12), given their variability. Higher metastatic ability found in the patient 2 could be due to histology or to the EGFR mutational status, known to impact on the BM aggressiveness3, 22, 59, 60. Interestingly, while relying on distinct modeling techniques, the value we obtained is in the biologically realistic range derived by others using stochastic evolutionary modeling34, 44.
It is an open debate in the literature to determine the value of γ in the dissemination law (2). Some use , arguing that cells able to leave are located at the surface of the PT31, 37. Others found that small values of γ close to zero were most appropriate, using this finding as support for the cancer stem cells theory (constant amount of metastatic susceptible cells within the PT regardless of its size)61. We have previously demonstrated that the value of γ was not identifiable from longitudinal data of total metastatic burden39. Unfortunately, we reached the same conclusion from the data available in the present study, and used γ = 1 based on a parsimony argument, as done by others who don’t consider the spatial repartition of metastatically-able cells34, 44. Further insights about the spatial repartition of metastatic clones might emerge from bulk sequencing studies of combined PT and metastases12.
Second-order phenomena that were ignored here for the sake of identifiability but nevertheless could impact on systemic dynamics include tumor-tumor interactions, either through soluble circulating factors62 or by exchange of tumor cells between established lesions12, 63. We have recently proposed a model for distant interactions that was validated in a two-tumors experimental system and could be incorporated into the current modeling platform64, 65.
While not problematic for the two patients that were investigated here since BMs did not respond to the systemic treatment – possibly because of the blood-brain barrier hampering delivery of the anti-cancer agents – a major limitation of our model is that the effect of therapy (other than on the PT or surgery) is not included. We intend to address this in future work, in particular for optimizing and personalizing the design of combination therapies24, 66, 67. Moreover, the current study needs to be extended to a larger number of patients to decipher general trends in BM patterns of NSCLC patients.
In order to translate our findings into a clinically usable tool, further methods need to be developed to calibrate the small number of parameters of the model from data already available at diagnosis or at the first BM occurrence. To this respect, in addition to routine clinico-demographic features, molecular gene expression signatures68 (for parameter μ, for instance) as well as radiomics features predictive of metastatic relapse69 (for parameter γ) might represent valuable resources.
Materials and methods
Patient data
The data used in this study concerned patients with non-small cell lung cancer (NSCLC) and were of two distinct natures: 1) population data of probability of BM as a function of PT size retrieved from the literature52 and 2) longitudinal measurements of PT and BM diameters in two patients with NSCLC retrieved from imaging data (CT scans for lung lesions, MRI for brain tumors). Both patients had unresectable PT at diagnosis. The first patient (used for model development) was extracted from an EGFR mutated cohort from Institut Bergonié (Bordeaux, France). The second patient (used for model validation) had an EGFR wild type squamous cell lung carcinoma and was issued from routinely treated patients in the thoracic oncology service of the University Hospital of Marseille. The data comprised 10 PT sizes and 47 BM sizes (spanning 6 time points) for the first patient and 11 PT sizes and 16 BM sizes (spanning 4 time points) for second patient.
Data use from the Marseille patient was approved by a national ethics committee (International Review Board of the French learned society for respiratory medicine, reference number 2015-041), according to French law. The data from the Bordeaux patient was collected during a retrospective study occurring prior to the 2016 Jardé act requiring formal approval by an ethics committee. It was nevertheless approved by the internal Bergonie research college. The data were analyzed anonymously.
Mathematical modeling of primary tumor growth and metastatic development
Primary tumor growth
The pre-diagnosis natural history of the primary tumor size Sp(t) for times t < Td (diagnosis time) – expressed in number of cells – was assumed to follow the Gompertz growth model49, 50, i.e.
where time t = 0 corresponds to the first cancer cell, parameter α0,p is the specific growth rate at this time and βp is the exponential rate of decrease of the specific growth rate. Conversions from diameter measurements to number of cells were performed assuming spherical shape and the classical assumption 1 mm3 = 106 cells70. After treatment start (Td), the primary tumor size was assumed to follow a tumor growth inhibition model56 consisting of: 1) exponential growth (rate α1), 2) log-kill effect of the therapy (efficacy parameter k)71 and 3) exponential decrease of the treatment effect due to resistance, with half-life tres. The equation is:
Metastatic development
Base model
The general modeling framework we employed was derived from the work of Iwata et al.31. It consists in modeling the population of metastases by means of a size-structured density p(t,s), of use to distinguish between visible and invisible tumors. Metastatic development of the disease is reduced to two main phases: dissemination and colonization72. The multiple steps of the metastatic cascade58 are aggregated into a dissemination rate with expression:
which corresponds to the number of successfully born BM per unit of time. In this expression, the geometric parameter γ corresponds to the intra tumor repartition of the cells susceptible to yield metastasis and μ is the per day per cell probability for such cells to overcome all the steps of the metastatic cascade (acquisition of metastatic-specific mutations, epithelial-to-mesenchymal transition, invasion of the surrounding tissue, intravasation, survival in transit, extravasation and survival in the brain). For γ = 1 all cells in the PT have equal probability to give a BM whereas a value of γ = 0 indicates a constant pool of cells having metastatic ability (cancer stem cells). Intermediate values 0 < γ < 1 can be interpreted as the geometric disposition of the metastatic-able cells, including the surface of the tumor (γ = 2/3) or a fractional dimension linked to the fractal nature of the tumor vasculature73. Assuming further that the growth of the metastasis follows a gompertzian growth rate
with growth parameters α0 and β possibly equal (base model) or distinct (different growth model) compared to the PT ones, the density ρ satisfies the following transport partial differential equation31:
where S0 is the size of a BM at birth (here assumed to be one cell). From the solution to this equation, the main quantity of interest for comparison to the empirical data is the number of metastasis larger than a given size s (cumulative size distribution):
The total number of metastases – denoted N(t) – is obtained by using s = V0 above and its expression can be directly computed without solving the entire problem (3) as it is given by:
Using the method of characteristics, one can derive the following relationship between N and f:
where t(s) is the time for a tumor growing at rate g to reach the size s. In the case of Gompertz growth one has:
Of particular interest is the number of visible BMs f(t, svis) with Svis the minimal visible size at CT scan taken here to be 5 mm in diameter.
Delay model
Consideration of a delay t0 before onset of metastatic dissemination in the model can be taken into account by remarking that
Thus
Dormancy model
For inclusion of dormancy in the model – defined as a period of duration τ during which a newborn metastasis remains at size s0 – the time to reach any given size s > S0 becomes tdorm (s) = t(s) + τ. The cumulative size distribution is then given by:
Secondary dissemination
In the previous model formulations, all BMs were assumed to have been seeded by the primary tumor. When BMs are also allowed to spread metastases themselves, this results in a second term in the boundary condition of (3) and the model becomes31:
In this case, formula (4) is not valid anymore, which complicates substantially the computation of the cumulative size distribution. A dedicated scheme based on the method of characteristics was employed43.
Discrete versions of the models
While continuous versions of the models were used for fitting the model to the data because they allow computations to be tractable, discrete versions were implemented for forward calculations, because of the small number of BMs. Briefly, in the base model case, the appearance time of the i-th BM Ti is defined by
The size of the i-th BM si(t) is then defined, for t > Ti, by:
For links between stochastic (Poisson process) and continuous versions of the Iwata model, the reader is referred to74.
Models’ fits and parameters estimation
Parameters calibration for the primary tumor growth
To parameterize the Gompertz function defining the PT growth, two parameters need to be defined (α0,p and βp). In the absence of longitudinal measurements of the PT size without treatment, these two parameters were determined from two considerations: 1) the maximal reachable size (carrying capacity, equal to ) of a human tumor is 1012 cells10, 29 and 2) the histology-dependent value of the doubling time at diagnosis, retrieved from a meta-analysis of published literature about the natural growth of lung PTs (see Table S1, extended from47, 51). The latter yielded values of 201 days for an adenocarcinoma and 104 days for an undifferentiated carcinoma. For the Gompertz model, the doubling time is size-dependent and its value for the PT diagnosis size DT(Sd) is given by:
Using the formula linking a0,p and β to the carrying capacity, this nonlinear equation was numerically solved.
Population level: probability of BM apparition
To fit the data from52 describing the probability of BM in a population of lung adenocarcinoma patients, we employed a previously described methodology39, 55. Briefly, we considered that the probability of developing BM after diagnosis was the probability of having already BM at diagnosis, i.e. ℙ(N(Td)>1). We fixed the value of the PT growth parameters as described above from the cohort histology (adenocarcinoma) and set γ = 1 as the simplest dissemination model. The inter-individual variability was then minimally modeled as resulting from a lognormal population distribution of the parameter μ (ln μ ~ N(ln(μpop), μσ)). Uniform distributions of the PT diameters were assuming within each interval (si, si + 1) of the PT sizes S1, …, S6 given as data. The probability of developing a metastasis with a PT size S ∊ (si, Si + 1) writes:
The best-fit of these probabilities – evaluated by Monte Carlo simulations – to the empirical data was then determined by least squares minimization performed using the function fminsearch of Matlab (Nelder-Mead algorithm)75.
Individual level: description of longitudinal data of number and size of BM growth
Maximum likelihood estimation
Due to the discrete nature of the data at the individual level (diameters of a small number of BMs at discrete time points), a direct comparison between the size distribution ρ solution of the problem (3) was not possible. Instead, we compared the data to the model by means of the cumulative size distribution. Denoting by tj the observation times, the sorted BM sizes at time tj and the number of metastases larger than size at time tj, we considered the following nonlinear regression problem:
where θ = (α0,p, βp, α0, β, μ, γ t0, τ) regroups all the parameters of the model. Note that, except for the “secondary dissemination” model, all models can be viewed as submodels of a general model including all these parameters (the “base model” consisting of the case α0,p = α0,βp = β,t0 and τ = 0, for instance). Classical maximum likelihood estimation then leads to the following estimate:
Parameters identifiability
Standard errors can be computed from this statistics’ covariance matrix, given by76:
where is the jacobian matrix of the model (with respect to the parameter vector θ at all time points and all sizes, evaluated at the optimal parameter and is the a posteriori estimate of σ given by
with N the total number of data points and P the number of free parameters.
Using the standard errors as an identifiability metric, we repeatedly observed a lack of identifiability of parameters μ and γ when fitted together. Indeed, γ were larger than 200% when fitting the base model to the data. Further investigation of the shape of the objective function confirmed this lack of idenfiability (Figure S7). To address this issue, we only considered a finite set of relevant possible values for γ and only optimized the value of μ. These values were (0, 0.4, 0.5,, 1) and corresponding initial conditions for μ were (10−3, 10−5, 10−8, 10−9, 10−12). When more parameters were let free in the model (delay td or dormancy period τ), we generated 4 × 4 parameters grids for initial conditions (with td ∊ {0,500,1700,2000}). Of these 16 optimization problems, the one with the minimal value of l at convergence was selected.
Footnotes
The authors declare no potential conflicts of interest