Abstract
Root system analysis is a complex task, often performed using fully automated image analysis pipelines. However, these pipelines are usually evaluated with a limited number of ground-truthed root images, most likely of limited size and complexity.
We have used a root model, ArchiSimple to create a large and diverse library of ground-truthed root system images (10.000). This library was used to evaluate the accuracy and usefulness of several image descriptors classicaly used in root image analysis pipelines.
Our analysis highlighted that the accuracy of the different metrics is strongly linked to the type of root system analysed (e.g. dicot or monocot) as well as their size and complexity. Metrics that have been shown to be accurate for small dicot root systems might fail for large dicots root systems or small monocot root systems. Our study also demonstrated that the usefulness of the different metrics when trying to discriminate genotypes or experimental conditions may vary.
Overall, our analysis is a call to caution when automatically analysing root images. If a thorough calibration is not performed on the dataset of interest, unexpected errors might arise, especially for large and complex root images. To facilitate such calibration, both the image library and the different codes used in the study have been made available to the community.
Introduction
Roots are of outmost importance in the life of plants and hence selection on root systems represents great promise for improving crop tolerance (as reviewed in (Koevoets et al., 2016)). As such, their quantification is a challenge in a multitude of research projects. This quantification is usually twofold. The first step consists in acquiring an image of the root system, either using classic image techniques (CCD cameras) or more specialized ones (microCT, X-Ray, fluorescence, …). The next step is to analyse the picture in order to extract meaningful descriptors of the root system.
To paraphrase the famous belgian surrealist painter, René Magritte, figure 1A is not a root system. Figure 1A is an image of a root system and that distinction is important. Such an image is indeed a two dimensional representation of a root system, which is usually a three dimensional object. Until now, measurements are generally not performed on the root systems themselves, but on the images and this raises some issues.
A. Image of a 2-week old maize root system grown in rhizotron. B. Close-up showing overlapping roots. C. Close-up showing crossing roots.
Image analysis is, by definition, the obtention of metrics (or descriptors) describing the objects contained in a particular image. In a perfect situation, these descriptors would accurately represent the biological object of the image with negligible deviation from the biological truth (or data). However, in many cases, artefacts might be present in the images so that the representation of the biological object is not accurate anymore. These artefacts might be due to the conditions in which the images were taken or to the object itself. Mature root systems, for instance, are complex branched structure, composed of thousands of overlapping (fig. 1B) and crossing linear segments (fig. 1C). These features are likely to impede image analysis and create a gap between the descriptors the data.
Root image descriptors can be separated into two main categories: morphological and geometrical descriptors. Morphological descriptors refer to the shape of the different root segments forming the root system (table 1). They include, among others, the length and diameter of the different roots. For complex root system images, morphological descriptors are difficult to obtain and are prone to error as mentioned above.
Root system parameters used as ground-truth data
Geometrical descriptors give the position of the different root segments in space. They summarize the shape of the root system as a whole. The simplest geometrical descriptors are the width and depth of the root system. Since these descriptors are mostly defined by the outside envelope of the root system, crossing and overlapping roots have little impact on their estimation and they can be considered as relatively errorless. Geometrical descriptors are expected to be loosely linked to the actual root system topology, as identical shapes could be reached by different root systems (the opposite is true as well). They are usually used in genetic studies, to identify genetic bases of root system shape and soil exploration.
Several automated analysis tools were designed in the last few years to extract both type of descriptors from root images (Armengaud et al., 2009; Bucksch et al., 2014; Galkovskyi et al., 2012; Pierret et al., 2013). However, the validation of such tools is often incomplete and/or error prone. Indeed, for technical reasons, the validation is usually performed on a small number of ground-truthed images of young root systems for which most analysis tools were actually designed. In the few cases where validation is performed on large and complex root systems, it is usually not on ground-truthed images, but in comparison with previously published tools (measurement of X with tool A compared with the same measurement with tool B). This might seem reasonable approach regarding the scarcity of ground-truthed images of large root systems. However, the inherent limitations of these tools, such as scale or plant type (monocot, dicot) are often not known. Users might not even be aware that such limitations exist and apply the provided algorithm without further validation on their own images. This can lead to unsuspected errors in the final measurements.
One strategy to address the lack of in-depth validation of image analysis pipeline would be to use synthetic images generated by structural root models (models designed to recreate the physical structure and shape of root systems). Many structural root models have been developed, either to model specific plant species (Pagès et al., 1989), or to be generic (Pagès et al., 2004; 2013). These models have been repeatedly shown to faithfully represent the root system structure (Pagès and Pellerin, 1996). In addition, they can provide the ground-truth data for each synthetic root system generated, independently of its complexity. However, except one recent tool designed for young seedlings with no lateral roots (Benoit et al., 2014). they have almost never been used for validation of image analysis tools (Rellán-Álvarez et al., 2015). A
Here we (i) illustrate the use of a structural root model, Archisimple, to systematically analyse and evaluate an image analysis pipeline and (ii) evaluate the usefulness of different root metrics commonly used in plant root research.
Material and methods
Nomenclature used in the paper
Ground-truth data
The real (geometric and morphometric) properties of the root system as a biological object. Determined by either manual tracing of roots or by using the output of modelled root systems.
(Image) Descriptor
Property of the root image. Does not necessarily have a biological meaning.
Synthetype
For each simulation, a parameter set is defined randomly. Then, 10 root systems are created. Since the model has an intrinsic variability, each of these root system is slightly different from the others, although similar, forming what we called a synthetic genotype, or synthetype.
Root axes
first order roots, directly attached to the shoot
Lateral root
second (or lower) order roots, attached to an other root
Creation of a root system library
We used the model ArchiSimple, which was shown to allow generating a large diversity of root systems with a minimal amount of parameters (Pagès et al., 2013). In order to produce a large library of root systems, we ran the model 10.000 times, each time with a random set of parameters.
The simulations were divided in two main groups: monocots and dicots. For the monocot simulations, the model generated a random number of first-order axes and secondary (radial) growth was disabled. For dicot simulations, only one primary axis was produced and secondary growth was enabled (the extend of which was determined by a random parameter). For all simulation, only first order laterals were created, to limit complexity.
The root system created from each simulation was stored in an RSML file. Each RSML file was then read by the RSML Reader plugin from ImageJ to extract metrics and generate ground-truth data for the library (Lobet et al., 2015). These ground-truth data included geometrical, morphological and topological parameters (table 1). For each RSML data file, the RSML Reader plugin also created a PNG image (at a resolution of 300 DPI) of the root system.
Root image analysis
Each generated image was analysed using a custom-made ImageJ plugin, Root Image Analysis-J (or RIA-J). The source code of RIA-J, as well as a compiled version is available at the address: https://zenodo.org/record/61509.
For each image, we extracted a set of classical root image descriptors, such as the total root length, the projected area or the number of visible root tips. In addition, we included shape descriptors, such as pseudo-landmarks, or a-dimensional metrics such as the exploration ratio, of the width proportion at 50% depth (see Supplemental file 1 for details about the shape descriptors). The list of metrics and algorithms used by our pipeline is listed in the table 2.
Data analysis
Data analysis was performed in R (R Core Team). Morphometric analyses were performed using the momocs (Bonhomme et al., 2014) and shapes (Dryden, 2015)packages. Plots were created using ggplot2 (Wickham, 2009) and lattice (Sarkar, 2008).
The Relative Root Square Mean Errors (RRSME) were estimated using the equation:
where n is the number of observations,
is the mean and yi is the estimated mean.
The Linear Discriminant Analysis (LDA) was performed using the lda function from the MASS package (M and D, 2002). For each analysis, we used the synthetype information as grouping factor. We used half of the samples (5) of each synthetype to build the model and the other half to assess the discriminant power of the each class of metrics (morphology and shape).
Data availability
All data used in this paper (including the image and RSML libraries) are available at the address https://zenodo.org/record/61739
An archived version of the codes used in this paper is available at the address https://zenodo.org/record/152083
Results and discussions
Production of a large library of ground-truthed root system images
We combined existing tools into a single pipeline to produce a large library of ground-truthed root system images. The pipeline combines a root model (ArchiSimple (Pagès et al., 2013)), the Root System Markup Language (RSML) and the RSML Reader plugin from ImageJ (Lobet et al., 2015). In short, ArchiSimple was used to create a large number of root systems, based on random input parameter sets. Each output was stored as an RSML file (fig. 2A), which was then used by the RSML Reader plugin to create a graphical representation of the root system (as a .jpeg file) and a ground-truth dataset (fig. 2B). Details about the different steps are presented in the Materials and Methods section.
Overview of the workflow used in this study. A. Generation of root systems using ArchiSimple. B. Creation and analysis of root images.
We used the pipeline to create a library of 10,000 root system images, separated into monocots (multiple first order roots and no secondary growth) and dicots (one first order root and secondary growth). For each input parameter-set used for ArchiSimple (1.000 different ones), 10 repetitions were performed to create synthetic genotypes, or synthetypes (fig. 2A). The synthetype repetitions were done such as the structure of the final dataset would mimic the structure of a dataset containing phenotypic data of different genotypes. The ranges of the different ground-truth data are shown in table 2 and their distribution is shown in the Supplemental Figure 1. The pipeline produced perfectly thresholded black and white images and hence the following analyses were focused on the characterisation of the root objects themselves.
We started by evaluating whether monocots and dicots should be separated during the analysis. We performed a Principal Component Analysis on the ground-truthed dataset to assess if the species grouping had an effect on the overall dataset structure (fig. 3A). Monocots and dicots formed distinct groups (MANOVA p-value < 0.001), with only minimal overlap. The first principal component, that represented 33.2% of the variation within the dataset, was mostly influenced by the number of primary axes. The second principal component (19.6% of the variation) was influenced, in part, by the root diameters. These two effects were consistent with the clear grouping of monocots and dicots, since they expressed the main difference between the two species. Therefore, since the species grouping had such a strong effect on the overall structure, we decided to analyse them separately rather than together for the following analyses.
A. Principal Component Analysis of the root ground-truth dataset. Images of the selected root systems have been added for illustration. B. Loadings of the Principal Component Analysis.
Ranges of the different ground-truth data from the root systems generated using ArchiSimple
Systematic evaluation of root image descriptors
In order to demonstrate the utility of a synthetic library of ground-truthed root systems, we analysed every image of the library using a custom-built root image analysis tool, RIA-J. We decided to do so because our purpose was to test the usefulness of the synthetic analysis and not to assess the accuracy of existing tools. Nonetheless, RIA-J was designed using known and published algorithms, often used in root system quantification. A detailed description of RIA-J can be found in the Materials and Methods section.
We extracted 16 descriptors from each root system image (Table 3) and compared them with their own ground-truth data. For each pair of descriptor/data, we performed a linear regression and computed its r-squared value. Figure 4 shows the results from the different combinations for both monocots and dicots. We can observe that, as a general rule, good correlations were rare, with only 3% of the combinations having an r-squared above 0.8. In addition, even a good correlation is not necessarily directly useful as the relationship between the two variables might not follow a 1:1 rule (fig. 4B-C). In such case, an additional validation might be needed to define the relation between both variables.
Root image descriptors extracted by RIA-J
A. Heatmap of the r-squared values between the different image descriptors and the ground-truth values. Black represents an r-squared value of 1; white represents a value of 0. Upper panel: dicot dataset. Lower panel: monocot dataset. B & C. Details of the regressions. The plain black line represent the fitted regressions while the dotted line represents the 1:1 relationship.
It also has to be noted that the correlations were different between species. As an example, within the dicot dataset, no good correlation was found between the tip_count and diam_mean estimators while better correlation was found for the monocots. As a consequence, validation of the different image analysis algorithms should be performed, at least, for each group of species. An algorithm giving good results for a monocot might fail when applied on dicot root system analysis.
Errors from image descriptors are likely to be non linear
In addition to being related to the species of study, estimation errors are likely to increase with the root system size. As the root system grows and develops, the crossing and overlapping segments increase, making the subsequent image analysis potentially more difficult and prone to error. However, a systematic analysis of such error is seldom performed.
Figure 5 shows the relationship between the ground-truth and descriptor values for three parameters: the total root length (fig. 5A), the number of roots (fig. 5B) and the root system depth (fig. 5C). For each of these variables, we quantified the Relative Root Mean Square Error (see Materials and Methods for details) as a function of the total root length. We can observe that for the estimation of both the total root length and the number of lateral roots, the Relative Root Square Mean Error increased with the size of the root system (fig. 5A-B). As stated above, such increase of the error was somehow expected with increasing complexity. For other metrics, such as the root system depth, no errors were expected (depth is supposedly an error-less variable) and the Relative Root Mean Square Error was close to 0 whatever the size of the root system.
Error estimation for three ground-truth parameters. Left panel shows the relationship between the descriptors and the corresponding ground-truth variables. Right panels show the evolution of the Relative Root Mean Square Error (RRSME) as a function of the ground-truth variable. For the RRSME calculations, the continuous variables were discretised in groups. A. Total root length. B. Number of lateral roots. C. Root system depth
Such results are a call to caution when analysing root images as unexpected errors in descriptors estimation can arise. This is probably even more true with real images, that are susceptible to contain non-root objects (e.g. dirt) and lower order laterals roots (as stated above, simulations used here were limited to first order laterals).
Differentiation power differs between metrics
Finally, we wanted to evaluate which metrics were the most useful to discriminate between root systems of different genotypes or experimental series (control vs treatment). As explained above, for each parameter set used in the ArchiSimple run for library construction, we generated 10 root systems. Given the intrinsic variability existing in the model, each of these 10 root systems were similar although different, as could be expected from plants of the same genotype. These so-called synthetypes, were then used to evaluate how efficient were the different metrics to discriminate them.
To estimate the differentiation of the image metrics, we used a Linear Discriminant Analysis (LDA) prediction model. For each synthetype, half of the plants were used to create the LDA model. The model was then used to predict a synthetype for the remaining half of the plants. This approach allowed us to evaluate the prediction accuracy, or differentiation power, of the different metrics. A prediction accuracy of 100% means that all plants were correctly assigned to their synthetype. To evaluate the differentiation power of single metrics, we used an approach in which each metric was iteratively added to the model, based on the model global prediction power (see Supplemental Figure 3 for details about the procedure). We performed the analysis either on a full dataset (fig. 6D-E), or on a data restrictedto the smallest plants (fig. 6A), in order to test the influence of the underlying data structure.
A. Distribution of the simulated root system depths. The dotted line represent the threshold used to generate a partial dataset. B-E. Evaluation of the differentiation power of the different image metrics. Each point corresponds to the prediction accuracy of the Linear Discriminant Analysis model, with an increasing number of metrics included. The abscise axis represents the variables iteratively added to the model. B-C. Prediction accuracy for a partial dataset. D-E: Prediction accuracy for the full dataset. B-D: Dicots. C-E: Monocots. Empty dots represent morphological metrics. Plain dots represent geometrical metrics. The dotted line represent the 90% precision accuracy threshold.
Two main observations can be made on the figure 6. First, for three out of four scenarios, only 5 (or less) descriptors were needed to achieve a differentiation accuracy of 90%. Depth, area and length were the most important descriptors in almost all scenarios. The remaining descriptors did not increase significantly the accuracy (some might even decrease it). This might be interpreted as a handful of variables were sufficient to distinguish synthetypes, and by extension genotypes or treated plants. However, we can also observe that the most important parameters changed depending on the underlying data structure (either due to species or the size of the dataset). This indicates that it is difficult to have an a priori evaluation of the important variables. Keeping as many variable a possible might always be the most efficient solution.
Conclusions
The automated analysis of root system images is routinely performed in many research projects. Here we used a library of 10.000 modelled images to estimate the accuracy and usefulness of different image descriptors extracted with an home-made root image analysis pipeline. The analysis highlighted some important limitations during the image analysis process.
Firstly, general structure of the root system (e.g monocot vs dicots) can have a strong influence on the descriptors accuracy. Descriptors that have been shown to be good predictors for one type of root systems might fail for another type. In some cases, the calibration and the combination of different descriptors might improve the accuracy of the predictions, but this needs to be assessed for each analysis.
A second factor influencing strongly the accuracy of the analysis is the root system size and complexity. As a general rule, for morphological descriptors, the larger the root system, the larger the error is. So far, a large proportion of the root research has been focused on seedlings with small root systems and have de facto avoided such errors. However, as the research questions are likely to focus more on mature root system in the future, these limitations will become critical.
Finally we have shown that not all metrics have the same benefit when comparing genotype or treatments. Again, depending on the root system type or size, different metrics will have different differentiation powers.
It is important to highlight that the images used in our analysis were perfectly thresholded, without any degradation in the image quality. Therefore, the errors computed in our analysis are likely under-estimated compared to real images (with additional background noise and lesser quality). Since the quality of the images is dependent on the underlying experimental setup, artificial noise could be added to the generated images in order to mimic any experimentally induced artifact and to improve the analysis pipeline evaluation, as proposed by (Benoit et al., 2014).
To conclude, our study is a reminder that thorough calibrations are needed for root image analysis pipelines. Here we have used a large library of simulated root images, that we hope will be helpful for the root research community to evaluate current and future image analysis pipelines.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Author Contributions
GL, LP, PT and CP designed the study. IK developed the image analysis pipeline RIA-J. GL generated the image library, did the image analysis and data analysis. LP developed the ArchiSimple model. All authors have participated in the writing of the manuscript.
Funding
This research was funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office, P7/29. GL is grateful to the F.R.S.-FNRS for a postdoctoral research grant (1.B.237.15F).