Cost-effective, high-throughput phenotyping system for 3D reconstruction of fruit form

Reliable phenotyping methods that are simple to operate and in-expensive to deploy are critical for studying quantitative traits in plants. Traditional fruit shape phenotyping relies on human raters or 2D analyses to assess form, e.g., size and shape. Systems for 3D imaging using multi-view stereo have been implemented, but frequently rely on commercial software and/or specialized hardware, which can lead to limitations in accessibility and scalability. We present a complete system constructed of consumer-grade components for capturing, calibrating, and reconstructing the 3D form of small-to-moderate sized fruits and tubers. Data acquisition and image capture sessions are 9 seconds to capture 60 images. The initial prototype cost was $1600 USD. We measured accuracy by comparing reconstructed models of 3D printed ground truth objects to the original digital files of those same ground truth objects. The R2 between length of the primary, secondary, and tertiary axes, volume, and surface area of the ground-truth object and the reconstructed models was > 0.97 and root-mean square error (RMSE) was <3mm for objects without locally concave regions. Measurements from 1mm and 2mm resolution reconstructions were consistent (R2 > 0.99). Qualitative assessments were performed on 48 fruit and tubers, including 18 strawberries, 12 potatoes, 5 grapes, 7 peppers, and 4 Bosch and 2 red Anjou pears. Our proposed phenotyping system is fast, relatively low cost, and has demonstrated accuracy for certain shape classes, and could be used for the 3D analysis of fruit form.


Introduction 1
Fruit appearance is a key trait for many crops and can condi-2 tion market viability of fruit products and the success of cul-3 tivars (1-3). Taken together, the shape and color, or appear-successfully implemented to measure the shape and size of 19 fruits such as strawberries (4, 12), apples (5), carrot (6, 14), 20 mangoes (28), and many others. More recently, methodolo-21 gies for 3D reconstruction of plant organs have been devel- 22 oped with approaches that vary in speed, scale, cost, and ac-23 curacy; including laser scanners, x-ray computed tomogra-24 phy, and reconstruction from sequences of 2D images from 25 digital cameras (8, [29][30][31][32][33][34][35][36][37][38][39][40][41]. Methods that rely on sequences 26 of 2D images are numerous and variable with their own com-27 plexities and nuances that provide different strengths and 28 weaknesses(8, 27,37,[40][41][42][43][44]. 29 Modern technologies and analyses can be used to assess these 30 physical characteristics and ultimately provide researchers 31 with the tools necessary to support genetic inquiries and bi-32 ological discoveries, expand what is known about modern 33 germplasm, and enhance breeding practices in fruit and veg-34 etable crops (4, 6, 8, 45-52). Multivariate and spatial statis-35 tics can be used to determine parameters that identify and 36 quantify fruit defects (53), differentiate between marketable 37 and non-marketable fruit (12,50), and understand fruit phe-38 notypes that impact markets requiring long shelf-life and sus-39 tained fruit quality through harvesting, handling, and ship-40 ping. 41 This paper describes a rapid (9 s), low-cost ($1,600), 42 turntable-type system for 3D reconstruction of fruit and tu-43 bers. Fruit rotates on an automated pedestal while a remote-44 controlled digital camera acquires images, as shown in Figure 45 1. We use a multi-camera calibration method (54) to compute 46 the calibration parameters of the camera at every time step. 47 Fruit are segmented from non-fruit regions in the images. Fi- 48 nally, a reconstruction method using silhouettes as features 49 (55) reconstructs the fruit or vegetable shape using the cali-50 bration and silhouette information (Figure 2). a stepper motor with a metal pedestal, chArUco tagged cubes, and a target object, a strawberry; the 80/20 t-slotted aluminum inverted T (⊥) frame; a digital camera is mounted on the vertical limb of the frame and attached to a PocketWizard Multi-Max II radio transceiver; reverse facing LED light sources; Arduino microcontroller connected to stepper motor and power supply. Best viewed in color.

Cameras and controllers.
We used one Sony α6000 mirror-116 less digital camera for this project. The camera was set to 117 medium speed continuous image capture ( 7 frames per sec-118 ond), manual focus, and aperture priority mode with the aper-119 ture set to f/8. We controlled the camera with a PocketWiz-120 ard MultiMax II transceiver unit. These units attach to the 121 camera's multi-port and digitally control the camera's shutter 122 button and can be programmed to "hold" the shutter button 123 to allow for variable duration. We used 9 seconds of hold 124 to match the rotation rate of our stepper motor. Two Pock-125 etWizard MultiMax II units are required: one unit to transmit 126 a signal and one unit to receive a control signal for multiple 127 cameras. With these, the camera is controlled from a single 128 source which is triggered by the input of a barcode scanner.

129
Microcomputers and stepper motor. The data acquisition 130 process consists of rotating the fruit on the pedestal and ac-131 quiring images of that fruit. To automate this process, we 132 used a Raspberry Pi 3 microcomputer as well as an Ar-133 duino Uno Rev3 microcontroller. To rotate the objects on 134 the pedestal, we used a Nema 17 stepper motor controlled us-135 ing an Arduino Uno Rev3 and an Arduino Rev3 motor shield. 136 The pedestal is a thin metal rod approximately 20cm in length 137 and 5mm in diameter. The Nema 17 stepper motor has 200 138 steps per rotation (1.8º per step). The motor is programmed 139 to take 1 step every 45ms, which is a full rotation every 9 140 seconds.

141
Lighting. We used 4 LED lamps to illuminate the scene. 142 These lights are all directed away from the object towards 143 a reflective white sheet to reduce the intensity of the light 144 on the scene. This enabled us to dramatically reduce, and in 145 some instances eliminate, the glare on the surface of more 146 reflective objects such as strawberries. The lights chosen do 147 not have any temperature control and are likely not ideal for 148 color accurate measurements. Calibration Targets. Calibration is performed on image data 150 that also contains the data for reconstruction. To accomplish 151 one-step calibration and data acquisition, the workspace is 152 prepared with calibration targets, which are shown in Figure   153 1. The fruit or tuber is mounted on the pedestal. A pair of 154 offset cubes are mounted on the pedestal directly below the 155 fruit or tuber (56 Figure 6C and 6D): , the distributions representing the 244 dark intensities for red, green, and blue channels, and the 245 same for all three channels of the light intensities.

246
Each image pixel x is evaluated against the distributions as 247 in Equation 3. We use a typical background subtraction tech-248 nique in that we subtract the mean and compare with a thresh-249 old; here the threshold is a constant multiplied by the stan-250 dard deviation. The user provides constants k d and k l , and 251 from the distributions computes Boolean values y d and y l 252 for each pixel x. The segmentation result of whether the 253 pixel represents the background (0) or not (1) is stored in 254 z = y l ∧ y d .
In our experiments, k d = 2.0 and k l = 2.5 for all tests.

256
Reconstruction. We used a Shape from Inconsistent Silhou-257 ette (SfIS) method (55) for 3D reconstruction of the plant or-258 gans and ground truth objects. With camera calibration and 259 segmentation or silhouette provided, SfIS is a voxel-based 260 method that searches for a labeling of voxels as occupied 261 or empty such that the voxels match the provided segmen-262 tations. The match does not need to be exact, so some small 263 camera calibration and segmentation errors can be present.

264
A key feature of the SfIS method is that it will not recon-265 struct concavities in 3D space. As examples of these types 266 of shapes, the tetrahedron, sphere, and F ground truth objects 267 ( Figure 7A-7C) can all be reconstructed because they do not 268 contain concavities, while the 6-sided spherical die cannot 269 ( Figure 7D). The stem or calyx region of an apple is also an 270 example of a locally concave region on a surface. The reason 271 that the SfIS method is not able to reconstruct locally con-272 cave regions if because of its dependence on segmentations 273 as features. 274 We use the extension to SfIS of hierarchical search described 275 in (60)  mm. An additional parameter is the factor that the the input 285 image is resized down, that value is 4 for both experiments.

286
The initial image size is 6000 by 4000 pixels. sions, can be difficult to quantify. To assess the accuracy of 298 our system, we selected shapes for which we had 3D model 299 files, printed those files, and then reconstructed the models 300 from image data with the phenotyping system. Through this 301 process we can characterize the performance of our method 302 on reconstructing different shape types with durable objects 303 versus individual fruit measurements, where the fruit decays 304 quickly and the human-made measurement cannot be pre-305 cisely replicated.

306
A motivation for using 3D printed objects is to have a way to 307 quantitatively assess the performance of the phenotyping sys-308 tem, with a durable artifact that can be stored indefinitely and 309 re-printed and/or scaled if needed. Since we have the origi-310 nating 3D model file, we can compare the reconstruction and 311 the ground truth object in ways that human-made measure-312 ments are unable to, by assessing differences in surface area 313 and volume. This is in contrast to measurements on fruits 314 or tubers that will not persist past a single session and may 315 suffer from measurement error. 316 We identified 4 digital objects from Thingiverse (https: 317 //www.thingiverse.com) that had good representa-318 tion of many different shapes that are both common and un-319 common in 3D biological structures, such as fruit and tubers: 320 convex regions, saddle regions, and locally concave regions, 321 shown in Figure 7. We scaled these 4 objects prior to printing 322 so that we would have different size representations. We 3D 323 printed these 11 object×scale stereolithography (STL) for-324 mat files (61-64) using a commercial-grade 3D printer. The 325 3D printed objects were then imaged in our system and re-326 constructed from the 2D images. ground-truth following ICP alignment: where X G,min and X G,max are the minimum and maximum 349 value of the first dimension of the ground-truth object G, re-350 spectively, and X R,min and X R,max are the minimum and 351 maximum value of the first axis of the reconstructed object 352 R, respectively. The Morpho::meshDist() function was used 353 to calculate and visualise distances between 3D objects. The 354 distance of the reconstructed model from the ground truth is 355 summarized using root mean square error (RMSE). RMSE is 356 calculated as: where δ i is the distance between i-th pair of n cor-358 responding points on the surface of the reconstruction 359 and ground-truth objects. The volume and surface area 360 of models was extracted using Lithics3D::mesh_volume() 361 and Lithics3D::mesh_area(), respectively The rgl::shade3d() 362 function was used to visually compare 3D objects. All regres-363 sions were performed using stats::lm().

364
Sample Collection. We purchased fresh fruit and produce 365 from a local grocery store in Davis, CA, USA for qualita-366 tive assessment. In total we purchased, scanned, and recon-367 structed 48 objects; including 18 strawberries, 12 potatoes, 5 368 grapes, 7 peppers, and 4 Bosch and 2 red Anjuo pears. We 369 want to test our approach for robustness, and so chose fruit 370 and produce with different scales, colors, levels of glossiness, 371 and other features.  the models without concavities. We found a strong corre-425 lation R 2 ≈ 0.99 for most measurements between the recon-426 structed models and the ground-truth objects without concav-427 ities (Figure 9). The surface area (SA) R 2 was 0.979 for the 428 models without concavities, and this R 2 value is the lowest 429 value for the traits we examined on objects without concav-430 ities. We found that SA of the reconstructed models were 431 upwardly biased relative to the ground-truth objects (110-432 Table 1. Accuracy metrics, including RMSE, difference in major axis length, and ratios of volume and surface area, from two experiments with eleven ground-truth objects. Differences in mm and ratios are reported between model and ground-truth object. Die_3, Die_6, and Die_12 have local concavities, while the other objects do not.  pressions per die ( Figure 7D). As is clearly shown in Figure   453 8D-F, our reconstructions are more similar to a 3D convex 454 hull, yielding a flat surface over the large concavities in the 455 true models. This is reflected by the rows corresponding to 456 the three die in Table 1. In these cases, the Volume is 115-457 130% greater than the ground-truth model. In general this is 458 not an issue for types of fruit that do not have concavities.

459
Qualitative assessment. We found that our platform and 460 approach to reconstruction is both quantitatively accurate 461 (Table 1; Figure 9), as well as visually accurate in most cases 462 (Figures 2 and 8). For the peppers, grapes, strawberries, and 463 potatoes, we found no systematic errors in reconstruction. 464 However, the Anjou pears were troublesome to segment lead-465 ing to the bottom half of the models being severely deformed. 466 The reason for the segmentation error is the use of a general 467 segmentation approach that worked without extensive tuning 468 for the whole set of samples. However, if one were to have a 469 large batch of objects with particular color features, fine tun-470 ing the user/session specific parameters for segmentation is 471 important for yielding accurate models. Segmentation errors 472 of this severe type appeared in 3 out of 59 objects that we im-473 aged and the rest of the models appear to reflect the physical 474 objects that were imaged.

476
We have described a low-cost ($1,600 USD), high-477 throughput (9 s data acquisition), modular reconstruction sys-478 tem that can be used in lab settings or in the field on a table, 479 with a fast data acquisition speed of 9 seconds per object. We 480 will discuss several design decisions that lead to flexibility.

481
The use of consumer grade materials results in a relatively 482 inexpensive system; multiple systems could be built and in-483 crease sample throughput during high-volume times of the 484 year. This means that larger experiments can be executed 485 enabling more robust studies. Our system is modular, allow-486 ing users with different interests to experiment and explore 487 different cameras, sensors, or lights. This system is easily re-488 paired and replaced if any damages are incurred by the hard-489 ware components.

490
The system has short session times and it only takes 9 sec-491 onds to acquire images on a single sample, regardless of the 492 number of cameras. In fact, we found that were frequently 493 rate limited by the write speed from the cameras on board 494 cache to the SD card. More often than not, the next ob-495 ject was prepped around the same time that the cameras had 496 cleared their on-board cache.

497
This system calibrates the camera from the image data ac-498 quired for the samples. The calibration is an absolute (as op-499 posed to relative one, with an unknown scale factor), so the 500 physical units of voxels are known.

501
Key assumptions and considerations. We highlight 502 some key assumptions of the methodologies used in our sys-503 tem that are important for those considering it for a range of 504 objects not treated in this paper.

505
Shape classes. Users who want to accurately represent lo-506 cally concave shapes -shapes with egg-shaped depressions 507 as demonstrated by the 3, 6, and 12-sided die in our calibra-508 tion objects (Table 1; Figure 9; Figure 7D)-will need to 509 substitute some portions of this system to recover such fea-510 tures. Shape from Silhouette is not able to recover locally 511 concave regions. However, most of the types of objects we 512    Similarly, if the larger fruit are of interest, some additional 521 modifications will be required to stabilize the pedestal during 522 rotation. If these issues are a concern, it may be beneficial 523 to construct a multi camera network that surrounds the tar- be 10mm, when they are really 20mm, all of the models will 538 2x smaller than the real object than they measured. Users 539 should print all calibration targets (aruco or chArUco) and 540 ground-truth samples with a high quality 2D or 3D printer to 541 ensure sharp corners and well-defined edges. Further, users 542 are encouraged to verify the proportions of the printed cali-543 bration targets with high-precision calipers prior to calibra-544 tion.

545
Segmentation quality. Model quality is directly linked to seg-546 mentation quality ( Figure 6A and 6B) as we have mentioned 547 throughout this paper. If an object is only partially seg-548 mented, and this false negative error happens in multiple 549 frames, part of the model may end up distorted or completely 550 missing ( Figure 2F). It is vital, as in any system, that users 551 examine the quality of reconstructions prior to measurement 552 and go back to calibration and segmentation outputs to iden-553 tify the source of errors. In this work, we chose one set of 554 segmentation parameters that performed reasonably well for 555 all objects, but we recommend that users perform segmen-556 tation with parameters optimized for their research samples 557 and imaging conditions.

559
In conclusion, we presented a phenotyping system for cap-560 turing, calibrating, and reconstructing 3D models of small-561 to-moderately sized fruit and tubers. The low-cost and reliance on consumer-grade materials makes it obtainable to 563 almost any program; short session times allows researchers 564 to increase the number of samples per hour, and high accu-565 racy means that the digital representations will yield abso-566 lute measurements on objects that do not degrade over time,