A database of egg size and shape from more than 6,700 insect species

Offspring size is a fundamental trait in disparate biological fields of study. This trait can be measured as the size of plant seeds, animal eggs, or live young, and it influences ecological interactions, organism fitness, maternal investment, and embryonic development. Although multiple evolutionary processes have been predicted to drive the evolution of offspring size, the phylogenetic distribution of this trait remains poorly understood, due to the difficulty of reliably collecting and comparing offspring size data from many species. Here we present a database of 10,449 morphological descriptions of insect eggs, with records for 6,706 unique insect species and representatives from every extant hexapod order. The dataset includes eggs whose volumes span more than eight orders of magnitude. We created this database by partially automating the extraction of egg traits from the primary literature. In the process, we overcame challenges associated with large-scale phenotyping by designing and employing custom bioinformatic solutions to common problems. We matched the taxa in this database to the currently accepted scientific names in taxonomic and genetic databases, which will facilitate the use of this data for testing pressing evolutionary hypotheses in offspring size evolution.

Insect eggs come in an incredible diversity of shapes and sizes 7,8 . The thousands of egg descriptions in the ento-48 mological literature, however, have never to our knowledge been systematically compiled across insects. Without a 49 comparison of egg sizes across insects, we cannot ascertain basic information such as the extant range of insect egg 50 sizes, or the relationship between size and ecology or development. To address this problem, we created a database of 51 quantitative parameters describing egg morphology from the entomological literature. All data were collected from 52 published records, including both measurements reported in text descriptions of insect eggs, as well as our own new 53 measurements of published images. We developed custom software that allowed us to collect data from thousands 54 of publications efficiently and reproducibly ( Figure 1). We provide this software as a set of tools that can assist other 55 scientists in collecting phenotypic data from the literature (see Methods). 56 Using this software we extracted egg descriptions from 1,756 publications from the past 250 years ( Table 1). The database has 10,449 entries representing every extant order of insects, and 6,706 unique insect species. The insect 58 egg database includes descriptions of egg size and shape (Table 2), and the scientific name of each entry has been ately previewed on the Harvard library system, does it contain an egg measurement in the text or an egg image 93 with a scale bar? [3] If the publication could not be immediately previewed, does the title or abstract refer to  103 The egg traits in the database are listed in we measured both the straight and curved length of the egg (for those eggs that are curved), but for all analyses and 123 figures, we used the straight length of the egg to maximize consistency with published records.

124
Width and breadth: To resolve ambiguous cases, and when measuring egg features from images, we defined width 125 as the widest diameter (mm) of the rotationally symmetric axis of the egg. For some insect groups this axis is referred 126 to in the literature as diameter 17 or breadth 20 . For eggs described in published records as having a length, width, and 127 breadth or depth (i.e., the egg is a flattened ellipsoid 21 ), we considered width as the wider of the two diameters, and 128 breadth as the diameter perpendicular to both width and length. For published images with a scale bar, we measured 129 width as the widest of the three egg diameters at the first quartile, midpoint, and third quartile of the length axis. 130 We did not measure breadth from published images.

131
Volume: Volume (mm 3 ) was calculated using the equation for the volume of an ellipsoid, following previous 132 studies 22,23 . The formula is 1 6 πlwb, with l, w, and b as length, width, and breadth, respectively. This simplifies to 133 1 6 πlw 2 when the egg is rotationally symmetric. For records in which the volume was reported but egg length and 134 width were not, we used the reported volume. For all other entries, we recalculated volume from the measurements 135 in the text and from measurements of images published with a scale bar.

136
Aspect ratio: We calculated aspect ratio as the ratio of length to width. An aspect ratio of one corresponds to a 137 spherical egg. An aspect ratio less than one corresponds to an egg that is wider than long (oblate ellipsoid). An aspect ratio greater than one corresponds to an egg that is longer than it is wide (prolate ellipsoid). Analyses testing the sensitivity of our measurement software (see "Assessing the accuracy of image measuring software" below) for egg 140 images indicated that the variance in measured aspect ratio increases sharply when aspect ratio is much higher than 141 typical (Table 3). Therefore we excluded the eggs in the top 0.1 percentile of aspect ratio from the final database. We 142 recorded the aspect ratio from images published with or without a scale bar, as aspect ratio is a scale-free attribute.

143
Asymmetry: We defined asymmetry as max(q 1 ,q 3 ) min(q 1 ,q 3 ) − 1, where q 1 and q 3 are the egg diameters at the first and third 144 quartile of the curved length axis. Therefore an egg with an asymmetry of zero has quartile diameters with equal 145 length. Baker's λ value, used to measure asymmetry in bird eggs 24 , can be converted to the asymmetry parameter 146 used in the present study. Analyses testing the sensitivity of our image measuring software (see "Assessing the 147 accuracy of image measuring software" below) indicated that the variance increases sharply near the extreme high 148 values of asymmetry (Table 3). We therefore excluded the eggs in the top 0.1 percentile of asymmetry from the final 149 database. Asymmetry was only recorded from published egg images. the curvature and aspect ratio are low (Table 3). We therefore did not calculate curvature for eggs with an aspect 154 ratio of one or less. Angle of curvature was only recorded from published egg images.  (Table 1). The user places points S1 and S2 at the ends of the scale bar. F Collected measurements from this image are as follows: Length is the distance from L1 to L2. Asymmetry is the ratio of the larger distance among q1 and q2 to the smaller. Angle of curvature is calculated as the angle formed by points L1, L2 and the midpoint of q2. Width is the longest distance between q1, q2, and q3. Aspect ratio is the ratio of length to width. See Table 2 for additional details.
type (e.g. light micrograph, scanning electron micrograph, drawing). However, images of low quality were excluded 173 by manually evaluating cases where landmarks could not be placed unambiguously.  (Table 3).  Table 3.    206 We assessed intraspecific variation in egg size descriptions using four methods:

207
First, for database entries that reported egg size variation (e.g. egg descriptions that included a range of egg length or 208 an average egg length with deviation), the percent difference in egg size was calculated as follows: for egg descriptions 209 recorded as ranges, percent difference was calculated as 100 * max l−min l medianl ; for egg descriptions recorded as average 210 and deviations, percent difference was calculated as 100 * (2 * deviation) meanl .

211
Second, independent observations of a single species were identified as two entries for the same species that differed 212 in the calculated volume by more than 1.0 * 10 −5 mm 3 . This excluded entries that were repeated publications of 213 the same description, such as an observation repeated in a subsequent review ( Table 1). The percent difference in 214 egg length was calculated as 100 * max l−min l medianl .

215
Third, for entries that had both a text description of egg length as well as a published image with a scale bar, the 216 difference in the reported egg length and our re-measurement of the image was assessed. The percent difference 217 between these two measurements was calculated as 100 * max l−min l medianl .

218
Fourth, for eggs that were measured as triaxial ellipsoids (length, width, and breadth measured all separately), the 219 percent difference was calculated from the change in egg volume if the egg had been assumed to be a rotationally 220 symmetric ellipsoid (volume = 1 6 πlwb vs volume = 1 6 πlw 2 ). Given that more eggs are likely triaxial ellipsoids than 221 are reported in the egg database, this metric gives insight into the variation in egg volume that might be masked 222 when only two dimensions are reported. The distribution of precision in the insect egg database was assessed using two metrics. First, the number of decimal 225 places used in the length measurement was calculated for each database entry from a base of millimeters (e.g. '1 mm' 226 has 0 decimal places, while '1.00 mm' has 2 decimal places). Second, the relative precision of each measurement was calculated by dividing the total length of the egg by the smallest unit used to measure it, and multiplying this value 228 by 100. This gives the percent of egg length captured by the unit of measurement (i.e. an egg measured as 1.00 mm 229 was measured within 1% of egg length). The accuracy of the image measuring software was assessed using an array of 24 simulated egg silhouettes with 250 known combinations of parameter values (Figure 4). We found that as the actual angle of curvature increases, 251 the difference between the actual and measured values increases (that is, the software underestimates the angle of 252 curvature), and this difference is larger in eggs with lower aspect ratio and higher asymmetry ( Table 3). As the actual 253 asymmetry increases the variance in measured asymmetry increases, and in eggs with low aspect ratio this results in 254 an overestimation of asymmetry. As the actual aspect ratio increases, the software overestimates the total aspect ratio by up to 0.75 (12.5% of the total aspect ratio). Given these results we removed eggs in the top 0.1 percentile of 256 values for asymmetry and aspect ratio when creating the final database.

257
Intraspecific variation in insect egg size was assessed using four metrics (see Methods section "Assessing intraspecific 258 variation"). The first two describe the percent difference in egg size reported in the literature, either as variation 259 recorded in an egg description ( Figure 4A), or as variation recorded across multiple independent observations of 260 eggs from the same species ( Figure 4B). In both cases the percent difference in egg length averaged 10% and ranged 261 from 1% to 100% (i.e., for an insect species with an average egg length of 1 mm, it was common to observe eggs from 262 0.9 to 1.1 mm and occasional outliers at 0.5 and 2 mm.

263
Additionally we re-measured published images of eggs and calculated the percent difference between our measure-264 ments and the text description ( Figure 4C). The variation between observations of the same species was consistent 265 with the reported intraspecific variation (average around 10%).

266
Although the majority of eggs in the database are described as rotationally symmetric ellipsoids (Table 1), for a 267 few clades of insects it is common to measure eggs as triaxial ellipsoids, with length, width, and breadth measured 268 separately (Table 2). Calculating the egg volume using two different methods -one taking into account breadth, 269 and the other assuming rotational symmetry -showed that the percent difference in calculated volume ranges 270 between 10% and 100% ( Figure 4D). Eggs from additional clades might be more accurately modeled as triaxial 271 ellipsoids than currently reported in the literature, but this percent difference likely represents the upper range of 272 the error in volume, because the clades typically measured as triaxial ellipsoids are those that are most obviously 273 flattened along one axis.

274
The text descriptions in the insect egg database were extracted from a diverse set of sources published over hundreds 275 of years, and the precision used to measure eggs varies across these sources (Figure 4). Most entomologists measured 276 eggs in tenths or hundredths of a millimeter ( Figure 4E). In terms of the total length of the egg, most measurements 277 in the database are precise to within 1% to 10% ( Figure 4F). Given that intraspecific variation is also around 10% of 278 total egg length, it is likely that some of this variation is due to measurement error.

279
The egg database contains descriptions of eggs from every insect order and from hundreds of insect families (Table   280 1). Given that the number of species varies greatly across taxonomic ranks we assessed the phylogenetic coverage of 281 the egg database ( Figure 4G, H). We found that families and orders with the highest number of estimated species are 282 represented by the greatest number of entries in the egg database. Additionally, most families in the egg database 283 have more than 1 entry per 100 species.

284
There are several orders represented in the database by fewer than ten entries ( Figure 4H). We suggest that this is     Table 3: Results of image measurement software accuracy assessment. Mean discrepancy calculated as the average difference between the actual and measured values, n = 5.