Abstract
Comprising over 15,000 living species, decapods (crabs, shrimp, and lobsters) are the most instantly recognizable crustaceans, representing a considerable global food source. Although decapod systematics have received much study, limitations of morphological and Sanger sequence data have yet to produce a consensus for higher-level relationships. Here we introduce a new anchored hybrid enrichment kit for decapod phylogenetics designed from genomic and transcriptomic sequences that we used to capture new high-throughput sequence data from 94 species, including 58 of 179 extant decapod families, and 11 of 12 major lineages. The enrichment kit yields 410 loci (>86,000 bp) conserved across all lineages of Decapoda, eight times more molecular data than any prior study. Phylogenomic analyses recover a robust decapod tree of life strongly supporting the monophyly of all infraorders, and monophyly of each of the reptant, ‘lobster’, and ‘crab’ groups, with some results supporting pleocyemate monophyly. We show that crown decapods diverged in the Late Ordovician and most crown lineages diverged in the Triassic-Jurassic, highlighting a cryptic Paleozoic history, and post-extinction diversification. New insights into decapod relationships provide a phylogenomic window into morphology and behavior, and a basis to rapidly and cheaply expand sampling in this economically and ecologically significant invertebrate clade.
1. Introduction
Decapod crustaceans, broadly categorized into ‘shrimp’, ‘lobsters’, and ‘crabs’, are embedded in the public consciousness due to their importance as a global food source worth over $24 billion [1]. Several ornamental species are also popular in the pet trade [2,3], and some lobsters and crayfish may be promising models for cancer and aging research [4]. Furthermore, decapods are a major faunal component of a bewildering variety of global habitats, including the open ocean, seafloor vents and seeps, caves, coral reefs, mangroves and estuaries, intertidal mud and sand, freshwater streams and lakes, semi-terrestrial locations, and in symbiosis with other animals (Figure 1). Decapods have diversified over the course of 455 million years resulting in over 15,000 living and 3,000 fossil species recognized in approximately 233 families [5,6]. Despite the economic and ecological significance of the clade, higher-level phylogenetic relationships among decapods have proven recalcitrant.
Representatives of major decapod lineages. (a) Lucifer sp. (Southeast Florida, USA) (Dendrobranchiata); (b) Stenopus hispidus (Komodo, Indonesia) (Stenopodidea); (c) Procaris chacei (Bermuda) (Procarididea); (d) Arctides regalis (Maui, Hawaii, USA) (Achelata); (e) Cherax quadricarinatus (aquarium specimen) (Astacidea); (f) Thor amboinensis complex (Ternate, Maluku Islands, Indonesia) (Caridea); (g) Axiopsis serratifrons (Bali, Indonesia) (Axiidea); (h) Stereomastis sculpta (specimen ULLZ 8022) (Polychelida); (i) Upogebia cf. pusilla (Arcachon Bay, France) (Gebiidea); (j) Emerita talpoida (Westerly, Rhode Island, USA) (Anomura); (k) Pachygrapsus crassipes (Catalina Island, California, USA) (Brachyura). Photo credits: (a) L. Ianniello; (b) A. Vasenin, license CC-BY-SA; (c) T.M. Iliffe; (d, k) J. Scioli; (e) C. Lukhaup; (f) C.H.J.M. Fransen; (g) A. Ryanskiy; (h) D.L. Felder; (i) X. de Montaudouin; (j) J.M. Wolfe.
The majority of work is restricted to studies using morphology [7–9], up to nine targeted mitochondrial and nuclear genes [6,10–19], and more recently complete mitogenomes of 13 genes [20–24]. Mitogenomic data can be problematic for reconstructing ancient nodes [25], and indeed, deeper relationships receive poor support [24]. As part of a larger analysis, decapods were included in a recent transcriptomic study [26], but with limited taxon sampling within the order. This plurality of results, several based on the same underlying data [25], have reported conflicting deep relationships among decapods. Without a robust phylogeny, comparative inferences about morphology, development, ecology, and behavior are limited.
Herein, phylogenomic sequencing of nuclear genes is leveraged for the first time in decapods, using anchored hybrid enrichment (AHE), a technique previously applied to vertebrates [27], plants [28], and clades of terrestrial arthropods that have diverged at least 100 Myr more recently than decapods [29–32]. Anchored Hybrid Enrichment specifically targets conserved coding regions that are flanked by less conserved sequence regions, with the goal of optimizing phylogenetic informativeness at multiple levels of divergence [27]. Unlike popular transcriptomic approaches, AHE does not require fresh or specially preserved tissues (critical for sampling the diversity of decapods, since many lineages are rare, confined to the deep sea, and/or have complicated life histories). Instead, AHE allows the use of ethanol-preserved specimens; however prior genomic and/or transcriptomic data are required to determine genomic target regions.
Here we combine new genomic and transcriptomic sequences to build AHE probes spanning all of Decapoda, ultimately sequencing 86 species and 7 outgroups. The enrichment kit we constructed can easily be used by the systematics community for future studies of decapod evolution. Ours is the first example of a strongly supported phylogenomic analysis including almost all major decapod lineages sequenced for over 400 loci, the largest dataset yet compiled for this group. With the inclusion of 19 vetted fossil calibration points, we also present the first divergence time analysis incorporating a well-supported topology for the entire decapod clade.
2. Methods
(a) Probe design
Target AHE loci were identified using our previous workflows (e.g. [29,31,32]; Figure S1) at the FSU Center for Anchored Phylogenomics (www.anchoredphylogeny.com). Targets were based on genomic resources from 23 decapod species (Table S1), including nine newly sequenced genomes (~6-31x coverage; Table S2) and four newly sequenced transcriptomes (Table S3-S4). Details of genome and transcriptome sequencing in Extended Methods 1a-c. Best-matching reads were identified in the two highest-recovery taxa (Table 1, RefsA), as well as reference sequences from the red flour beetle Tribolium castanaeum [29,32], resulting in 823 preliminary AHE target sequences. As in Hamilton et al. [29], we screened exemplar transcriptomes from five major decapod lineages (Table 1, RefsB) for the best-matching transcript, then aligned in MAFFT v7.023 [33], requiring representation in at least four of the lineages and resulting in 352 final targets. Additional details in Extended Methods 1d.
Genomes and transcriptomes used for preliminary probe design.
We used additional genomic resources (Table S1) to build alignments from six major lineages representing the diversity of decapods (Achelata, Anomura, Astacidea, Brachyura, Caridea, Dendrobranchiata). Raw reads from these species were mapped to the references above and used to extend probes into flanking regions [29]. For each combination of locus and major lineage, an alignment containing all recovered sequences (and lineage-specific reference sequence) was created in MAFFT. As each alignment contained at least one sequence derived from a genome, we were able to identify and mask intronic and repetitive regions, the latter identified based on the best-matching genomic region in the published red cherry shrimp (Neocaridina denticulata) genome [34]. Probes were tiled at 4x density across all sequences in each alignment and divided into two Agilent SureSelect XT kits (Table S5).
(b) AHE sequencing and dataset assembly
From the Florida International Crustacean Collection (FICC), 89 species of decapods and seven additional crustaceans were selected for AHE sequencing. High molecular weight DNA was extracted from abdominal tissue, gills, or legs using the DNeasy Blood and Tissue Qiagen Kit following manufacturer’s protocol. A post-extraction RNase Treatment was performed on all samples to remove RNA contamination. AHE libraries were prepared from DNA extracts from 94 species (Table S6) at the FSU Center for Anchored Phylogenomics, following Lemmon et al. [27]. Libraries with 8 bp indices were combined in pools of 16 samples prior to enrichment with the Agilent SureSelect XT kits, then combined into two pools of 48 samples and sequenced in a single Illumina HiSeq2500 lane with 2x150 paired end reads, which were sanitized using Trim Galore! v0.4.0 [35]. Due to high divergence across Decapoda, we screened resulting AHE data for single-copy exons in the reference genome of the Chinese mitten crab, Eriocheir sinensis [36]. A total of 675 exons (L1-L675) were identified with ~40% coverage across AHE sequenced taxa. The E. sinensis amino acid sequences for these 675 loci were added to our data set and used as a reference locus set for Iterative Baited Assembly (IBA) and orthology screening following Breinholt et al. [31], except where noted in Extended Methods 1f-g.
(c) Phylogenomics
The main data matrix used for phylogenetic analysis comprised 410 loci with at least 60% of the taxa represented in each locus (Figure S2). We analyzed both amino acid and nucleotide datasets, as nucleotides have been shown to support shallow relationships [37] and may be robust to differences among optimality criteria [38]. Analyzed matrices are summarized in Table 2. We inferred phylogenetic relationships using several methods, fully detailed in Extended Methods 1i-j. Bayesian inference was conducted with PhyloBayes v3.3f [39] using the site-heterogeneous CAT-GTR + G substitution model (only used for amino acid matrices). Maximum-likelihood analyses used IQ-TREE v1.6.3 [40] on 149 best-fitting partitions identified by PartitionFinder [41]. Coalescent (‘species tree’) methods were applied to investigate the role of incomplete lineage sorting, with maximum-likelihood gene trees inferred in IQ-TREE as inputs to estimate the species tree in ASTRAL-III v5.6.1 [42].
Data matrix statistics.
(d) Divergence time estimation
We identified 19 fossil calibrations across Decapoda, justified based on best practices ([43,44]; details in Extended Data 2 and Table S7). All internal calibrations used soft bounds with 5% of the probability distribution allowed outside of the input ages, defined by a birth-death tree prior. We applied a gamma-distributed root prior based on crown Eumalacostraca [44] with a mean age of 440 Ma and SD 20 Myr. Divergence times were estimated in PhyloBayes using a fixed topology from our preferred tree (Bayesian CAT-GTR + G; discussed below), the CAT-GTR + G substitution model, multiple clock models, and two runs of four MCMC chains each.
3. Results and Discussion
(a) Target capture success
We successfully sequenced targeted regions from 94 species representing 11 of 12 major decapod lineages (raw reads in NCBI BioProject XXX, assemblies in Dryad). We attempted to include Neoglyphea inopinata, one of only two living members of Glypheidea (deep sea lobsters with a diverse fossil record); however, multiple attempts to extract DNA from the limited tissue available to us did not render high-quality genomic extractions and failed during the probe capture (however, see [23] for mitogenome data). All other taxa we sequenced were quite successful, producing an average of 3,299,141 reads, with an average of 332 loci across samples that ranged from 57-405 assembled loci (Figure S2, Table S8). The final 410 loci ranged from 66-1683 bp with a total alignment length of 86,322 bp (Table 2). Taxa represented by >350 loci came from all major decapod lineages except Procarididea, demonstrating the efficacy of our probes across the entire clade. Using our enrichment kit, it will thus be possible for the community to easily sequence the same loci for large-scale phylogenomics spanning any decapods of interest.
The majority of nodes were congruent across different analyses, albeit with different levels of support (Figure 2), demonstrating that our large dataset is mostly cohesive and can resolve deep splits. We use the results from Bayesian inference with the CAT-GTR + G amino acid substitution model as the ‘best’ topology (Figure 2, first support square). This topology does not precisely match any previous result from morphological, Sanger, and/or mitogenomic data [19,21–23,25]. We include this tree over the Bayesian recoded topology because it had more nodes resolved, with higher support; the ways in which model misspecification drive contradicting topologies between these methods is still not understood [45]. Nucleotide analyses were not preferred because of both saturation in our data (Figure S3) and disagreement among results of different analyses [38].
Phylogenetic hypothesis for Decapoda based on the topology from the Bayesian CAT-GTR + G analysis. Unlabeled nodes are considered strongly supported. Nodes where at least one analysis rejected the depicted topology are illustrated with rug plots showing the support values from each analysis. In rug plots, the illustrated topology is first row, first column. All alternative topologies are available in Dryad. Species used for probe design by shotgun whole genome sequencing in bold text. For clarity, the branch leading to the outgroup Branchinecta sp. (Anostraca) has been shortened, and the real length is indicated. Organism silhouettes are from PhyloPic (phylopic.org) or created by J.M. Wolfe.
(b) Deep evolutionary history of decapods
Monophyly of Decapoda is supported in amino acid analyses (Figure 2); some nucleotide results find Lucifer (an apomorphic, epipelagic dendrobranchiate shrimp; Figure 1a) experiencing long branch attraction toward various outgroups. The most classical division in decapods, between suborders Dendrobranchiata (most food shrimp/prawns) and Pleocyemata (all other decapods), is supported by the unrecoded amino acid matrices (pp = 0.97/bootstrap = 93%), and contradicted by all others. The alternative hypothesis recovered is the natant (shrimp-like) decapods, with Dendrobranchiata, Stenopodidea, Procarididea, and Caridea forming a (usually poorly supported, ~50%) clade. We tentatively agree with the results supporting Dendrobranchiata and Pleocyemata, similar to the transcriptomic results of Schwentner et al. [26]. The polarity of one of the major characters separating these two clades, the lecithotrophic free-living nauplius larva in dendrobranchiates (as opposed to the egg-nauplius in pleocyemates), depends on whether Euphausiacea (krill) are most closely related to decapods, which we did not test. If euphausiids are the sister group of decapods, then pleocyemates have lost the free-living nauplius [46,47]; otherwise, the free-living nauplius is convergent in euphausiids and dendrobranchiates [48].
Within Pleocyemata, all infraorders receive full support for their monophyly (Figure 2). A single origin of the reptant or ‘crawling/walking’ decapods (Achelata, Polychelida, Astacidea, Axiidea, Gebiidea, Anomura, and Brachyura) is strongly supported. Numerous morphological characters have previously suggested monophyly of the reptant clade, e.g. a dorsoventrally flattened pleon, calcified body, anterior articulation of the mandibles formed by an elongated process of the molar region extending dorsally from the palp, anteroposterior rotation of walking legs, a short first pleomere, and spermatozoa with at least three nuclear arms [7]. Monophyly of reptant decapods is concordant with the majority of previous results [25], and almost certainly includes Glypheidea in addition [23].
Our posterior age estimate for the root of crown Decapoda (mean in the Late Ordovician at 455 Ma, 95% CI 512-412 Ma; Figure 3), was substantially older than most previous estimates [6,10,12,14,49], which largely fixed crown decapods in the Devonian. Our data include non-decapod outgroups, and the more crownward position of the Devonian calibration fossil Palaeopalaemon newberryi within Reptantia [44,50], resulting in older ages for deeper nodes.
Divergence time estimates for Decapoda based on the topology in Figure 2. Posterior ages were estimated in PhyloBayes using the CAT-GTR + G substitution model, the CIR clock model, and a gamma distributed root prior of 440 Ma ± 20 Myr. Horizontal shaded bars represent 95% confidence intervals. Numbered circles represent nodes with fossil calibrations.
(c) Evolutionary history of shrimp
Dendrobranchiate relationships are consistent (Figure 2), except the aforementioned long branch attraction of Lucifer to outgroups in the ASTRAL and nucleotide analyses. Amino acid results place this perplexing decapod with Sergestidae (pelagic shrimp), as suggested by morphological analysis, especially spermatophore morphology [51,52]. Crown dendrobranchiates diverged in the Late Devonian (Figure 3), with the two main clades Sergestoidea and Penaeoidea both diverging in the Pennsylvanian (about 73 Myr prior to the estimates of Ma et al. [53], and over 100 Myr older than the minimum ages suggested by Robalino et al. [54]). Although our mean posterior age of crown Penaeidae (134 Ma) is younger than the phylogenetically justified Late Jurassic crown fossil Antrimpos speciosus [54], which we did not use as a calibration, our 95% CI encompasses the fossil age of 151 Ma.
Among the pleocyemate shrimps, a sister group relationship between Stenopodidea (cleaner/boxer shrimp) and (Procarididea + Caridea) is strongly supported (Figure 2); similar topologies have been found in previous molecular and morphological analyses [8,11,26]. The sister group relationship of Procarididea (anchialine shrimp) and Caridea is supported by four-gene molecular analyses, the extended second pleurite overlapping the first and third somites, phyllobranchiate gills, and the form of the telson and uropods [6,55,56].
Within Caridea, we sampled eight families, compared with maximum 27 families in previous studies [18,57]. Nevertheless, our results are broadly concordant within the limits of the taxa we sequenced, producing a strongly supported backbone topology upon which future studies can build. The deepest split within carideans was between Atyidae (freshwater shrimp) and all others (Figure 2). Support for an alternative deep split of Atyidae and Oplophoridae from all other carideans (full nucleotide analyses) was weak, or a polytomy (recoded analyses). Our preferred topology agrees with previous work [18,57,58], although we are missing some deeper-diverging families. We support previous findings that the traditional concept of ‘Alpheoidea’ (snapping shrimp and allies) is not monophyletic and contains Palaemonidae (~200 genera; [18,57]). This larger Alpheioidea + Palaemonidae clade contains the majority of caridean diversity, possibly including Amphionidacea [19]. We inferred younger posterior ages (Figure 3) for the Procaridea-Caridea split (Pennsylvanian), and crown Caridea (Late Triassic), compared with previous analyses [6,58].
(d) Evolutionary history of lobsters
Few previous analyses (from six or fewer nuclear genes: [11,16,17]) have recovered monophyly of the overall lobster body plan, which receives full support in all our analyses (Figure 2). The relationships among lobster infraorders, however, are poorly resolved. Amino acid analyses suggest a sister group relationship between Polychelida (blind deep sea lobsters) and Astacidea (clawed lobsters and crayfish), but with <70% support, except the Dayhoff recoded Bayesian analysis (pp = 0.86). Recent mitogenomic work shows Glypheidea potentially intervening as the sister group of Polychelida [23,24] within a paraphyletic lobster grouping. It is difficult to predict how the addition of Glypheidea would affect our topology, as it is also possible that the mitochondrial grouping results from long branch attraction (both Glypheidea and Polychelida have species-poor crown groups and morphologically diverse stem groups that predate their crowns by >150 Myr; [15,59]).
Achelata, with the crown members united by their unique phyllosomal larval stage, are monophyletic (Figure 2). Of the two constituent families, Scyllaridae (slipper lobsters) are monophyletic in all analyses. Palinuridae (spiny lobsters), however, are paraphyletic in amino acid recoded analyses, and in some nucleotide analyses. Based on monophyletic Palinuridae (Figure 3), their crown divergence occurred in the Late Triassic (similar to previous estimates; [15]), and crown Scyllaridae in the Early Cretaceous (about 87 Myr younger than Bracken-Grissom et al. [15]). These age estimates predate the wealth of Jurassic and Cretaceous fossil achelatan larvae [60,61], implying bizarre stem-groups may have persisted throughout the Mesozoic alongside the crown. Our phylogenetic results also support the division of Palinuridae into distinct clades of Silentes and Stridentes, the latter bearing an enlarged antennular plate used in sound production in adults [15,62,63]. Indeed, the palinurid genera that are close to Scyllaridae in these alternative analyses are the included members of Silentes (Jasus and Sagmariasus), further supporting the auditory behavior of Stridentes as a clade synapomorphy, having diversified in the Jurassic.
Relationships within Astacidea were similar to results from a combined Sanger sequencing and taxonomic synthesis approach [64,65]. Crown Astacidea diverged in the Pennsylvanian, and crown Nephropidae (‘true’ lobsters) diverged in the Early Jurassic (Figure 3). The split between southern hemisphere crayfish (Parastacidae) and northern hemisphere crayfish occurred in the Middle Triassic around 241 Ma, prior to the breakup of Pangaea [15].
(e) Evolutionary history of mud/ghost shrimp
The mud/ghost shrimps Axiidea and Gebiidea (formerly Thalassinidea) are not monophyletic, as has been shown in previous molecular work [66]. Most of our amino acid analyses produce a paraphyletic mud shrimp group (Figure 2), with limited but clear support for Axiidea as the sister group to the Gebiidea and Meiura clade (i.e. the Monochélie of de Saint Laurent [67], which is strongly supported herein). There is precedent for our mud shrimp and crab clade based on morphology (where thalassinidean monophyly is assumed [7]), Sanger results [14], and on some mitogenomic analyses [20,22,23]. The alternative hypothesis we recovered (in most of the nucleotide analyses) is mud shrimp polyphyly, with Axiidea as sister group to the lobster clade. Loss of chelae on pereiopods posterior to the first was previously proposed asconvergent in Meiura and members of Gebiidea [7]; our topology suggests this character is the eponymic synapomorphy of the Monochélie [66,67]. Our divergence time analysis suggests that the mud shrimp + crab clade diverged in the Late Devonian (Figure 3), with crown Monochélie diverging in the Mississippian, and both crown Axiidea and crown Gebiidea in the Late Triassic. These posterior age estimates are older than previous studies [14] or a literal interpretation of the fossil record [68].
(f) Evolutionary history of crabs
Meiura, the monophyletic relationship between Anomura (‘false’ crabs) and Brachyura (true crabs), is strongly supported (Figure 2). This is important because a number of Sanger analyses [6,10,13,16,19] purported to refute meiuran monophyly. Several synapomorphies have been proposed, such as a short asymmetric flagella on the antennule, bent exopods on the maxillipeds, and fusion of ganglia borne on the first pleomere and thoracic mass [7]. Carcinization, the overall crab-like body plan including a flattened carapace with lateral margins, fused sternites, and strongly bent abdomen [69] has been suggested as a developmentally co-opted trait of Meiura with a ‘tendency’ to evolve repeatedly [70,71]. Our topology suggests at least four separately carcinized clades in Anomura, and one in Brachyura; however, increased taxon sampling will complicate character distribution [14,69] and introduce secondary losses, such as in frog crabs [72,73].
Within Anomura, we recover support for Hippidae (mole crabs) as the sister group of ‘Paguroidea’ (king crabs and most hermit crabs; Figure 2) rather than the outgroup to all other anomurans [14,74]. Precedent for a sister group relationship of Hippidae to Paguroidea comes from mitochondrial gene rearrangements [71], from gross adult morphology (e.g. shape of carapace and sternites [75]), and from characters of the foregut ossicles [76]. Note, as in past molecular-only results [14,74], Parapaguridae are more closely related to Eumunididae (squat lobsters), rendering hermit crabs polyphyletic with potentially convergent evolution of asymmetrical abdomens [74]. Recent mitogenome research [24] displayed dramatically different relationships among Anomura, including a different topology for hermit crab polyphyly, but mitochondrial data are best for relationships within families, and weaker for deep splits as they represent a single locus: long established as a weak approach to phylogenetic estimation. Posterior divergence estimates (Figure 3) from crown Anomura to crown Paguroidea span a narrow interval of about 22 Myr in the Late Triassic, with each node about 20 Myr older than previous estimates [14]. Note that we recover these posterior ages based on only soft maximum priors (i.e. not minima) from the Late Triassic Platykotta akaina [14,77], as it may be placed outside the meiuran crown-group [78]. We also observe a conflicting split between Galatheoida and all other anomurans (Figure 2), where preferred analyses support previous molecular-only results [14,74]. The alternative, a clade of monophyletic squat lobsters, porcelain crabs, and Parapaguridae, is supported in recoded amino acid, ASTRAL, and non-recoded nucleotide trees. These relationships could be clarified by sampling additional squat lobster and hermit crab groups.
Our analyses strongly support the traditional morphological divisions within Brachyura (Figure 2), with podotremes (represented by Dromiidae, or sponge crabs; gonopores located on the pereiopod coxa) as the deepest split in the Late Triassic (Figure 3), and eubrachyurans divided into Thoracotremata (gonopores located on the sternum) and Heterotremata (female gonopores located on the sternum, male gonopores on the coxa). Each of the two eubrachyuran branches diverged in the Early Cretaceous, with diversification among families mainly in the Late Cretaceous. Within Thoracotremata, all our results reject monophyletic Grapsoidea (Figure 2). The focal tree supports Sesarmidae as the outgroup to other families, but other analyses support either Plagusiidae or Varunidae. Within Heterotremata, we recover support for several clades that have been previously defined [12], at least within our taxon sampling: Majoidea (spider and decorator crabs: Epialtidae, Inachoididae, and Mithracidae), Xanthoidea (mud crabs: Panopeidae and Xanthidae), and Portunoidea (swimming crabs: Portunidae and Geryonidae). Within Majoidea, the family Epialtidae is paraphyletic with respect to Mithracidae, suggesting continued evaluation of larval morphology [12,79,80]. This is the best supported region of the heterotreme tree. As we only sampled 19 of 96 total brachyuran families, it is unsurprising that our analyses conflicted for remaining clades. Important AHE target taxa for improved resolution include Raninidae, Cyclodorippidae, and Homolidae (all podotremes), Gecarcinidae and Pinnotheridae (pea crabs, within thoracotremes), primary freshwater heterotremes [12], and xanthoid relatives [81].
(g) Divergence times
We present the posterior age results from divergence time analysis using the CIR autocorrelated clock model (Figure 3), as it is more biologically realistic [82]. Unlike in broader studies of arthropods [83] and myriapods [84], posterior credibility intervals were similar for many nodes regardless of which clock model we applied (Figure S4). Although we only used the top 50 loci as sequence data to estimate divergence times [85], the posteriors hewing close to the effective prior are not necessarily problematic (e.g. [86–88]). Similarities between effective prior and posterior distributions are also present for nodes we did not explicitly calibrate (Figure S4b), though they are free to vary according to the birth-death tree prior. This effect is less pronounced for non-reptantian nodes, which have scant fossil information and essentially uniform maxima as described by Brown & Smith [89].
Overall, our divergence time estimates imply a significant cryptic history for decapods, encompassing most of the Paleozoic (Figure 3). Perhaps our results will motivate revision of Paleozoic fossils that have been suggested as decapods, such as Imocaris spp. [78,90,91], angustidontids [92,93], or poorly constrained natantian shrimp [94], in a more explicit phylogenetic framework. We also infer a lack of cladogenesis among the deep lineages during the Permian, followed by diversification in most crown groups in the Triassic. Although we did not explicitly calibrate most nodes using Triassic fossils, and molecular data alone cannot accurately estimate diversification [95], it is striking that our divergence time analysis infers appearance of the modern decapod clades following the largest known mass extinction 251 Ma, replacing and innovating ecological roles as important members of the Modern evolutionary fauna [96–98]. Moreover, the most species-rich lineages, Caridea, Anomura, and Brachyura, each show deep divergences during the Jurassic and family-level diversification in the Cretaceous, concurrent with the radiation of modern reef-building corals [99], a major habitat and source of biodiversity for these crustaceans [100,101].
4. Conclusion
Our well-resolved dated phylogeny may inform comparative evolutionary topics, such as the evolution of visual systems in deep sea and cave environments [64,102], evolution of major body plan features [14,69,103], the role of symbiosis [104–106], evolution of behavior [107], macroevolutionary trends in physiology and habitat through time [100,108], conservation biology and vulnerability to climate change [49,109,110], and more. The new enrichment kit we have generated will permit an inexpensive expansion of taxon sampling across Decapoda, via our large-scale matrix of loci conserved across 450 Myr, to accelerate discoveries in a fascinating invertebrate clade.
Data accessibility
Extended Methods and Fossil Calibrations, Figures S1-S4, and Tables S1-S8 are available as Electronic Supplementary Material. Raw reads are available in the NCBI BioProject: XXX. Assemblies, matrices, and resulting tree files are available at the Dryad Digital Repository, provisional link: https://datadryad.org/review?doi=doi:10.5061/dryad.k7505mn. Scripts for this paper are available at https://github.com/jessebreinholt/proteinIBA.git.
Competing interests
We have no competing interests.
Authors’ contributions
H.D.B.G. and K.A.C. conceived the project, J.M.W., K.A.C., M.E.S., and H.D.B.G. acquired samples, L.E.T. and H.D.B.G. extracted DNA, J.M.W. and M.E.S. extracted RNA, A.R.L. and E.M.L. developed probes and conducted sequencing, J.W.B. developed, assembled, screened orthologs and processed AHE data into matrices, J.W.B. and J.M.W. conducted phylogenetic analysis, J.M.W. vetted fossil constraints, performed divergence time analyses, created figures, and wrote the manuscript with input from all authors.
Funding
We thank the following funding sources: AMNH Gerstner Scholarship and Lerner-Gray Fellowship (J.M.W.), NSF-EAR 1615426 (J.M.W.), NSF-DEB 1556059 (H.B.G.), and Florida International University.
Pairwise heat map of species-pairwise amino acid dataset completeness for all targeted AHE loci, in the unrecoded amino acid dataset. Numbers in parentheses are total captured loci per species. Low shared site coverage in shades of red and high shared site coverage in shades of green.
Saturation plot for each codon position with transversions (v) and transitions (s) plotted against F84 distance. The third codon position clearly deviates from expected values, and thus has experienced saturation.
Comparison of posterior probability distributions for divergence times assessed as in Figure 3 (posterior), and using the same analyses under the effective prior (removing sequence data). The posterior analyses are shaded; effective priors are superimposed on the same axes with a heavy line of the same color. Grey/black analyses with the CIR autocorrelated clock model (depicted in Figure 3); orange analyses with the UGAM uncorrelated clock model. (a) Selected nodes directly calibrated by fossils and their calibration number; (b) Selected nodes calibrated by only a birth-death tree prior.
All Extended Tables are available as.xslx or.csv files attached.
Table S1. Details of all transcriptome and genome sequences used in probe design.
Table S2. Sample information for whole genome sequencing.
Table S3. Sample information for transcriptome sequencing.
Table S4. Assembly statistics for transcriptome sequencing.
Table S5. Brief description of enrichment kits for each of six selected major lineages (Achelata, Anomura, Astacidea, Brachyura, Caridea, and Dendrobranchiata).
Table S6. Sample information for AHE sequencing.
Table S7. Formatted list of node calibration priors.
Table S8. Assembly statistics for AHE sequencing, and loci sequenced for each species. For each locus, 1 represents presence and 0 represents absence in the main data matrix.
Acknowledgments
We thank Mercer Brugler, Jorge Perez-Moreno, Shaina Simon, and Juliet Wong for assistance with extractions, and Michelle Kortyna, Sean Holland, and Ameer Jalal at the FSU Center for Anchored Phylogenomics for assistance with probe design and data collection. We acknowledge use of the Engaging Cluster at MGHPCC. This is contribution #X for the Center for Coastal Oceans Research in the Institute for Water and Environment at Florida International University.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
Extended References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.↵
- 128.
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵