Abstract
New techniques for the species-level sorting of millions of specimens have to be developed in order to answer the question of how many species live on earth. These methods should be reliable, scalable, and cost-effective as well as largely insensitive to the low-quality genomic DNA commonly obtained from museum specimens. Mini-barcodes seem to satisfy these criteria, but it is unclear whether they are sufficiently informative for species-level sorting. This is here tested based on 20 datasets covering ca, 30,000 specimens of 5,500 species. All specimens were first sorted based on morphology before being barcoded with full-length cox1 barcodes. Mini-barcodes of different lengths and positions were then obtained in silico from the full-length barcodes using nine published mini-barcode primers (length: 94 – 407-bp) and a sliding window approach (3 windows: 100-bp, 200-bp, 300-bp). Afterwards, we determined whether barcode length and/or position reduces congruence between morphospecies and molecular Operational Taxonomic Units (mOTUs) that were obtained using three different species delimitation techniques (ABGD, PTP, objective clustering). We also evaluate how useful the published mini-barcodes are for species identification with the "best close match” algorithm. We find that there is no significant difference in performance for both species delimitation and identification between full-length and mini-barcodes as long as they are of moderate length (>200-bp). Only very short mini-barcodes (<200-bp) perform poorly, especially when they are located near the 5’ end of the Folmer region. Overall, congruence between morphospecies and mOTUs is ca. 80% for barcodes that are >200-bp. The congruent mOTUs contain ca. 75% of the specimens and we estimate that most of the conflict is caused by ca. 10% of the specimens that should be targeted for re-examination in order to resolve conflict efficiently. Overall, barcode length (>200-bp) and species delimitation methods have minor effects on congruence. Our study suggests that large-scale species discovery, identification and metabarcoding can utilize mini-barcodes without substantial loss of information compared to full-length barcodes. This is good news given that mini-barcodes can be obtained via cost-effective tagged amplicon sequencing using short-read sequencing platforms (Illumina: "NGS barcodes”).
Footnotes
The revised version of the manuscript has a more concise title and includes a few text edits.