Abstract
Species identification using DNA sequences, known as DNA barcoding has been widely used in many applied fields. Current barcoding methods are usually based on a single mitochondrial locus, such as cytochrome c oxidase subunit I (COI). This type of barcoding is not always effective when applied to species separated by short divergence times or that contain introgressed genes from closely related species. Herein we introduce a more effective multi-locus barcoding framework that is based on gene capture and “next-generation” sequencing and provide both empirical and simulation tests of its efficacy. We examine genetic distinctness in two pairs of fishes that are sister-species: Siniperca chuatsi vs. S. kneri and Sicydium altum vs. S. adelum, where the COI barcoding approach failed species identification in both cases. Results revealed that distinctness between S. chuatsi and S. kneri increased as more independent loci were added. By contrast S. altum and S. adelum could not be distinguished even with all loci. Analyses of population structure and gene flow suggested that the two species of Siniperca diverged from each other a long time ago but have unidirectional gene flow, whereas the two species of Sicydium are not separated from each other and have high bidirectional gene flow. Simulations demonstrate that under limited gene flow (< 0.00001 per gene per generation) and enough separation time (> 100000 generation), we can correctly identify species using more than 90 loci. Finally, we selected 500 independent nuclear markers for ray-finned fishes and designed a three-step pipeline for multilocus DNA barcoding.