Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A chromosome level genome of Astyanax mexicanus surface fish for comparing population-specific genetic differences contributing to trait evolution

Wesley C. Warren, Tyler E. Boggs, Richard Borowsky, Brian M. Carlson, Estephany Ferrufino, Joshua B. Gross, LaDeana Hillier, Zhilian Hu, Alex C. Keene, Alexander Kenzior, Johanna E. Kowalko, Chad Tomlinson, Milinn Kremitzki, Madeleine E. Lemieux, Tina Graves-Lindsay, Suzanne E. McGaugh, Jeff T. Miller, Mathilda Mommersteeg, Rachel L. Moran, Robert Peuß, Edward Rice, Misty R. Riddle, Itzel Sifuentes-Romero, Bethany A. Stanhope, Clifford J. Tabin, Sunishka Thakur, Yamamoto Yoshiyuki, View ORCID ProfileNicolas Rohner
doi: https://doi.org/10.1101/2020.07.06.189654
Wesley C. Warren
1Department of Animal Sciences, Department of Surgery, Institute for Data Science and Informatics, University of Missouri, Bond Life Sciences Center, Columbia, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wwarren@genome.wustl.edu nro@stowers.org
Tyler E. Boggs
2Department of Biological Sciences, University of Cincinnati, Cincinnati, OH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard Borowsky
3Department of Biology, New York University, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian M. Carlson
4Department of Biology, The College of Wooster, Wooster, OH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Estephany Ferrufino
5Harriet L. Wilkes Honors College, Florida Atlantic University, Jupiter FL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joshua B. Gross
2Department of Biological Sciences, University of Cincinnati, Cincinnati, OH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
LaDeana Hillier
6Department of Genome Sciences, University of Washington, Seattle, WA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhilian Hu
7Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alex C. Keene
8Department of Biological Sciences, Florida Atlantic University, Jupiter FL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander Kenzior
9Stowers Institute for Medical Research, Kansas City, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Johanna E. Kowalko
5Harriet L. Wilkes Honors College, Florida Atlantic University, Jupiter FL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chad Tomlinson
10McDonnell Genome Institute, Washington University, St Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Milinn Kremitzki
10McDonnell Genome Institute, Washington University, St Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Madeleine E. Lemieux
11Bioinfo, Plantagenet, ON K0B 1L0, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tina Graves-Lindsay
10McDonnell Genome Institute, Washington University, St Louis, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Suzanne E. McGaugh
12Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeff T. Miller
12Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mathilda Mommersteeg
7Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rachel L. Moran
12Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert Peuß
9Stowers Institute for Medical Research, Kansas City, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Edward Rice
1Department of Animal Sciences, Department of Surgery, Institute for Data Science and Informatics, University of Missouri, Bond Life Sciences Center, Columbia, MO
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Misty R. Riddle
13Genetics Department, Blavatnik Institute, Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Itzel Sifuentes-Romero
5Harriet L. Wilkes Honors College, Florida Atlantic University, Jupiter FL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bethany A. Stanhope
5Harriet L. Wilkes Honors College, Florida Atlantic University, Jupiter FL
8Department of Biological Sciences, Florida Atlantic University, Jupiter FL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Clifford J. Tabin
13Genetics Department, Blavatnik Institute, Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sunishka Thakur
5Harriet L. Wilkes Honors College, Florida Atlantic University, Jupiter FL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yamamoto Yoshiyuki
14Department of Cell and Developmental Biology, University College London, London, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicolas Rohner
9Stowers Institute for Medical Research, Kansas City, MO
15Department of Molecular & Integrative Physiology, KU Medical Center, Kansas City, KS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicolas Rohner
  • For correspondence: wwarren@genome.wustl.edu nro@stowers.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Identifying the genetic factors that underlie complex traits is central to understanding the mechanistic underpinnings of evolution. In nature, adaptation to severe environmental change, such as encountered following colonization of caves, has dramatically altered genomes of species over varied time spans. Genomic sequencing approaches have identified mutations associated with troglomorphic trait evolution, but the functional impacts of these mutations remain poorly understood. The Mexican Tetra, Astyanax mexicanus, is abundant in the surface waters of northeastern Mexico, and also inhabits at least 30 different caves in the region. Cave-dwelling A. mexicanus morphs are well adapted to subterranean life and many populations appear to have evolved troglomorphic traits independently, while the surface-dwelling populations can be used as a proxy for the ancestral form. Here we present a high-resolution, chromosome-level surface fish genome, enabling the first genome-wide comparison between surface fish and cavefish populations. Using this resource, we performed quantitative trait locus (QTL) mapping analyses for pigmentation and eye size and found new candidate genes for eye loss such as dusp26. We used CRISPR gene editing in A. mexicanus to confirm the essential role of a gene within an eye size QTL, rx3, in eye formation. We also generated the first genome-wide evaluation of deletion variability that includes an analysis of the impact on protein-coding genes across cavefish populations to gain insight into this potential source of cave adaptation. The new surface fish genome reference now provides a more complete resource for comparative, functional, developmental and genetic studies of drastic trait differences within a species.

Main Text

Establishing the relationship between natural environmental factors and the genetic basis of trait evolution has been challenging. The ecological shift from surface to cave environments provides a tractable system to address this, as in this instance the polarity of evolutionary change is known. Across the globe, subterranean animals, including fish, salamanders, rodents, and myriads of invertebrate species have converged on reductions in metabolic rate, eye size, and pigmentation1–3. The robust phenotypic differences from surface relatives provide the opportunity to investigate the mechanistic underpinnings of evolution and to determine whether genotype-phenotype interactions are deeply conserved.

The Mexican cavefish, Astyanax mexicanus, has emerged as a powerful model to investigate complex trait evolution4. A. mexicanus comprises at least 30 cave-dwelling populations in the Sierra de El Abra, Sierra de Colmena and Sierra de Guatemala regions of northeastern Mexico, and surface-dwelling fish of the same species inhabit rivers and lakes throughout Mexico and southern Texas5. The ecology of surface and cave environments differ dramatically, allowing for functional and genomic comparisons between populations that have evolved in distinct environments. Dozens of evolved trait differences have been identified in cavefish including changes in morphology, physiology, and behavior4, 6. It is also worth noting that comparing cavefish to surface fish reveals substantial differences in many traits of possible relevance to human disease, including sleep duration, circadian rhythmicity, anxiety, aggression, heart regeneration, eye and retina development, craniofacial structure, insulin resistance, appetite, and obesity7–17. Further, generation of fertile surface-cave hybrids in a laboratory setting has allowed for genetic mapping in these fish10, 18–25. Clear phenotypic differences, combined with availability of genetic tools, positions A. mexicanus as a natural model system for identifying the genetic basis of ecologically and evolutionary relevant phenotypes17, 26, 27.

Recently, the genome of an individual from the Pachón cave population was sequenced28. While this work uncovered genomic intervals and candidate genes linked to cave traits, an important computational limitation was sequence fragmentation due to the use of short-read sequencing. In addition, lack of a surface fish genome prevents direct comparisons between surface and cavefish populations at the genomic level. Here, we address these two obstacles by presenting the first de novo genome assembly of the surface fish morph using long-read sequencing technology. This approach yielded a much more comprehensive genome that allows for direct genome-wide comparisons between surface fish and Pachón cavefish. As a proof-of-principle, we first confirmed known genetic mutations associated with pigmentation and eye loss, then discovered novel quantitative trait loci (QTL), and identified coding and deletion mutations that highlight putative contributions to cave trait biology.

Results

De novo assembly and curation of the surface fish genome

We set out to generate a robust reference genome for the surface morph of A. mexicanus from a single lab-reared female, descended from wild caught individuals from known Mexico localities (Fig. 1a, Supplementary Figure 1). We sequenced and assembled the genome using Pacific Biosciences single molecule real time (SMRT) sequencing (∼73x genome coverage) and the wtdbg2 assembler29 to an ungapped size of 1.29 Gb. Initial scaffolding of assembled contigs was accomplished with the aid of an A. mexicanus surface fish physical map (BioNano) followed by manual assignment of 70% of the assembly scaffolds to 25 total chromosomes using the existing A. mexicanus genetic linkage map markers30. The final genome assembly, Astyanax mexicanus 2.0, comprises a total of 2,415 scaffolds (including single contig scaffolds) with N50 contig and scaffold lengths of 1.7 Mb and 35 Mb, respectively, which is comparable to other similarly sequenced and assembled teleost fishes (Supplementary Table 1). The assembled regions (394 Mb) that we were unable to assign to chromosomes were mostly due to 3.36% (2,235 markers total) of the genetic linkage markers not aligning to the surface fish genome, single markers per contig where the orientation could not be properly assigned, or markers that mapped to multiple places in the genome and thus could not be uniquely mapped. The uniquely mapped markers exhibited few ordering discrepancies and significant synteny between the linkage map30 and the assembled scaffolds of the Astyanax mexicanus 2.0 genome, validating the order and orientation of a majority of the assembly (Fig. 1b; Supplementary Fig. 2). In Astyanax mexicanus 2.0, we assemble and identify 11% more total masked repeats that in the Astyanax mexicanus 1.0.2 assembly (Supplementary Table 1). Amongst assembled teleost genomes, A. mexicanus seems to be an intermediate with 41% masked interspersed repeats estimated using WindowMasker31, compared to Xiphophorus maculatus (27%) and Danio rerio (50%). Possible mapping bias across cave population sequences to the cave versus surface fish genome references was also investigated by mapping population level resequencing reads to both genomes32. We found the number of unmapped reads is greater for all populations aligned to the Astyanax mexicanus 1.0.2 reference compared to Astyanax mexicanus 2.0 (Fig. 1c). Also, the percentage of properly paired reads, that is, pairs where both ends align to the same scaffold, is greater for all resequenced cave populations aligned to the Astyanax mexicanus 2.0 reference (Supplementary Fig. 3) and significantly more non-primary alignments with greater variation were observed (Supplementary Fig. 4). Both metrics indicate that the Astyanax mexicanus 2.0 reference has more resolved sequence regions than the Astyanax mexicanus 1.0.2 reference. Future application of phased assembly approaches will likely resolve a significantly higher proportion of chromosomal sequences in the Astyanax mexicanus genome33.

Figure 1.
  • Download figure
  • Open in new tab
Figure 1. Assembly metrics for the surface fish genome Astyanax mexicanus 2.0.

a) Adult Astyanax mexicanus surface fish. b) Substantial synteny is evident between a recombination-based linkage map30 and the draft surface fish genome. By mapping the relative positions of genotyping-by-sequencing (GBS) markers from a dense map constructed from a Pachón x surface fish F2 pedigree, we observed significant synteny based on 96.6% of our genotype markers. c) Proportion of unmapped reads for the same samples aligned to the cave (square points) and surface (triangle points) Astyanax reference genomes. Colors unite samples by population identity. Best fit lines are for each alignment (cave: blue, surface: yellow) have similar slope, indicating that population identity has a similar effect on mapping rate to either genome.

Gene annotation

Two independent sets of protein-coding genes were generated using the NCBI34 and Ensembl35 automated pipelines with similar numbers of genes found by each: 25,293 and 26,698, respectively (Supplementary Table 2). Gene annotation was aided by the diversity of transcript data derived from whole adult fish, embryos, and 12 different tissues available from the NCBI short read archive. As a result, the total predicted protein-coding genes and transcripts (mRNA) were consistent with other annotated teleost species (Supplementary Table 2) and 1,665 new protein-coding genes were added compared to Astyanax mexicanus 1.0.2. Additionally, long non-coding gene (e.g. lncRNA) representation is significantly improved in the Astyanax mexicanus 2.0 reference compared to the Astyanax mexicanus 1.0.2 reference (5,314 vs 1,062), although targeted non-coding RNA sequencing will be required to achieve annotation comparable to zebrafish (Supplementary Table 2). We assessed completeness of gene annotation by applying BUSCO (benchmarking universal single-copy ortholog) scores, which measure gene completeness among vertebrate genes. We find 94.6% of genes are complete, 4% are missing and 4.2% are duplicated (Supplementary Table 3). In addition, using NCBIs transcript aligner Splign (C++toolkit) gene annotation metrics, same-species RefSeq or GeneBank transcripts show 98.4% coverage when aligned to Astyanax mexicanus 2.0. In total, our measures of gene representation in the Astyanax mexicanus 2.0 reference show a high-quality resource for the study of A. mexicanus gene function.

Having a high-quality reference genome provides many benefits in exploiting Astyanax mexicanus as a model species. Among the more important uses, from the standpoint of utilizing the system to understand evolutionary mechanisms, are in mapping and identifying genes responsible for phenotypic change, and in unveiling genomic structural variation that provides a substrate for adaptive selection. Thus, to demonstrate the advantages Astyanax mexicanus 2.0 brings to the field, we have explored its use in each of these settings; in particular examining the genetic underpinnings of albinism and reduced eye size, and then looking at the contribution of genomic deletion to variation within Astyanax mexicanus populations.

First with respect to identification of evolutionary important genes, we took advantage of a critical attribute of A. mexicanus, nearly unique among cave animals: the ability to interbreed cavefish and surface fish populations to generate fertile hybrids that can be intercrossed to perform QTL studies (for review see36, 37). The improved contig length of the surface fish genome and our syntenic analysis that unites the physical genome to a previously published linkage map based on genotyping-by-sequencing (GBS) markers30, should greatly aid the correct identification of genetic changes linked to cave traits. To this end, we demonstrate the power of the Astyanax mexicanus 2.0 reference in gaining deeper insight into an already mapped trait (albinism) and in carrying out new QTL analysis of eye reduction.

QTL analysis of albinism

Albinism has previously been mapped in QTL studies of both the Pachòn and Molino populations20. This work was carried out before the existence of any reference genome for the species. However, these mapping studies showed that a single major QTL in each population colocalized with a candidate gene oca2. Oca2 loss of function mutations are known to cause albinism in other organisms, including humans, mice and zebrafish, and deletions causing functional inactivation of Oca2 were identified in the albino fish from both the Molino and Pachón caves20. These compelling data indicating that oca2 mutations are causal of albinism in cavefish were subsequently confirmed by CRISPR-mediated mutagenesis in surface fish27. To build on these results, we first performed two separate de novo QTL analyses of surface/Pachón F2 hybrids (group A and B) in the context of the new reference genome (Fig. 2a). Both studies identified a single QTL for albinism, with a LOD score of 47.31 at the peak marker on linkage group 3 (69% variance explained) in group A, and with LOD score of 22.56 at the peak marker on linkage group 21 (37.8% variance explained) in group B (Fig. 2c, d; 2g, h). At the peak QTL positions, F2 hybrids homozygous for the cave allele are albino (Fig. 2e, i). Mapping the markers associated with each of these linkage groups to the surface fish genome revealed that they are both located on surface fish chromosome 13 (Table 1, Fig. 2f, j). This demonstrates a significant improvement over mapping to the original cavefish genome: Linkage group 3 from group A almost completely corresponds to surface fish chromosome 13 in Astyanax mexicanus 2.0, whereas it is split up into many contigs in Astyanax mexicanus 1.0.2 (Fig. 2f, j, Supplementary Fig. 5a). The QTL identified on linkage group 21 in group B corresponds to a 2.8 Mb region on surface fish chromosome 13 as determined using the sequences flanking the 1.5-LOD support interval as input for Ensembl basic local alignment search tool (Table 1, Fig. 2J). In line with previous mapping studies, the gene oca2 is found within this region on surface fish chromosome 13. We further functionally verified that a change in the oca2 locus is responsible for albinism in the F2 hybrids, by crossing an albino surface/Pachón F2 hybrid with a genetically-engineered surface fish heterozygous for a deletion in oca2 exon 2127. We found that 46.5% of the offspring completely lacked pigment (n=40/86, Fig. 2b). This confirms that a change in the oca2 locus is responsible for loss of pigment in the mapping population.

Figure 2.
  • Download figure
  • Open in new tab
Figure 2. Utilizing the surface fish genome for QTL mapping of albinism.

a) Representative image of F1 hybrid of surface/Pachón cross and albino F2 hybrid b) Pigmented (top) and albino (bottom) progeny resulting from breeding a surface/Pachón albino F2 hybrid with a surface fish heterozygous for an engineered 4 base pair deletion in oca2 exon 21. c-j) QTL mapping of albinism in surface/Pachón mapping group A (c-f) and B (g-j). c, g) Results of genome wide LOD calculation for albinism using Haley-Knott regression and binary model (0 = pigmented, 1 = albino). Significance threshold of 5% (black dotted line) determined by calculating the 95th percentile of genome-wide maximum penalized LOD score using 1000 random permutations. d, h) LOD score for each marker on the linkage group with the peak marker. e, i) Plot highlighting the effect of the indicated genotype at the peak marker (S = surface allele, C = cave allele). f, j) Relative position of the markers that define the 1.5 LOD support interval on chromosome 13 of the surface fish genome.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1.

QTL mapping of albinism and eye size in two groups of surface/Pachón F2 hybrids. Table comparing QTL marker positions on linkage maps, Pachón genome scaffolds, and surface fish genome chromosomes (1 Astyanax mexicanus 1.0.2; 2 Astyanax mexicanus 2.0).

New insights on albinism locus

While up until this point our reanalysis of albinism was largely confirmatory, an advantage of using the surface fish genome as a reference, compared to the previously available cavefish genome, is the ability to make comparisons between regions that are deleted in cavefish and as such are not available for sequence alignment. We utilized the A. mexicanus 2.0 reference genome to align sequencing data obtained from wild-caught fish from different surface and cave localities32 and analyzed the oca2 locus (Supplementary Fig. 6,7). We found that 9 out of 10 Pachón cavefish carry the deletion in exon 24 that was previously reported in laboratory-raised fish (Supplementary Fig. 6;20). None of the individuals from other populations for which sequence information was available carried the same deletion. Consistent with the laboratory strains, exon 21 of oca2 is absent in all Molino cavefish samples (n=9, Supplementary Fig. 7). None of the other sequenced population samples harbor the same deletion of exon 21, however we found smaller heterozygous or homozygous deletions in exon 21 in some wild samples of Pachón (n=5/9), Río Choy (n=1/9), and Tinaja (5/9) fish (Supplementary Fig. 7). In summary, we were able to detect deletions in oca2 that would have not been discovered with alignments to Astyanax mexicanus 1.0.2 since exon 24 is missing in Pachón and thus no reference-based alignments were produced in that region.

Surface fish genome reveals new candidate genes from prior QTL studies

A prominent feature of cavefish is the absence of external eyes. While a number of eye size QTL have been identified18, 19, there is an incomplete understanding of the genetic basis of eye regression. We uncovered additional information by remapping previously published QTL18–24 using the Astyanax mexicanus 2.0 reference. A total of 1,060 out of 1,124 markers (94.3%) mapped successfully with BLAST and were included in our surface fish QTL database. There were 77 markers that did not map to the cavefish genome28 but did map to the surface fish genome, 52 markers that mapped to the cavefish but not the Astyanax mexicanus 2.0 reference, and 12 markers that did not map to either reference. The improved contiguity of the chromosome-level surface fish assembly allowed us to identify several additional candidate genes associated with QTL markers that were not previously identified in the more fragmented cavefish genome (Supplementary Data 1). For example, the markers Am205D and Am208E mapped to an approximately 1 Mb region of chromosome 6 of the surface fish genome (46,516,926 -47,425,713 bp) but did not map to the Astyanax mexicanus 1.0.2 reference. This region is associated with feeding angle21, eye size, vibration attraction behavior (VAB), suborbital neuromasts23, and maxillary tooth number19. Multiple strong candidate genes involved in eye development or morphology are contained in this region including rhodopsin, mib2, and ubiad1, as well as GABA A receptor delta which is associated with a variety of behaviors and could conceivably be involved in VAB. Notably, the scaffold containing these four genes in Astyanax mexicanus 1.0.2 (KB871939.1) was not linked to this QTL, demonstrating the utility of increased contiguity of Astyanax mexicanus 2.0.

Previous studies suggested that the gene retinal homeobox gene 3 (rx3) lies within the QTL for outer plexiform layer of the eye24. Another QTL for eye size, size of the third suborbital bone, and body condition18, 19, may also contain rx3 (low marker density and low power of older studies, result in a broad QTL critical region in this area). The increased contiguity of Astyanax mexicanus 2.0 revealed that rx3 is within the region encompassed by this QTL, whereas in Astyanax mexicanus 1.0.2 the marker for this QTL (Am55A) and rx3 were located on separate scaffolds; thus, we could not appreciate that this QTL and key gene for eye development were in relatively close genomic proximity. While no amino acid coding changes are apparent between cavefish and surface fish, expression of rx3 is reduced in Pachón cavefish relative to surface fish28. In zebrafish, rx3 is expressed in the eye field of the anterior neural plate during gastrulation and has an essential role for the fate specification between eye and telencephalon38, 39. We have compared rx3 expression at the end of gastrulation and confirmed Pachón embryos have reduced expression domain size (Fig. 3a, b). The expression area is significantly smaller in Pachón embryos compared to stage-matched surface fish embryos. The expression of rx3 is restored in F1 hybrids between cavefish and surface fish, indicating a recessive inheritance in cavefish (Fig. 3a, b). To test for a putative role of rx3 in eye development in Astyanax mexicanus we used CRISPR/Cas9 to mutate this gene in surface fish (Fig. 3c), and assessed injected, CRISPant fish for eye phenotypes. Wild-type surface fish have large eyes (Fig. 3d). In contrast, externally visible eyes are completely absent in adult CRISPant surface fish (n=5, Fig.3d). This is consistent with work from other species, in which mutations in rx3 (fish) or Rx (mice) result in a complete lack of eyes40–42. Together these data suggest that the role of rx3 in eye development is conserved in A. mexicanus. Further, they support the hypothesis that regulatory changes in this gene may contribute to eye loss in cavefish through specification of a smaller eye field, and subsequently, production of a smaller eye.

Figure 3.
  • Download figure
  • Open in new tab
Figure 3. Rx3 analysis.

a) In situ hybridizations for rx3 (arrows) and pax2 mRNA at tailbud stage on surface fish, Pachón cavefish, and F1 hybrid between surface fish male and Pachón cavefish female. Scale bar: 200 µm. b) Comparison of expression area of rx3 at tailbud stage on surface fish (n=17), Pachón (n=14), and F1 hybrid (n=11). Significance shown for ANOVA. p < 0.005 is denoted by * and ns means not significant. c) Diagram of rx3 gene. Boxes indicate exon and lines indicate introns. The empty boxes are UTR and the filled boxes are coding sequence (gene build from NCBI). A gRNA was designed targeting exon 2. The gRNA target site is in blue and the PAM site is in red. The arrow indicates the predicted Cas9 cut site. d) left: Adult WT control (uninjected sibling) surface fish. Right: Adult rx3 CRISPR/Cas9-injected surface fish lacking eyes (CRISPant).

In addition to the compiled database of older QTL studies, a number of genomic intervals associated with previously described locomotor activity difference between cavefish and surface fish25 were also re-screened using the surface-anchored locations of markers from the high-density linkage map30. Within this Pachón/surface QTL map we confirmed the presence of 20 previously reported candidate genes, and identified 96 additional genes with relevant GO terms, including rx3, further demonstrating the power and utility of this new genomic resource (Supplementary Table 4). The new candidates include additional opsins (opn7a, opn8a, opn8b, tmtopsa, and two putative green-sensitive opsins), as well as several genes contributing to circadian rhythmicity (id2b, nfil3-5, cipcb, clocka, and npas2). While analyses of expression data and sequence variation are necessary to determine which of these candidates exhibit meaningful differences between morphs, the presence of clocka and npas2 in these intervals is of particular note, as the original analysis conducted using Astyanax mexicanus 1.0.2 did not provide any evidence of a potential role for members of the core circadian clockwork in mediating observed differences in locomotor activity patterns between Pachón and surface fish15, 25.

Genetic mapping with surface fish genome reveals new candidate genes for eye regression

Finally, to gain new insight into the eye reduction phenotype, we used the Astyanax mexicanus 2.0 reference to de novo genetically map eye size in the two surface/Pachón F2 groups that we used to map albinism. In surface/Pachón F2 mapping population A (n=188) we identified multiple QTL for normalized eye perimeter that were spread across four linkage groups (Fig. 4a). The QTL on linkage group 1 is significant above a threshold of p < 0.01 (Fig. 4b). The surface allele at the peak marker appears to be dominant since eye size in the heterozygous state is similar to the homozygous state (Fig. 4c). Mapping linkage group 1 to the Astyanax mexicanus 1.0.2 genome assembly results in markers under the QTL peak spread across different contigs, whereas these map to chromosome 3 in the new surface fish assembly, emphasizing how the improved quality of the surface fish genome allows the identification of candidate genes throughout the QTL region (Fig. 4d, Supplementary Fig. 5b, Table 1). This region has been identified previously18, 28 and contains genes such as shisa2a and shisa2b, as well as eya1. Analysis of the region between the markers with the highest LOD scores (3:9301997-9505868), however, revealed one gene that has not been linked to eye loss in A. mexicanus before, ENSAMXG00000005961 (Fig. 4d). Alignment of this novel gene sequence showed homology to orofacial cleft 1 (ofcc1), also called ojoplano (opo). In medaka, opo has been shown to be involved in eye development. When knocked out, opo affects the morphogenesis of several epithelial tissues, including impairment of optic cup folding which resulted in abnormal morphology of both the lens and neural retina in the embryos43. Sequence comparisons of this gene in Pachón cavefish and surface fish revealed several coding changes, however, none affect evolutionarily conserved residues. Future studies are needed to investigate a putative role of opo in the eye loss of Pachón cavefish.

Figure 4.
  • Download figure
  • Open in new tab
Figure 4. Utilizing the surface fish genome for QTL mapping of eye size in surface/Pachón hybrids.

Results from group A (a-d), results from group B (e-h). a, e) Genome wide LOD calculation for eye size using Haley-Knott regression and normal model. Significance threshold of 5% (black dotted line) determined by calculating the 95th percentile of genome-wide maximum penalized LOD score using 1000 random permutations. b, f) LOD score for each marker on the linkage group with the peak marker. c, g) Mean phenotype of the indicated genotype at the peak marker (S = surface allele, C = cave allele). d) Relative position of the markers that define the 1.5 LOD support interval for group A on the surface fish chromosome 3. h) Relative position of the markers that define the 1.5 LOD support interval on chromosome 20 of the surface fish genome.

In the surface/Pachón F2 mapping population B (n=219) we identified a single QTL for normalized left eye diameter on linkage group 13 with a LOD score of 11.98 at the peak marker that explains 24.3% of the variance in this trait (Fig. 4e, f). F2 hybrids homozygous for the cave allele at this position have the smallest eye size, and the heterozygous state is intermediate (Fig. 4g). The values for left and right eye diameter mapped to the same region and, notably, we obtain the same peak when including eyeless fish in the map and coding eye phenotype as a binary trait (i.e. eyed, eye-less). The QTL on linkage group 13 corresponds to a 633 KB region on surface fish chromosome 13 as determined using the sequences flanking the 1.5-LOD support interval as input for Ensembl basic local alignment search tool (Fig. 4h, Table 1). There are 22 genes in this region (Supplementary Table 5). Of note, none of the previously mapped eye size QTL in Pachón cavefish map to the same region44. A promising candidate gene in this interval is dusp26. Morpholino knock-down of dusp26 in zebrafish results in small eyes with defective retina development and a less developed lens during embryogenesis45. We used whole genome sequencing data32 to compare the dusp26 coding region between surface, Tinaja, Pachón and Molino and found no coding changes. However, previously published embryonic transcriptome data indicates that expression of dusp26 is reduced in Pachón cavefish at 36 hpf (p<0.05) and 72 hpf (p<0.01) (Supplementary Fig. 8)46. These data suggest a potential role for dusp26 in eye degeneration in cavefish, however, we cannot exclude critical contributions of other genes or genomic regions in the identified interval.

Structural variation in cave populations

Another important advantage of having a robust reference genome is that it allows one to interrogate the genomic structural variation at a population level. Knowledge of population-specific A. mexicanus structural sequence variation is lacking.

Therefore, we aligned the population samples from Herman et al.32 against Astyanax mexicanus 2.0 to ascertain the comparative state of deletions. We used the SV callers Manta47 and LUMPY48 to count the numbers of deletions present in each sample compared to Astyanax mexicanus 2.0 (Supplementary Fig. 9). While LUMPY tended to call a larger number of short deletions and Manta a smaller number of long deletions, there was high correlation (R2=0.78) between the number of calls each made per sample (Supplementary Fig. 10), so we used the intersection of deletions called by both callers for further analysis. We then classified these deletions based on their effect (i.e. deletions of coding, intronic, regulatory, or intergenic sequence) (Supplementary Fig. 11). Among the cavefish populations measured for deletion events, 412 genes contained deletions with an allele frequency >5%. We found that the Molino population has the fewest heterozygous deletions, while Río Choy surface fish have the least homozygous deletions, mirroring the heterozygosity of single nucleotide polymorphisms32 (Supplementary Fig. 11). In addition, the Tinaja population showed the most individual variability of either allelic state (standard deviation of 427). Pachón and Tinaja contained the highest number of protein-coding genes altered by a deletion in at least one haplotype, while Río Choy had the least (Fig. 5a). In two examples, per3 and ephx2, we find the deletions that presumably altered protein-coding gene function varied in population representation, number of bases affected, and haplotype state for each (Fig. 5b, c). Of the 412 genes that contained deletions, 109 have assigned gene ontology in cavefish (Supplementary Table 6). We tested these 109 genes for canonical pathway enrichment using WebGestalt49 and found genes significantly enriched (p<0.05) for AMPK and MAPK signaling, as well as metabolic and circadian clock function (Table 2). In addition, some genes were linked to diseases consistent with cavefish phenotypes including: ephx2, which is linked to familial hypercholesterolemia, or hnf4a, which is linked with non-insulin dependent diabetes mellitus, based on the gene ontology of OMIM and DisGeNET50. Notably, these enrichment analyses recovered the deletion known to be the main cause of pigment degeneration in cavefish, oca227. Further validation of these disease phenotype or pathway inferences is warranted.

Figure 5.
  • Download figure
  • Open in new tab
Figure 5. Sequence deletion events characterized by population origin.

a) Total deletion count per population for protein-coding exons and intron sequence and regulatory sequence defined as 1000bp up or downstream of annotated protein-coding genes. b) Gene sequence deletion coordinates for the circadian rhythm gene per3 and cholesterol regulatory gene ephx2 by population. c) The allelic state of each deletion distributed by individual sample per population.

View this table:
  • View inline
  • View popup
Table 2.

Protein-coding genes altered by deletion events and their enrichment among canonical pathways and disease. Red text denotes significant tests for disease enrichment (see methods).

Coding mutations affecting hypocretin signaling

We next used the surface fish genome to compare amino acid composition of key genes hypothesized to be involved in cave-specific adaptations. The wake-promoting hypothalamic neuropeptide Hypocretin is a critical regulator of sleep in animals ranging from zebrafish to mammals51–53. We previously found that expression of hypocretin was elevated in Pachón cavefish compared to surface fish and pharmacological inhibitors of Hypocretin signaling restore sleep to cavefish, suggesting enhanced Hypocretin signaling underlies the evolution of sleep loss in cavefish9. In teleost fish, Hypocretin signals through a single receptor, the Hypocretin Receptor 2 (Hcrtr2). We compared the sequence of hcrtr2 in surface fish and Pachón cavefish and identified two missense mutations that result in protein coding changes, S293V and E295K (Fig. 6a). To examine whether these were specific to Pachón cavefish or shared in other cavefish populations we examined the hcrtr2 coding sequence in Tinaja and Molino cavefish. The mutation affecting amino acid 295 is shared between Pachón and Tinaja populations (Fig. 6a). Further, we identified a six base pair deletion that results in the loss of two amino acids (amino acid 140 and 141; SV) in Molino cavefish. The presence of these variants was validated by PCR and subsequent Sanger sequencing on DNA from individuals from laboratory populations of these fish. The E295K variant in Tinaja and Pachón as well as the two amino acid deletions in Molino affect evolutionarily conserved amino acids suggesting a potential impact on protein function (Fig. 6b). We performed an in-silico analysis to test whether the identified cavefish variants in hcrtr2 are predicted to affect the protein structure and stability of Hcrtr2. We used iStable54 to test for potential destabilization effects of the substitution mutations found in Tinaja (E295K) and Pachón (S293V and E295K). iStable predicted a destabilization effect of E295K on the protein structure with a confidence score of 0.842. However, we found a stabilization effect of the S293V mutation in Pachón using iStable with a confidence score of 0.772. We repeated this analysis using MUpro55 and obtained similar results. These findings raise the possibility that the evolved changes differentially affect the function of the same receptor. Structural changes in proteins upon amino acid deletion are difficult to predict with common tools such as iStable and MUpro. To analyze whether the deletion of S140 and V141 in the Molino hcrtr2 could potentially influence the structural integrity of Hcrtr2, we performed a different analysis. We used the SWISS-MODEL protein structure prediction tool56 to identify potential differences in the protein structure between surface and Molino Hcrtr2. We modeled the surface and Molino Hcrtr2 protein using crystal data from the human HCRTR2 (5wqc;57). We then used the VMD visualization software to overlay the surface fish and Molino predicted structure. This analysis indicates that the deletion of S140 and V141 disrupts the structural integrity of an alpha helical structure in the transmembrane region of Hcrtr2 that could potentially affect the stability of this receptor (Fig. 6c). We also tested the Tinaja and Pachón hcrtr2 sequences using a similar approach and found only minor differences between the respective cavefish Hcrtr2 structure when overlaid with the surface fish structure (Fig. 6c). To confirm these potential structural changes in cavefish Hcrtr2 advanced in-situ and in-vivo protein analysis need to be performed in future studies, however, the identification of coding mutations in three different populations of cavefish supports the notion that hypocretin signaling is under selection at the level of receptor. This is in line with population sequencing data32, which identified hcrtr2 in the top 5% of FST outliers between surface fish (Rascon) and Pachón cavefish.

Figure 6.
  • Download figure
  • Open in new tab
Figure 6. Hcrtr2 mutations in cavefish.

(a) position of mutations in hcrtr2 in the different cavefish populations (b) Sequence alignment showing that the mutations in Molino and Tinaja are affecting evolutionary conserved amino acids. (c) 3D model based on structure of human HCRTR2. Structure is displayed as comic structure displaying alpha-helices and beta sheets using VMD 1.9.3. A model of Hcrtr2 from surface and cavefish fish was rendered using swiss model (https://swissmodel.expasy.org/) and X-ray crystal data from RCSB Protein Data Bank: 5wqc. Purple arrows indicate site of mutation and yellow arrows indicate sites where differences in the secondary structure between surface and respective cavefish. In the Molino projection the deletion (S140 and V141) in the Molino sequence causes a break in one of the alpha helix structures of the transmembrane domain (highlighted in purple box). These images were made with VMD/NAMD/BioCoRE/JMV/other software support. VMD/NAMD/BioCoRE/JMV/ is developed with NIH support by the Theoretical and Computational Biophysics group at the Beckman Institute, University of Illinois at Urbana-Champaign.

Discussion

The genomic and phenotypic surface-to-cave transitions in A. mexicanus serve as an indispensable model for the study of natural polygenic trait adaptation. Here, we present a high-quality “chromonome” of the surface form of A. mexicanus. Our surface fish genome far surpasses the contiguity and completeness compared to the assembled cavefish version of A. mexicanus28 owing to the use of updated long-read and mapping technology. This allowed for substantial increases in the detected sequence variation associated with cave phenotypes. A substantial level of heterozygosity within surface fish populations, coupled with our use of two surface populations for de novo assembly, however, prevented more complete sequence connectivity as compared to similarly assembled fish genomes such as Xiphophorus maculatus58, a laboratory lineage of reared fish (Supplementary Table 1). Nevertheless, a 40-fold reduction in assembled contigs (an indirect measure of gaps), a 71-fold improvement in N50 contig length, high-level synteny with prior linkage map measures of A. mexicanus chromosome order, and the addition of 1,665 protein-coding genes all offer researchers a resource for more complete genetic experimentation. In this study, we explored cavefish genomes of diverse origins relative to this new genomic proxy for the ancestral (surface) state. This yielded numerous discoveries when determining evolutionary-derived sequence features that may be driving cave phenotypes.

We have now refined our knowledge of the genetic basis of troglomorphic traits by identifying candidate genes within new and previously identified QTL. Trait-associated sequence markers, when placed on more contiguous chromosomes of the surface fish genome, delineate sequence structure, especially gene regulatory elements. As proof of this added discovery potential, we identified new candidate genes associated with eye loss relying on a single-QTL model. Previous studies using multi-QTL models have identified eight QTL associated with eye size in Pachón showing that eye degeneration likely involves a number of genes19. Part of the single eye size QTL we mapped in one of our Pachón/surface populations had been previously identified, but our new analysis shifted the LOD peak to just outside the previously identified region, revealing a new candidate gene for eye loss, opo. Genetic mapping in a second surface/Pachón hybrid population revealed a new QTL for eye size. A gene within this region, dusp26 has lower expression in Pachón cavefish during critical times of eye development. New studies aimed at identifying regulatory changes linked to these candidates will further clarify the role of these genes in the evolution of eye size reduction.

Structural variation such as genomic deletions represent a significant source of standing variation for trait adaptation59 which we were unable to accurately measure previously due to numerous assembly gaps in the Astyanax mexicanus 1.0.2 genome assembly. A species’ use of structural variation, including deletions, insertions, inversions and other more complex events can enable trait evolution, but their occurrence and importance in phenotypic adaptation among wild populations is understudied. A small number of studies in teleost fish show the standing pool of structural variations (SVs) outnumbers single nucleotide variations (SNVs) and are likely in use to alter the genotype to phenotype continuum of natural selection60, 61. For example, rampant copy number variation occurs during population differentiation in stickleback59. We conducted the first study of cavefish SVs, albeit only identifying deletions. We identified SVs possibly contributing to observed troglomorphic phenotypes, discovered previously unidentified population-unique deletions, and broadly classified their putative impact through gene function inference to zebrafish. Of the deletion altered protein-coding genes where biological inference is possible, we find low and high frequency deletions in per3 and ephx2, respectively (Fig. 5b, c). Some of these gene disruptions fit expected cave phenotypes while others are of unknown significance (e.g. sms a gene involved in beta-alanine metabolism). per3 plays a role in vertebrate circadian regulation and corticogenesis62, and sleep and circadian patterns are significantly disrupted in cavefish9. In all sampled Molino fish, per3 harbors a heterozygous deletion (Fig. 5b, c).

In addition, we identify multiple evolved coding changes within hcrtr2, a receptor that regulates sleep across vertebrate species63. This study uncovered unique changes in cave populations at the hcrtr2 locus. Future genome-wide analysis of G protein-coupled receptor (GPCR) differences between surface fish and cavefish populations, combined with functional studies using allele swapping between cavefish and surface fish, will further elucidate the relationship between GPCR function and trait evolution.

Our discovery of novel cave specific features of several genes and many others not explored here highlight novel insights that were obscured when using Astyanax mexicanus 1.0.2 genome assembly as a reference. The first cave population-wide catalog of thousands of deletions that could potentially impact the cave phenotypes collectively awaits investigation and validation. Future assembly improvements to both the cave and surface forms of the A. mexicanus genome are expected and will be key in understanding population-genetic processes that enabled cave-colonization. The shared gene synteny of A. mexicanus with zebrafish, gene editing success demonstrated in this study, and their interbreeding capabilities promise to reveal not only genic contributions, but likely regulatory regions associated with the molecular origins of troglomorphic trait differences.

Methods section

Genome sequencing and assembly curation

DNA sequencing

High molecular weight DNA was isolated from a single female surface fish using the MagAttract kit (Qiagen) according to the manufacturer’s protocol. The sample used for the assembly was a female fish (Asty152) from a lab reared cross of wild-caught Astyanax mexicanus surface populations from the Río Sabinas and the Río Valles surface localities (Supplementary Fig. 1a). Single molecule real-time (SMRT) sequencing was completed on a PacBio RSII instrument, yielding an average read length of ∼12 kb. SMRT sequence coverage of >50-fold was generated based on an estimated genome size of 1.3 Gb. All SMRT sequences are available under NCBI BioProject number PRJNA533584.

Assembly and error correction

For de novo assembly of all SMRT sequences, we used a fuzzy de Bruijn graph algorithm, wtdbg29 for contig graph construction, followed by collective raw read alignment for assembly base error correction. All error-prone reads were first used to generate an assembly graph with genomic k-mers unique to each read that results in primary contigs. Assembled primary contigs were corrected for random base error, predominately indels, using all raw reads mapped with minimap264. To further reduce consensus assembly base error, we corrected homozygous insertion, deletion and single base differences using default parameter settings in Pilon65 with ∼60x coverage of an Illumina PCR-free library (150bp read length) derived from Asty152 DNA.

Assembly scaffolding

To scaffold de novo assembled contigs, we generated a BioNano Irys restriction map of another surface fish (Asty168) that allowed sequence contigs to be ordered and oriented, and potential mis-assemblies to be identified. Asty168 is the offspring of two surface fish, one from the Río Valles (Asty02) and the other from the Río Sabinas locality (Asty04), both Asty02 and Asty04 were the offspring of wild-caught fish (Supplementary Fig. 1b). We prepared HMW-DNA in agar plugs using a previously established protocol for soft tissues66. Briefly, we followed a series of enzymatic reactions that (1) lysed cells, (2) degraded protein and RNA, and (3) added fluorescent labels to nicked sites using the IrysPrep Reagent Kit. The nicked DNA fragments were labeled with Alexa Fluor 546 dye, and the DNA molecules were counter-stained with YOYO-1 dye. The labeled DNA fragments were electrophoretically elongated and sized on a single IrysChip, and subsequent imaging and data processing determined the size of each DNA fragment. Finally, a BioNano proprietary algorithm performed a de novo assembly of all labeled fragments >150 kbp into a whole-genome optical map with defined overlap patterns. The individual map was clustered and scored for pairwise similarity, and Euclidian distance matrices were built. Manual refinements were then performed as previously described66.

Chromosome builds

Upon chimeric contig correction and completion of scaffold assembly steps using the BioNano map, we used Chromonomer67 to align all possible scaffolds to the A. mexicanus high-density linkage map30, then assigned chromosome coordinates. Using default parameter settings, Chromonomer attempts to find the best set of non-conflicting markers that maximizes the number of scaffolds in the map, while minimizing ordering discrepancies. The output is a FASTA file format describing the location of scaffolds by chromosome: a “chromonome”.

Defining syntenic regions between cave and surface genomes

A total of 2,235 genotyping-by-sequencing (GBS) markers30 were mapped to both the cave (Astyanax mexicanus 1.0.2) and surface fish (Astyanax mexicanus 2.0) assemblies (Supplementary Fig. 2). These GBS markers were mapped to Astyanax mexicanus 1.0.2 using the Ensembl ‘BLAST/BLAT search’ web tool and resulting information from each individual query was transcribed into an Excel worksheet. To map to the surface fish genome, the NCBI ‘Magic-BLAST’ (version1.3.0) command line mapping tool68 was used. The resulting output of the mapping was a single, tabular formatted spreadsheet, which was used to visualize syntenic regions from the constructed linkage map30. We used Circos software to visualize the positions of all markers that mapped to the Astyanax mexicanus 2.0 genome69. Any chromonome errors detected through these synteny alignments were investigated and manually corrected if orthologous data were available, such as Astyanax mexicanus 1.0.2 scaffold alignment.

Gene Annotation

The Astyanax mexicanus 2.0 assembly was annotated using the previously described NCBI70 and Ensembl35 pipelines, including masking of repeats prior to ab initio gene predictions, and RNAseq evidence-supported gene model building. NCBI and Ensembl gene annotation relied on an extensive variety of publicly RNA-seq data from both cave and surface fish tissues to improve gene model accuracy. The Astyanax mexicanus 2.0 RefSeq or Ensembl release 98 gene annotation reports each provide a full accounting of all methodology deployed and their output metrics within each respective browser.

Assaying genome quality using population genomic samples

To understand the impact reference sequence bias and quality may have on downstream population genomic analyses for the cavefish and surface fish genomes, we utilized the population genomic re-sequenced individuals processed in Herman et al.32. In brief, 100bp sequences were aligned to the reference genomes Astyanax mexicanus 1.0.2 and Astyanax mexicanus 2.0. The NCBI annotation pipeline of both assemblies included WindowMasker and RepeatMasker steps to delineate and exclude repetitive regions from gene model annotation. The positional coordinates for repeats identified by RepeatMasker are provided in the BED format at NCBI for each genome. WindowMasker’s ‘nmer’ files (counts) were used to re-generate repetitive region BED coordinates31. The BED coordinates for both maskers were intersected with BEDTools v2.27.171. We used SAMTools 1.9 with these coordinates and alignment quality scores to filter alignments and generate summary statistics for each sample aligned to the cave and surface reference genomes 72. Summary plots were generated in R 3.6.

Genetic mapping in surface x Pachón crosses

To map albinism and eye size to the new surface genome and test robustness in this methodology across laboratories, two independent F2 mapping populations consisting of surface/Pachón hybrids were analyzed. First, we scored 188 surface/Pachón F2 hybrids for albinism and normalized eye perimeter that were used in a previously published genetic mapping study10. Phenotypes were assessed using macroscopic images of entire fish and measurements were obtained using ImageJ73. Normalized eye perimeter was determined by dividing eye perimeter by body length. Albinism was scored as absence of body and eye pigment.

The 25 linkage groups (LGs) constructed de novo in earlier studies of this population10 were scanned using the R (v.3.5.3) package R/qtl (v.1.44-9)74 using the scanone function for markers linked with albinism (binary model) or left eye perimeter relative to body length (normal model). The genome-wide LOD significance threshold was set at the 95th percentile of 1,000 permutations. All marker sequences were aligned to both the Astyanax mexicanus 1.0.2 and Astyanax mexicanus 2.0 references using Bowtie (v.2.2.6 in sensitive mode)75. Circos plots were generated using v.0.69-669.

The second F2 mapping population (n=219) consisted of three clutches produced from breeding paired F1 surface/Pachón hybrid siblings76. Albinism and eye size were assessed using macroscopic images of entire fish. Eye diameter and fish length were measured in ImageJ73 according to Hinaux et al.77. Fish lacking an eye (21/195 on the left side, 22/172 on the right side) were not included in the analysis of eye size in order to analyze eye size using a normal distribution model. We found that fish length was positively correlated with eye diameter. To eliminate the effect of fish length on potential QTL, we analyzed eye diameter normalized to standard length. We used R/qtl74 to scan the linkage groups (scanone function) for markers linked with albinism (binary model) or eye diameter relative to body length (normal model) and assessed statistical significance of the LOD scores by calculating the 95th percentile of genome-wide maximum penalized LOD score using 1000 random permutations. We estimated confidence intervals for the QTL using 1.5-LOD support interval (Iodint function).

Complementation analysis in albino surface-Pachón F2 fish

An albino surface-Pachón F2 fish was crossed to oca2Δ4bp/+ surface fish27. Five day-post-fertilization larval offspring from this cross were scored for pigmentation (presence or absence of melanin pigmentation) by routine observation.

Larvae were imaged under a dissecting microscope and the number of pigmented and albino progeny were assessed. Following this, DNA was extracted from 8 pigmented and 8 albino progeny, and these fish were then genotyped for the engineered 4 bp deletion by PCR followed by gel electrophoresis, using locus specific primers (forward primer: 5’-CCCAAAGCAGAGTGTTTGGTA-3’, reverse primer: 5’-TTTCCAAAGATCACATATCTTGACA-3’) and methods79, 80. Briefly, larval fish were euthanized in MS-222, fish were imaged, then DNA was extracted from whole fish, PCR was performed followed by gel electrophoresis. All genotyped albino embryos had two bands (indicating the presence of the engineered deletion), whereas all genotyped pigmented embryos had a single band, indicating inheritance of the wild-type allele from the surface fish parent.

Genetic mapping of previously mapped QTL studies to the surface fish genome

To identify any candidate genes potentially associated with cave phenotypes that were not evident in the more fragmented Astyanax mexicanus 1.0.2 genome, we generated a QTL database for the Astyanax mexicanus 2.0 genome to identify genomic regions containing groups of markers associated with cave-derived phenotypes. We followed the methods Herman et al.32 used to create a similar database for the cavefish genome. BLAST was used to identify locations in the surface fish genome for 1,156 markers from several previous QTL studies18, 19, 21–23, 81. The top BLAST hit for each marker was identified by ranking e-value, followed by bitscore, and then alignment length. Previously, 687 of these markers were mapped to the cavefish genome. BEDTools intersect and the surface fish genome annotation were used to identify all genes within the regions of interest. Genomic intervals associated with activity QTL previously identified using the existing high-density linkage map were similarly re-examined using the established locations of the 2,235 GBS markers within the Astyanax mexicanus 2.0 genome25. We then investigated whether genes identified near QTL had ontologies associated with cave phenotypes using the NCBI (https://www.ncbi.nlm.nih.gov/) or Ensembl genome browser (https://www.ensembl.org).

Generation of rx3 CRISPant fish

CRISPant surface fish were generated using CRISPR/Cas9. A gRNA targeting exon 2 was designed and generated as described previously82. Briefly, oligo A containing the gRNA sequence (5’-GTGTAGCTGAAACGTGGTGA-3’) between the sequence for the T7 promoter and a sequence overlapping with the second oligo, oligo B, was synthesized (IDT) (Supplementary Table 7). Following annealing with Oligo B and amplification, the T7 Megascript Kit (Ambion) was used to transcribe the gRNA with several modifications, as in27, 80. The gRNA was cleaned up using the miRNeasy mini kit (Qiagen) following manufacturer’s instructions and eluted into RNase-free water. Nis-Cas9-nis83 mRNA was transcribed using the mMessage mMACHINE T3 kit (Life Technologies) following manufacturers directions. Single cell embryos were injected with 25 pg of gRNA and 150 pg Cas9 mRNA (as in79, 80). Injected fish were screened to assess for mutagenesis by PCR using primers surrounding the gRNA target site (F primer: 5’-AGCCCGGACCGTAAGAAG-3’, R primer: 5’-GCTGTAAACGTCGGGGTAGT-3’). Genotyping was performed with DNA extracted from whole larval fish from the clutch, as described previously84. Gel electrophoresis was used to discriminate between alleles with indels (more than one band results in a smeary band) and wild-type alleles (single band: as in79). Injected fish (CRISPants) along with uninjected wild-type siblings were raised to adulthood. Eye size was assessed in both rx3 gRNA/Cas9 mRNA injected fish (CRISPants) and wild-type sibling adult fish (n=5 each). Fish were anesthetized in MS-222 and imaged under a dissecting microscope.

Structural variation detection

We first aligned Illumina reads from A. mexicanus population resequencing data described by Herman et. al.32 to the Astyanax mexicanus 2.0 reference using bwa mem v0.7.1785 with default options. We then converted the output to bam format, fixed mate pair information, sorted and removed duplicates using the SAMTools v1.9 view -bh, fixmate -m, sort, and markdup -r modules, respectively72. We ran two software packages for calling structural variants (SV) using these alignments: manta47 and lumpy48. We ran manta v1.6.0 with default options as suggested in the package’s manual, and lumpy using the Smoove v0.2.3 pipeline (https://github.com/brentp/smoove) using the commands given in the “Population calling” section. Nextflow86 workflows for our running of both SV callers can be found at https://github.com/esrice/workflows. Due to the low sequence coverage (average ∼9x) available per sample, we confined analysis to deletions called by both SV callers and within the size range 500bp to 100kb. We consider a deletion to be present in both sets of calls if there was a reciprocal overlap of 50% of the length of the deletion. We used the scripts merge_deletions.py to find the intersection of the two sets of deletions, annotate_vcf.py to group the deletions by their effect (i.e. deletion of coding, intronic, regulatory, or intergenic sequence), and count_variants_per_sample.py to count the numbers of variants called in each sample; all scripts can be found at https://github.com/esrice/misc-tools. All protein-coding genes with detected deletions among cavefish populations as defined in this study were used as input to test for significant enrichment among specific databases within WebGestalt 49. Entrez Gene IDs were input as gene symbols, with organism of interest set to zebrafish using protein-coding genes as the reference set. Pathway common databases of KEGG and Panther canonical gene signaling pathways as well as the various genes associated with diseases curated in OMIM and Digenet87 were reported using a hypergeometric test, and the significance level was set at 0.05. We implemented the Benjamini and Hochberg multiple test adjustment88 to control for false discovery.

Author contributions

WCW, NR conceived of the study. AA, TB, RB, BMC, FD, EF, JG, LH, ZH, ACK, AK, JK, CT, MK, MEL, TL, SEM, JTM, MM, RLM, RP, ER, MR, IS, BAS, CT, ST, YY performed and analyzed the experiments. WCW, NR, RB, BCM, JG, ACK, JK, SEM, MM, MR, CT, YY wrote the paper.

Competing interest statement

The authors declare no competing interests

Data availability statement

Original data underlying this manuscript can be accessed from the Stowers Original Data Repository at http://www.stowers.org/research/publications/libpb-1528 or made available on request

Supplementary Figures

Figure S1.
  • Download figure
  • Open in new tab
Figure S1. Breeding scheme to cross two surface fish populations from different geographic locations in Mexico.

(a) a female Río Sabinas (Asty04) was crossed to a male Río Valles (Asty02) and the resulting female (Asty 137) was crossed to the male Río Valles (Asty02). The resulting female fish (Asty152) was sequenced and assembled. (b) A female Río Valles (Asty02) was crossed to a male Río Sabinas (Asty04) and the resulting female was used for the BioNano Irys “restriction” map. c) locations of the rivers (blue) and caves (orange) discussed in the study. Maps from Google Maps.

Figure S2.
  • Download figure
  • Open in new tab
Figure S2. Synteny between a recombination-based linkage map and the Astyanax mexicanus 2.0 reference.

By mapping the relative positions of GBS markers from a dense map constructed from a Pachón x surface fish F2 pedigree, we observed significant synteny based on 96.6% of our markers (a). Although a substantial portion of the genome remains unplaced (grey), we noted sparse distribution of GBS markers beyond the first 6 Mb of the unplaced scaffold (represented in grey, (b). The majority of these alignments were present in the first 1 Mb of the unplaced scaffold (grey). This syntenic representation indicates substantial structural similarity between the cavefish and surface genomes and demonstrates the utility of the draft surface fish genome for identifying QTL intervals and nominating candidate genes associated therein.

Figure S3.
  • Download figure
  • Open in new tab
Figure S3. Percentage of properly paired reads for each sample aligned to the surface (y-axis) and cave (x-axis) reference genomes.

Samples are colored by population identity. Line is best fit of all samples. Properly paired reads include reads that align in the correct orientation to the same scaffold/chromosome in the assembly. The greater proportion of properly pairing reads in the alignment to the surface genome likely reflects the greater level of contiguity of the surface assembly, e.g, N50 contig length.

Figure S4.
  • Download figure
  • Open in new tab
Figure S4. Non-primary read alignment counts for samples aligned to the surface (y-axis) and cave (x-axis) reference genomes.

The reference alignment for each sample is indicated by shape (cave reference: square, surface reference: triangle). Colored lines are fit to samples aligned to each respective genome (cave: blue, surface: yellow). Each sample was aligned to both reference genomes and are united by color indicating population membership. Inset boxplots are quartiles, median, minimum/maximum whiskers and results of a paired t-test of the number of non-primary alignments for each sample aligned to each genome. Non-primary alignments include reads that have multiple mapping positions and reads with chimeric alignments. Alignment to the cave results in significantly more non-primary alignments with greater variation among the aligned samples.

Figure S5.
  • Download figure
  • Open in new tab
Figure S5. Surface fish genome allows for long and continuous intervals in the QTL analysis.

Circos plot for albino QTL (a) and eye size QTL (b) showing the markers with significant LOD scores on both the cavefish (blue contigs) and surface fish genome (red/black contigs).

Figure S6.
  • Download figure
  • Open in new tab
Figure S6. Alignment of oca2 exon 24 sequencing data from wild samples utilizing the Astyanax mexicanus 2.0 as a reference.

Portion of oca2 locus on chromosome 13 of the annotated surface fish genome (top) aligned to sequencing data from wild-caugh surface fish (Río Choy, Rascón) and cavefish (Molino, Pachón, Tinaja). Exon 24 is highlighted and gaps in horizontal lines indicated sequence deletion. Alignments made using CLC sequence viewer.

Figure S7.
  • Download figure
  • Open in new tab
Figure S7. Alignment of oca2 exon 21 sequencing data from wild samples utilizing the Astyanax mexicanus 2.0 as a reference.

Portion of oca2 locus on chromosome 13 of the annotated surface fish genome (top) aligned to sequencing data from wild-caugh surface fish (Río Choy, Rascón) and cavefish (Molino, Pachón, Tinaja). Exon 24 is highlighted and gaps in horizontal lines indicated sequence deletion. Alignments made using CLC sequence viewer.

Figure S8.
  • Download figure
  • Open in new tab
Figure S8. Pachón cavefish have reduced expression of dusp26.

Normalized expression count of dusp26 at the indicated stage of development. Significance codes; ns p>0.05, *p<0.05, **p<0.005. Transcriptomics data from 46

Figure S9.
  • Download figure
  • Open in new tab
Figure S9. Comparative counts of deletions called per sample and grouped by the samples’ populations according to the SV caller.

These deletions are parsed by homozygous or heterozygous state, or total number of haplotypes affected (rows), and by deletions detected made by lumpy, manta, or the intersection of the two sets (columns).

Figure S10.
  • Download figure
  • Open in new tab
Figure S10. Comparison of number of deletions called per sample for lumpy to manta SV detection algorithms.

A high correlation between the results of the two callers is observed.

Figure S11.
  • Download figure
  • Open in new tab
Figure S11. Counts of deletions called per sample, grouped by the samples’ population source.

(a) Homozygous alternate, (b) heterozygous, and (c) total alternate deletions compared to the Astyanax mexicanus 2.0 reference. Values on the y axis represent total deletion counts. Across all allele types deletions affecting coding, intronic, or regulatory sequence are labelled accordingly. Putative regulatory sequence was arbitrarily defined as 1 kb upstream of the start and stop codon.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table S1.

Representative assembly metrics for sequenced teleost genomes1.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table S2.

Representative gene annotation measures for assembled teleost genomes1.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table S3.

A summary of gene representation using highly conserved gene orthologs from multiple species.

View this table:
  • View inline
  • View popup
Table S4.

Candidate genes in surface fish genome intervals underlying previously reported activity QTL in Astyanax mexicanus1, 2. Candidate genes previously identified based on the Pachón cave genome (Astyanax mexicanus 1.0.2) are indicated with asterisks.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table S5.

List of genes in the 1.5-LOD support interval for the eye size QTL in surface/Pachón F2 hybrid group B (chromosome 20:1168905-1802909).

View this table:
  • View inline
  • View popup
Table S6.

Astyanax mexicanus protein-coding genes with detected deletions across all cave populations (see Methods). Entrez gene symbols derived from NCBI gene annotation are provided.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table S7.

CRISPR Oligo design.

Acknowledgments

We thank Karin Zueckert-Gaudenz and Mihaela Sardiu for technical assistance and the Stowers aquatics group for fish care and help with shipments of fish between NYU and Stowers. AK, SEM, NR are supported by NIH 1R01GM127872-01. This work was also supported by NIH R24OD011198 to WCW, LH, ESR, TL, and MK. NSF EDGE Award 1923372 to NR, JEK, SEM, NSF DEB-1754231 to JEK and AK, and NSF IOS-1933428 to JEK, SEM, NR. NR is further supported by institutional funding, funding from the Edward Mallinckrodt Foundation and the JDRF. JBG is supported by NIDCR R01-DE025033 and NSF DEB-1457630. Some computation for this work was performed on the high-performance computing infrastructure provided by Research Computing Support Services and in part by the National Science Foundation under grant number CNS-1429294 at the University of Missouri, Columbia MO. The Minnesota Supercomputing Institute (MSI) at the University of Minnesota provided resources that contributed to the research results reported within this paper.

References:

  1. 1.↵
    Culver, D. & Pipan, T. The Biology of Caves and Other Subterranean Habitats. Second edn, (Oxford University Press, 2019).
  2. 2.
    Protas, M. & Jeffery, W. R. Evolution and development in cave animals: from fish to crustaceans. Wiley Interdiscip Rev Dev Biol 1, 823–845, doi:10.1002/wdev.61 (2012).
    OpenUrlCrossRefPubMed
  3. 3.↵
    Emerling, C. A. & Springer, M. S. Eyes underground: regression of visual protein networks in subterranean mammals. Mol Phylogenet Evol 78, 260–270, doi:10.1016/j.ympev.2014.05.016 (2014).
    OpenUrlCrossRefPubMed
  4. 4.↵
    Alex Keene, M. Y., Suzanne McGaugh. Biology and Evolution of the Mexican Cavefish. (Academic Press, 2015).
  5. 5.↵
    Gross, J. B. The complex origin of Astyanax cavefish. BMC Evol Biol 12, 105, doi:10.1186/1471-2148-12-105 (2012).
    OpenUrlCrossRefPubMed
  6. 6.↵
    Maldonado, E., Rangel-Huerta, E., Rodriguez-Salazar, E., Pereida-Jaramillo, E. & Martinez-Torres, A. Subterranean life: Behavior, metabolic, and some other adaptations of Astyanax cavefish. J Exp Zool B Mol Dev Evol, doi:10.1002/jez.b.22948 (2020).
    OpenUrlCrossRef
  7. 7.↵
    Krishnan, J. & Rohner, N. Cavefish and the basis for eye loss. Philos Trans R Soc Lond B Biol Sci 372, doi:10.1098/rstb.2015.0487 (2017).
    OpenUrlCrossRefPubMed
  8. 8.
    Aspiras, A. C., Rohner, N., Martineau, B., Borowsky, R. L. & Tabin, C. J. Melanocortin 4 receptor mutations contribute to the adaptation of cavefish to nutrient-poor conditions. Proc Natl Acad Sci U S A 112, 9668–9673, doi:10.1073/pnas.1510802112 (2015).
    OpenUrlAbstract/FREE Full Text
  9. 9.↵
    Jaggard, J. B. et al. Hypocretin underlies the evolution of sleep loss in the Mexican cavefish. Elife 7, doi:10.7554/eLife.32637 (2018).
    OpenUrlCrossRef
  10. 10.↵
    Stockdale, W. T. et al. Heart Regeneration in the Mexican Cavefish. Cell Rep 25, 1997–2007 e1997, doi:10.1016/j.celrep.2018.10.072 (2018).
    OpenUrlCrossRef
  11. 11.
    Riddle, M. R. et al. Insulin resistance in cavefish as an adaptation to a nutrient-limited environment. Nature 555, 647–651, doi:10.1038/nature26136 (2018).
    OpenUrlCrossRef
  12. 12.
    Yoshizawa, M. et al. The evolution of a series of behavioral traits is associated with autism-risk genes in cavefish. BMC Evol Biol 18, 89, doi:10.1186/s12862-018-1199-9 (2018).
    OpenUrlCrossRef
  13. 13.
    Elipot, Y., Hinaux, H., Callebert, J. & Retaux, S. Evolutionary shift from fighting to foraging in blind cavefish through changes in the serotonin network. Curr Biol 23, 1–10, doi:10.1016/j.cub.2012.10.044 (2013).
    OpenUrlCrossRefPubMed
  14. 14.
    Gross, J. B. & Powers, A. K. A Natural Animal Model System of Craniofacial Anomalies: The Blind Mexican Cavefish. Anat Rec (Hoboken) 303, 24–29, doi:10.1002/ar.23998 (2020).
    OpenUrlCrossRef
  15. 15.↵
    Carlson, B. M. & Gross, J. B. Characterization and comparison of activity profiles exhibited by the cave and surface morphotypes of the blind Mexican tetra, Astyanax mexicanus. Comp Biochem Physiol C Toxicol Pharmacol 208, 114–129, doi:10.1016/j.cbpc.2017.08.002 (2018).
    OpenUrlCrossRef
  16. 16.
    Xiong, S., Krishnan, J., Peuss, R. & Rohner, N. Early adipogenesis contributes to excess fat accumulation in cave populations of Astyanax mexicanus. Dev Biol 441, 297–304, doi:10.1016/j.ydbio.2018.06.003 (2018).
    OpenUrlCrossRef
  17. 17.↵
    Rohner, N. Cavefish as an evolutionary mutant model system for human disease. Dev Biol 441, 355–357, doi:10.1016/j.ydbio.2018.04.013 (2018).
    OpenUrlCrossRef
  18. 18.↵
    Protas, M. et al. Multi-trait evolution in a cave fish, Astyanax mexicanus. Evol Dev 10, 196–209, doi:10.1111/j.1525-142X.2008.00227.x (2008).
    OpenUrlCrossRefPubMedWeb of Science
  19. 19.↵
    Protas, M., Conrad, M., Gross, J. B., Tabin, C. & Borowsky, R. Regressive evolution in the Mexican cave tetra, Astyanax mexicanus. Curr Biol 17, 452–454, doi:10.1016/j.cub.2007.01.051 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  20. 20.↵
    Protas, M. E. et al. Genetic analysis of cavefish reveals molecular convergence in the evolution of albinism. Nat Genet 38, 107–111, doi:10.1038/ng1700 (2006).
    OpenUrlCrossRefPubMedWeb of Science
  21. 21.↵
    Kowalko, J. E. et al. Convergence in feeding posture occurs through different genetic loci in independently evolved cave populations of Astyanax mexicanus. Proc Natl Acad Sci U S A 110, 16933–16938, doi:10.1073/pnas.1317192110 (2013).
    OpenUrlAbstract/FREE Full Text
  22. 22.
    Kowalko, J. E. et al. Loss of schooling behavior in cavefish through sight-dependent and sight-independent mechanisms. Curr Biol 23, 1874–1883, doi:10.1016/j.cub.2013.07.056 (2013).
    OpenUrlCrossRef
  23. 23.↵
    Yoshizawa, M., Yamamoto, Y., O’Quin, K. E. & Jeffery, W. R. Evolution of an adaptive behavior and its sensory receptors promotes eye regression in blind cavefish. BMC Biol 10, 108, doi:10.1186/1741-7007-10-108 (2012).
    OpenUrlCrossRefPubMed
  24. 24.↵
    O’Quin, K. E., Yoshizawa, M., Doshi, P. & Jeffery, W. R. Quantitative genetic analysis of retinal degeneration in the blind cavefish Astyanax mexicanus. PLoS One 8, e57281, doi:10.1371/journal.pone.0057281 (2013).
    OpenUrlCrossRefPubMed
  25. 25.↵
    Carlson, B. M., Klingler, I. B., Meyer, B. J. & Gross, J. B. Genetic analysis reveals candidate genes for activity QTL in the blind Mexican tetra, Astyanax mexicanus. PeerJ 6, e5189, doi:10.7717/peerj.5189 (2018).
    OpenUrlCrossRef
  26. 26.↵
    Stahl, B. A. et al. Stable transgenesis in Astyanax mexicanus using the Tol2 transposase system. Dev Dyn, doi:10.1002/dvdy.32 (2019).
    OpenUrlCrossRef
  27. 27.↵
    Klaassen, H., Wang, Y., Adamski, K., Rohner, N. & Kowalko, J. E. CRISPR mutagenesis confirms the role of oca2 in melanin pigmentation in Astyanax mexicanus. Dev Biol 441, 313–318, doi:10.1016/j.ydbio.2018.03.014 (2018).
    OpenUrlCrossRef
  28. 28.↵
    McGaugh, S. E. et al. The cavefish genome reveals candidate genes for eye loss. Nat Commun 5, 5307, doi:10.1038/ncomms6307 (2014).
    OpenUrlCrossRefPubMed
  29. 29.↵
    Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat Methods 17, 155–158, doi:10.1038/s41592-019-0669-3 (2020).
    OpenUrlCrossRef
  30. 30.↵
    Carlson, B. M., Onusko, S. W. & Gross, J. B. A high-density linkage map for Astyanax mexicanus using genotyping-by-sequencing technology. G3 (Bethesda) 5, 241–251, doi:10.1534/g3.114.015438 (2014).
    OpenUrlAbstract/FREE Full Text
  31. 31.↵
    Morgulis, A., Gertz, E. M., Schaffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141, doi:10.1093/bioinformatics/bti774 (2006).
    OpenUrlCrossRefPubMedWeb of Science
  32. 32.↵
    Herman, A. et al. The role of gene flow in rapid and repeated evolution of cave-related traits in Mexican tetra, Astyanax mexicanus. Mol Ecol 27, 4397–4416, doi:10.1111/mec.14877 (2018).
    OpenUrlCrossRef
  33. 33.↵
    Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol, doi:10.1038/nbt.4277 (2018).
    OpenUrlCrossRef
  34. 34.↵
    Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res, doi:10.1093/nar/gkz899 (2019).
    OpenUrlCrossRef
  35. 35.↵
    Yates, A. D., et al. Ensembl 2020. Nucleic Acids Res, doi:10.1093/nar/gkz966 (2019).
    OpenUrlCrossRef
  36. 36.↵
    1. A. Keene,
    2. M. Yoshizawa,
    3. S.E. McGaugh
    O’Quin K, M. S. The genetic bases of troglomorphy in Astyanax: How far we have come and where do we go from here? In: A. Keene, M. Yoshizawa, S.E. McGaugh, ed. Biology and Evolution of the Mexican Cavefish: Elsevier (2015).
  37. 37.↵
    Casane, D. & Retaux, S. Evolutionary Genetics of the Cavefish Astyanax mexicanus. Adv Genet 95, 117–159, doi:10.1016/bs.adgen.2016.03.001 (2016).
    OpenUrlCrossRef
  38. 38.↵
    Cavodeassi, F., Ivanovitch, K. & Wilson, S. W. Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193–4202, doi:10.1242/dev.097048 (2013).
    OpenUrlAbstract/FREE Full Text
  39. 39.↵
    Stigloher, C. et al. Segregation of telencephalic and eye-field identities inside the zebrafish forebrain territory is controlled by Rx3. Development 133, 2925–2935, doi:10.1242/dev.02450 (2006).
    OpenUrlAbstract/FREE Full Text
  40. 40.↵
    Mathers, P. H., Grinberg, A., Mahon, K. A. & Jamrich, M. The Rx homeobox gene is essential for vertebrate eye development. Nature 387, 603–607, doi:10.1038/42475 (1997).
    OpenUrlCrossRefPubMedWeb of Science
  41. 41.
    Loosli, F. et al. Medaka eyeless is the key factor linking retinal determination and eye growth. Development 128, 4035–4044 (2001).
    OpenUrlAbstract/FREE Full Text
  42. 42.↵
    Loosli, F. et al. Loss of eyes in zebrafish caused by mutation of chokh/rx3. EMBO Rep 4, 894–899, doi:10.1038/sj.embor.embor919 (2003).
    OpenUrlAbstract/FREE Full Text
  43. 43.↵
    Martinez-Morales, J. R. et al. ojoplano-mediated basal constriction is essential for optic cup morphogenesis. Development 136, 2165–2175, doi:10.1242/dev.033563 (2009).
    OpenUrlAbstract/FREE Full Text
  44. 44.↵
    Borowsky, R. Restoring sight in blind cavefish. Curr Biol 18, R23–24, doi:10.1016/j.cub.2007.11.023 (2008).
    OpenUrlCrossRefPubMedWeb of Science
  45. 45.↵
    Yang, C. H. et al. NEAP/DUSP26 suppresses receptor tyrosine kinases and regulates neuronal development in zebrafish. Sci Rep 7, 5241, doi:10.1038/s41598-017-05584-7 (2017).
    OpenUrlCrossRef
  46. 46.↵
    Stahl, B. A. & Gross, J. B. A Comparative Transcriptomic Analysis of Development in Two Astyanax Cavefish Populations. J Exp Zool B Mol Dev Evol 328, 515–532, doi:10.1002/jez.b.22749 (2017).
    OpenUrlCrossRef
  47. 47.↵
    Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222, doi:10.1093/bioinformatics/btv710 (2016).
    OpenUrlCrossRefPubMed
  48. 48.↵
    Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol 15, R84, doi:10.1186/gb-2014-15-6-r84 (2014).
    OpenUrlCrossRefPubMed
  49. 49.↵
    Wang, J., Vasaikar, S., Shi, Z., Greer, M. & Zhang, B. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res 45, W130–W137, doi:10.1093/nar/gkx356 (2017).
    OpenUrlCrossRefPubMed
  50. 50.↵
    Pinero, J. et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res, doi:10.1093/nar/gkz1021 (2019).
    OpenUrlCrossRef
  51. 51.↵
    Prober, D. A., Rihel, J., Onah, A. A., Sung, R. J. & Schier, A. F. Hypocretin/orexin overexpression induces an insomnia-like phenotype in zebrafish. J Neurosci 26, 13400–13410, doi:10.1523/JNEUROSCI.4332-06.2006 (2006).
    OpenUrlAbstract/FREE Full Text
  52. 52.
    Yokogawa, T. et al. Characterization of sleep in zebrafish and insomnia in hypocretin receptor mutants. PLoS Biol 5, e277, doi:10.1371/journal.pbio.0050277 (2007).
    OpenUrlCrossRefPubMed
  53. 53.↵
    Lin, L. et al. The sleep disorder canine narcolepsy is caused by a mutation in the hypocretin (orexin) receptor 2 gene. Cell 98, 365–376, doi:10.1016/s0092-8674(00)81965-0 (1999).
    OpenUrlCrossRefPubMedWeb of Science
  54. 54.↵
    Chen, C. W., Lin, J. & Chu, Y. W. iStable: off-the-shelf predictor integration for predicting protein stability changes. BMC Bioinformatics 14 Suppl 2, S5, doi:10.1186/1471-2105-14-S2-S5 (2013).
    OpenUrlCrossRef
  55. 55.↵
    Cheng, J., Randall, A. & Baldi, P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins 62, 1125–1132, doi:10.1002/prot.20810 (2006).
    OpenUrlCrossRefPubMedWeb of Science
  56. 56.↵
    Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46, W296–W303, doi:10.1093/nar/gky427 (2018).
    OpenUrlCrossRefPubMed
  57. 57.↵
    Suno, R. et al. Crystal Structures of Human Orexin 2 Receptor Bound to the Subtype-Selective Antagonist EMPA. Structure 26, 7–19 e15, doi:10.1016/j.str.2017.11.005 (2018).
    OpenUrlCrossRefPubMed
  58. 58.↵
    Schartl, M. et al. The genome of the platyfish, Xiphophorus maculatus, provides insights into evolutionary adaptation and several complex traits. Nat Genet 45, 567–572, doi:10.1038/ng.2604 (2013).
    OpenUrlCrossRefPubMed
  59. 59.↵
    Chain, F. J. et al. Extensive copy-number variation of young genes across stickleback populations. PLoS Genet 10, e1004830, doi:10.1371/journal.pgen.1004830 (2014).
    OpenUrlCrossRefPubMed
  60. 60.↵
    Catanach, A. et al. The genomic pool of standing structural variation outnumbers single nucleotide polymorphism by threefold in the marine teleost Chrysophrys auratus. Mol Ecol 28, 1210–1223, doi:10.1111/mec.15051 (2019).
    OpenUrlCrossRef
  61. 61.↵
    Flagel, L. E., Willis, J. H. & Vision, T. J. The standing pool of genomic structural variation in a natural population of Mimulus guttatus. Genome Biol Evol 6, 53–64, doi:10.1093/gbe/evt199 (2014).
    OpenUrlCrossRefPubMed
  62. 62.↵
    Noda, M. et al. Role of Per3, a circadian clock gene, in embryonic development of mouse cerebral cortex. Sci Rep 9, 5874, doi:10.1038/s41598-019-42390-9 (2019).
    OpenUrlCrossRef
  63. 63.↵
    Li, S. B. & de Lecea, L. The hypocretin (orexin) system: from a neural circuitry perspective. Neuropharmacology 167, 107993, doi:10.1016/j.neuropharm.2020.107993 (2020).
    OpenUrlCrossRef
  64. 64.↵
    Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110, doi:10.1093/bioinformatics/btw152 (2016).
    OpenUrlCrossRefPubMed
  65. 65.↵
    Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963, doi:10.1371/journal.pone.0112963 (2014).
    OpenUrlCrossRefPubMed
  66. 66.↵
    Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30, 771–776, doi:10.1038/nbt.2303 (2012).
    OpenUrlCrossRefPubMed
  67. 67.↵
    Catchen, J., Amores, A. & Bassham, S. Chromonomer: a tool set for repairing and enhancing assembled genomes through integration of genetic maps and conserved synteny. bioRxiv 2020.02.04.934711 (2020).
  68. 68.↵
    Boratyn, G. M., Thierry-Mieg, J., Thierry-Mieg, D., Busby, B. & Madden, T. L. Magic-BLAST, an accurate RNA-seq aligner for long and short reads. BMC Bioinformatics 20, 405, doi:10.1186/s12859-019-2996-x (2019).
    OpenUrlCrossRef
  69. 69.↵
    Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res 19, 1639–1645, doi:10.1101/gr.092759.109 (2009).
    OpenUrlAbstract/FREE Full Text
  70. 70.↵
    Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42, D756–763, doi:10.1093/nar/gkt1114 (2014).
    OpenUrlCrossRefPubMedWeb of Science
  71. 71.↵
    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842, doi:10.1093/bioinformatics/btq033 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  72. 72.↵
    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079, doi:10.1093/bioinformatics/btp352 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  73. 73.↵
    Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat Methods 9, 676–682, doi:10.1038/nmeth.2019 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  74. 74.↵
    Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890, doi:10.1093/bioinformatics/btg112 (2003).
    OpenUrlCrossRefPubMedWeb of Science
  75. 75.↵
    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359, doi:10.1038/nmeth.1923 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  76. 76.↵
    Riddle, M. R. et al. Genetic architecture underlying changes in carotenoid accumulation during the evolution of the blind Mexican cavefish, Astyanax mexicanus. J Exp Zool B Mol Dev Evol, doi:10.1002/jez.b.22954 (2020).
    OpenUrlCrossRef
  77. 77.↵
    Hinaux, H. et al. A developmental staging table for Astyanax mexicanus surface fish and Pachon cavefish. Zebrafish 8, 155–165, doi:10.1089/zeb.2011.0713 (2011).
    OpenUrlCrossRefPubMed
  78. 78.
    Kavalco, K. F. & De Almeida-Toledo, L. F. Molecular cytogenetics of blind mexican tetra and comments on the karyotypic characteristics of genus Astyanax (Teleostei, Characidae). Zebrafish 4, 103–111, doi:10.1089/zeb.2007.0504 (2007).
    OpenUrlCrossRefPubMed
  79. 79.↵
    Kowalko, J. E., Ma, L. & Jeffery, W. R. Genome Editing in Astyanax mexicanus Using Transcription Activator-like Effector Nucleases (TALENs). J Vis Exp, doi:10.3791/54113 (2016).
    OpenUrlCrossRef
  80. 80.↵
    Stahl, B. A. et al. Manipulation of Gene Function in Mexican Cavefish. J Vis Exp, doi:10.3791/59093 (2019).
    OpenUrlCrossRef
  81. 81.↵
    Yoshizawa, M. et al. Distinct genetic architecture underlies the emergence of sleep loss and prey-seeking behavior in the Mexican cavefish. BMC Biol 13, 15, doi:10.1186/s12915-015-0119-3 (2015).
    OpenUrlCrossRef
  82. 82.↵
    Varshney, G. K. et al. High-throughput gene targeting and phenotyping in zebrafish using CRISPR/Cas9. Genome Res 25, 1030–1042, doi:10.1101/gr.186379.114 (2015).
    OpenUrlAbstract/FREE Full Text
  83. 83.↵
    Jao, L. E., Wente, S. R. & Chen, W. Efficient multiplex biallelic zebrafish genome editing using a CRISPR nuclease system. Proc Natl Acad Sci U S A 110, 13904–13909, doi:10.1073/pnas.1308335110 (2013).
    OpenUrlAbstract/FREE Full Text
  84. 84.↵
    Ma, L., Jeffery, W. R., Essner, J. J. & Kowalko, J. E. Genome editing using TALENs in blind Mexican Cavefish, Astyanax mexicanus. PLoS One 10, e0119370, doi:10.1371/journal.pone.0119370 (2015).
    OpenUrlCrossRef
  85. 85.↵
    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997 (2013).
  86. 86.↵
    Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319, doi:10.1038/nbt.3820 (2017).
    OpenUrlCrossRefPubMed
  87. 87.↵
    Pinero, J. et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 45, D833–D839, doi:10.1093/nar/gkw943 (2017).
    OpenUrlCrossRefPubMed
  88. 88.↵
    Benjamini, Y. Controlling the false discovery rate: A practical and powerful approach to multipe testing. J R Stat Soc Series B Stat Methodol 57, 289–300 (1995).
    OpenUrlCrossRef
Back to top
PreviousNext
Posted July 06, 2020.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A chromosome level genome of Astyanax mexicanus surface fish for comparing population-specific genetic differences contributing to trait evolution
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A chromosome level genome of Astyanax mexicanus surface fish for comparing population-specific genetic differences contributing to trait evolution
Wesley C. Warren, Tyler E. Boggs, Richard Borowsky, Brian M. Carlson, Estephany Ferrufino, Joshua B. Gross, LaDeana Hillier, Zhilian Hu, Alex C. Keene, Alexander Kenzior, Johanna E. Kowalko, Chad Tomlinson, Milinn Kremitzki, Madeleine E. Lemieux, Tina Graves-Lindsay, Suzanne E. McGaugh, Jeff T. Miller, Mathilda Mommersteeg, Rachel L. Moran, Robert Peuß, Edward Rice, Misty R. Riddle, Itzel Sifuentes-Romero, Bethany A. Stanhope, Clifford J. Tabin, Sunishka Thakur, Yamamoto Yoshiyuki, Nicolas Rohner
bioRxiv 2020.07.06.189654; doi: https://doi.org/10.1101/2020.07.06.189654
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A chromosome level genome of Astyanax mexicanus surface fish for comparing population-specific genetic differences contributing to trait evolution
Wesley C. Warren, Tyler E. Boggs, Richard Borowsky, Brian M. Carlson, Estephany Ferrufino, Joshua B. Gross, LaDeana Hillier, Zhilian Hu, Alex C. Keene, Alexander Kenzior, Johanna E. Kowalko, Chad Tomlinson, Milinn Kremitzki, Madeleine E. Lemieux, Tina Graves-Lindsay, Suzanne E. McGaugh, Jeff T. Miller, Mathilda Mommersteeg, Rachel L. Moran, Robert Peuß, Edward Rice, Misty R. Riddle, Itzel Sifuentes-Romero, Bethany A. Stanhope, Clifford J. Tabin, Sunishka Thakur, Yamamoto Yoshiyuki, Nicolas Rohner
bioRxiv 2020.07.06.189654; doi: https://doi.org/10.1101/2020.07.06.189654

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3686)
  • Biochemistry (7766)
  • Bioengineering (5666)
  • Bioinformatics (21234)
  • Biophysics (10552)
  • Cancer Biology (8157)
  • Cell Biology (11902)
  • Clinical Trials (138)
  • Developmental Biology (6736)
  • Ecology (10387)
  • Epidemiology (2065)
  • Evolutionary Biology (13838)
  • Genetics (9693)
  • Genomics (13054)
  • Immunology (8120)
  • Microbiology (19932)
  • Molecular Biology (7824)
  • Neuroscience (42955)
  • Paleontology (318)
  • Pathology (1276)
  • Pharmacology and Toxicology (2256)
  • Physiology (3350)
  • Plant Biology (7207)
  • Scientific Communication and Education (1309)
  • Synthetic Biology (1998)
  • Systems Biology (5528)
  • Zoology (1126)