ABSTRACT
Background The fungus gnat, Bradysia (Sciara) coprophila, has compelling chromosome biology. Paternal chromosomes are eliminated during spermatogenesis whereas both maternal X sister chromatids are retained. Embryos start with three copies of the X chromosome, but 1-2 copies are eliminated from somatic cells as part of sex determination, and one is eliminated in the germline to restore diploidy. These developmentally normal events present opportunities to study chromosome movements that are unusual in other systems. To support such studies, we previously generated a highly contiguous optical-map-scaffolded long-read assembly (Bcop_v1) of the male somatic genome. However, the scaffolds were not chromosome-scale, the majority of the assembly lacked chromosome assignments, and the order and orientation of the contigs along chromosomes remained unknown.
Findings Male pupae Hi-C data was used to correct, order, and orient the contigs from Bcop_v1 into chromosome-scale scaffolds, producing the updated assembly, Bcop_v2. Several orthogonal analyses allowed us to (i) identify the corresponding chromosome for each scaffold, (ii) orient them with respect to polytene maps, and (iii) determine that they were highly concordant with the chromosomes they represent. Gene annotations produced for Bcop_v1 were lifted over to Bcop_v2. Chromosomal repeat distributions highlight a potential telomeric sequence. Finally, the Hi-C data shed new light on three “fold-back regions” seen to physically interact in images of polytene X chromosomes.
Conclusions Studies of the unusual chromosome movements in Bradysia coprophila will benefit from the updated assembly (Bcop_v2) where each somatic chromosome is represented by a single scaffold.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
EMAILS: John M. Urban: jurban{at}carnegiescience.edu Susan A. Gerbi: susan_gerbi{at}brown.edu, Allan C. Spradling: spradling{at}carnegiescience.edu
Version 2 has been extremely expanded compared to version 1. Version 1 was released early (November 2022) to share the genome (Bcop_v2) and data described with other researchers, and so they had something to cite. Version 1 described (i) the basic process of getting chromosome-scale scaffolds with Hi-C scaffolding, (ii) using known sequences to anchor, orient, and QC the chromosome scaffolds, and (iii) lifting over gene annotations. Version 2 has been rearranged by separating methods and results into different sections. Version 2 has many new sections and analyses, including: (1) much more extensive QC of the Hi-C scaffolding results, (2) further validation of chromosome identities of the scaffolds, (3) much richer integration into the historical literature on Bradysia coprophila, (4) comparative genomics with other Dipterans, (5) deeper scrutiny of genes in the chromosome scaffolds reported to have originated through horizontal gene transfer (HGT), (6) much broader characterization of the chromosomal context of repeats and transposons as well as centromeric, peri-centromeric, telomeric, and sub-telomeric regions; including a description of satellites and retrotransposon-related sequences at scaffold termini, (7) a much richer exploration of long-range Hi-C interactions detected on the X chromosome, building evidence that they correspond to three regions seen to physically interact in polytene spreads, and characterizing those loci at the sequence level for the first time. We believe this version of the manuscript will be of major interest to those studying Bradysia coprophila as well as more generally to those interested in genome assembly, chromosome biology, Dipteran evolution, insect telomeres, and long-range chromosome interactions.
Abbreviations
- AGP
- a golden path (file format)
- Bcop
- Bradysia coprophila
- bp
- base pairs
- BED
- Browser Extensible Data (file format)
- BUSCO
- Benchmarking Universal Single-Copy Orthologs
- CDS
- coding sequence
- FBR
- foldback region
- Gb
- gigabase pairs
- GFF
- General Feature Format (file format)
- GTF
- General Transfer Format (file format)
- Hi-C
- High dimensional Chromosome conformation capture
- i5k
- initiative to sequence 5000 insect (and other arthropod) genomes (http://i5k.github.io/about)
- kb
- kilobase pairs
- LCTTR
- long complex terminal tandem repeats
- LINE
- long interspersed nuclear element
- LTR
- long terminal repeats
- Mb
- megabase pairs
- MMNE
- Min-Max Normalized Entropy
- NCBI
- National Center for Biotechnology Information
- PacBio
- Pacific Biosciences
- PCR
- polymerase chain reaction
- RTE
- retrotransposable element
- RTRS
- retrotransposon related sequence
- Sccr
- Sciara coprophila centromeric repeat
- ScRTE
- Sciara coprophila retrotransposable element as defined in Escribá et al (2011) [45]
- SMRT
- single-molecule real-time.
- ONT
- Oxford Nanopore Technologies.