TY - JOUR T1 - Chromosome-level <em>de novo</em> genome assembly of <em>Telopea speciosissima</em> (New South Wales waratah) using long-reads, linked-reads and Hi-C JF - bioRxiv DO - 10.1101/2021.06.02.444084 SP - 2021.06.02.444084 AU - Stephanie H Chen AU - Maurizio Rossetto AU - Marlien van der Merwe AU - Patricia Lu-Irving AU - Jia-Yee S Yap AU - Hervé Sauquet AU - Greg Bourke AU - Timothy G Amos AU - Jason G Bragg AU - Richard J Edwards Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/11/10/2021.06.02.444084.abstract N2 - Telopea speciosissima, the New South Wales waratah, is an Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Here, we report the first chromosome-level genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 97.8 % of Embryophyta BUSCOs complete. We present a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering scaffolds, which combines read depths, k-mer frequencies and BUSCO predictions. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single copy orthologues and estimate the genome size to be approximately 900 Mb. The largest 11 scaffolds contained 94.1 % of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. We investigated CYCLOIDEA (CYC) genes, which have a role in determination of floral symmetry, and confirm the presence of two copies in the genome. Read depth analysis of 180 ‘Duplicated’ BUSCO genes suggest almost all are real duplications, increasing confidence in protein family analysis using annotated protein-coding genes, and highlighting a possible need to revise the BUSCO set for this lineage. The chromosome-level T. speciosissima reference genome (Tspe_v1) provides an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.Competing Interest StatementThe authors have declared no competing interest. ER -