RT Journal Article
SR Electronic
T1 trio-sga: facilitating de novo assembly of highly heterozygous genomes with parent-child trios
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 051516
DO 10.1101/051516
A1 Malinsky, Milan
A1 Simpson, Jared T.
A1 Durbin, Richard
YR 2016
UL http://biorxiv.org/content/early/2016/05/03/051516.abstract
AB Motivation Most DNA sequence in diploid organisms is found in two copies, one contributed by the mother and the other by the father. The high density of differences between the maternally and paternally contributed sequences (heterozygous sites) in some organisms makes de novo genome assembly very challenging, even for algorithms specifically designed to deal with these cases. Therefore, various approaches, most commonly inbreeding in the laboratory, are used to reduce heterozygosity in genomic data prior to assembly. However, many species are not amenable to these techniques.Results We introduce trio-sga, a set of three algorithms designed to take advantage of mother-father-offspring trio sequencing to facilitate better quality genome assembly in organisms with moderate to high levels of heterozygosity. Two of the algorithms use haplotype phase information present in the trio data to eliminate the majority of heterozygous sites before the assembly commences. The third algorithm is designed to reduce sequencing costs by enabling the use of parents’ reads in the assembly of the genome of the offspring. We test these algorithms on a ‘simulated trio’ from four hap-loid datasets, and further demonstrate their performance by assembling three highly heterozygous Heliconius butterfly genomes. While the implementation of trio-sga is tuned towards Illumina-generated data, we note that the trio approach to reducing heterozygosity is likely to have cross-platform utility for de novo assembly.