RT Journal Article SR Electronic T1 Power and pitfalls of computational methods for inferring clone phylogenies and mutation orders from bulk sequencing data JF bioRxiv FD Cold Spring Harbor Laboratory SP 697318 DO 10.1101/697318 A1 Sayaka Miura A1 Tracy Vu A1 Jiamin Deng A1 Tiffany Buturla A1 Jiyeong Choi A1 Sudhir Kumar YR 2019 UL http://biorxiv.org/content/early/2019/07/19/697318.abstract AB Background Tumors harbor extensive genetic heterogeneity in the form of distinct clone genotypes that arise over time and across different tissues and regions of a cancer patient. Many computational methods produce clone phylogenies from population bulk sequencing data collected from multiple tumor samples. These clone phylogenies are used to infer mutation order and clone origin times during tumor progression, rendering the selection of the appropriate clonal deconvolution method quite critical. Surprisingly, absolute and relative accuracies of these methods in correctly inferring clone phylogenies have not been consistently assessed.Methods We evaluated the performance of seven computational methods in producing clone phylogenies for simulated datasets in which clones were sampled from multiple sectors of a primary tumor (multi-region) or primary and metastatic tumors in a patient (multi-site). We assessed the accuracy of tested methods metrics in determining the order of mutations and the branching pattern within the reconstructed clone phylogenies.Results The accuracy of the reconstructed mutation order varied extensively among methods (9% – 44% error). Methods also varied significantly in reconstructing the topologies of clone phylogenies, as 24% – 58% of the inferred clone groupings were incorrect. All the tested methods showed limited ability to identify ancestral clone sequences present in tumor samples correctly. The occurrence of multiple seeding events among tumor sites during metastatic tumor evolution hindered deconvolution of clones for all tested methods.Conclusions Overall, CloneFinder, MACHINA, and LICHeE showed the highest overall accuracy, but none of the methods performed well for all simulated datasets and conditions.SNVsingle-nucleotide variantVAFvariant allele frequencyCNVcopy number alteration