RT Journal Article SR Electronic T1 De Novo Mutational Signature Discovery in Tumor Genomes using SparseSignatures JF bioRxiv FD Cold Spring Harbor Laboratory SP 384834 DO 10.1101/384834 A1 Daniele Ramazzotti A1 Avantika Lal A1 Keli Liu A1 Robert Tibshirani A1 Arend Sidow YR 2018 UL http://biorxiv.org/content/early/2018/08/04/384834.abstract AB Cancer is the result of mutagenic processes that can be inferred from genome sequences by analysis of mutational signatures. Here we present SparseSignatures, a novel framework to extract mutational signatures from somatic point mutation data. Our approach incorporates DNA replication error as a background, enforces sparsity of non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to very large datasets. We apply SparseSignatures to whole genome sequences of 2827 tumors from 20 cancer types and show by standard metrics that our set of signatures is substantially more robust than previously reported ones, having eliminated redundancy and overfitting. Known mutagens (e.g., UV light, benzo(a)pyrene, APOBEC dysregulation) exhibit single signatures and occur in the expected tissues, a dominant signature with uncertain etiology is present in liver cancers, and other cancers exhibit a mixture of signatures or are dominated by background and CpG methylation signatures. Apart from cancers that are mostly due to environmental mutagens there is virtually no correlation between cancer types and signatures, highlighting the idea that any of several mutagenic pathways can be active in any solid tissue.