PT - JOURNAL ARTICLE AU - Daniele Ramazzotti AU - Avantika Lal AU - Keli Liu AU - Robert Tibshirani AU - Arend Sidow TI - De Novo Mutational Signature Discovery in Tumor Genomes using SparseSignatures AID - 10.1101/384834 DP - 2019 Jan 01 TA - bioRxiv PG - 384834 4099 - http://biorxiv.org/content/early/2019/07/11/384834.short 4100 - http://biorxiv.org/content/early/2019/07/11/384834.full AB - Cancer is the result of mutagenic processes that can be inferred from tumor genome sequences by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates DNA replication error as a background, favors sparsity (signatures with few types of mutations) of non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to very large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using standard metrics. We then apply SparseSignatures to whole genome sequences of 4476 tumors from 33 cancer types, obtaining 13 signatures in addition to the background. Signatures of known mutagens (e.g., UV light, benzo(a)pyrene, APOBEC dysregulation) occur in the expected tissues and a dominant signature with uncertain etiology is present in liver cancers. Other cancers exhibit a mixture of signatures or are dominated by background and CpG methylation signatures. Apart from cancers that are mostly due to environmental mutagens, there is little correlation between cancer types and signatures, highlighting the idea that any of several mutagenic pathways can be active in any solid tissue.