PT - JOURNAL ARTICLE AU - Shantao Li AU - Forrest W. Crawford AU - Mark B. Gerstein TI - SigLASSO: a LASSO approach jointly optimizing sampling likelihood and cancer mutation signatures AID - 10.1101/366740 DP - 2018 Jan 01 TA - bioRxiv PG - 366740 4099 - http://biorxiv.org/content/early/2018/07/10/366740.short 4100 - http://biorxiv.org/content/early/2018/07/10/366740.full AB - Multiple mutational processes drive carcinogenesis, leaving characteristic signatures on tumor genomes. Determining the active signatures from the full repertoire of potential ones can help elucidate the mechanisms underlying cancer initiation and development. This involves decomposing the frequency of cancer mutations categorized according to their trinucleotide context into a linear combination of known mutational signatures. We formulate this task as an optimization problem with L1 regularization and develop a software tool, sigLASSO, to carry it out efficiently. First, by explicitly adding multinomial sampling into the overall objective function, we jointly optimize the likelihood of sampling and signature fitting. This is especially important when mutation counts are low and sampling variance, high, such as the case in whole exome sequencing. sigLASSO uses L1 regularization to parsimoniously assign signatures to mutation profiles, leading to sparse and more biologically interpretable solutions. Additionally, instead of hard thresholding and choosing a priori, a discrete subset of active signatures, sigLASSO fine-tunes model complexity parameters, informed by the scale of the data and prior knowledge. Finally, it is challenging to evaluate sigLASSO signature assignments. To do this, we construct a set of criteria, which we can apply consistently across assignments.