Abstract
Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples. To help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available a dataset of five matched tumor-normal cell line pairs sequenced with Illumina, PacBio HiFi, and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples and technologies (short-read and long-read), DeepSomatic consistently outperforms existing callers, particularly for indels.
Competing Interest Statement
K.S., D.E.C., P.C., A. Kolesnikov, L.B., J.C.M., and A.C. are employees of Google LLC and own Alphabet stock as part of the standard compensation package. M.S.F. is a part of the speakers bureau for Bayer and PacBio.