PT - JOURNAL ARTICLE AU - Krithika Bhuvaneshwar AU - Lei Song AU - Subha Madhavan AU - Yuriy Gusev TI - Detection and quantification of viral RNA in human tumors using open source pipeline: viGEN AID - 10.1101/099788 DP - 2017 Jan 01 TA - bioRxiv PG - 099788 4099 - http://biorxiv.org/content/early/2017/01/11/099788.short 4100 - http://biorxiv.org/content/early/2017/01/11/099788.full AB - Introduction An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples.We present an open source bioinformatics pipeline viGEN that combines existing well-known and novel RNA-seq tools for not only detection and quantification of viral RNA, but also variants in the viral transcripts.Methods The pipeline includes 4 major modules: The first module allows to align and filter out human RNA sequences; second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral genes level thus allowing for downstream differential expression analysis of viral genes between experimental and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package.Results In this paper, we use this pipeline in a case study to examine viruses present in RNA-seq data from 75 TCGA liver cancer patients. We were able to quantify viral transcriptomes at a viral-gene/CDS level, find differentially expressed viral transcripts between the groups of patients, extract variants, and connect them to clinical outcome. The results presented corresponded with published literature in terms of rate of detection, viral gene expression patterns and impact of several known variants of HBV genome. Results also show novel information about distinct patterns of expression and co-expression in Hepatitis B, Hepatitis C, Human Endogenous Retrovirus (HERV) K113 viruses.Conclusion This pipeline is generalizable, and can be used to provide novel biological insights into the significance of viral and other microbial infections in complex diseases, tumorigeneses and cancer immunology. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.HBVHepatitis B virusHCVHepatitis C VirusHERV K113Human Endogenous Retrovirus K113TCGAThe Cancer Genome AtlasHCCHepatocellular carcinomaNAFLDnonalcoholic fatty liver diseaseHep BHepatitis BHep CHepatitis CHepB + HepCcoinfected with both Hepatitis B and C virusHBsAgHepatitis B surface antigenHBeAgHepatitis B type e antigenNGSnext-generation sequencingRNA-seqwhole transcriptome sequencingBAMBinary version of Sequence alignment/map formatCDScoding sequenceCox PHCox Proportional HazardHBxviral gene XSTSSequence-tagged sitesNCBINational Center for Biotechnology InformationGFFgeneral-feature-format