Abstract
Hepatitis B virus (HBV) represents a major health burden, as it affects close to 290 million people worldwide. Although prophylactic vaccines are available, current therapeutic compounds do not usually achieve HBV eradication due to the persistence of the covalently closed circular (ccc)DNA that serves as viral reservoir. Thus, novel biomarkers that reliably reflect intrahepatic cccDNA transcriptional activity would be highly relevant for the monitoring of infected individuals, as well as the evaluation of new treatments targeting HBV. In this context, the development of 5’ rapid amplification of complementary DNA ends (5’RACE) as a strategy to capture and amplify full-length HBV RNAs, coupled with long-read and full-length sequencing approaches (e.g., Oxford Nanopore Technology), has recently enabled the detailed characterization of these molecules. The analysis of such data requires a dedicated bioinformatics pipeline due to the highly condensed nature of the HBV genome, which is characterized by the production of multiple transcripts and spliced variants that overlap each other. Here, we present Bolero, a computational method and built-in workflow designed to handle HBV sequencing data and evaluate the relative expression of viral RNAs and their spliced variants. The analysis of HBV-infected cell lines demonstrates that our bioinformatics pipeline is efficient for the identification and quantification of individual HBV mRNAs. Thus, Bolero represents a useful tool to study cccDNA transcriptional activity and the heterogeneity of HBV RNA spliced variants.
Author summary Transcriptomic analyses have brought comprehensive insights in the mechanisms controlling gene expression. Moreover, with the recent advances in sequencing technologies and computational methods, researchers can nowadays not only quantify gene expression, but also study alternative splicing, polyadenylation, transcription initiation, and even rare phenomena such as distant gene fusions. However, conventional analysis tools still rely heavily on the assumption of linear genomes with minimal overlap between open reading frames, rendering them insufficient for studying complex viruses such as hepatitis B virus (HBV).
Unlike typical linear genomes, HBV genome consists in a circular DNA molecule, which results in an extensive sequence overlap between its transcripts. To tackle these challenges, we developed an innovative approach coupling 5’ rapid amplification of complementary DNA ends (5’RACE) and long-read sequencing to comprehensively explore the HBV transcriptome. Furthermore, we developed Bolero, a computational method designed to handle the peculiarities of HBV sequencing data, which allows a detailed characterization of the HBV transcriptome.
Competing Interest Statement
The authors have declared no competing interest.