ABSTRACT
A whole genome co-expression network was created using Mycobacterium tuberculosis transcriptomic data from publicly available RNA-sequencing experiments covering a wide variety of experimental conditions. The network includes expressed regions with no formal annotation, including putative short RNAs and untranslated regions of expressed transcripts, along with the protein-coding genes. These unannotated expressed transcripts were among the best-connected members of the module sub-networks, making up more than half of the ‘hub’ elements in modules that include protein-coding genes known to be part of regulatory systems involved in stress response and host adaptation. This dataset provides a valuable resource for investigating the role of non-coding RNA, and conserved hypothetical proteins, in transcriptomic remodelling. Based on their connections to genes with known functional groupings and correlations with replicated host conditions, predicted expressed transcripts can be screened as suitable candidates for further experimental validation.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
updated supplemental table 2
Abbreviations
- CDS
- coding sequence
- ME
- module eigengene
- MM
- module membership
- Mtb
- Mycobacterium tuberculosis
- MTBC
- Mycobacterium tuberculosis complex
- ncRNA
- non-coding RNA
- ORF
- open reading frame
- RNA-seq
- RNA sequencing
- RNAP
- RNA polymerase
- sORF
- short open reading frame
- sRNA
- short non-coding RNA
- TSS
- transcription start site
- TTS
- transcription termination site
- UTR
- untranslated region
- WGCNA
- weighted gene co-expression analysis