Abstract
De novo transcriptome sequencing and analysis provide a way for researchers of non-model organisms to explore the differences between various conditions and species. These experiments are expensive and produce large-scale data. The results are typically not definitive but will lead to new hypotheses to study. Therefore, it is important that the results be reproducible, extensible, queryable, and easily available to all members of the team. Towards this end, the Transcriptome Computational Workbench (TCW) is a software package to perform the fundamental computations for transcriptome analysis (singleTCW) and comparative analysis (multiTCW). It is a Java-based desktop application that uses MySQL for the TCW database. The input to singleTCW is sequence and optional count files; the computations are sequence similarity annotation, gene ontology assignment, open reading frame (ORF) finding using hit information and 5th-order Markov models, and differential expression (DE). For DE analysis, TCW interfaces with an R script, where R scripts for edgeR and DEseq are provided, but the user can supply their own. TCW provides support for searching with the super-fast DIAMOND program against UniProt taxonomic databases, though the user can request BLAST and provide other databases to search against. The input to multiTCW is multiple singleTCW databases; the computations are homologous pair assignment, pairwise analysis (e.g. Ka/Ks) from codon-based alignments, clustering (bidirectional best hit, Closure, OrthoMCL, user-supplied), and cluster analysis and annotation. Both singleTCW and multiTCW provide a graphical interface for extensive query and display of the data. Example results are presented from three datasets: (i) a rhizome plant with de novo assembled contigs, (ii) a rhizome plant with gene models from a draft genome sequence, and (iii) a non-rhizome plant with gene models from a finished genome sequence. The two rhizome plants have replicate count data for rhizomes, root, stem and leaf samples. The software is freely available at https://github.com/csoderlund/TCW.