’Simple Tidy GeneCoEx’: a gene co-expression analysis workflow powered by tidyverse and graph-based clustering in R

Chenxin Li; C. Robin Buell

doi:10.1101/2022.11.11.516131

Abstract

Gene co-expression analysis is an effective method to detect groups (or modules) of co-expressed genes that display similar expression patterns, which may function in the same biological processes. Here, we present ‘Simple Tidy GeneCoEx’, a gene co-expression analysis workflow written in the R programming language. The workflow is highly customizable across multiple stages of the pipeline including gene selection, edge selection, clustering resolution, and data visualization. Powered by the tidyverse package ecosystem and network analysis functions provided by the igraph package, the workflow detects gene co-expression modules whose members are highly interconnected. Step-by-step instructions with two use case examples as well as source code are available at https://github.com/cxli233/SimpleTidy_GeneCoEx.

Core Ideas

An R-based workflow that performs gene co-expression analysis was developed.
The workflow is based on tidyverse packages and graph theory.
The workflow is highly customizable, detects tight gene co-expression modules, and generates publication quality figures.
Two plant gene expression datasets were used to benchmark the workflow.

Competing Interest Statement

The authors have declared no competing interest.

Abbreviations

ANCOVA: analysis of covariance
ANOVA: analysis of variance
FPKM: fragments per kilobase exon model per million mapped fragments
LCM: laser capture micro-dissection
msq: mean sum of squares
PCA: principal component analysis
sd: standard deviation
TPM: transcripts per million
WGCNA: weighted gene co-expression network analysis

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.