DESeq2 and edgeR

DESeq2 (Love et al. 2014) and edgeR (Robinson et al. 2010) are statistical packages in Bioconductor used to assess differential expression in RNA-Seq experiments.

 

DESeq2 or edgeR statistics for an assembly can be analyzed by opening the assembly in ArrayStar. For information about setting up an assembly suitable for analyzing DESeq2 or edgeR statistics in ArrayStar, see Create a SeqMan NGen Assembly Using DESeq2 or edgeR Statistics.

 

Both methods require a control group to be specified, and both require replicate samples for each experimental condition and for the control. Note that when multiple experimental conditions are being considered, the same control group is used for multiple tests. The original P-values from the statistical tests are then adjusted using the Benjamini-Hochberg (1995) procedure.

 

Differences between DESeq2 and edgeR are shown in the table below:

 

Calculation

DESeq2

edgeR

Normalization method

Uses a median of ratios method to normalize read counts to account for sequencing depth and RNA composition. Provides two methods: regularized logarithm (rlog) and Variance Stabilizing Transformations (VST).

 

DESeq2 does not attempt to account for transcript length since it is comparing counts between samples for the same gene and assumes the length does not change. This assumption holds true except in rare cases where the dominant transcript length changes between samples due to alternative splicing for example.

Uses "trimmed mean of M-values" (TMM) (Robinson & Oshlack, 2010). The TMM normalized read count can be viewed in the ArrayStar tables, where counts are represented as log2(counts-per-million-reads).

 

Normalized counts generated by a different method, RLE, are also available within ArrayStar but these values are not used for the actual statistical tests. RLE is similar to the RLOG normalization method used by DESEq2.

Statistical tests for differential expression

DESeq2 uses raw counts, rather than normalized count data, and models the

normalization to fit the counts within a Generalized Linear Model (GLM) of the negative binomial family with a logarithmic link. Statistical tests are then performed to assess differential expression, if any.

Data are normalized to account for sample size differences and variance among samples. The normalized count data are used to estimate per-gene fold changes and to perform statistical tests of whether each gene is likely to be differentially expressed.

 

EdgeR uses an exact test under a negative binomial distribution (Robinson and Smyth, 2008). The statistical test is related to Fisher's exact test, though Fisher uses a different distribution.

Data reporting method

In ArrayStar, the rlog values are used by default in the scatter plot and for clustering. VST values are displayed as Gene Table data columns.

In ArrayStar, the log2(CPM) values calculated using TMM are used by default in the scatter plot. In the Gene Table, values for fold change compared to the control are represented as log(fold change).