How should I choose a suitable normalization procedure?

Normalizing the data accounts for systematic technical sources of variation so that biologically-driven changes in gene expression can be better detected between samples. All of the normalization methods available on ExpressAnalyst are well-established in the field.

  • Depending on the data type (microarray vs RNAseq), different normalization methods will be presented;

  • If your gene expression data has already been normalized, please select “None”.

    • If you are not sure whether the data is already log transformed or not, you can easily figure this out by visualizing the data (i.e. boxplot). For microarray data, log transformed data values are usually less than 16. For RNA-seq data with 1 million reads, log2(1,000,000) is less than 20. Therefore if all data values are below 20, it is reasonable to assume that the data has already been log transformed.
  • In general, normalization methods work better with the associated differential analysis methods.

    • For RNAseq data, Trimmed Mean of M-values (TMM) should be used with edgeR; while Relative Log Expression (RLE) should be used with DESeq2