How should I choose a suitable normalization procedure for my data?

jess.ewald · June 9, 2022, 9:54pm

Normalizing the data accounts for systematic technical sources of variation so that biologically-driven changes in gene expression can be better detected between samples. A gene expression normalization method should be chosen unless the data has already been normalized, in which case the user should select “None”.

All of the normalization methods available on FastBMD are well-established and have been used in many previous studies. They are based on slightly different assumptions about the underlaying distributions, but should produce relatively similar results. If you are concerned about significant differences between normalization methods, you can try out more than one and visualize the results using the provided plots.

Tip: if you are not sure whether the data is already log transformed or not, you can easily figure this out by visualizing the data (i.e. boxplot). For microarray data, log transformed data values are usually less than 16. For RNA-seq data with 1 million reads, log2(1,000,000) is less than 20. Therefore if all data values are all below 20, it is reasonable to assume that the data has already been log transformed.