What are the differences between sample normalization, transformation, and scaling?

The Data Quality Check page allow users to visualize the distributions of different omics datasets and choose proper algorithms for data normalization.

Sample normalization aims to adjust systematic differences among samples (i.e. difference in dilutions, or input volumes, library size or sequencing depth etc). Normalization by sum is more appropriate for discrete data, for example raw RNA-seq reads counts; while normalization by median is often used for continuous data, for example MS-based data like metabolomics or proteomics.

Transformation: the simplest method is log transformation, as most molecular profiles are log normal. Many advanced, powerful data transformation have been developed for microarray, RNAseq, or metabolomics. Please choose appropriate methods based on your omics data types. Note, many method can also adjust systematic differences

Scaling is applied to the feature-dimension and refers to standardizing the data such that each feature has roughly the same distribution such as [-1, 1] in auto-scaling or Z-score. This is an important step for multi-omics data since data measured on different platforms can have vastly different values. Scaling the data can make the 'omics types more comparable to each other, making it easier to identify consistent patterns across multiple data sets.