Option for class-specific quantile normalization

jeff.xia · November 5, 2022, 9:18pm

Due to its potential interest to larger audience, this post was adapted from our MetaboAnalystR GitHub post

Some approaches in transcriptomics may not be appropriate for metabolomics, and require benchmarking studies before we can recommend them confidently. Untargeted metabolomics (~1000s features) are more similar to transcriptomics in terms of feature numbers, as compared to targeted metabolomics data (10s ~ 100s features). More specific considerations are given below:

Quantile normalization (QN) has a very strong assumption and influence on data distribution. In our experience, if it is applied to each class separately, it will generate distinct class separations (very clear on PCA). It is of particular concern when dataset is small. We see PCA changes from no separation to clear separation, with many more significant features identified after this procedure. However, this could be artifacts (i.e. caused by the algorithm).

A general assumption in differential analysis is that most omics features will remain stable (“homeostasis”), and only a small percentage (say, < 20%) will change. In this case, it is reasonable to apply QN globally, which is the typical use case for QN approach.

In summary, without dedicated benchmarking, we only recommend QN (applied to whole dataset instead of class-level) for untargeted metabolomics