How to deal with multiple factors and interactions in metabolomics data analysis?

In general, multi-factor analysis in conventional statistical analysis cannot be accommodated in omics data simply because 1) such analysis require large sample size usually not available in omics studies; 2) if we perform such tests across all omics features, there is no well-accepted method in multiple testing adjustment, and we are risking high false positives.

Here are some general suggestions in omics analysis with multiple factors

  1. Try to reduce the number of factors in analysis. For instance, using PCA or heatmap to visually explore the data with regard to meta-data or factors. When there is no clear patterns of separation, it is generally advisable to analyze the data with regard to each experimental factor separately for simpler and easier interpretation.

  2. Try to stratify data with regard to important factors. In case, strong patterns were detected for some factors, an intuitive approach is split data based on this factor and do analysis separately. For instance, if gender is important, then perform analysis on M and F separately

  3. In MetaboAnalyst, the Statistical Analysis [metadata table] module does offer several carefully selected approaches that can be used to do analysis at omics scale with considerations of multiple factors and interactions, including

  • Two-way ANOVA
  • ANOVA-simultaneous component analysis (ASCA)
  • Linear modelling (limma, supporting fixed and random effects model)
  • Random forest (naturally deals with multi-factors including interactions)