Adapted from a MetaboAnalystR GitHub post
In general, we discourage the usage of traditional statistical methods - they are not suitable for omics data analysis. Some practical considerations from our perspective:
- Computationally expensive - when applied to 1000s of features …
- Statistically no well defined for multiple testing adjustment for interactions, post-hoc analysis
- Interface design are very complex to support 3-way ANOVA (better do it offline by experienced statisticians)
We recommend using the linear modelling (limma) for flexible multi-factor, exploratory analysis - available in the Statistical Analysis [metadata] module.
For very complex study design, try to deduce number of factors to be included at the same time:
- Use PCA colouring based on different factors to see if there are patterns associated with these factors
- Stratify the main study factors (for instance, separate analysis for the gender factor)
More advanced statistics may be applied in the late stage of analysis (i.e. for those significant features) with the caveats of not inflating the false positives
Finally, statistics (and p values in particular) are mainly to help prioritize your targets for further validation. Robust patterns and biological interpretation (with consideration of the study context) are the main focus. In our experience, complex statistical modeling can obscure meaningful biological patterns in omics data.