How does the mummichog algorithm work?

Qiang · June 12, 2022, 9:18pm

The mummichog algorithm (Li et al.) has been developed to to perform pathway activity analysis directly from LC-MS untargeted metabolomics peak list data.

The algorithm is based on the assumption that even the mapping of LC-MS peaks to compounds is often inaccurate, it is still possible to identify meaningful functional changes by performing enrichment analysis based on those “fuzzy” annotations, as long as the annotation errors are random. Note, improving the annotation (i.e. high resolution MS, or better algorithm such as NetID) will directly improve the pathway analysis results.

Briefly, users provide a list of m/z features and their associated p-values (i.e., obtained from t-tests / ANOVA), together with a significance threshold (i.e. p-value 0.05)

Three lists are then drawn from this initial list,

Lsig, which is the list containing only all significant m/z features (determined by the user selected p-value cutoff);
Lref, which is the list of all m/z features;
Lperm, which is a list of randomly drawn m/z features from Lref, but the same length as Lsig.

The next steps are as follows:

A list of randomly drawn m/z features are drawn from Lref to create Lperm. The m/z features are then mapped to potential metabolites, considering different adducts, protons, etc.
The list of potential compounds are then mapped to the user’s selected library of pathways, and a p-value is calculated per pathway.
Steps 1 and 2 are repeated many times to compute the null distribution of p-values (modeled as a gamma distribution).
The Lsig is mapped to potential metabolites for pathway enrichment analysis, and the resulting p-values (Fisher’s or Hypergeometric, and EASE scores) per pathway for the Lsig compounds are then adjusted for the null-distribution.