In mummichog version 2.0, empirical compounds are intermediaries between m/z features and compounds. An empirical compound is a computational unit for a tentative metabolite, since the experimental measurement may not separate compounds of identical mass (isomers). The steps for how they are formed are as follows:
-
As in version 1, all m/z features are matched to potential compounds considering different adducts. Then, per compound, all matching m/z features are split into Empirical Compounds based on whether they match within an expected retention time window. The retention time window (in seconds) is calculated as the maximum retention time * 0.02. This results in the initial Empirical Compounds list.
-
Empirical Compounds are merged if they have the same m/z, matched form/ion, and retention time. This results in the merged Empirical Compounds list.
-
Finally, if primary ions are enforced, only Empirical Compounds containing at least 1 primary ion are kept. Primary ions considered are ‘M+H[1+]’, ‘M+Na[1+]’, ‘M-H2O+H[1+]’, ‘M-H[-]’, ‘M-2H[2-]’, ‘M-H2O-H[-]’, ‘M+H [1+]’, ‘M+Na [1+]’, ‘M-H2O+H [1+]’, ‘M-H [1-]’, ‘M-2H [2-]’, and ‘M-H2O-H [1-]’. This results in the final Empirical Compounds list.
-
Next, pathway libraries are converted from “Compound” space to “Empirical Compound” space. This is done by converting all compounds in each pathway to all Empirical Compound matches. Then the mummichog/GSEA algorithms work as before to calculate pathway enrichment.