What are empirical compounds and how they are calculated (mummichog)?

In mummichog version 2.0, empirical compounds are intermediaries between m/z features and compounds. An empirical compound is a computational unit for a tentative metabolite, since the experimental measurement may not separate compounds of identical mass (isomers). The steps for how they are formed are as follows:

  1. As in version 1, all m/z features are matched to potential compounds considering different adducts. Then, per compound, all matching m/z features are split into Empirical Compounds based on whether they match within an expected retention time window. The retention time window (in seconds) is calculated as the maximum retention time * 0.02. This results in the initial Empirical Compounds list.

  2. Empirical Compounds are merged if they have the same m/z, matched form/ion, and retention time. This results in the merged Empirical Compounds list.

  3. Finally, if primary ions are enforced, only Empirical Compounds containing at least 1 primary ion are kept. Primary ions considered are ‘M+H[1+]’, ‘M+Na[1+]’, ‘M-H2O+H[1+]’, ‘M-H[-]’, ‘M-2H[2-]’, ‘M-H2O-H[-]’, ‘M+H [1+]’, ‘M+Na [1+]’, ‘M-H2O+H [1+]’, ‘M-H [1-]’, ‘M-2H [2-]’, and ‘M-H2O-H [1-]’. This results in the final Empirical Compounds list.

  4. Next, pathway libraries are converted from “Compound” space to “Empirical Compound” space. This is done by converting all compounds in each pathway to all Empirical Compound matches. Then the mummichog/GSEA algorithms work as before to calculate pathway enrichment.