Metabolites unrecognized by MetaboAnalyst 5.0 were excluded from the subsequent customized KEGG pathway analysis

GuoliangCui · June 26, 2022, 12:59pm

Hello,

I used MetaboAnalyst 5.0 to analyze data generated using the Biocrate MxP kit. We have 8 wildtype samples and 8 knockout samples.
I uploaded the .csv file containing ~630 metabolites quantified using the Biocrate MxP kit (for ease of reference in this post, this .csv file is called “file_1”. There were more than 100 metabolites not recognized by the MetaboAnalyst 5.0 database. As a result, these 100 metabolites were excluded by MetaboAnalyst 5.0 in the subsequent KEGG pathway analysis.
To reintroduce these 100 metabolites to MetaboAnalyst 5.0, I manually added these unrecognized metabolites to KEGG .csv file following the instruction of MetaboAnalyst 5.0 (i.e., add metabolite1; metabolite2; … to the 2nd column of this KEGG .csv file).

Later, I found that if a metabolite in file_1 was unrecognized by the MetaboAnalyst 5.0 database, even if I manually added this metabolite to the KEGG dataset, this manual editing did not change the result of pathway enrichment.

For example, the metabolite “Hydroxyglutaric acid” in my file_1 was not recognized by the MetaboAnalyst 5.0 database. I added “Hydroxyglutaric acid” to the KEGG dataset (either to an already existing pathway, or a newly created pathway in the KEGG dataset.csv file). Adding it to an already existing pathway did not change the enrichment ratio or p value of this existing pathway. We also created a new pathway containing “Hydroxyglutaric acid” and several other metabolites unmapped to MetaboAnalyst 5.0. All the metabolites defined in this new pathway exist in our file_1, so we expected the enrichment ratio of this newly created pathway to be 100%, but it was 0%.

Besides “Hydroxyglutaric acid”, I have tried multiple metabolites that were neglected by the MetaboAnalyst 5.0 database and results were the same.

Having said so much, my question is that, is there a way not to exclude the unmapped metabolites from the final KEGG pathway enrichment analysis?

Thanks!
Guoliang Cui

Qiang · July 4, 2022, 9:20pm

Hi Guoliang,

Thanks for your description on your question. Let’s be brief. You are using Enrichment Analysis module, right? Can you convert your compound name into KEGG ID or HMDB ID for better recognication? This is the best approach recommended to incease the coverage.

Based on your description, you have tried to use self-defined database? Have you defined the database correctly according to the template format? Can you share the database and your data for further checking?

Cheers,
Zhiqiang Pang

Steffen · December 21, 2022, 1:40pm

Hi Zhiqiang and Guoliang,
we struggle with the same issue (we loose a lot of metabolites for pathway analyses because they are not recognized by neither their name or HMDB ID). On the Metaboanalyst website we found that the last update based on HMDB releases took place almost five years ago (v4.0). Is there any way of mapping the metabolites to the most recent HMDB release to achieve higher coverage of metabolites for pathway analyses?
THANKS so much,
Steffen

xia.lab · December 22, 2022, 4:44pm

Please note that the main factor here is not compound annotation coverage, it is pathway annotation coverage. MetaboAnalyst primarily annotates compounds covered by our pathways or metabolite sets (plus common metabolites in HMDB). During downstream analysis, the background “universe” is defined by those compounds in pathways or metabolite sets. Adding new compounds (recognized or not) will not affect the result.

In other words, even compounds are in KEGG or HMDB, they may not be annotated in MetaboAnalyst (if they are not assigned to any pathways or metabolite sets). Those compounds will NOT affect the results (i.e. pathway analysis and their p-values)

If you look for general name mapping, please use RefMet

Connor_Jankowski · December 27, 2022, 8:29pm

Hi Jeff,

I’m not quite sure I understand your answer in the context of the question, but perhaps I am misunderstanding the issue. It sounds like they have compounds which exist in the current HMDB that are not being recognized and are thus being excluded from the pathway analysis, e.g. it’s not recognizing hydroxyglutarate as an input and thus it’s not being considered during pathway analysis.

xia.lab · December 29, 2022, 8:10pm

Pathway/Enrichment analysis is NOT dependent on whether the compounds exist in current HMDB. It depends on whether the compounds are included in our pathway libraries. The compound annotation in MetaboAnalyst is for pathway analysis, not for general purpose HMDB compound annotation. All compounds defined in our pathway libraries will be annotated. If they are not recognized (even they have valid HMDB IDs), there will be no effect on the results. The information is stated in the compound mapping result table (see below)

(Updated in 2025) We intend to add a permutation based approach taking into account of all compounds detected. The results will be less biased but also will have less power (i.e. p-values will be less signficant in general) - due to the signal dilution as well as the non-parameteric nature of this approach.

Steffen · January 11, 2023, 7:08pm

Dear Jeff and Connor,

thanks so much for your responses and helping to try to solve our problem. Please excuse my delayed response. Our original assumption was that the pathway annotations are linked to the HMDB annotation, which is, following your response Jeff, not the case. Our “problem” of loosing one third of measured metabolites for the pathway analyses persists but my understanding is that these might then hopefully be not the most functionally important metabolites.

Thanks again and best wishes for the new year,
Steffen