I used MetaboAnalyst 5.0 to analyze data generated using the Biocrate MxP kit. We have 8 wildtype samples and 8 knockout samples.
I uploaded the .csv file containing ~630 metabolites quantified using the Biocrate MxP kit (for ease of reference in this post, this .csv file is called “file_1”. There were more than 100 metabolites not recognized by the MetaboAnalyst 5.0 database. As a result, these 100 metabolites were excluded by MetaboAnalyst 5.0 in the subsequent KEGG pathway analysis.
To reintroduce these 100 metabolites to MetaboAnalyst 5.0, I manually added these unrecognized metabolites to KEGG .csv file following the instruction of MetaboAnalyst 5.0 (i.e., add metabolite1; metabolite2; … to the 2nd column of this KEGG .csv file).
Later, I found that if a metabolite in file_1 was unrecognized by the MetaboAnalyst 5.0 database, even if I manually added this metabolite to the KEGG dataset, this manual editing did not change the result of pathway enrichment.
For example, the metabolite “Hydroxyglutaric acid” in my file_1 was not recognized by the MetaboAnalyst 5.0 database. I added “Hydroxyglutaric acid” to the KEGG dataset (either to an already existing pathway, or a newly created pathway in the KEGG dataset.csv file). Adding it to an already existing pathway did not change the enrichment ratio or p value of this existing pathway. We also created a new pathway containing “Hydroxyglutaric acid” and several other metabolites unmapped to MetaboAnalyst 5.0. All the metabolites defined in this new pathway exist in our file_1, so we expected the enrichment ratio of this newly created pathway to be 100%, but it was 0%.
Besides “Hydroxyglutaric acid”, I have tried multiple metabolites that were neglected by the MetaboAnalyst 5.0 database and results were the same.
Having said so much, my question is that, is there a way not to exclude the unmapped metabolites from the final KEGG pathway enrichment analysis?
Thanks for your description on your question. Let’s be brief. You are using Enrichment Analysis module, right? Can you convert your compound name into KEGG ID or HMDB ID for better recognication? This is the best approach recommended to incease the coverage.
Based on your description, you have tried to use self-defined database? Have you defined the database correctly according to the template format? Can you share the database and your data for further checking?
Hi Zhiqiang and Guoliang,
we struggle with the same issue (we loose a lot of metabolites for pathway analyses because they are not recognized by neither their name or HMDB ID). On the Metaboanalyst website we found that the last update based on HMDB releases took place almost five years ago (v4.0). Is there any way of mapping the metabolites to the most recent HMDB release to achieve higher coverage of metabolites for pathway analyses?
THANKS so much,
Please be aware that compound annotation results have very little effect on the downstream pathway analysis in MetaboAnalyst. All compounds covered by our pathways or metabolite sets will be annotated. During analysis, the background “universe” is defined by those compounds in pathways or metabolite sets. Adding new compounds (recognized or not) will not affect the result. The main factor here is not compound annotation, it is pathway annotation. Our pathway libraries were updated in late 2021.
I’m not quite sure I understand your answer in the context of the question, but perhaps I am misunderstanding the issue. It sounds like they have compounds which exist in the current HMDB that are not being recognized and are thus being excluded from the pathway analysis, e.g. it’s not recognizing hydroxyglutarate as an input and thus it’s not being considered during pathway analysis.
Pathway/Enrichment analysis is NOT dependent on whether the compounds exist in current HMDB. It depends on whether the compounds are included in our pathway libraries. The compound annotation in MetaboAnalyst is for pathway analysis, not for general purpose HMDB compound annotation. All compounds defined in our pathway libraries will be annotated. If they are not recognized (even they have valid HMDB IDs), there will be no effect on the results.
The information is stated in the compound mapping result table (see below)
Dear Jeff and Connor,
thanks so much for your responses and helping to try to solve our problem. Please excuse my delayed response. Our original assumption was that the pathway annotations are linked to the HMDB annotation, which is, following your response Jeff, not the case. Our “problem” of loosing one third of measured metabolites for the pathway analyses persists but my understanding is that these might then hopefully be not the most functionally important metabolites.
Thanks again and best wishes for the new year,