Metaboanalyst Data Pre-processing

darkrider · April 1, 2025, 7:39pm

I’m in Metaboanalyst, processing MS peak list data (i.e. the pre-processing step). After the step of matching peaks across samples, peaks were grouped, and if there was more than one peak per group, it was replaced by their sum. My question is, how can one see which peaks (by rt and m/z) were grouped together and replaced by a sum? For example, I had ~12000 features and now it’s down to about ~8000. I ask this question because when looking at the PLS-DA results of top VIP scores, the metabolite m/z and rt do not match my original list of m/z and rt’s. Thank you so much for your insight!

jeff.xia · April 2, 2025, 12:34am

If I understand your case correctly, your input data are peak list files containing the original list of m/z, rt and intensities. Note that m/z and rt are not identical across different files, making it impossible to compare.

The goal of the pre-processing step is to align these peaks based on their m/z and rt, so that peaks can be compared across different files/samples and groups. After alignment, peaks will be identified by the median m/z and rt of their peak groups (based on my recall). The sum is performed on their intensity values

All downstream analyses (stats, compound identification, pathway analysis, etc) are performed on the aligned peak table, you should only refer to the aligned peaks (i.e. median values of m/z and rt for the peak group), not the original peaks, which is usually slightly different in each file.

darkrider · April 2, 2025, 12:14pm

Thank you so much for your reply, Jeff! My input data are a zipped peak list file that contain the original list of m/z, rt and intensities. The m/z and avg rt are identical across the files. In other words, my zipped file contains .csv spreadsheets, and each .csv spreadsheet is data from 1 urine sample. Each .csv spreadsheet contains 3 columns: rt, mz and intensity. Each row is a metabolite, and these rows are identical across each sample (i.e. the rt’s and mz’s are the same in each .csv spreadsheet). Only the intensities are different across the samples (i.e. for example, some intensities are zero because that metabolite was not detected for that sample). Also, many of these metabolites have already been identified by their rt’s and mz’s, so we have the names before uploading data into metaboanalyst. I was running into a problem in the downstream analysis because my PLS-DA results of top VIP scores listed metabolite mz’s and rt’s which do not match the data I uploaded into metaboanalyst.

Perhaps I performed the data pre-processing step in metaboanalyst incorrectly. We used the Agilent 6540 Accurate Mass QToF (2 ppm accuracy for MS and 5 ppm accuracy for MS/MS). In the LC-QToF methods, mass selection range was 100-1000 m/z range of error and total run time was 15 minutes. So in the metaboanalyst pre-processing step, I used 0.005 mass tolerance (m/z) and a retention time tolerance of 18.0 seconds. And I did not filter any data. I also normalized by median, transformed by square root transformation and auto-scaled.

If this all looks ok so far, as a follow up question, do you know how may I access and view the aligned peak table (i.e. median values of the m/z and rt for the peak group)? I appreciate your insight and thank you!

jeff.xia · April 2, 2025, 1:52pm

Note that all the results from data processing and analysis will be available in the “Download” page. I would suggest you to read our tutorials and recent publications to get familiar with the platform.

jeff.xia · April 5, 2025, 12:00pm

This topic was automatically closed after 3 days. New replies are no longer allowed.