Hi there. What is the cut-off rank that MetaboAnalyst uses to filter metabolites using MAD/RSD/non-parametric RSD? Within the description of the FilterVariable function, it’s not clear:
“The function applies a filtering method, ranks the variables within the dataset, and removes variables based on its rank.”
It seems clear from the MetaboAnalyst filter page that for more broad filter parameters (mean, median, IQR) they are based on the number of metabolites in the dataset. Is this the same for MAD/RSD/non-parametric RSD? Apologies if this is clearly described but I haven’t been able to find it on the website or in previous pubs.
Yes, the cutoff is mainly based on data size for those you mentioned (see default value below). The detailed documentation on the Data Filtering web page. We strongly recommend data filtering for untargeted metabolomics (due to high noise level). For targeted metabolomics, 5000 features should be sufficient
Hi Jeff. Thanks for the quick response. Just to make sure I’m clear, if we used MAD/RSD/non-para RSD on a dataset of 950 metabolites, 25% of the lowest ranked features (highest MAD) would be filtered out, regardless of the actual MAD values? So nothing like a traditional MAD exclusion with a hard threshold (MAD > 3 or something)? Basically, we want to be as clear and specific about our exclusion criteria for a manuscript.
I see your point. You are welcome to suggest these features (with related papers / links) - we will consider adding these support in the new release, together with associated documentations.
At the moment, there are two possible solutions:
Use MetaboAnalyst web - if you do data filtering, at the download page, there will be a file “data_prefilter_###.csv” - it contains both the data and the filter values (the first column). You can download this file and open it in a spreadsheet to sort and filter at the threshold of your desire. You can then upload this data (make sure to remove the filter column and add class label) and do analysis (no more data filter this time)
Use MetaboAnalystR package - this task will be very simple if you know R