How does IQR filtering work?

As far as I understand, IQR filters out data if it is not between the 25th and 75th percentile, but I do not understand what values from the dataset are being ranked and needs to be within this range?

1 Like

IQR is a measure of variance of a variable (based on its 25th - 75th percentile from concentrations across all samples) - all variables are ranked based on their IQRs and those at the lower tail (i.e. 5% at the bottom) will be removed (near constant). Our data filtering page provides detailed explanation on this procedure. Below is a screenshot of the related text.

1 Like

Arh yes of course. Thank you for your reply, it was helpful.

Thank you this question and explanation, I was also wondering the same thing.

I have a follow up question: How does MetaboAnalyst select which metabolites to remove if there are many with an IQR of zero? For example: if I have 137 metabolites, calculate the IQR for each one, rank them, and see that I have 40 with the lowest IQR of zero. A 5% filter will remove seven of those, but how are they selected my MetaboAnalyst in this case?

Any help would be much appreciated.

There is no magic here. The approach will rank them and trim the portion. There will be some randomness if two features have the same values. If you know R, you can search the R function from the R history to see exact command used.

In this case, it is this R function:
β€œPerformFeatureFilter”,
and this line of R command:
rk ← rank(-filter.val, ties.method=β€˜random’);

Thanks for getting back, that is very helpful to know.

This topic was automatically closed after 4 days. New replies are no longer allowed.