How does LEfSe work with the multi-class datasets?

Tymofij · August 17, 2022, 10:05pm

Hello!
I have a few questions regarding performance of LEfSe analysis on the datasets that have more than 2 classes/groups of features.

Does MicrobiomeAnalyst offer a strict version of LEfSe?
I noticed in my analyses that some featuers were identified as biomarkers of one class but did not differ statistically from all the other groups, just from one or a few. According to the original paper on LEfSe ( Metagenomic biomarker discovery and explanation - PMC (nih.gov), this is a non-strict version of LEfSe, i.e., it determines the biomarkers that distinguish at least one individual class. How do I perform a strict strategy of LEfSe in the MicrobiomeAnalyst, i.e., identify bacterial features that are statistically different from all classes within a multi-class dataset, and obtain a detail report on the p-values, FDR-corrected p-values, LDA scores, etc.?
MicrobiomeAnalyst offers the Benjamini-Hochberg method of correction of the p-values for multiple testing; however, LEfSe algorithm is sequential, meaning it runs several tests where each next test depends on the results of the previous one. Are FDR-corrected p-values used in the Kruskal-Wallis test, Wilcoxon pairwise tests, and the final LDA tests or only in the latter LDA one?
Sometimes the names of the bacteria in the bar and dot plots of LEfSe results are cut in width, other times—either the heatmap on the right of the dot plot is cropped, or the number of different colors in the bar plot is limited to only a few in cases of the datasets with many classes. Is there a way to increase the plot parameters for visualization of LEfSe results?

Thank you!
I would appreciate all the information on the topic that you can provide!
~ Tymofij

Yao · August 18, 2022, 1:46am

Hello,

Thanks for bing up this questions.

For multi-class LEfSe, MicrobiomeAnalyst only supports the non-strict version currently. We can add the strict version in our update website coming soon.
The p-values correction is only conducted in the final step.
Add the moment, using the choice for Prepend higher taxa and color pallet would be helpful. However, sometimes if the name is really too long, it may squeeze the space for the main plot.

If you have more question or suggestion, free feel to post here.

Best

Tymofij · April 17, 2023, 5:04pm

Hello,

I just wanted to follow up on this topic.

Bar plots that LEfSe generates are never cutting off the names of the taxa; however, the dot plot does. Is there a way to improve the dot plots?

Also, does the MicrobiomeAnalyst team still consider implementing the strict version for multi-class LEfSe? That would be GREATLY appreciated.

Thank you!
Tymofij