The separation in PLS DA is calculated by maximizing the covariance between the data matrix (X) and the class labels (Y). By default , the program will first convert the class labels into rankings based on their numerical or alphabetic orders. For instance, group labels “A, B, C, D” will be 1, 2, 3, 4; while group labels “low, medium, high” will become 2, 3, 1! The PLS regression will be performed between the data matrix X and the numerical Y. When you update the labels, it could leads to the change of order, and thus the separation patterns (note the classification will not be affected by this)
For two group data, this procedure will not affect the visualization pattern, as it will always be between 1 vs. 2. For multi groups, this default approach is meaningful when the group labels correspond to time series, disease severity, or treatment dose. However, when group labels do not reflect quantitative differences, users should uncheck the option “Class order matters” (located on the top of the page, see below). In this case, PLS-DA will be performed using a general linear model in which group labels will be coded using model matrix rather than numerical values.
If I am using PLS-DA to find important features, I understand that choosing group order matters will impact the results. I am unsure how the data is handled when there are multiple labels for each dataset? For instance, we have timepoints 12 and 17, physiological conditions diapause and non-diapause as well as different sample types that we are comparing across such as larvae and bacteria. They are labelled 12_Dia_B, 17_Dia_B, 12_Non_B, 17_Non_B, and also for larvae 12_Dia_L, 17_Dia_L, 12_Non_L, 17_Non_L. What would be the best approach to answer the question, what are the important features driving the differences between the bacteria and larvae during diapause and non-diapause physiological states? Would I select that group order matters since there are timepoints? Is it wrong if I do not indicate that group order matters? In summary, my question really is how does group order matters handle multiple variables in the group identifier?
Your question can be answered in the Statistics [Metadata Table] module => time series + one condition
PLS-DA in MetaboAnalyst only considers one experimental factor at a time (timepoints OR physiological conditions ). You can certainly merge multiple experimental factors into one (make sure they have enough replicates in each new group). Whether orders matter or not (for visualization only) depends on whether time effect is larger than physiological condition (you can estimate this from PCA)