Why does the PLS-DA separation pattern change after I updated the class labels?

jeff.xia · July 25, 2022, 2:53pm

The separation in PLS DA is calculated by maximizing the covariance between the data matrix (X) and the class labels (Y). By default , the program will first convert the class labels into rankings based on their numerical or alphabetic orders. For instance, group labels “A, B, C, D” will be 1, 2, 3, 4; while group labels “low, medium, high” will become 2, 3, 1! The PLS regression will be performed between the data matrix X and the numerical Y. When you update the labels, it could leads to the change of order, and thus the separation patterns (note the classification will not be affected by this)

For two group data, this procedure will not affect the visualization pattern, as it will always be between 1 vs. 2. For multi groups, this default approach is meaningful when the group labels correspond to time series, disease severity, or treatment dose. However, when group labels do not reflect quantitative differences, users should uncheck the option “Class order matters” (located on the top of the page, see below). In this case, PLS-DA will be performed using a general linear model in which group labels will be coded using model matrix rather than numerical values.