# What is the meaning of Components in PLS-DA?

Hello everyone,
I have a dataset of concentrations from various metabolites taken from different parts of the digestive system of pigs.
I’d like to use the PLS-DA to see which metabolites are mostly influenced by the different parts of the digestive system. Unfortunately I don’t quite understand the meaning of the components in the PLS-DA charts.
I understand the PLS-DA as follows (In simple words):
In the PLS-DA the concentrations are taken as input for a model which tries to connect those to the different groups (=the different parts of the digestive system).
The methabolites get weighted via the VIP score based on how good they drive the separation of groups in the model.

My question is regarding the components of the following plot (example plot):

I’m having a hard time to describe and value the separation of these bubbles in B. as I don’t understand the meaning of the components (x-axis / y-axis). Also for graph A. I don’t get how it helps me to know that the first component explains 33% of the variability in the three groups.
Are the components correlated to the VIP score (C.), so that component 1 is connected to the first metabolite of the vip score (Tryptophan)? Or are the components just variables of the model? If yes, what kind of information can I get from this graph for my metabolite analysis?
Can someone help me understand the “components”?

1 Like

It’s ok, PLS-DA is very useful, but also very difficult to understand! Here is my best attempt to explain the parts that are important for interpretation in simple terms:

Your dataset has some metabolites (ie. metabolite A, B, C, D, E). There is some total amount of variability in your dataset (var(A) + var(B) + var(C) + var(D) + var(E)), with each variability being calculated with concentrations from all samples.

If we have had a successful experiment, the metabolites with the largest variability should be the most interesting, a.k.a. different between your experimental factors. We could make plot B with just two metabolites, ie. x-axis = metabolite C conc, y-axis = metabolite E conc.

However, plotting component scores instead of individual metabolite concentrations can give a better overview because components capture information from many metabolites. A component score is just some coefficients times each metabolite concentration and added together: coef1A conc + coef2B conc + coef3*C conc … = component score. See, inputting concentrations from each sample produces a component score for each sample.

In PLS-DA, we first find the set of coefficients that produces component scores that have the biggest difference between your experimental factors (this is component 1). The % variability is: var(component 1 scores)/total variability. Remember that total variability is: (var(A) + var(B) + var(C) + var(D) + var(E)). Then, we find the second set of coefficients that explains the second most variability between experimental factors to calculate component 2 scores, and so on for each component. Each component must be orthogonal to all the others, which just means that it must describe new patterns in the data that were not captured by the previous components.

So for interpretation:

• If there is no separation between sample bubbles, it means that there are no consistent patterns in the metabolites that explain differences between the treatment groups (sad!)
• If there is separation, we can tell some other things.
• First, sample groups that are closer together have more similar metabolic profiles to each other. Ie if you had liver, small intestine, large intestine, you probably expect the two intestine groups to fall closer together on the plot compared to the liver.
• Second, once we see the components that separate our samples, we can try understand which individual metabolites are driving this through the VIP score. VIP scores are related to the coefficients for each component score. Metabolites with higher VIP had bigger coefficients, meaning they influenced that score the most.

A typical interpretation: Plots A and B show separation of sample groups, great, there are some consistent metabolic patterns! I see in component 1, cluster 2&3 are similar to each other, and different from cluster 3. I wonder why? Which metabolites are driving this? I see from the VIP scores for component 1 that it is mainly 4 essential amino acids.

2 Likes

Hello jess.ewald