Dear MetaboAnalyst team,

The VIP scores calculated by the MetaboAnalyst PLS-DA module differ from those obtained using other calculation modules, such as scikit-learn in Python and mixOmics in R. While the scores from scikit-learn and mixOmics are consistent, the MetaboAnalyst scores highlight influential features more prominently. Are your calculations different from those of other modules?

Here is a test dataset with 6 samples and 8 features, and the feature XXXX is manually added with a dramatic difference between classes T and C.

sample T1 T2 T3 C1 C2 C3

class T T T C C C

ALBU 8.27E9 1.08E10 4.63E9 1.45E10 2.37E10 9.37E9

TRFE 9.11E8 8.53E8 5.34E8 5.57E8 1.51E9 1.06E9

CO3 1.53E8 1.05E8 1.18E8 2.95E8 2.75E8 8.10E7

IGG1 2.08E9 1.06E9 2.22E9 2.61E8 2.66E8 1.87E9

A2MG 4.63E7 3.45E7 1.54E8 1.03E8 1.50E8 1.18E8

XXXX 1.00E5 8.00E4 9.00E4 8.00E8 1.00E9 9.00E7

FIBB 2.81E8 1.89E8 2.30E8 2.48E8 2.93E8 3.60E8

CO4B 1.77E5 3.97E5 1.36E5 2.19E6 1.15E6 3.51E5

The VIP scores calculated by scikit-learn/mixOmics and MetaboAnalyst are shown below. MetaboAnalyst highlights the feature XXXX with a VIP score of 2.74, while scikit-learn/mixOmics assigns a score of only 1.24. Additionally, the order of the features is different.

sklearn/mixOmics MetaboAnalyst

XXXX 1.242534 2.7451

CO4B 1.120459 0.38577

ALBU 1.086513 0.10519

FIBB 1.029846 0.041263

IGG1 1.001192 0.53945

CO3 0.907570 0.013373

A2MG 0.808401 0.099583

TRFE 0.692824 0.040896

# Scikit-learn (v1.3.2) python script

import numpy as np

from sklearn.cross_decomposition import PLSRegression, PLSCanonical

import pandas as pd

def vip_PLSR(model: PLSRegression):

t = model.x_scores_

w = model.x_rotations_

q = model.y_loadings_

p, h = w.shape

vips = np.zeros((p,))

s = np.diag(t.T @ t @ q.T @ q).reshape(h, -1)

total_s = np.sum(s)

for i in range(p):

weight = np.array([ (w[i,j] / np.linalg.norm(w[:,j]))**2 for j in range(h) ])
vips[i] = np.sqrt(p*(s.T @ weight)/total_s)

return vips

def main():

df = pd.read_csv(‘demodata.tsv’, sep=‘\t’, index_col=0)

X = np.array(df.T)

Y = np.array([0,0,0,1,1,1])

```
n_comp = 1
pls = PLSRegression(n_components=n_comp)
pls.fit(X, Y)
vips = vip_PLSR(pls)
coef = pls.coef_[0]
for i, feature in enumerate(df.index):
print(feature, vips[i], coef[i])
```

if **name** == ‘**main**’:

main()

# MixOmics R script:

if (!require(“BiocManager”, quietly = TRUE))

install.packages(“BiocManager”)

BiocManager::install(“mixOmics”)

library(“mixOmics”)

data ← read.table(“demodata.tsv”, header = TRUE, row.names = 1, sep = “\t”)

features ← rownames(data)

data ← as.data.frame(sapply(data, function(x) as.numeric(as.character(x))))

X ← t(data.matrix(data))

response_variable ← factor(c(rep(0, 3), rep(1, 3)))

Y = as.numeric(response_variable)

plsda_model ← mixOmics::plsda(X, Y, ncomp = 5)

vip_scores ← vip(plsda_model)

rownames(vip_scores) ← features

print(vip_scores)