Potential score describes the probability of a given taxon to produce a metabolite. TrpNet uses the logistic regression predictive model to estimate the scores. Genome-scale metabolic models (GEMs) are used as knowledge base for training the predictive models.
For each tryptophan metabolite, Bayesian logit regression models are trained for each taxonomy level from phylum to strain using the taxon labels in the selected taxonomy level and the origin (whether human/mouse gut microbe or not) as predictors.
The predictive result of a input taxon list by the corresponding regression models can be summarized as a matrix with rows as the input taxa and column as the metabolites they can potentially produce complemented by the probability termed as potential score. Models trained by taxonomy levels with higher resolution tend to present better performance in prediction based on the 10-fold cross-validation and ROC analysis. A potential score over 0.5 indicates the taxon is more likely to produce the given metabolite and the increasing score value means the greater production possibility.