What is Jaccard index and how to use it to identify pathways of interest?


The Jaccard index is a statistic used for measuring the similarity and diversity of sample sets. It is defined by taking the ratio of two sizes, the intersection size divided by the union size.

  • 0: No overlap at all (the sets are completely disjoint).
  • 1: Perfect overlap (the sets are identical in terms of presence/absence).

Low Jaccard score indicates that two pathways are very different, can be potentially complementary. They are the target of interest to further manual inspection.

When comparing the pathwaya from a specific GEM to the pathway from its background (community GEM), the Jaccard index becomes proportion (A/B). Low value indicate that this microbe contains very few reactions compared to the more complete pathways defined by the community.