How to manually select proteins for biomarker analysis?

When manually building a biomarker panel, you should prioritize proteins that provide a balance between strong statistical performance and clear biological significance.

  • Consult Feature Importance Rankings: This is most straightforward by reference to the rankings generated by the platform based on univariate or multivariate statsitical methods. These rankings identify which proteins are the most statistically powerful “drivers” for distinguishing your experimental groups. Please note, although this approach aims to give you the most promising biomarkers, the performance will be optimistic (overfitting) as the biomarkers are selected on the same data.
  • Avoid redundancy: ProteoAnalyst computes k-means clustering as part of the feature ranking process to help researchers identify and mitigate redundancy when building biomarker panels. High-performing features that belong to the same cluster often provide overlapping information.
  • Parsimony: While it is tempting to include many proteins, smaller panels are preferred to avoid overfitting and to ensure clinical or experimental feasibility. Use the Predicted Accuracy plot to find the “elbow” point where adding more proteins no longer yields significant performance gains.
  • Leverage Biological Context: Manual selection is your opportunity to include “hub” proteins from relevant pathways identified in the WGCNA or PPI Network modules, even if they have slightly lower statistical scores. Combining biomarkers from different biological processes often provides more complementary information.
  • Data Completeness: Noise can be introduced by proteins with a high percentage of missing values. For the most reliable predictions, prioritize proteins that were consistently detected across the majority of samples in your original quantification matrix.