How does ProteoAnalyst handle missing values in proteomics data?

Missing values are a major challenge in proteomics, often representing 20-50% of the data matrix. ProteoAnalyst supports two missing mechanisms:

  • MNAR (Missing Not at Random) - the most common in proteomics, where low-abundance proteins fall below detection). The default option replaces these entries with one‑fifth of the minimum observed intensity assuming their limits of detection (LoD). For more advanced handling of such left-censored data, users can choose methods like MinDet, MinProb, and quantile regression imputation of left-censored data (QRILC)

  • MAR (Missing at Random) - missingness could be due to stochastic technical glitches rather than the actual concentration of the protein. Available methods include Mean/Median replacement, K-Nearest Neighbors (KNN), and Bayesian PCA.*