What are the data filtering options in ProteoAnalyst and how should I choose?

ProteoAnalyst offers two levels of data filtering:

Platform-Specific Filtering :

  • MaxQuant: Remove contaminants (+), remove decoys (Reverse ‘+’), set minimum peptides per protein (recommended: 2), remove only-by-site IDs
  • DIA-NN: Filter by Q-value (< 0.01), filter by PEP threshold, set minimum peptides/precursors
  • FragPipe: Remove contaminants, set minimum protein probability (default: 0.99), set minimum combined total peptides
  • Spectronaut: Remove contaminants, filter protein q-value below threshold, set minimum peptides/stripped sequences

Statistical Data Cleaning (available for all data types):

  • Drop unannotated entries: Remove proteins/peptides that cannot be mapped to known identifiers
  • Remove features with high missing values: Set a percentage threshold (e.g., remove features with >50% missing values)
  • Remove low-variance features: Filter out the bottom X% of features ranked by interquartile range (IQR). Use low values (e.g., 5-10%) for differential expression analysis

How to choose:

Always apply platform-specific filtering first to remove known technical artifacts (contaminants, decoys, low-confidence IDs).

For statistical cleaning, a common starting point is removing features with >50% missing values and filtering the bottom 10% by variance. For differential expression studies, use conservative variance filtering to avoid removing potentially informative features.