ProteoAnalyst offers two levels of data filtering:
Platform-Specific Filtering :
- MaxQuant: Remove contaminants (+), remove decoys (Reverse ‘+’), set minimum peptides per protein (recommended: 2), remove only-by-site IDs
- DIA-NN: Filter by Q-value (< 0.01), filter by PEP threshold, set minimum peptides/precursors
- FragPipe: Remove contaminants, set minimum protein probability (default: 0.99), set minimum combined total peptides
- Spectronaut: Remove contaminants, filter protein q-value below threshold, set minimum peptides/stripped sequences
Statistical Data Cleaning (available for all data types):
- Drop unannotated entries: Remove proteins/peptides that cannot be mapped to known identifiers
- Remove features with high missing values: Set a percentage threshold (e.g., remove features with >50% missing values)
- Remove low-variance features: Filter out the bottom X% of features ranked by interquartile range (IQR). Use low values (e.g., 5-10%) for differential expression analysis
How to choose:
Always apply platform-specific filtering first to remove known technical artifacts (contaminants, decoys, low-confidence IDs).
For statistical cleaning, a common starting point is removing features with >50% missing values and filtering the bottom 10% by variance. For differential expression studies, use conservative variance filtering to avoid removing potentially informative features.