How do I assess if the effectiveness of different data processing options?

You can assess the impact and effectiveness of different data processing options (data filtering, normalization, and missing value imputation) based on different visualizations, which are designed to detect technical biases and ensure biological signals are preserved.

  • Data Distribution (Boxplots and Density Plots): Effective normalization should result in samples having similar medians in the boxplot and overlapping bell-shaped curves in the density plot, indicating that systematic technical differences in total intensity have been removed.

  • PCA: help you determine if the normalization has improved the grouping of biological replicates. If the normalization is successful, biological replicates from the same experimental group should cluster more closely together, while the different groups should show clearer separation along the first two principal components.

  • MA Plot: the minus–average plot is a scatter plot of logâ‚‚(fold change) versus average expression; used to diagnose normalization quality and visualize expression-dependent effects; effective normalization should center the data around zero across the entire intensity range.

  • Distance Dendrogram allow you to verify that replicates remain highly correlated after processing. You should check that the dendrogram correctly branches samples based on their biological conditions rather than technical batches.