What is the rationale behind filtering gene expression data?

The purpose of filtering is to increase the statistical power of differential expression analysis by removing any genes that are less likely to be informative. There are three common strategies:

Low variance filter: genes whose expression values do not change across different samples, and thus have very low variance. Genes are ranked by their variance from low to high, and you can exclude a certain percentile of genes with the lowest variance by adjusting the “Variance filter” slider.

Low abundance filter: genes with very low abundance are not measured reliably and may not be biologically important. You can exclude genes below a certain threshold by adjusting the “Low abundance” slider. The above referenced study has suggested 10% genes can be removed based on their abundance with improved results

Difficult-to-measure filter: some experiments including QC samples or technical replicates - features that changes a lot in those replicates cannot be measured reliably and should be excluded in downstream analysis.

Please refer to the paper Independent filtering increases detection power for high-throughput experiments for detailed discussion and benchmark tests. The study has suggested that up to 50% genes (!) can be removed based on their variance with improved results.