What is the rationale behind filtering gene expression data in FastBMD?

The purpose of filtering is to increase the statistical power of differential expression analysis be removing any genes that are less likely to be informative. While the NTP recommendations for 'omics dose-response modeling do not contain any variance or abundance-based filtering steps, this is a common step in typical differential expression analysis and so we have given users the option of incorporating it into their workflow.

  • Low variance filter: genes whose expression values do not change across different samples, and thus have very low variance. Genes are ranked by their variance from low to high, and you can exclude a certain percentile of genes with the lowest variance by adjusting the “Variance filter” slider. The above referenced study has suggested that up to 50% genes can be removed based on their variance with improved results

  • Low abundance filter: genes with very low abundance are not measured reliably and may not be biologically important. You can exclude genes below a certain threshold by adjusting the “Low abundance” slider. The above referenced study has suggested 10% genes can be removed based on their abundance with improved results

Please refer to the paper Independent filtering increases detection power for high-throughput experiments for detailed discussion and benchmark tests.