Rarefying/Normalization

gdlesk · January 5, 2024, 6:31pm

Hi all,

I have really enjoyed using MicrobiomeAnalyst it has been by far the easiest and most useful bioinformatic tool I have used so far. I had a few questions regarding rarefying/normalizing data inside MB analyst versus outside of it.

I have Illumina 250bp paired-end 16s sequencing data from three sample types within an urban river (crayfish, sediment, and water) and I have a large disparity of read counts between different samples (~300 reads - ~160,000 reads). Based on the large difference I had planned to rarify the data in MB analyst and then continue the analysis. However, I am worried that rarefying to minimum library size will remove too much of the data from the samples with a high read count, and removing the samples with low read counts will remove too many samples to answer our original research question.

I was reading through some literature and the documentation of phyloseq about normalizing microbiome data with uneven sequencing depth and found that rarefying was not recommended due to the high chance of type II errors. McMurdie and Homes 2014, recommended to normalize data in DESeq2, edgeR, or metagenomeSeq packages.

Can data normalized in any of those packages be used as input data into MB analyst or is there another recommended solution to deal with uneven sequencing depth in a data set without filtering out all of the low-read samples?

Thanks!
-Grant

jeff.xia · January 6, 2024, 9:54pm

This is certainly a valid concern. However, there is no good solution AFAK. It is really up to your data and analysis goals (i.e. alpha diversity vs comparative analysis). Many default approaches were designed for gut microbiome data (which are relatively dense). For environmental samples, they are more sparse and require different settings.

Note:

MicrobiomeAnalyst does not enforce rarefying.;
If you choose DESeq2, edgeR or metagenomeSeq, their built-in normalization methods will be used. The normalization page is mainly for other methods (i.e. t-tests) or for visualization purposes.

system · January 8, 2024, 6:29pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.