I would like to understand better how the Data Integrity Check takes place.
I have been experiencing an issue where my datasets are cut significantly short based on the feature frequency during the data upload (“OTUs with ≥ 2 counts” in the Data Inspection step). It seems like very straightforward filtering, but I cannot replicate it on my own with my raw input datasets.
Could you describe step-by-step the algorithm of data integrity check upon data upload, please?
I would like to recreate this quality control on my own with my datasets to make sure there aren’t any errors.
Hello,
For data integrity check , we keep the features that occur in at least 2 samples and also filter the constant features(variance equals 0) across the samples.
You can also check our R functions for details.
Hope this helps. Let me know if you have more questions.