Data Integrity Check inquiry


I would like to understand better how the Data Integrity Check takes place.

I have been experiencing an issue where my datasets are cut significantly short based on the feature frequency during the data upload (“OTUs with ≥ 2 counts” in the Data Inspection step). It seems like very straightforward filtering, but I cannot replicate it on my own with my raw input datasets.

Could you describe step-by-step the algorithm of data integrity check upon data upload, please?
I would like to recreate this quality control on my own with my datasets to make sure there aren’t any errors.

Thank you!

For data integrity check , we keep the features that occur in at least 2 samples and also filter the constant features(variance equals 0) across the samples.
You can also check our R functions for details.
Hope this helps. Let me know if you have more questions.

Thank you for your response!

The R command only says this:

7. mbSet<-SanityCheckData(mbSet, “text”);
8. mbSet<-SanityCheckSampleData(mbSet);
9. mbSet<-SetMetaAttributes(mbSet, “1”)

Is there a better way to analyze these R functions?

Based on your question, you can read the underlying R source code at XiaLab GitHub.