Data Integrity Check inquiry

Tymofij · March 14, 2023, 11:25pm

Hello!

I would like to understand better how the Data Integrity Check takes place.

I have been experiencing an issue where my datasets are cut significantly short based on the feature frequency during the data upload (“OTUs with ≥ 2 counts” in the Data Inspection step). It seems like very straightforward filtering, but I cannot replicate it on my own with my raw input datasets.

Could you describe step-by-step the algorithm of data integrity check upon data upload, please?
I would like to recreate this quality control on my own with my datasets to make sure there aren’t any errors.

Thank you!

Yao · March 16, 2023, 1:28am

Hello,
For data integrity check , we keep the features that occur in at least 2 samples and also filter the constant features(variance equals 0) across the samples.
You can also check our R functions for details.
Hope this helps. Let me know if you have more questions.

Tymofij · March 17, 2023, 8:28pm

Thank you for your response!

The R command only says this:

7. mbSet<-SanityCheckData(mbSet, “text”);

8. mbSet<-SanityCheckSampleData(mbSet);

9. mbSet<-SetMetaAttributes(mbSet, “1”)

Is there a better way to analyze these R functions?

jeff.xia · March 21, 2023, 1:23pm

Based on your question, you can read the underlying R source code at XiaLab GitHub.

7.	mbSet<-SanityCheckData(mbSet, “text”);
8.	mbSet<-SanityCheckSampleData(mbSet);
9.	mbSet<-SetMetaAttributes(mbSet, “1”)