How to understand the result from permutation tests (Biomarker Analysis)?

Permutation tests are performed to see if the biomarker models obtained based on the original data are any better than null models (i.e. models created using shuffled group labels).

The procedures are described below:

  1. Re-assign the group labels randomly to each sample;
  2. Perform model building and evaluation through cross validation (CV) based on shuffled labels. The performance is then recorded.
    • Specific for Biomarker Module: perform balanced sub-sampling cross validation. Within each CV, perform feature ranking and select the top x features to build a classifier using the 2/3 training data, which is then tested on the 1/3 hold-out data. Note, the procedure is repeated only 3 times to save computational time.
  3. Repeat steps 1 and 2 many times (say, 1000 times)
  4. Compare the performance using the original phenotype label and the permuted labels. With sufficient sample size, the performance measures based on the permuted data will usually form a normal distribution. If the performance score of the original data lies outside the distribution, then the result is significant. An empirical p value is also usually given. For instance, in 1000 permutation tests, if none of the results are better than the original one, the p value will be reported as p < 0.001.

We have noticed that, when the data does not contain good signals (i.e. AUROC is < 0.65), or when the sample size is small with outliers, the permutation results could be unstable (switching between significant and insignificant in different runs). This is unavoidable due to the random subsampling nature of the procedure.

Related post permutation tests in PLS-DA