Different input sample order creates different results

Hi there,
Using Metaboanalyst, with the Statistical Analysis (metadata table) module and data format as time series + one factor, when I input my data in my original sample order (RBC_data.csv), I get lots of significant proteins (>1000). But when I re-order the samples in my original datasheet to group by my one factor (RBC_datatest.csv), I get 0 proteins. I have treated the two files the exact same process (and confirmed they are the exact same file just sample order is different), so I am wondering is this due to the normalization step? Or am I missing something?

My process:
-input data as time series + one factor, in columns
-check that metadata is matching samples properly under data editor
-default missing values replacement
-no extra filtering
-sample normalization by median
-data transformation cube root
-Pareto scaling
-Two way anova with interaction, same FDR

More clearly: if the input data is the same, except for the order of the samples is different, and I do the exact same process, why do I receive different results?

RBC_datatest.csv (1.3 MB)
RBC_data.csv (1.3 MB)
RBC_metadata_short.csv (618 Bytes)

Can you update your post based on this post?

1 Like

I updated my post with the data attached. I also read the response to the recent question about group labels on PLS-DA. Does small sample size also influence how the Two-way anova is calculated, and thus different order of samples would influence the output?

Thanks for alerting us to this issue. There was indeed a problem with how we were handling the metadata, which is fixed now. In addition, while we were reviewing the code, we made some updates to the way the ANOVA tests are set-up and are using a new R package, so you can expect some interface changes.

Btw, while using your data, I noticed that you have the same subject IDs for C, L, H. I assume this is control, low, and high, and it seems unlikely that you have done repeated ‘exposures’ to the same subjects for multiple time points? Pls ignore if this is not the case!

We have made the changes internally - the online version will be updated very soon.

Hi @jess.ewald,
Thanks for the update! I will re-analyze my data afterwards.

And indeed yes I do have repeated measures across my two factors - time and concentration because it was an in vitro study where I used red blood cells so they could be repeated over many treatments.


Ok, in that case - the old version of the multi-factor ANOVA interface was not designed to handle this case, but I’ve added a dropdown to the updated version where you can specify whether the experimental factor is defined between or within (your case) subjects.

1 Like