Hello, I am a new user of MetaboAnalystR. While attempting a workflow for multifactor metadata analysis with covariates there were a couple errors in the R directions from the MetaboAnalyst website. I have an example workflow below with fixes for these issue.
################################################################################
# MetaboAnalystR workflow
################################################################################
# Resest the workflow
mSet<-NULL
# Initialize object
mSet<-InitDataObjects("pktable", "mf", FALSE)
# metadata with covariates
mSet<-SetDesignType(mSet, "multi")
mSet<-Read.TextDataTs(mSetObj = mSet, filePath="feat.dlco.metab.txt", format="colmf");
mSet<-ReadMetaData(mSet, "class.dlco.txt");
# fix metadata types
# it seems MetaboAnalystR gets its datatypes from the metadata title.
# this section corrects all the metadata stuff, except for those labeled "orig"
mSet$dataSet$meta.types[1:3]<-c("cont","disc","cont")
mSet$dataSet$meta.info[,1]<-as.numeric(as.character(mSet$dataSet$meta.info[,1]))
mSet$dataSet$meta.info[,3]<-as.numeric(as.character(mSet$dataSet$meta.info[,3]))
mSet$dataSet$types.cls.lbl[1:3]<-c("cont","disc","cont")
mSet$dataSet$type.cls.lbl<-"cont"
mSet$dataSet$cls<-as.numeric(as.character(mSet$dataSet$cls))
# fixes some kind of missing internal variable
mSet$dataSet$cls.type<-"cont"
mSet<-SanityCheckData(mSet)
# replaces na's (but not 0's)
mSet<-ReplaceMin(mSet);
mSet<-SanityCheckMeta(mSet, 1)
mSet<-SetDataTypeOfMeta(mSet);
mSet<-SanityCheckData(mSet)
# Filter based on iqr, RSD, then logtransform then normalize
mSet<-FilterVariable(mSet, "iqr", 40, "F", 25, F)
mSet<-PreparePrenormData(mSet)
mSet<-Normalization(mSet, "QuantileNorm", "LogNorm", "NULL", ratio=FALSE, ratioNum=20)
# Covariate adjusted anova
# Directions in MetaboAnalystR indicate using square brackets instead of vector notation
# but that doesn't work
adj.vec <<- c("SmokingStatus", "AGE_YRS")
mSet<-CovariateScatter.Anal(mSet, "covariate_plot_0_dpi72.png", "png", "DLCO", "NA", "NA" , 0.05,"anova")
# Get plot for scatter plot
# Triple colon required for hidden functions
mSet <- MetaboAnalystR:::PlotCovariateMap(mSet, theme="default", imgName="", format="png", dpi=72)
# Get plot for individual metabolite
mSet <- MetaboAnalystR:::PlotMultiFacCmpdSummary(mSet, "134.8651018","DLCO", 0, "png", 72, width=NA)
This seems to get the intended analysis and figures (I am seeing similar issues for heatmap and pca, and will post solutions soon). However, while reading through the code on GitHub I found a couple lines of code that indicate a that for mixed models of continuous variables and factor variables the linear regression employed by MetaboAnalystR sets the intercept of the model to 0. The code that does this is shown below, and is in multifac_covariate.R at lines 503 through 507.
if(sum(types == "cont") == length(vars)){ #in case of single, continuous variable, must use different intercept or limma will give unreasonable results
design <- model.matrix(formula(paste0("~", paste0(" + ", vars, collapse = ""))), data = covariates);
} else {
design <- model.matrix(formula(paste0("~ 0", paste0(" + ", vars, collapse = ""))), data = covariates);
}
I would appreciate a explanation of the reasoning for this model adjustment, especially the comment “in case of single, continuous variable, must use different intercept or limma will give unreasonable results”. My understanding is that removing the intercept should generally have bad effects on the results, and I could not find a case where it was done in this way. I understand that this behavior can be avoided by setting metadata types to continuous, but as it stands I worry that people using the main website are using this intercept-less model without any awareness.