I have data for both lipids concetrations and metabolites concetrations, which are in different scale and units, for example metabolites’ concetrations are in mmol/L and lipids are in mg/dL.
Is it better to run two different analyses, one for the lipids alone and one for metabolites alone, or can I run one analysis for all of my data? And why? Is it because of the units or is better to run those compounds in seperate analysis?
Good question! Directly merging the data is usually not a good idea as difference in the scale could make one dataset dominate the analysis. For instance, PCA is sensitive to large values in the data
Here are some thoughts:
- If the main concern is difference in units/scale, you can perform normalization / scaling on each data and then combine them
- For biomarker analysis, it is possible that integrating features from both lipids and metabolites could improve the performance
- More features (same number of samples) could introduce more noises and lower statistical power (high false positives)
Maybe explore both ways (individually and then combined) then let us know your findings!