OmicsAnalyst currently supports 5 commonly used methods including:
- Multiple co-inertia analysis (MCIA)
- Consensus PCA (CPCA)
- Projection to latent structures (PLS)
- Procrustes Analysis
- Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO)
The key distinguishing features between the five dimension reduction algorithms are summarized below:
Algorithm | Symmetry | Orthogonality constraints | Supervision | Unique features |
MCIA | Symmetric | Individual and global | Unsupervised | Very similar to the more familiar canonical correlation analysis but is more robust to outliers and has fewer tuneable parameters. Overall, it is performed in a two-step process. First, a one table dimension reduction method is performed on each individual dataset. Secondly, MCIA projects the two dimensionally reduced matrices into a same hyperspace while imposing the constraint of maximizing covariance between each matrix. (more details ....) |
CPCA | Symmetric | Individual only | Unsupervised | This method finds overlapping components in the global score matrix due to relaxed orthogonality constraints. This has theoretical consistency with the idea that some features participate in multiple but distinct biological processes, and so it could recover more biologically realistic components. (more details ....) |
PLS | Asymmetric | Individual and global | Unsupervised | PLS deals efficiently with high dimension, collinearity, noise, and missing values, making it ideal for multi-omics data. OmicsAnalyst runs PLS on regression mode, meaning that components from one ‘omics are chosen based on their ability to predict components in the other. This makes it asymmetric and the order in which datasets are uploaded will change the results. (more details ....) |
Procrustes Analysis | Asymmetric | Individual only | Unsupervised | Procrustes analysis (PA) is a fast and simple visualization technique that superimposes the principal components of two datasets at the low-dimensional space. Procrustes essentially computes reduced dimensions for each data set using a method similar to PCA. Then, one of the reduced dimension matrices is rotated until it has maximum similarity with the other. Procrustes is one of the most widely used multi-variate dimension reduction methods used for multi-omics data. (more details ....) |
DIABLO | Symmetric | Individual and global | Supervised | The only supervised approach. DIABLO is a multi-block partial least squares discriminant analysis (multi PLS-DA), and thus finds components in the shared covariance space that maximally separate sample groups as specified by the meta-data. (more details ....) |
Note: Symmetry refers to whether the order that the data sets are analyzed/uploaded will give the same (symmetric) or different (asymmetric) results; Orthogonality constraints refers to which components must be orthogonal to each other - individual only means components must only be orthogonal within the set computed for a single 'omics data set whereas individual and global means components must also be orthogonal within the shared co-variance space; Supervision refers to whether the sample takes the sample labels into consideration when computing the components.