Spectrum is a self-tuning spectral clustering method for multi-omics data. It combines the strengths of several other methods: an adaptive density-aware kernel is used to strengthen connections in the graph based on common nearest neighbours, a tensor product graph with diffision method is used to reduce noise while integrating different data sources, a generalized eigengap method is used to automatically determine the optimal number of sample clusters, and a spectral clustering method is used define sample clusters due to its speed on large graphs. OmicsAnalyst uses the multi-modality mode in the Spectrum R package, which finds the optimal number of clusters for both Gaussian and non-Gaussian distributed data. More details …
Similarity Network Fusion (SNF) generates an integrated sample similarity matrix from multiple 'omics datasets by first computing similarity matrices for each dataset individually, and then fusing them together. Individual similarity matrices are computed using an exponential similarity kernel that scales the Euclidean distance between samples. These matrices are then fused together by an iterative approach that adjusts each matrix to make it more similar to the others. The SNF algorithm is iterated until the matrices converge. OmicsAnalyst then uses the clustering method from the Spectrum R package to define sample clusters in the SNF matrix (refer to the Spectrum method above). More details …
Perturbation clustering methods implemented in the PINSPlus R package are founded on the idea that even truly homogeneous populations will have small differences in 'omics features due to measurement error and natural variability, and that clusters corresponding to true sub-population should be robust this variation. Their approach is to repeatedly add small amounts of noise to individual 'omics data, and then cluster the samples. The number of clusters that gives the most stable sample similarity matrix is retained, and the corresponding similarity matrices from each 'omics are integrated to find connections that are stable across multiple datasets. Finally, k-means clustering on the integrated matrix is used to group samples into the optimal number of clusters found in the previous step. This method is not suggested for datasets with large number of samples (>50). More details …