It is critical to properly label your data so that they can be recognized and compared. The following common IDs are supported:
- Gene ID: Entrez ID, Ensembl Gene ID, GenBank Accession ID, RefSeq ID, Ensembl Transcript ID, and official Gene Symbol
- Probe ID (for human, mouse and rat only): popular microarray plotforms from Affymetrix, Agilent, Illumina;
The gene expression data also should contain sample names in the first line. Each sample name should be unique. The class labels of experimental conditions should be in a new line beginning with “#CLASS”. Multiple class labels can be indicated by adding a colon and its name (for example, “#CLASS:cancer_type” and “#CLASS:stage”). For meta-analysis, the same set of labels must be used for ALL datasets.
How to format a gene expression table?
Here is a good tutorial on how to generate tab delimited text files from the Excel Spreadsheet program. When you open your data using any text editor (for example, WordPad), it should look like the following:
Sample name, one class label (one missing value)
#NAME Sample1 Sample2 Sample3 Sample4 Sampl5 Sampl6 Sample7 Sample8 #CLASS case case case case control control control control Gene1 -3.06 -2.25 -1.15 -6.64 0.4 1.08 1.22 1.02 Gene2 -1.36 -0.67 -0.17 -0.97 -2.32 -5.06 0.28 1.32 Gene3 1.61 -0.27 0.71 -0.62 0.14 0.11 0.98 Gene4 0.93 1.29 -0.23 -0.74 -2 -1.25 1.07 1.27
Sample name, two class labels (cancer and sex)
#NAME Sample1 Sample2 Sample3 Sample4 Sampl5 Sampl6 Sample7 Sample8 #CLASS:CANCER case case case case control control control control #CLASS:SEX F F M M F M F M Gene1 -3.06 -2.25 -1.15 -6.64 0.4 1.08 1.22 1.02 Gene2 -1.36 -0.67 -0.17 -0.97 -2.32 -5.06 0.28 1.32 Gene3 1.61 -0.27 0.71 -0.62 0.14 0.11 0.98 Gene4 0.93 1.29 -0.23 -0.74 -2 -1.25 1.07 1.27
What if my microarray platform or organism is not supported?
You have three options:
- Keep the ID Type as “Not Specified”. You can still perform statistical analysis (differential expression, meta-analysis, volcano plot, heatmap, etc);
- Use the microarray annotation file to annotate probes to one of the common gene IDs that are supported (entrez, refseq, ensemble, etc)
- It is possible to add support for other model organisms/platforms based on user requests. Feel free to send us your suggestions. Note, this could take a while depending on the available time.