How to prepare a gene expression table for ExpressAnalyst?

It is critical to properly label your data so that they can be recognized and compared. The following common IDs are supported:

  1. Gene ID: Entrez ID, Ensembl Gene ID, GenBank Accession ID, RefSeq ID, Ensembl Transcript ID, and official Gene Symbol
  2. Probe ID (for human, mouse and rat only): popular microarray plotforms from Affymetrix, Agilent, Illumina;

The gene expression data also should contain sample names in the first line. Each sample name should be unique. The class labels of experimental conditions should be in a new line beginning with “#CLASS”. Multiple class labels can be indicated by adding a colon and its name (for example, “#CLASS:cancer_type” and “#CLASS:stage”). For meta-analysis, the same set of labels must be used for ALL datasets.

  • How to format a gene expression table?

    Here is a good tutorial on how to generate tab delimited text files from the Excel Spreadsheet program. When you open your data using any text editor (for example, WordPad), it should look like the following:

    • Sample name, one class label (one missing value)

      #NAME	Sample1	Sample2	Sample3	Sample4	Sampl5	Sampl6	Sample7	Sample8
      #CLASS	case	case	case	case	control	control	control	control
      Gene1	-3.06	-2.25	-1.15	-6.64	0.4	1.08	1.22	1.02
      Gene2	-1.36	-0.67	-0.17	-0.97	-2.32	-5.06	0.28	1.32
      Gene3	1.61	-0.27	0.71	-0.62	0.14		0.11	0.98
      Gene4	0.93	1.29	-0.23	-0.74	-2	-1.25	1.07	1.27
      
    • Sample name, two class labels (cancer and sex)

      #NAME           Sample1	Sample2	Sample3	Sample4	Sampl5	Sampl6	Sample7	Sample8
      #CLASS:CANCER	case	case	case	case	control	control	control	control
      #CLASS:SEX	F	F	M	M	F	M	F	M
      Gene1           -3.06	-2.25	-1.15	-6.64	0.4	1.08	1.22	1.02
      Gene2           -1.36	-0.67	-0.17	-0.97	-2.32	-5.06	0.28	1.32
      Gene3           1.61	-0.27	0.71	-0.62	0.14		0.11	0.98
      Gene4           0.93	1.29	-0.23	-0.74	-2	-1.25	1.07	1.27
      
  • What if my microarray platform or organism is not supported?

    You have three options:

    1. Keep the ID Type as “Not Specified”. You can still perform statistical analysis (differential expression, meta-analysis, volcano plot, heatmap, etc);
    2. Use the microarray annotation file to annotate probes to one of the common gene IDs that are supported (entrez, refseq, ensemble, etc)
    3. It is possible to add support for other model organisms/platforms based on user requests. Feel free to send us your suggestions. Note, this could take a while depending on the available time.