How is the raw RNA-seq data processed in ExpressAnalyst?

ExpressAnalyst has 3 main steps to process raw RNA-sq data for species without a reference genome. The following steps are all done using the Seq2Fun software.

  1. Raw reads quality control, including adopter detection removal, low quality reads and bases removal, error correction. Raw Reads quality control will remove low quality and too short reads; trim low quality bases; remove sequencing adapters, ploy(A) tails, low complex reads; perform error correction for overlapped region of paired-end reads and join the overlapped paired-end reads.
  2. Clean reads alignment via translated search in a protein ortholog database. Each clean read will be translated into all possible amino acid sequences using the six reading-frames and the top longest ones will be used to identify its homology sequence in the protein database.
  3. Summarize results into gene abundance tables, figures. The gene/ortholog abundance table is generated by summarizing all reads that are mapped to the same ortholog group.

For more details, please read our paper (Peng Liu, et al. (2021).