ExpressAnalyst has 3 main steps to process raw RNA-sq data for species without a reference genome. The following steps are all done using the Seq2Fun software.
- Raw reads quality control, including adopter detection removal, low quality reads and bases removal, error correction. Raw Reads quality control will remove low quality and too short reads; trim low quality bases; remove sequencing adapters, ploy(A) tails, low complex reads; perform error correction for overlapped region of paired-end reads and join the overlapped paired-end reads.
- Clean reads alignment via translated search in a protein ortholog database. Each clean read will be translated into all possible amino acid sequences using the six reading-frames and the top longest ones will be used to identify its homology sequence in the protein database.
- Summarize results into gene abundance tables, figures. The gene/ortholog abundance table is generated by summarizing all reads that are mapped to the same ortholog group.
For more details, please read our paper (Peng Liu, et al. (2021).