| Literature DB >> 32366294 |
Chun-Hsi Tso1, Jen-Leih Wu2, Ming-Wei Lu3.
Abstract
BACKGROUND: Transcriptome analysis by next-generation sequencing has become a popular technique in recent years. This approach is quite suitable for non-model organism study, as de novo assembly is independent of prior genomic sequences of organisms. De novo sequencing has benefited many studies on commercially important fish species. However, to understand the functions of these assembled sequences, they still need to be annotated with existing sequence databases. By combining Basic Local Alignment Search Tool (BLAST) and Gene Ontology analysis, we were able to identify homologous sequences of assembled sequences and describe their characteristics using pre-defined tags for each gene, though the above conventional annotation results obtained for non-model assembled sequences was still associated with a lack of pre-defined tags and poorly documented records in the database.Entities:
Keywords: Functional enrichment analysis; Non-model organism; RNA-Seq; Transcriptome
Year: 2020 PMID: 32366294 PMCID: PMC7199347 DOI: 10.1186/s12859-020-3507-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow outline of Blast2Fish. This illustration briefs the workflow of Blast2Fish. The dotted line frame means the Blast2Fish system. The pipeline starts from the gene sequences file. Blast2Fish outputs MeSH term, taxonomic distribution and reference resource
Top 20 MeSH major terms of the demonstration analysis
| MeSH terms | Score | Annotated queries |
|---|---|---|
| Signal Transduction | 16.80 | 127 |
| Genes, MHC Class I | 15.19 | 56 |
| Genes, MHC Class II | 9.31 | 54 |
| Immunity, Innate | 8.91 | 75 |
| Cell Movement | 6.01 | 38 |
| Genome, Viral | 5.50 | 6 |
| Gene Targeting | 5.44 | 10 |
| Host-Pathogen Interactions | 4.68 | 20 |
| Cell Differentiation | 4.26 | 34 |
| Body Patterning | 3.84 | 35 |
| Genetic Predisposition to Disease | 3.80 | 26 |
| Genes, Immunoglobulin | 3.68 | 31 |
| Gene Regulatory Networks | 3.55 | 24 |
| Gene Rearrangement, B-Lymphocyte, Light Chain | 3.38 | 17 |
| Flatfishes | 3.31 | 38 |
| Embryonic Development | 3.18 | 34 |
| Neovascularization, Physiologic | 3.10 | 22 |
| DNA Methylation | 2.69 | 17 |
| Wound Healing | 2.56 | 15 |
| Apoptosis | 2.51 | 23 |
Fig. 2The distribution of the top 40 major MeSH of demo analysis. The visualization output of annotated major MeSH counts on the result page
Top 10 MeSH minor terms of the demonstration analysis
| MeSH terms | Score | Annotated queries |
|---|---|---|
| Signal Transduction | 76.74 | 385 |
| Fish Diseases | 72.10 | 309 |
| Brain | 37.89 | 241 |
| Liver | 37.82 | 241 |
| Transcription Factors | 34.39 | 216 |
| Cell Differentiation | 34.35 | 204 |
| Nerve Tissue Proteins | 29.67 | 162 |
| 27.38 | 183 | |
| Cell Movement | 26.97 | 183 |
| Carrier Proteins | 25.72 | 123 |
Top 10 immune system-specific major MeSH terms of demo analysis
| MeSH terms | Score | Annotated queries |
|---|---|---|
| Genes, MHC Class I | 15.19 | 56 |
| Genes, MHC Class II | 9.31 | 54 |
| Immunity, Innate | 8.91 | 75 |
| Genes, Immunoglobulin | 3.68 | 31 |
| Gene Rearrangement, B-Lymphocyte, Light Chain | 3.38 | 17 |
| Spleen | 2.08 | 28 |
| Immune Evasion | 1.50 | 2 |
| Adaptive Immunity | 1.40 | 15 |
| Disease Resistance | 1.00 | 1 |
| Major Histocompatibility Complex | 0.84 | 15 |
Fig. 3The effect of increasing parameter max_target_seqs in the annotation. a With increasing parameter max_target_seqs, the percentage of annotated hits rose and the average identity of BLAST hits showed a decline. b The counts of PubMed articles per annotated query increased by maximum target sequence setting. c Using more hits resulted in higher MeSH terms and PubMed articles than with the conventional method, which only uses a single best BLAST hit per query
The requirements of the system for performing Blast2Fish annotation
| nr DB | bonyfish_7898 DB | |
|---|---|---|
| BLAST runtime (minutes) | 50,267 | |
| BLAST memory required (GB) | 57 | |
| Database size (GB) | 199 |