| Literature DB >> 30076392 |
Shaokui Yi1,2, Xiaoyun Zhou3, Jie Li1, Manman Zhang1, Shuangshuang Luo1.
Abstract
Reconstruction and annotation of transcripts, particularly for a species without reference genome, plays a critical role in gene discovery, investigation of genomic signatures, and genome annotation in the pre-genomic era. This study generated 33,330 full-length transcripts of diploid M. anguillicaudatus using PacBio SMRT Sequencing. A total of 6,918 gene families were identified with two or more isoforms, and 26,683 complete ORFs with an average length of 1,497 bp were detected. Totally, 1,208 high-confidence lncRNAs were identified, and most of these appeared to be precursor transcripts of miRNAs or snoRNAs. Phylogenetic tree of the Misgurnus species was inferred based on the 1,905 single copy orthologous genes. The tetraploid and diploid M. anguillicaudatus grouped into a clade, and M. bipartitus showed a closer relationship with the M. anguillicaudatus. The overall evolutionary rates of tetraploid M. anguillicaudatus were significantly higher than those of other Misgurnus species. Meanwhile, 28 positively selected genes were identified in M. anguillicaudatus clade. These positively selected genes may play critical roles in the adaptation to various habitat environments for M. anguillicaudatus. This study could facilitate further exploration of the genomic signatures of M. anguillicaudatus and provide potential insights into unveiling the evolutionary history of tetraploid loach.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30076392 PMCID: PMC6076316 DOI: 10.1038/s41598-018-29991-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
The PacBio SMRT sequencing information of M. anguillicaudatus.
| Sample name | cDNA size (kb) | Reads of Insert | Read bases of insert (bp) | Mean read length of insert (bp) | Mean read quality of insert | Mean number of passes | full-length non-chimeric reads |
|---|---|---|---|---|---|---|---|
| 2n_MA | 1–5 | 627,338 | 2,192,131,499 | 3494 | 0.88 | 3 | 220,682 |
| 2n_MA | 4.5–10 | 521,829 | 1,795,489,735 | 3441 | 0.89 | 3 | 201,598 |
Figure 1The frequency distribution of the length (A) and GC contents (B) of the 76,787 consensus sequences.
Figure 2Functional annotations of the non-redundant transcripts with the public databases. (A) Venn diagram of the annotation between NR, COG, KEGG, Swiss-Prot and InterPro databases. (B) The distribution of homologous species annotated in the NR database.
Figure 3Length distribution of the complete ORFs predicted with TransDecoder.
Figure 4Prediction of lncRNAs in M. anguillicaudatus. (A) Venn diagram of the prediction with three methods. (B) Length distribution of the 1,208 predicted lncRNAs.
Figure 5Phylogenetic tree of Misgurnus species inferred with RAxML (A) the substitution rates of Misgurnus species calculated with free-ratio model in codeml program (B).
Figure 6Distribution of GO classifications of positively selected genes in the diploid and tetraploid M. anguillicaudatus clade.
The positively selected sites identified in M. anguillicaudatus clade.
| Orthologous ID | Site position | BEB probability | Gene name | Gene description |
|---|---|---|---|---|
| OG0102 | 362 R | 0.991** |
| WASH complex subunit 4 |
| OG0501 | 290 N | 0.984* |
| nucleolin |
| OG0759 | 312 R | 0.985* |
| replication protein A1 |
| OG0884 | 47 D | 0.962* |
| lipase, endothelial |
| OG1052 | 72 V | 0.958* |
| receptor (G protein-coupled) activity modifying protein 2 |
| OG1178 | 240 T | 0.971* |
| zinc finger, CCCH-type with G patch domain |
| OG1413 | 7 T | 0.950* |
| nuclear assembly factor 1 homolog (S. cerevisiae) |
| OG1652 | 113 V | 0.979* |
| splicing factor 3b, subunit 4 |
| OG1728 | 87 S | 0.982* |
| immediate early response 2a |