| Literature DB >> 33989386 |
Xindan Li1,2, Jinming Wu1, Xinping Xiao1, Yifeng Rong1,2, Haile Yang1, Junyi Li1, Qiong Zhou1, Weiguo Zhou3, Jianquan Shi3, Hongfang Qi3, Hao Du1,2,3.
Abstract
The Tibetan Schizothoracinae fish Gymnocypris przewalskii has the ability to adapt to the extreme plateau environment, making it an ideal biological material for evolutionary biology research. However, the lack of well-annotated reference genomes has limited the study of the molecular genetics of G. przewalskii. To characterize its transcriptome features, we first used long-read sequencing technology in combination with RNA-seq for transcriptomic analysis. A total of 159,053 full-length (FL) transcripts were captured by Iso-Seq, having a mean length of 3,445 bp with N50 value of 4,348. Of all FL transcripts, 145,169 were well-annotated in the public database and 134,537 contained complete open reading frames. There were 4,149 pairs of alternative splicing events, of which three randomly selected were defined by RT-PCR and sequencing, and 13,293 long non-coding RNAs detected, based on all-vs.-all BLAST. A total of 118,185 perfect simple sequence repeats were identified from FL transcripts. The FL transcriptome might provide basis for further research of G. przewalskii.Entities:
Keywords: zzm321990 Gymnocypris przewalskiizzm321990 ; alternative splicing; full-length transcriptome; gene expression; single-molecule sequencing
Year: 2021 PMID: 33989386 PMCID: PMC8320875 DOI: 10.1093/dnares/dsab005
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1Sampling points’ diagram of G. przewalskii in Qinghai province. The sampling map was created by the ArcGIS v10.1 (ESRI, CA, USA) and the sample site represented by the black triangle.
Description of the transcriptome of G. przewalskii by PacBio Iso-Seq and Illumina RNA-seq
| Parameter | PacBio Iso-Seq | Illumina RNA-seq |
|---|---|---|
| Number of subreads or raw reads | 12,898,077 | 285,490,628 |
| Reads of CCS or clean reads | 568,004 | 281,915,120 |
| Number of FLNC | 508,704 | – |
| Full-length transcriptome | – | – |
| Number of transcripts | 159,053 | 164,142 |
| Mean length (bp) | 3,445 | 1,426 |
| Smallest length (bp) | 175 | 188 |
| largest length (bp) | 14,106 | 67,560 |
| N50 length (bp) | 4,348 | 2,940 |
Figure 2The length distribution of FLNC obtained by Iso-Seq. The x-axis represents the FLNC length, and the y-axis represents the number of the FLNC.
Comparison of isoforms before and after RNA-seq data correction
| Parameter | Before correction by RNA-seq data | After correction by RNA-seq data |
|---|---|---|
| Isoforms number | 214,911 | 214,911 |
| Average length | 3,208 | 3,209 |
| Maxnum length | 14,075 | 14,106 |
| Minnum length | 175 | 175 |
| N50 | 4,281 | 4,281 |
Figure 3FL transcripts annotation percentage in NR, GO, KOG and Swiss-Prot databases.
Figure 4.Homologous species annotation. The species identified by homology search against the NCBI NR databases. Note that only the top five for transcripts are covered in the analysis.
Figure 5.Determination of cut-off of encoding potential. Performance evaluation using 10-fold.
Figure 6.Overview of SSRs isolated from FL transcripts of G. przewalskii. (A) The number of SSRs with different repeats and motifs. (B) The dominant motifs of dinucleotide, trinucleotide and tetranucleotide SSRs.
Primer sequences used in validation of AS events
| Primer | Sequences 5'-3' |
|---|---|
| Gym.prz_8755 F | AGGATGATGATGGCGAGGAT |
| Gym.prz_8755 R | CGGATTGCCGTTAGCACTAG |
| Gym.prz_151000F | CAAGTTGAAGGAGCAAGAGTGC |
| Gym.prz_151000 R | CTTCATTAGGAATGGGCTGTGA |
| Gym.prz_131234 F | GGCTGCTCTGTTCGTTAGCC |
| Gym.prz_131234 R | CCTCCTCCTTTCTTTGCGTTAA |
Figure 7.Transcript numbers of tissue-specific genes in various tissues. Brain showed the greatest number of tissue-specific transcripts, and kidney exhibited the least.
Figure 8.Quantitative real-time PCR confirmation of the transcript expression obtained by high-throughput sequencing. According to the housekeeping gene, the expression amount of the gene in the tissues was normalized, and the liver was homogenized into 1 serve as reference.