| Literature DB >> 26528307 |
Xinlong Xiao1, Jinbiao Ma2, Yufang Sun1, Yinan Yao2.
Abstract
RNA-sequencing has been widely used to obtain high throughput transcriptome sequences in various species, but the assembly of a full set of complete transcripts is still a significant challenge. Judging by the number of expected transcripts and assembled unigenes in a transcriptome library, we believe that some unigenes could be reassembled. In this study, using the nitrate transporter (NRT) gene family and phosphate transporter (PHT) gene family in Salicornia europaea as examples, we introduced an approach to further assemble unigenes found in transcriptome libraries which had been previously generated by Trinity. To find the unigenes of a particular transcript that contained gaps, we respectively selected 16 NRT candidate unigene pairs and 12 PHT candidate unigene pairs for which the two unigenes had the same annotations, the same expression patterns among various RNA-seq samples, and different positions of the proteins coded as mapped to a reference protein. To fill a gap between the two unigenes, PCR was performed using primers that mapped to the two unigenes and the PCR products were sequenced, which demonstrated that 5 unigene pairs of NRT and 3 unigene pairs of PHT could be reassembled when the gaps were filled using the corresponding PCR product sequences. This fast and simple method will reduce the redundancy of targeted unigenes and allow acquisition of complete coding sequences (CDS).Entities:
Keywords: RNA-seq; RPKM; Salicornia europaea; expression pattern; nitrate transporter gene; phosphate transporter gene; unigene assembly
Year: 2015 PMID: 26528307 PMCID: PMC4604318 DOI: 10.3389/fpls.2015.00843
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Schematic diagram for detecting the expression of a transcript on different fragments by RT-qPCR and RNA-seq. Eq. expression quantity. Black arrows represent primer pairs of RT-qPCR.
Figure 2Flow chart for distinguishing and assembling potential unigenes for a transcript in a transcriptome library. The transcriptome library contained all of the unigenes which were assembled by Trinity from RNA-seq reads.
Quality assessment of Trinity assembly results.
| Unigene_Se200S | 101751 | 44551677 | 438 | 547 |
| Unigene_SeCKS | 97865 | 41490134 | 424 | 523 |
| Unigene_Se200R | 140086 | 56587612 | 404 | 482 |
| Unigene_SeCKR | 122728 | 52358894 | 427 | 528 |
| Unigene_All | 142721 | 81719801 | 573 | 780 |
The unigenes with length under 200 nt were excluded. N50 represents the length for which 50% of the sequence in an assembly is in contigs of this size or larger.
Summary of .
| 75 | 1023 | 65 | 53 | 521–636 | |
| 37 | 615 | 36 | 7 | 493–557 | |
| 6 | 989 | 1 | 2 | 209–210 |
Summary of .
| 24 | 832 | 21 | 9 | 516–542 | |
| 2 | 2543 | 2 | 1 | 613 | |
| 12 | 569 | 11 | 3 | 309–375 | |
| 9 | 1154 | 6 | 6 | 432–541 |
Primers and estimated PCR products for 16 pairs of selected .
| 1 | Unigene53952_All + Unigene31144_All | 12–101+208-3′ | 107 aa | GACTACCAAGGAAATCCAGTGG/CAGGAAGGGCAAGACAACG | 721 | 1042 |
| 2 | Unigene67667_All + Unigene54470_All | 5′–177+359–3′ | 182 aa | TTGGTCCTTTGCTTGGTGC/TTTTCGATGAGGGCGGC | 575 | 1121 |
| 3 | Unigene68619_All + Unigene91547_All | 5′–170+164–234 | none | CAAAGTGACAAATGGGAAGG/ACAATAGAGTCGTGGTGGAGAT | 700 | 700 |
| 4 | Unigene60049_All + Unigene54473_All | 4–145+136–3′ | none | CAGCAGTGGGAAACAACCT/ACCAGAAACCAAGCAAATCA | 599 | 599 |
| 5 | Unigene31738_All + Unigene31143_All | 5′–212+203–3′ | none | CGTTGTCTTGCCCTTCCTG/GCCATACTTCTCATATTCTCTGG | 695 | 695 |
| 6 | Unigene63092_All + Unigene71213_All | 5′–80+78–3′ | none | ATCACACCAGCAGAACACGT/CAAGAACACCCCAAAATCAA | 896 | 896 |
| 7 | Unigene49607_All + Unigene30293_All | 59–245+311–3′ | 66 aa | TGCCTTCCTTAGTGATTCCTAT/AGATTTCCACAACTCCTGCC | 768 | 966 |
| 8 | Unigene63092_All + Unigene34485_All | 5′–80+218–3′ | 138 aa | GTGGGGTGACCAAGAAGAGA/TAGGAGGATGCTGGCGATG | 768 | 1182 |
| 9 | Unigene85390_All + Unigene23477_All | 268–392+466–541 | 74 aa | ACTAGGGGGATTAGGCCTTT/TGTCGTCGTTTGAACTATGGA | 479 | 701 |
| 10 | Unigene71982_All + Unigene61016_All | 5′–321+482–3′ | 162 aa | TTGGTTTTTTGGCTTCTGC/CCTCCTCCTTTATCTTCTGTGA | 560 | 1046 |
| 11 | Unigene34465_All + Unigene61016_All | 5′–430+482–3′ | 52 aa | CACTTCCACTTACCGCCAC/CGTCCTCCTCCTTTATCTTCTG | 731 | 887 |
| 12 | Unigene49930_All + Unigene54471_All | 41–122+359–3′ | 237 aa | CACCTTGGGCTTCGCATA/AACGGCAGTCATCATTTCG | 563 | 1274 |
| 13 | Unigene54452_All + Unigene44258_All | 5′–243+423–463 | 180 aa | GGGTTTTGTTTCGGGGTG/GGATTAAAGGCTTGTCAGGC | 816 | 1356 |
| 14 | Unigene34113_All + Unigene5588_All | 5′–258+415–3′ | 157 aa | GTTCCTCATGCCCCTTGTC/TTGGTGCTTCTCTCGTTTCTAC | 465 | 936 |
| 15 | Unigene39295_All + Unigene80102_All | 165–252+381–413 | 129 aa | TGTTATCCTTAAAACAGGGG/TTGAGTTATGACAGCACCC | 437 | 824 |
| 16 | Unigene136150_All + Unigene132777_All | 5′–66+402–3′ | 336 aa | ATTATCAATAGCAAAGCCTC/CAACTGCTCCACTACCCT | 417 | 1425 |
Reference location indicated the location of unigene coding protein aligning with protein database of Arabidopsis thaliana. aa, amino acid; bp, base pair.
Primers and estimated PCR products for 12 pairs of selected .
| 1 | Unigene91380_All + Unigene105912_All | 82–170+348–448 | 178 aa | TAGTTGTGGAGGAGAAATGG/AGGATGGAGAAGGTGACG | 558 | 1092 |
| 2 | Unigene5546_All + Unigene11489_All | 5′–157+229–319 | 72 aa | CTTTTGGGGCGTCTGTA/GAACCACGTGCTTGTGG | 553 | 769 |
| 3 | Unigene5546_All + Unigene47851_All | 5′–157+166–3′ | 9 aa | CTTTTGGGGCGTCTGTAC/TTCCCTTCGCTTTTTGTG | 677 | 704 |
| 4 | Unigene16022_All + Unigene50223_All | 5′–91+417–3′ | 326 aa | ATTTCATCACACACCCAG/ATTTACCCATAGACTCCG | 854 | 1832 |
| 5 | Unigene16022_All + Unigene90558_All | 5′–91+417–3′ | 326 aa | ATTTCATCACACACCCAGA/CAAGAGATTTACCCATAGACTC | 860 | 1838 |
| 6 | Unigene11489_All + Unigene63539_All | 229–319+412–493 | 93 aa | CAGATAGACGCCGATGAGG/GCAACCAAGAACAATAAGAGAG | 416 | 695 |
| 7 | Unigene48391_All +Unigene44133_All | 5′–63+404–3′ | 341 aa | CTTTCCACCTCAAGTCAT/GTGCAACCAAGAACAATAAGA | 587 | 1610 |
| 8 | Unigene53055_All + Unigene44133_All | 5′–163+404–3′ | 241 aa | TCTGTATCTCCCTCCTAAC/CCAAGAACAATAAGAGAGTT | 632 | 1355 |
| 9 | Unigene125621_All + Unigene61379_All | 5′–68+131–277 | 63 aa | CCCCATCTCCACTCCTGTT/AAGGTTGACGGTGGTGTTC | 566 | 755 |
| 10 | Unigene141416_All + Unigene129694_All | 5′–233+210–281 | none | CTGCTATCACCCCTCTTG/CCTTGGTTCAATTTTTATCC | 745 | 745 |
| 11 | Unigene141415_All + Unigene129694_All | 8–87+210–281 | 123 aa | AAAGTATAATGGTAGGAAGTCC/GTTCAATTTTTATCCTCAGG | 458 | 827 |
| 12 | Unigene29837_All + Unigene46396_All | 5′–230+292–3′ | 62 aa | GTGTTGCTTTGTGGTCCT/CCCATGTAATATTCCGGT | 962 | 1148 |
Reference location indicated the location of unigene coding protein aligning with protein database of Arabidopsis thaliana. aa, amino acid; bp, base pair.
Figure 3Agarose gel electrophoresis of PCR products for unigene pairs. PCR reactions were performed with corresponding primer pairs and a pooled cDNA template. Amplified fragments were separated by 1.2% agarose gel electrophoresis. (A) PCR products for NRT unigenes. (B) PCR products for PHT unigenes. M: DNA marker with bands of 2000 bp, 1000 bp, 750 bp, 500 bp, 250 bp, 100 bp from high to low, respectively. The serial numbers represent corresponding PCR reactions for various unigene pairs, which are coincident with the number in Tables 3A,B.
Information of assembled .
| >Assembly 3 (Unigene68619_All + Unigene91547_All) | 762 | 215 | 953 | 293 | no | Nitrate transporter 1:2 ( |
| >Assembly 4 (Unigene60049_All + Unigene54473_All) | 429 | 1861 | 2254 | 591 | no | Probable peptide/nitrate transporter (Arabidopsis thaliana) |
| >Assembly 7 (Unigene49607_All + Unigene30293_All) | 561 | 958 | 1672 | 431 | no | Nitrate transporter 1.5 (Arabidopsis thaliana) |
| >Assembly 11 (Unigene34465_Al + Unigene61016_All) | 1655 | 695 | 2498 | 626 | yes | Putative peptide/nitrate transporter (Arabidopsis thaliana) |
| >Assembly 14 (Unigene34113_All + Unigene5588_All) | 821 | 747 | 2030 | 504 | yes | Nitrate transporter2.5 (Arabidopsis thaliana) |
Information of assembled .
| >Assembly 6 (Unigene11489_All + Unigene63539_All) | 274 | 249 | 789 | 263 | no | Phosphate transporter (Arabidopsis thaliana) |
| >Assembly 8 (Unigene53055_All + Unigene44133_All) | 619 | 621 | 1959 | 534 | yes | Putative inorganic phosphate transporter 1-3 (Arabidopsis thaliana) |
| >Assembly 10 (Unigene141416_All + Unigene129694_All) | 722 | 238 | 1055 | 345 | no | Phosphate transporter 3;1 (Arabidopsis thaliana) |
Gene specific primers for determining expression of unigenes using RT-qPCR.
| Unigene60049_All | 429 | NRT1/PTR | TCCCTCCTTGGTGGTTACCT | GTTGTGGTAGGTGTGCTTGC |
| Unigene54473_All | 1861 | NRT1/PTR | AAGTAAGCCCGGACGTGAAG | ATGCATGCCTTGTCCAAACAC |
| Unigene34113_All | 821 | NRT2 | CAGTTCCTCATGCCCCTTGT | TTGGCCGAAAAGCATGACTG |
| Unigene5588_All | 747 | NRT2 | AGTGGGGAGGTGCATTTTGT | TCGTTTCTACTGCCTTCAGCA |
Figure 4RT-qPCR shows that unigenes for the same transcript have the same expression levels. The same letter indicates no significant difference. T1, T2: the treatment under NaCl stress or NaCl-free conditions. (A) Unigenes pair 4 (NRT). (B) Unigenes pair 14 of (NRT).
Figure 5Expression pattern for 8 pairs of unigenes belonging to a transcript. (A–E): NRT unigene pairs. (F–H): PHT unigene pairs. RPKM (reads per kilobase of exon model per million mapped reads): a method for indicating the expression level of a unigene. Four transcriptome libraries (200S, CKS, 200R, CKR) provided samples for RNA-seq.