| Literature DB >> 24405808 |
Cheng-Yu Hou, Ming-Tsung Wu, Shin-Hua Lu, Yue-Ie Hsing, Ho-Ming Chen1.
Abstract
BACKGROUND: Degradation is essential for RNA maturation, turnover, and quality control. RNA degradome sequencing that integrates a modified 5'-rapid amplification of cDNA ends protocol with next-generation sequencing technologies is a high-throughput approach for profiling the 5'-end of uncapped RNA fragments on a genome-wide scale. The primary application of degradome sequencing has been to identify the truncated transcripts that result from endonucleolytic cleavage guided by microRNAs or small interfering RNAs. As many pathways are involved in RNA degradation, degradome data should contain other RNA species besides the cleavage remnants of small RNA targets. Nevertheless, no systematic approaches have been established to explore the hidden complexity of plant degradome.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24405808 PMCID: PMC3898255 DOI: 10.1186/1471-2164-15-15
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Analysis schemas for motifs associated with uncapped 5′-ends. The analysis pipeline for identifying motifs associated with uncapped 5′-ends (A). Uncapped 5′-ends were first classified by the genomic region where they were produced by mapping with Bowtie. Selected uncapped 5′-ends representing as major peaks were further filtered with the binomial test. After filtering known miRNA targets, all the uncapped 5′-ends which passed the threshold of the binomial test or the top 1000 most abundant ends in each genomic region were subjected to motif analysis with the MEME suite. The spatial relationship between motifs and uncapped reads on a genome-wide scale was further explored by MORPH (B). All loci containing a candidate motif in a specific genomic region were first identified and then clustered based on the distribution of normalized reads in a 20-nt region flanking the motif. With the first nucleotide of the motif set as 1, positions upstream of the motif are indicated as negative values and positions downstream are indicated as positive values. The position of an uncapped read was determined by its 5′-end and the number of uncapped reads at each position was normalized to the total read number within the 20-nt region and represented as a heat map.
Motifs identified from the analysis of predominant uncapped 5′-ends in Arabidopsis and rice degradome libraries
| 1 RTGATGA | TWF (At) | IGR | KRTGATGA | 7.60E-22 | 5 | 28(1000) |
| Tx4F (At) | intron | RATGATGA | 2.00E-06 | 4 | 13(770) | |
| INF9311a (Os) | intron | RTGATGA | 7.70E-05 | 6 | 20(817) | |
| NPBs (Os) | IGR | DRTGATGA | 6.40E-24 | 5 | 37(1000) | |
| NPBs (Os) | intron | RTGATGAD | 2.00E-11 | 6 | 20(1000) | |
| 2 TGTAHAKA | TWF (At) | 3′UTR | TGTAHATA | 2.00E-82 | 4 | 110(1000) |
| Tx4F (At) | 3′UTR | TGTAHAKW | 4.40E-52 | 3 | 72(1000) | |
| INF9311a (Os) | intron | YTGTAMAK | 1.10E-21 | 3 | 55(817) | |
| INF9311a (Os) | CDS | TGTACAG | 1.20E-07 | 4 | 27(1000) | |
| INF9311a (Os) | 3′UTR | YTGTAHAK | 1.00E-376 | 3 | 320(1000) | |
| INF939 (Os) | 3′UTR | HTGTAMWK | 3.50E-135 | 3 | 119(1000) | |
| NPBs (Os) | 3′UTR | YTGTAMAK | 1.30E-164 | 3 | 174(1000) | |
| NPBs (Os) | IGR | TGTAHAKW | 5.70E-26 | 4 | 62(1000) | |
| NPBs (Os) | intron | TGTACAKA | 1.30E-22 | 4 | 55(1000) | |
| 3 AATAAA | Tx4F (At) | 3′UTR | AAYAAARV | 2.30E-10 | 4 | 60(1000) |
| 4 CACACACA | INF939 (Os) | CDS | CACACACA | 1.10E-01 | -1 | 15(599) |
| INF939 (Os) | 3′UTR | CACACACA | 2.70E-01 | -1 | 9(1000) | |
| 5 ATGTATGT | Col-0 (At) | 3′UTR | ATGTATGT | 1.70E-38 | -1 | 103(499) |
| 6 GTCTRGTG | Tx4F (At) | IGR | GTCTRGTG | 6.10E-05 | 16 | 12(1000) |
| 7 CAGAC | NPBs (Os) | 3′UTR | MCAGAC | 5.60E-02 | 1 | 40(1000) |
| 8 AAAAAAAA | INF9311a (Os) | IGR | AAAAAAAA | 2.40E-07 | 12 | 16(1000) |
| 9 GTCCGAC | Tx4F (At) | CDS | AGTCCGAC | 9.20E-21 | -8 | 35(1000) |
| INF9311a (Os) | CDS | AGYCCGAC | 1.50E-64 | -8 | 81(1000) | |
| INF939 (Os) | CDS | AGTCCGAC | 4.60E-31 | -8 | 60(599) | |
| INF939 (Os) | 3′UTR | RSYCCRAC | 1.30E-07 | -8 | 59(1000) | |
| NPBs (Os) | CDS | ASKCCGAC | 8.90E-258 | -8 | 298(1000) | |
| NPBs (Os) | 3′UTR | VBCCGACH | 8.90E-51 | -7 | 85(1000) | |
| NPBs (Os) | intron | SKCCGACH | 1.10E-09 | -7 | 30(1000) | |
| 10 GATCCAAC | AxIDT (At) | 3′UTR | GATCCAAM | 4.50E-03 | -8 | 10(793) |
| AxIRP (At) | CDS | GRTCCAAC | 1.00E-126 | -8 | 121(1000) | |
| AxIRP (At) | 5′UTR | RATCCAAC | 5.00E-19 | -8 | 49(1000) | |
| AxIRP (At) | intron | GRTCCAAC | 7.10E-01 | -8 | 18(1000) | |
| AxSRP (At) | CDS | GATCCAAC | 8.40E-40 | -8 | 45(1000) | |
| AxSRP (At) | 5′UTR | GATCCAAC | 9.80E-07 | -8 | 22(1000) | |
| AxSRP (At) | intron | GATCCAAC | 3.70E-01 | -8 | 15(1000) | |
| 11 GACGATC | Col-0 (At) | 3′UTR | VMGACGAT | 3.40E-02 | -9 | 15(499) |
| 3′UTR | CGACGATY | 3.20E-06 | -8 | 23(153) | ||
| CDS | SGACGWTY | 1.50E-03 | -8 | 17(476) |
aAt: Arabidopsis; Os: rice.
bIGR, UTR and CDS indicate the intergenic region, the untranslated region and the coding sequence, respectively.
cSyntax for multiple bases: B = C/G/T, D = A/G/T, H = A/C/T, K = G/T, M = A/C, R = A/G, S = G/C, V = A/C/G, W = A/T, Y = C/T.
dE-value is the estimated number of (equally or more significant) motifs that one would expect to find by chance if the input sequences were shuffled.
ePosition indicates the predominant position of the first nucleotide of the motif relative to the uncapped 5′-end revealed by deep sequencing which was set to 1. Upstream positions are indicated as negative values and downstream positions are indicated as positive values.
fThe numbers indicate sites possessing the indicated motif at the specific position among the number of input sequences (in parentheses) for MEME analysis.
Figure 2The 5′-ends of rice snoRNAs were precisely captured in degradome data. Motifs corresponding to the snoRNA C box are indicated in rice PARE reads and the schematic structure of a canonical C/D box snoRNA (A). Uncapped reads produced around the 5′-ends of rice C/D box snoRNAs (B), H/ACA box snoRNAs (C), and other ncRNAs (D) identified previously are visualized by cluster heat maps. The first nucleotide of a snoRNA or an ncRNA was set to 1. The numbers of snoRNAs and ncRNAs reported previously (indicated in parentheses) and possessing uncapped reads in the region inspected are indicated above the heat map. Loci were clustered based on the distribution of normalized read numbers across the 20-nt region by Ward’s method.
Figure 3Position-specific enrichment of uncapped 5′-ends surrounding putative PUF binding sites. Distribution of normalized reads around putative PUF binding sites in the 3′ UTR of Arabidopsis genes with deep sequencing data derived from the PARE method (A), degradome sequencing (C), and GMUCT method (D) and rice (B), soybean (E) and yeast (F) genes with PARE data. Motifs were boxed and the first nucleotide of motifs was set as 1. Loci containing the motif of interest were identified from the 3′ UTR of all annotated genes and the number is shown in parentheses above the heat map. For Arabidopsis and rice, only loci with a total read number greater than five in the 20-nt region are shown and the number of loci in each heat map is also indicated above the heat map. Loci were clustered based on the distribution of normalized read numbers across the 20-nt region by Ward’s method. Uncapped 5′-ends associated with putative PUF binding sites in Arabidopsis (G) and rice (H) were independently validated by the modified 5′ RACE protocol. The frequency of uncapped 5′-ends among clones sequenced at the position corresponding to the dominant termini supported by deep sequencing data is indicated with a filled arrow whereas at other positions it is indicated with an open arrow. Putative PUF binding sites are boxed.
Figure 4Position-specific enrichment of uncapped 5′-ends surrounding a poly(A) signal-like element. Distribution of normalized reads around the AATAAA and AAAAAA sites in the 3′ UTR of Arabidopsis genes (A and C) and rice genes (B and D) is visualized by MORPH. Loci containing the motif of interest were identified from the 3′ UTR of all annotated genes and the number is shown in parentheses above the heat map. Only loci with a total read number greater than five in the 20-nt region are shown and the number of loci in each heat map is also indicated above the heat map. Motifs were boxed with the first nucleotide set as 1. Loci were clustered based on the distribution of normalized read numbers across the 20-nt region by Ward’s method. Base composition upstream and downstream of the AATAAA sites identified in the motif analysis of Tx4F library in the 3′ UTR of Arabidopsis genes (E).
Figure 5Position-specific enrichment of uncapped 5′-ends surrounding three motifs recognized by RNA-binding proteins. Distribution of normalized reads around three motifs recognized by RNA-binding proteins in the 3′ UTR of Arabidopsis genes (A , C and E) and rice genes (B , D and F) is visualized by MORPH. ACHTT and TGGA are motifs bound by Physcomitrella patens Phpat.016g078400 (Pp_0206) and Phpat.012g050300 (Pp_0237) respectively. GAACA is a motif bound by Ostreococcus tauri Ot09g01160 (Ot_0263). Loci containing the motif of interest were identified from the 3′ UTR of all annotated genes and the number is shown in parentheses above the heat map. Only loci with a total read number greater than five in the 20-nt region are shown and the number of loci in each heat map is also indicated. Motifs were boxed with the first nucleotide set as 1. Loci were clustered based on the distribution of normalized read numbers across the 20-nt region by Ward’s method.
Figure 6Position-specific enrichment of uncapped 5′-ends surrounding a CAGAC motif in the 3′ UTR. Distribution of normalized reads around a CAGAC motif in the 3′ UTR of rice genes (A), Arabidopsis genes (B and C) and soybean genes (D) is visualized by MORPH. Loci containing the motif of interest were identified from the 3′ UTR of all annotated genes and the number is shown in parentheses above the heat map. For rice and Arabidopsis, only loci with a total read number greater than five in the 20-nt region are shown and the number of loci in each heat map is also indicated. The motif was boxed with the first nucleotide set as 1. Loci were clustered based on the distribution of normalized read numbers across the 20-nt region by Ward’s method.
Figure 7Potential false uncapped 5′-ends caused by non-specific PCR amplification. Schemas depict models of uncapped transcripts (A) and capped transcripts (B) captured by the PARE protocol. The 5′ adaptor primer was perfectly annealed to cDNA corresponding to the 5′ RNA adaptor ligated to uncapped transcripts whereas it was partially annealed at its 3′-end to the internal region of capped cDNA. Distribution of normalized uncapped reads around the GTCCGAC sites in the CDS of Arabidopsis (C) and rice (D) genes with PARE libraries is visualized by MORPH. Reads surrounding motifs corresponding to the 3′ end of the 5′ adaptors used in degradome sequencing (E) and GMUCT (F) method are also visualized by MORPH. Motifs were boxed with the first nucleotide set as 1. Loci containing the motif of interest were identified from the CDS of all annotated genes and the number is shown in parentheses above the heat map. Only loci with a total read number greater than five in the 20-nt region are shown and the number of loci in each heat map is also indicated. Motifs were boxed with the first nucleotide set as 1. Loci were clustered based on the distribution of normalized read numbers across the 20-nt region by Ward’s method.