| Literature DB >> 32463462 |
Joshua Thody1, Leighton Folkes2, Vincent Moulton1.
Abstract
Natural antisense transcript-derived small interfering RNAs (nat-siRNAs) are a class of functional small RNA (sRNA) that have been found in both plant and animals kingdoms. In plants, these sRNAs have been shown to suppress the translation of messenger RNAs (mRNAs) by directing the RNA-induced silencing complex (RISC) to their sequence-specific mRNA target(s). Current computational tools for classification of nat-siRNAs are limited in number and can be computationally infeasible to use. In addition, current methods do not provide any indication of the function of the predicted nat-siRNAs. Here, we present a new software pipeline, called NATpare, for prediction and functional analysis of nat-siRNAs using sRNA and degradome sequencing data. Based on our benchmarking in multiple plant species, NATpare substantially reduces the time required to perform prediction with minimal resource requirements allowing for comprehensive analysis of nat-siRNAs in larger and more complex organisms for the first time. We then exemplify the use of NATpare by identifying tissue and stress specific nat-siRNAs in multiple Arabidopsis thaliana datasets.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32463462 PMCID: PMC7337908 DOI: 10.1093/nar/gkaa448
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The three types of cis-NAT orientation that can form dsRNA: 5′ overlap (head-to-head) (A), 3′ overlap (tail-to-tail) (B) and the complete enclosure of one transcript by the other (full overlap) (C). Transcript sequences are always transcribed in the 5′ direction and are represented by arrows. Regions of complementarity between the two sequences are represented by dashed lines.
Figure 2.A visual overview of the NATpare pipeline. Input and output data are represented by ovals and processes are represented by rectangles. Data input or processing steps surrounded by dashed lines are optional and dependent on the provided input data. NATpare takes as input HTS data (sRNA and degradome) along with a reference transcriptome and outputs a list of predicted nat-siRNA. Additional annotation information, in the form of a GFF3 file, can be used to annotate the predicted NATs (cis or trans) by incorporating genomic origin.
The configurable parameters for NATpare. The values used during analysis can be changed by modifying the input configuration file or by using the command line when running the tool.
| Parameter | Default value | Description |
|---|---|---|
| Minimum overlap length | 100 | Minimum length of the annealed region between NATs |
| Minimum sRNA phases | 1 | Minimum number of sRNA alignment phases (shown in Figure |
| Minimum sRNA length | 19 | Minimum input sRNA length |
| Maximum sRNA length | 24 | Maximum input sRNA length |
| Minimum sRNA abundance | 1 | Minimum input sRNA abundance |
| Minimum tag length | 19 | Minimum length of degradome reads |
| Maximum tag length | 21 | Maximum length of degradome reads |
|
| true | Only search for NATs with perfectly complementary or from the same genomic location |
| Coverage ratio | 80% | The percentage of overlap required between the BLAST and RNAplex alignments |
| Largest bubble region | 10% | Largest non-complementary region in a |
| Low complexity filter | true | Discard input sequences based on their complexity |
| Genome alignment | true | If a genome is provided, discard any sRNAs that do not align |
Figure 3.The two types of adjacent sRNA alignment phases considered by NATpare. Adjacent sRNA phases without overlap (A) are when the first position at the 5′ end of an aligned sRNA is adjacent to the last position at the 3′ end of another aligned sRNA. Adjacent sRNA phases with overlap (B) are where sRNA sequences align contiguously to a given transcript.
Computation performance comparison between NATpipe and the newly developed NATpare pipeline when evaluated on 5 simulated datasets. If the tool did not finish within 10 days it was recorded as did not finish (DNF).
| Species | Annotation version | # Transcripts | NATpipe time | NATpare time |
|---|---|---|---|---|
|
| SL3.0 | 33925 | DNF | 4 min 52 s |
|
| IRGSP-1.0 | 42378 | DNF | 5 min 38 s |
|
| TAIR10 | 48359 | 1 day 18 h 34 min | 11 min 15 s |
|
| G. max v2.1 | 88412 | DNF | 1 h 5 min |
|
| IWGSC | 133744 | DNF | 13 h 2 min |
Top 10 reported G. max cis-NATs with the highest number of unique reported nat-siRNAs by Zheng et al. (30) and the prediction results from NATpare and NATpipe.
| Gene A | Gene B | Overlap length | Zheng | NATpipe | NATpare |
|---|---|---|---|---|---|
| Glyma13g11940.1 | Glyma13g11970.1 | 542 | 1864 | 0 | 1802 |
| Glyma13g11820.1 | Glyma13g11830.1 | 428 | 1285 | 0 | 1406 |
| Glyma13g11940.1 | Glyma13g11950.1 | 147 | 724 | 0 | 576 |
| Glyma13g11940.1 | Glyma13g11960.1 | 118 | 509 | 0 | 487 |
| Glyma11g30060.1 | Glyma11g30070.1 | 392 | 244 | 209 | 237 |
| Glyma13g21780.1 | Glyma13g21790.1 | 355 | 28 | 0 | 28 |
| Glyma15g06490.1 | Glyma15g06500.1 | 156 | 26 | 0 | 26 |
| Glyma17g23860.1 | Glyma17g23870.1 | 174 | 18 | 11 | 11 |
| Glyma03g22390.1 | Glyma03g22400.1 | 276 | 17 | 16 | 17 |
| Glyma15g37470.1 | Glyma15g37480.1 | 764 | 15 | 0 | 15 |
The upregulated nat-siRNAs, as reported by iDEP, in the A. thaliana seedling salt-stress dataset. Ten of the 29 sequences originated from NAT pairs where one of the transcripts is annotated as a potential natural antisense gene. The transcript that gives rise to the largest number of nat-siRNAs is currently annotated as ‘unknown RNA’ and the corresponding NAT has an unknown function. Adjusted P-values were obtained using a false discovery rate of 0.1 and were expressed to three significant digits. Any extreme P-values (i.e. P < 0.001) were reported as P < 0.001.
| Sequence | Originating gene | Originating gene annotation | Corresponding NAT | Corresponding NAT annotation | log2fc | Adjusted |
|---|---|---|---|---|---|---|
| CAAAAACTGCTGAATCGTCGAGG | AT3G41761.1 | other RNA | AT3G41762.1 | unknown protein | 7.759025974 |
|
| CCGGCGACTTTTCCGGCGATCGG | 7.728742081 |
| ||||
| CAAAAACTGCTGAATCGTCGAGGA | 6.425292703 |
| ||||
| AAAAACTGCTGAATCGTCGAGG | 6.214796612 |
| ||||
| AAAAACTGCTGAATCGTCGAGGA | 6.133011199 |
| ||||
| CCGGCGACTTTTCCGGCGATCGGT | 5.961226293 |
| ||||
| CGGCGACTTTTCCGGCGATCGG | 5.539642282 |
| ||||
| CCGGCCGCCGGGATTTTCGCCGG | 5.283876137 |
| ||||
| AAAAACTGCTGAATCGTCGA | 4.989198969 |
| ||||
| GGCGACTTTTCCGGCGATCGG | 4.908960699 |
| ||||
| CCGGCCGCCGGGATTTTCGCCG | 4.22132342 |
| ||||
| GGCGACTTTTCCGGCGATCG | 4.117655813 |
| ||||
| AACTGCTGAATCGTCGAGG | 3.689930846 |
| ||||
| TCCGGCGACTTTTCCGGCGATCGG | 3.580577335 |
| ||||
| AAAACTGCTGAATCGTCGAGG | 3.080787251 |
| ||||
| CCGGCCGCCGGGATTTTCGCC | 2.703061225 |
| ||||
| AAACTGCTGAATCGTCGAGGA | 2.517183992 |
| ||||
| CAAAAACTGCTGAATCGTCGAG | 2.435469953 |
| ||||
| TAAGAGAGAACAAGGATGGTT | AT1G05560.1 | UDP-glucosyltransferase 75B1 | AT1G05562.1 | Potential natural antisense gene | 4.458736007 |
|
| GACAAGTAGAAAAAAAATGGCG | 3.790780596 |
| ||||
| AGTAGAAAAAAAATGGCGCCA | 3.258296457 |
| ||||
| CAAGTAGAAAAAAAATGGCGCC | 3.16407171 |
| ||||
| AAGTAGAAAAAAAATGGCGCC | 2.086703913 |
| ||||
| CAAGTAGAAAAAAAATGGCGC | 1.98426758 |
| ||||
| TGAGAATTTTCGGTTTGGTTT | AT1G05562.1 | Potential natural antisense gene | AT1G05560.1 | UDP-glucosyltransferase 75B1 | 5.178982904 |
|
| TTGTTTGTGTTGGAAGGTGTG | 4.804480168 |
| ||||
| AGACAGATTAGGTAACTCGAA | 2.199439073 |
| ||||
| GCGGCGGAGAAGTATGTGGATA | AT3G59068.1 | Potential natural antisense gene | AT3G59070.1 | Cytochrome b561/ferric reductase transmembrane with DOMON related domain | 4.908960699 |
|
| GCCACTACTCCCTCACGGCTCTGC | AT5G01600.1 | ferretin 1 | AT5G01595.1 | other RNA | 6.220625993 |
|