| Literature DB >> 22470422 |
Danilo Pellin1, Paolo Miotto, Alessandro Ambrosi, Daniela Maria Cirillo, Clelia Di Serio.
Abstract
We propose a new method for smallRNAs (sRNAs) identification. First we build an effective target genome (ETG) by means of a strand-specific procedure. Then we propose a new bioinformatic pipeline based mainly on the combination of two types of information: the first provides an expression map based on RNA-seq data (Reads Map) and the second applies principles of comparative genomics leading to a Conservation Map. By superimposing these two maps, a robust method for the search of sRNAs is obtained. We apply this methodology to investigate sRNAs in Mycobacterium tuberculosis H37Rv. This bioinformatic procedure leads to a total list of 1948 candidate sRNAs. The size of the candidate list is strictly related to the aim of the study and to the technology used during the verification process. We provide performance measures of the algorithm in identifying annotated sRNAs reported in three recent published studies.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22470422 PMCID: PMC3314655 DOI: 10.1371/journal.pone.0032723
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Outline of the bioinformatic pipeline.
(a) Construction of the effective target genome (ETG) in terms both of sequences and coordinates. (b) Construction of the two strand specific reads maps. (c) Construction of two strand specific conservation maps. (d) Combination of reads and conservation map to allow for the identification of putative sRNA encoding regions. (e) annotations of putative sRNA to assess their reliability.
Figure 2Reads maps construction.
Reads map (blue curve) is obtained by assembling together all reads (sequences in red) mapping uniquely and completely within the same IGR or AS region (sequence in black). The BioPerl procedure implemented merges NGS mappers output and T_IGRAScoord files.
Figure 3SRNA identification process.
For each IGR (sequence in black), reads (blue curve) and conservation (green curve) maps are superimposed. First Type A candidates (highlighted in blue) are identified and extracted by testing length constrains (conditions I and II) and reads coverage above ExprT1 (dotted blue line). On the remaining portions of IGRs, Type B candidates (highlight in yellow) are identified and extracted by testing length constrains (conditions I and II) and contemporaneously both reads coverage above ExprT2 and conservation depth above ConsT2 (dot and dashed yellow lines). Finally, Type C candidate (highlighted in green) are identified in the remaining IGRs on the basis of high sequence conservation (above ConsT1 threshold reported as dotted green line).
Complete list of weights w for conservation map calculation.
| Genome | Ref seq Accession |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| NC_010612 | 0.39919 |
|
| NC_002944 | 0.40764 |
|
| NC_005916 | 0.41261 |
|
| NC_008595 | 0.41444 |
|
| NC_002677 | 0.41792 |
|
| NC_011896 | 0.41802 |
|
| NC_008146 | 0.44789 |
|
| NC_008705 | 0.44793 |
|
| NC_009077 | 0.44918 |
|
| NC_014814 | 0.45108 |
|
| NC_009338 | 0.45114 |
|
| NC_008726 | 0.45129 |
|
| NC_008596 | 0.45333 |
|
| NC_010397 | 0.46 |
|
|
|
For each genome of the comparison set the corresponding Ref Seq accession number and the evolutionary distance from MTB genome are reported. The sum of w is equal to 6.22 that correspond to the upper limit of conservation value C.
Summary of reads coverage and conservation depth empirical distributions.
| Reads coverage | Conservation depth | |
|
| 0 |
|
|
| 0 |
|
|
| 1 |
|
|
| 6 |
|
|
| 22 |
|
|
| 50 |
|
|
| 260 |
|
|
| 79033 |
|
Candidate classification based on candidate definition provided in 2.4.
| Coding region |
| 5′/3′ | |
| Strand + | |||
| type A | 329 | 190 | 82 |
| type B | 0 | 61 | 0 |
| type C | 0 | 229(190+39) | 0 |
| Strand − | |||
| type A | 432 | 217 | 128 |
| type B | 0 | 53 | 0 |
| type C | 0 | 227 (186+41) | 0 |
| 761 | 977 | 210 | |
| 1948 | |||
Comparison with Arnvig, et al. [10] annotated sRNA.
| Arnvig, KB., | Candidate identified by our method | ||||||||
| sRNA | Start | End | Id | Type | Start | End | meanExpr | meanCons | mfePvalue |
| Trans-encoded sRNA | |||||||||
| B11 | 4099478 | 4099386 | candidate_1603 | A | 4099477 | 4099384 | 2564.23 | 4.12 | 0 |
| B55 | 704187 | 704247 | candidate_84 | A | 704187 | 704246 | 3713.08 | 0.13 | 0 |
| C8 | 4168281 | 416815441682124168224 | candidate_1621 | A | 4168281 | 4168193 | 970.13 | 3.68 | 0.08 |
| F6 | 293604 | 293641293661293705 | candidate_29 | A | 293604 | 293662 | 634.93 | 1.88 | 0 |
| G2 | 19151901915028 | 19149621914977 | candidate_1269 | A | 1915164 | 1915013 | 548.41 | 0.14 | 0.01 |
| Cis-encoded sRNA | |||||||||
| ASdes | 918264918350918365 | 918432918412918458 | candidate_121 | A | 918327 | 918360 | 256.03 | 0 | 0.47 |
Comparison with DiChiara, et al. [11] sRNAs annotated in Mycobacterium bovis BCG.
| SRNAs verified in | Candidate identified by our method | |||||||||
| Id | Start | End | Verify by Northern in MTB | Id | Type | Start | End | meanExpr | meanCons | e-value |
| Mcr3,Mpr7 | 1498201 | 1498256 | + | candidate_190 | A | 1471657 | 1471737 | 24422.29 | 3 | 1.00E-024 |
| Mcr4 | 2137103 | 2137148 | − | candidate_1314 | A | 2136173 | 2136126 | 10706 | 0.22 | 1.00E-021 |
| Mcr6 | 4141762 | 4141802 | + | candidate_1621 | A | 4168281 | 4168193 | 970.13 | 3.68 | 9.00E-019 |
| Mcr8 | 4073966 | 4073908 | + | candidate_1935 | C | 4100859 | 4100792 | 13.72 | 3.06 | 3.00E-029 |
| Mcr9,Mpr14 | 3317517 | 3317634 | − | candidate_1502 | A | 3363153 | 3363023 | 215.58 | 2.04 | 3.00E-064 |
| Mcr11 | 1439808 | 1439904 | + | candidate_1693 | B | 1413139 | 1413102 | 330.15 | 0.96 | 1.00E-016 |
| Mcr14 | 321693 | 321658 | + | candidate_1676 | B | 293659 | 293603 | 388.51 | 1.96 | 6.00E-013 |
| Mpr1 | 2813325 | 2813408 | − | candidate_801 | C | 2849576 | 2849542 | 0.83 | 2.58 | 8.00E-015 |
| Mpr3 | 935527 | 935628 | − | candidate_710 | C | 905089 | 905185 | 19.96 | 4.8 | 3.00E-042 |
| Mpr4 | 4073793 | 4074096 | + | candidate_561 | A | 4100684 | 4100816 | 167.03 | 2.37 | 1.00E-072 |
| Mpr5 | 1205934 | 1205651 | + | candidate_1142 | A | 1292095 | 1291823 | 272.63 | 0 | 0.09 |
| Mpr8 | 1857323 | 1857431 | − | candidate_1822 | C | 1852175 | 1852137 | 3.1 | 2.59 | 4.00E-017 |
| Mpr10 | 2300720 | 2300817 | − | candidate_330 | A | 2522164 | 2522218 | 190.9 | 2.01 | 1.00E-007 |
| Mpr15 | 3506470 | 3506359 | + | candidate_846 | C | 3551163 | 3551221 | 0 | 3.19 | 5.00E-026 |
| Mpr18 | 4066488 | 4066544 | + | candidate_1155 | A | 1321893 | 1321807 | 147.74 | 0 | 0.02 |
| Mpr19 | 4072633 | 4072493 | + | candidate_877 | C | 4099384 | 4099497 | 3.26 | 3.93 | 3.00E-046 |
| Mpr21 | 818357 | 818428 | − | candidate_1881 | C | 3215656 | 3215591 | 1.53 | 3.12 | 2.00E-008 |