| Literature DB >> 22276113 |
Ranjit Kumar1, Mark L Lawrence, James Watt, Amanda M Cooksey, Shane C Burgess, Bindu Nanduri.
Abstract
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22276113 PMCID: PMC3262788 DOI: 10.1371/journal.pone.0029435
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1RNA-Seq data analysis workflow for intergenic expression analysis.
Analysis workflow includes identification of novel protein coding genes and sRNAs in the intergenic region of H. somni 2336 genome.
H. Somni 2336 sRNAs, their genome location, additional features and comparative genomics.
| ID | Start | End | Length (nt) | Promoter | Rho independent terminator | Flanking gene (left) | Flanking gene(right) | Rfam annotation | Conservation across other genome |
| HS1 | 8109 | 8210 | 101 | - | Y | HSM0009 (+) | HSM0010 (+) | - | C |
| HS2 | 16119 | 16190 | 71 | Y | Y | HSM0018 (+) | HSM0019 (−) | - | B |
| HS3 | 27693 | 27843 | 150 | Y | - | HSM0034 (+) | HSM0035 (+) | - | B |
| HS4 | 28211 | 28327 | 116 | Y | - | HSM0035 (+) | HSM0036 (+) | - | B |
| HS5 | 29449 | 29733 | 284 | Y | - | HSM0037 (+) | HSM0038 (+) | - | C |
| HS6 | 30644 | 30884 | 240 | Y | Y | HSM0039 (+) | HSM0040 (−) | - | B |
| HS7 | 91913 | 92041 | 128 | - | Y | HSM0081 (−) | HSM0082 (+) | - | C |
| HS8 | 113922 | 114088 | 166 | Y | Y | HSM0102 (+) | HSM0103 (−) | - | C |
| HS9 | 161819 | 161939 | 120 | Y | - | HSM0149 (+) | HSM0150 (+) | - | B |
| HS10 | 183097 | 183180 | 83 | Y | - | HSM0171 (−) | HSM0172 (+) | - | B |
| HS11 | 197465 | 197654 | 189 | - | - | HSM0185 (−) | HSM0186 (−) | - | B |
| HS12 | 229676 | 229780 | 104 | Y | - | HSM0213 (+) | HSM0214 (−) | - | C |
| HS13 | 243592 | 243683 | 91 | - | Y | HSM0224 (+) | HSM0225 (−) | - | A |
| HS14 | 258140 | 258443 | 303 | Y | Y | HSM0242 (−) | HSM0243 (−) | - | A |
| HS15 | 258962 | 259092 | 130 | Y | Y | HSM0244 (−) | HSM0245 (+) | - | A |
| HS16 | 260385 | 260577 | 192 | Y | Y | HSM0245 (+) | HSM0246 (−) | - | A |
| HS17 | 261314 | 261511 | 197 | - | Y | HSM0246 (−) | HSM0247 (−) | - | A |
| HS18 | 261829 | 262158 | 329 | Y | Y | HSM0246 (−) | HSM0247 (−) | - | A |
| HS19 | 264263 | 264362 | 99 | Y | - | HSM0250 (+) | HSM0251 (+) | - | A |
| HS20 | 279541 | 279733 | 192 | - | - | HSM0266 (−) | HSM0267 (+) | - | B |
| HS21 | 306503 | 306957 | 454 | Y | - | HSM0284 (+) | HSM0285 (−) | tmrna | D |
| HS22 | 318188 | 318443 | 255 | - | - | HSM0292 (+) | HSM0293 (−) | - | D |
| HS23 | 319335 | 319635 | 300 | - | - | HSM0295 (+) | HSM0296 (+) | - | B |
| HS24 | 341911 | 342015 | 104 | - | - | HSM0316 (−) | HSM0317 (−) | - | B |
| HS25 | 343637 | 343745 | 108 | - | - | HSM0318 (−) | HSM0319 (+) | - | B |
| HS26 | 377953 | 378037 | 84 | Y | - | HSM0345 (+) | HSM0346 (−) | - | B |
| HS27 | 383294 | 383421 | 127 | Y | - | HSM0346 (−) | HSM0347 (+) | - | A |
| HS28 | 385627 | 385733 | 106 | Y | Y | HSM0348 (−) | HSM0349 (+) | - | B |
| HS29 | 391757 | 392003 | 246 | - | - | HSM0353 (−) | HSM0354 (−) | - | C |
| HS30 | 412425 | 412522 | 97 | - | - | HSM0368 (−) | HSM0369 (−) | - | B |
| HS31 | 472256 | 472331 | 75 | Y | Y | HSM0407 (−) | HSM0408 (−) | - | B |
| HS32 | 524433 | 524805 | 372 | Y | - | HSM0448 (+) | HSM0450 (−) | - | A |
| HS33 | 599224 | 599416 | 192 | Y | Y | HSM0521 (−) | HSM0522 (+) | - | B |
| HS34 | 614906 | 615013 | 107 | - | - | HSM0538 (+) | HSM0539 (−) | intron_gpII | A |
| HS35 | 616791 | 617486 | 695 | Y | - | HSM0539 (−) | HSM0540 (−) | - | A |
| HS36 | 617726 | 618078 | 352 | Y | - | HSM0539 (−) | HSM0540 (−) | - | A |
| HS37 | 618122 | 618228 | 106 | Y | Y | HSM0539 (−) | HSM0540 (−) | - | A |
| HS38 | 637931 | 638076 | 145 | Y | Y | HSM0552 (+) | HSM0553 (+) | - | B |
| HS39 | 638222 | 638366 | 144 | - | - | HSM0552 (+) | HSM0553 (+) | - | B |
| HS40 | 653813 | 653962 | 149 | Y | - | HSM0561 (+) | HSM0562 (−) | - | A |
| HS41 | 694580 | 694680 | 100 | Y | - | HSM0594 (−) | HSM0595 (−) | - | A |
| HS42 | 703333 | 703423 | 90 | Y | Y | HSM0607 (+) | HSM0608 (+) | - | B |
| HS43 | 710363 | 710450 | 87 | Y | - | HSM0611 (+) | HSM0612 (+) | - | B |
| HS44 | 747233 | 747386 | 153 | Y | - | HSM0644 (−) | HSM0645 (−) | - | A |
| HS45 | 800181 | 800295 | 114 | Y | Y | HSM0704 (+) | HSM0705 (+) | - | B |
| HS46 | 851529 | 851662 | 133 | Y | Y | HSM0740 (−) | HSM0741 (+) | - | B |
| HS47 | 853988 | 854118 | 130 | Y | - | HSM0742 (−) | HSM0743 (+) | - | B |
| HS48 | 876355 | 876433 | 78 | - | - | HSM0758 (+) | HSM0759 (+) | glycine | C |
| HS49 | 979462 | 979681 | 219 | Y | - | HSM0844 (+) | HSM0845 (+) | - | B |
| HS50 | 981925 | 982225 | 300 | - | - | HSM0847 (+) | HSM0848 (+) | - | C |
| HS51 | 994234 | 994314 | 80 | Y | - | HSM0853 (+) | HSM0854 (+) | - | B |
| HS52 | 1007799 | 1008007 | 208 | Y | Y | HSM0868 (−) | HSM0869 (−) | - | C |
| HS53 | 1008086 | 1008580 | 494 | - | Y | HSM0868 (−) | HSM0869 (−) | - | A |
| HS54 | 1012617 | 1012823 | 206 | - | Y | HSM0874 (−) | HSM0875 (−) | - | B |
| HS55 | 1014425 | 1014768 | 343 | Y | - | HSM0875 (−) | HSM0876 (−) | - | A |
| HS56 | 1015189 | 1015390 | 201 | Y | - | HSM0877 (+) | HSM0878 (+) | - | A |
| HS57 | 1021919 | 1022474 | 555 | Y | - | HSM0888 (−) | HSM0889 (−) | - | A |
| HS58 | 1031980 | 1032132 | 152 | Y | Y | HSM0900 (+) | HSM0901(+) | - | C |
| HS59 | 1032206 | 1032458 | 252 | Y | Y | HSM0900 (+) | HSM0901 (+) | - | D |
| HS60 | 1052587 | 1052754 | 167 | Y | - | HSM0920 (+) | HSM0921 (+) | - | A |
| HS61 | 1147201 | 1147290 | 89 | Y | - | HSM1005 (+) | HSM1006 (+) | - | B |
| HS62 | 1260621 | 1260860 | 239 | - | - | HSM1095 (+) | HSM1096 (+) | 6 s | D |
| HS63 | 1292413 | 1292563 | 150 | Y | - | HSM1125 (−) | HSM1126 (+) | - | B |
| HS64 | 1307757 | 1307987 | 230 | Y | - | HSM1136 (+) | HSM1137 (−) | - | A |
| HS65 | 1312693 | 1312855 | 162 | Y | Y | HSM1143 (+) | HSM1144 (+) | - | A |
| HS66 | 1320228 | 1320349 | 121 | Y | Y | HSM1155 (−) | HSM1156 (−) | - | A |
| HS67 | 1337412 | 1337590 | 178 | Y | Y | HSM1172 (+) | HSM1173 (+) | - | B |
| HS68 | 1343583 | 1343659 | 76 | Y | Y | HSM1182 (+) | HSM1183 (+) | - | D |
| HS69 | 1377309 | 1377411 | 102 | Y | - | HSM1218 (+) | HSM1219 (+) | - | C |
| HS70 | 1413741 | 1413887 | 146 | - | - | HSM1254 (−) | HSM1255 (+) | lysine | B |
| HS71 | 1455529 | 1455708 | 179 | Y | - | HSM1275 (−) | HSM1276 (−) | MOCORNA | B |
| HS72 | 1513886 | 1513955 | 69 | Y | - | HSM1330 (+) | HSM1331 (+) | - | B |
| HS73 | 1537168 | 1537267 | 99 | - | - | HSM1355 (−) | HSM1356 (−) | LR-PK1 | B |
| HS74 | 1591107 | 1591187 | 80 | Y | Y | HSM1392 (−) | HSM1393 (−) | - | B |
| HS75 | 1593953 | 1594392 | 439 | Y | - | HSM1395 (−) | HSM1396 (+) | RNaseP_bact_a | D |
| HS76 | 1596011 | 1596138 | 127 | Y | Y | HSM1397 (+) | HSM1398 (+) | - | B |
| HS77 | 1748563 | 1748820 | 257 | Y | Y | HSM1521 (−) | HSM1522 (+) | - | D |
| HS78 | 1752653 | 1752795 | 142 | Y | - | HSM1525 (+) | HSM1526 (+) | - | A |
| HS79 | 1839524 | 1839616 | 92 | Y | Y | HSM1590 (−) | HSM1591 (+) | - | B |
| HS80 | 1859168 | 1859317 | 149 | Y | Y | HSM1612 (−) | HSM1613 (−) | - | B |
| HS81 | 1874398 | 1874609 | 211 | Y | Y | HSM1626 (+) | HSM1627 (+) | isrK | B |
| HS82 | 1925814 | 1925932 | 118 | - | - | HSM1675 (−) | HSM1676 (−) | - | B |
| HS83 | 1927797 | 1928029 | 232 | Y | - | HSM1676 (−) | HSM1677 (+) | - | D |
| HS84 | 1928157 | 1928331 | 174 | Y | Y | HSM1676 (−) | HSM1677 (+) | - | A |
| HS85 | 1942445 | 1942617 | 172 | Y | Y | HSM1692 (+) | HSM1693 (+) | - | A |
| HS86 | 1962487 | 1962618 | 131 | Y | Y | HSM1719 (−) | HSM1720 (−) | - | A |
| HS87 | 2020545 | 2020668 | 123 | - | - | HSM1776 (−) | HSM1777 (+) | gcvB | D |
| HS88 | 2124794 | 2124884 | 90 | Y | Y | HSM1868 (−) | HSM1869 (−) | - | A |
| HS89 | 2136245 | 2136324 | 79 | Y | - | HSM1881 (−) | HSM1882 (−) | - | A |
| HS90 | 2139563 | 2139823 | 260 | Y | Y | HSM1887 (+) | HSM1888 (+) | - | A |
| HS91 | 2146286 | 2146459 | 173 | - | - | HSMR0065 (+) | HSM1893 (+) | - | B |
| HS92 | 2210148 | 2210318 | 170 | - | - | HSM1950 (+) | HSM1951 (+) | - | B |
| HS93 | 2223802 | 2223946 | 144 | - | - | HSM1974 (+) | HSM1975 (+) | alpha_RBS | D |
| HS94 | 2229269 | 2229450 | 181 | - | - | HSM1982 (−) | HSM1983 (+) | FMN | D |
*sRNA sequences conserved in; A - unique to H. somni 2336. B - H. somni strain 129PT only. C – phylogenetically closer bacterial genomes specially members of Pasteurellaceae family (M. haemolytica, P. multocida. H. influenza etc). D - across distant bacterial species.
The start and end represents the boundaries of identified TAR (transcriptionally active region) which is a potential sRNA region.
Any cell with no predicted result is marked with ‘−’.
Figure 2Identification of sRNA annotated to Rfam.
The figure shows identification of well conserved sRNA “tmRNA” using RNA-Seq based method. “tmRNA” was computationally predicted as a sRNA by Rfam using sequence similarity across other bacterial families.
Figure 3Identification of a novel sRNA.
A highly expressed sRNA “HS46” found in the intergenic region of H. somni 2336 genome.
Novel proteins identified in the H. somni 2336 genome along with closest matching homolog and its annotation.
| ID | Start | End | Strand | Length (nt) | Top BLASTX Hit | Annotation |
| HSP1 | 140988 | 141083 | + | 96 | ZP_04978675.1 | hypothetical protein MHA_2182 [ |
| HSP2 | 260019 | 260078 | − | 60 | YP_002791255.1 | toxic membrane protein [ |
| HSP3 | 260229 | 260408 | − | 180 | YP_001784202.1 | hypothetical protein HSM_0870 [ |
| HSP4 | 260707 | 260850 | − | 144 | YP_001784202.1 | hypothetical protein HSM_0870 [ |
| HSP5 | 260951 | 261007 | − | 57 | CBY77851.1 | predicted toxic peptide IbsB3 [ |
| HSP6 | 692328 | 692543 | − | 216 | ZP_04977604.1 | hypothetical protein MHA_1062 [ |
| HSP7 | 748664 | 748849 | + | 186 | ZP_06863963.1 | putative phage-related DNA-binding protein [ |
| HSP8 | 752366 | 752593 | + | 228 | ZP_01791588.1 | hypothetical protein CGSHiAA_00240 [ |
| HSP9 | 753226 | 753384 | + | 159 | ZP_05848096.1 | conserved hypothetical protein [ |
| HSP10 | 754234 | 754398 | + | 165 | NP_873053.1 | hypothetical protein HD0492 [ |
| HSP11 | 758474 | 758698 | + | 225 | ZP_05848108.1 | conserved hypothetical protein [ |
| HSP12 | 764501 | 764686 | + | 186 | ABX51978.1 | hypothetical protein [ |
| HSP13 | 771653 | 771787 | + | 135 | ZP_04464387.1 | hypothetical protein CGSHi6P18H1_07995 [ |
| HSP14 | 782712 | 782840 | + | 129 | YP_001344686.1 | hypothetical protein Asuc_1392 [ |
| HSP15 | 858416 | 858721 | + | 306 | ZP_04976950.1 | hypothetical protein MHA_0367 [ |
| HSP16 | 982362 | 982619 | + | 258 | NP_660225.1 | repressor-like protein [ |
| HSP17 | 1008333 | 1008518 | + | 186 | YP_001784474.1 | hypothetical protein HSM_1144 [ |
| HSP18 | 1014064 | 1014276 | + | 213 | ZP_02478185.1 | hypothetical protein HPS_04457 [ |
| HSP19 | 1023854 | 1024255 | + | 402 | YP_719605.1 | hypothetical protein HS_1393 [ |
| HSP20 | 1026199 | 1026414 | + | 216 | YP_002475212.1 | putative lytic protein Rz1, bacteriophage protein [ |
| HSP21 | 1031209 | 1031379 | + | 171 | YP_002475190.1 | hypothetical protein HAPS_0589 [ |
| HSP22 | 1031709 | 1031906 | − | 198 | ZP_05731317.1 | hypothetical protein Pat9bDRAFT_4634 [ |
| HSP23 | 1043942 | 1044073 | + | 132 | ZP_02479029.1 | hypothetical protein HPS_00455 [ |
| HSP24 | 1306383 | 1306649 | − | 267 | ZP_04976986.1 | hypothetical protein MHA_0405 [ |
| HSP25 | 1309667 | 1309807 | − | 141 | ZP_01787689.1 | hypothetical protein CGSHi22421_00792 [ |
| HSP26 | 1324541 | 1324765 | − | 225 | YP_002475146.1 | DnaK suppressor protein/C4-type zinc finger protein, DksA/TraR family [ |
| HSP27 | 1345868 | 1346077 | + | 210 | YP_001088372.1 | putative conjugative transposon egulatory protein [ |
| HSP28 | 1448209 | 1448292 | − | 84 | AAB96578.1 | TnaC [ |
| HSP29 | 1747221 | 1747361 | + | 141 | YP_002476351.1 | hypothetical protein HAPS_1915 [ |
| HSP30 | 1750020 | 1750235 | + | 216 | ZP_04752631.1 | hypothetical protein AM305_05314 [ |
| HSP31 | 1852968 | 1853201 | + | 234 | YP_718779.1 | hypothetical protein HS_0567a [ |
| HSP32 | 1959818 | 1959922 | − | 105 | ZP_04977712.1 | hypothetical protein MHA_1177 [ |
| HSP33 | 1962885 | 1963013 | + | 129 | ZP_05993368.1 | hypothetical protein COI_2717 [ |
| HSP34 | 1962985 | 1963176 | − | 192 | ZP_05993369.1 | hypothetical protein COI_2718 [ |
| HSP35 | 1966085 | 1966366 | − | 282 | ZP_04977704.1 | hypothetical protein MHA_1169 [ |
| HSP36 | 1977131 | 1977247 | + | 117 | ZP_07538596.1 | hypothetical protein appser10_8220 [ |
| HSP37 | 2071545 | 2071745 | − | 201 | YP_719865.1 | hypothetical protein HS_1660 [ |
| HSP38 | 2165733 | 2165906 | − | 174 | YP_718223.1 | hypothetical protein HS_0017a [ |
Figure 4Identification of a novel protein coding gene.
Novel protein coding gene “HSP7” identified using transcriptome analysis shows homology (similarity 74%, sequence coverage 100%) to a phage related DNA binding protein from Neisseria polysaccharea.
Genes with revised coordinate information based on transcriptome map.
| Gene id | Previous annotation (Start-End) | New corrected annotation (Start-End) |
| HSM_0031 | 24651–24929 | 24597–24929 |
| HSM_0525 | 602547–603416 | 602547–603602 |
| HSM_0789 | 909036–911534 | 909036–911642 |
| HSM_1019 | 1164444–1165163 | 1164444–1165244 |
| HSM_1729 | 1972283–1972600 | 1972283–1972765 |
Figure 5Identification of a novel operon structure comprised of three genes: HSM_1354, HSM_1355, and HSM_1356.
The RNA-Seq coverage shows three genes annotated as ribosomal proteins (IF3, L35, and L20) being expressed as a transcription unit.