| Literature DB >> 28262684 |
Le Tang1,2,3, Songling Zhu1,2, Emilio Mastriani1,2, Xin Fang1,2, Yu-Jie Zhou1,2, Yong-Guo Li4, Randal N Johnston5, Zheng Guo6, Gui-Rong Liu1,2, Shu-Lin Liu1,2,4,7.
Abstract
Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28262684 PMCID: PMC5337935 DOI: 10.1038/srep43565
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Mountain plot representing the modeled secondary structure by height versus position.
The height m(k) is given by the number of base pairs enclosed at position k. Three curves are shown: the MFE structure (red), the pairing probabilities (black) and a positional entropy curve (green). Well-defined regions are identified by low entropy.
Numbers of tetranucleotide sequences consisting of one each of C, T, A and G in representative strains of Salmonella and E. coli.
| CTAG | 850 | 1025 | 858 | 861 | 928 | 924 | 885 |
| AGTC | 9810 | 9800 | 9350 | 9985 | 9975 | 9269 | 9377 |
| GACT | 9693 | 9625 | 9142 | 9644 | 9686 | 9366 | 9528 |
| GCTA | 13201 | 12923 | 12504 | 12937 | 13015 | 12551 | 10608 |
| GTAC | 12993 | 12788 | 12319 | 13063 | 13053 | 12549 | 12036 |
| TAGC | 12983 | 13053 | 12516 | 13317 | 13168 | 12835 | 10606 |
| AGCT | 13948 | 14029 | 13220 | 14084 | 14016 | 13355 | 13333 |
| TCGA | 14306 | 13999 | 13511 | 14336 | 14308 | 13695 | 15457 |
| ACGT | 15426 | 15168 | 14624 | 15435 | 15423 | 14895 | 14545 |
| TGCA | 15872 | 15904 | 14995 | 15947 | 15812 | 15041 | 19761 |
| CATG | 16194 | 16140 | 15360 | 16326 | 16174 | 15542 | 15246 |
| TACG | 16501 | 16186 | 15580 | 16380 | 16402 | 15993 | 14101 |
| CGTA | 16431 | 16339 | 15759 | 16658 | 16374 | 15855 | 14324 |
| ACTG | 18472 | 18294 | 17357 | 18406 | 18318 | 17524 | 20435 |
| CAGT | 18347 | 18496 | 17303 | 18571 | 18614 | 17431 | 20477 |
| GTCA | 19434 | 19281 | 18510 | 19691 | 19871 | 18250 | 18388 |
| GATC | 19168 | 18787 | 18097 | 19138 | 18922 | 18455 | 19120 |
| TGAC | 19229 | 18976 | 17926 | 18981 | 18742 | 18654 | 18580 |
| ATGC | 21823 | 21318 | 20664 | 21794 | 21473 | 21041 | 21733 |
| GCAT | 21915 | 21835 | 20568 | 22005 | 21991 | 20890 | 21685 |
| CTGA | 24470 | 23934 | 22577 | 24204 | 24023 | 22834 | 24365 |
| TCAG | 23808 | 24177 | 22699 | 24459 | 24418 | 23114 | 24638 |
| CGAT | 26823 | 26068 | 25427 | 26801 | 26339 | 26146 | 24248 |
| ATCG | 26940 | 26430 | 25439 | 27020 | 27157 | 25781 | 24354 |
Figure 2Phylogenetic tree of bacterial strains based on 16S rDNA sequence comparison.
(A) Bacterial strains representing a wide range of phyla; color categories for GC percentages: black, GC up to 45%; red, GC 46–55%; blue, GC >55% (see Supplementary Table 1). (B) Bacterial strains representing main branches of the Proteobacteria Phylum; purple color indicates bacteria that had lowest CTAG frequencies among the strains compared (see Supplementary Table 2). (C) Bacterial strains representing main branches of the Gammaproteobacteria Class; orange color indicates bacteria that had lowest CTAG frequencies among the strains compared (see Supplementary Table 3).
Phylogenetic distribution of bacteria having low CTAG sequence frequencies.
| Bacterial strain | Genome size (bp) | Number of CTAG | CTAG/kb | GC % |
|---|---|---|---|---|
| 3799539 | 408 | 0.107 | 0.51 | |
| 4720462 | 609 | 0.129 | 0.54 | |
| 4734438 | 743 | 0.157 | 0.55 | |
| 4864217 | 810 | 0.167 | 0.52 | |
| 4962173 | 920 | 0.185 | 0.52 | |
| 4685848 | 807 | 0.172 | 0.52 | |
| 4809037 | 1026 | 0.213 | 0.52 | |
| 4585229 | 858 | 0.187 | 0.52 | |
| 4833080 | 928 | 0.192 | 0.52 | |
| 4857432 | 850 | 0.175 | 0.52 | |
| 4933631 | 957 | 0.194 | 0.52 | |
| 4460105 | 716 | 0.161 | 0.51 | |
| 4588711 | 784 | 0.171 | 0.50 | |
| 4641652 | 885 | 0.191 | 0.51 | |
| 5132068 | 951 | 0.185 | 0.51 | |
| 4717338 | 916 | 0.194 | 0.51 | |
| 4993013 | 1034 | 0.207 | 0.51 | |
| 5528445 | 1176 | 0.213 | 0.50 | |
| 4519823 | 1066 | 0.236 | 0.51 | |
| 4605545 | 987 | 0.214 | 0.53 | |
| 4368708 | 555 | 0.127 | 0.55 | |
| 3883467 | 996 | 0.256 | 0.54 | |
Lineage-specific CTAG degeneration patterns in Salmonella and E. coli K12.
| LT2 site | Gene name | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 257125 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||
| 264404 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |||||||||||
| 440769 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||
| 1426591 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |||
| 1459627 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |
| 1597093 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | @ | |||
| 1818367 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||
| 1818375 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||
| 1977607 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||
| 2023278 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||
| 2149464 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||
| 2174259 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||||
| 2399521 | CTAG | CTAG | CTAG | CTAG | CTAG | |||||||||||||
| 2440495 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |||||||||||
| 2506054 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||
| 2800018 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |
| 2816155 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||
| 2914911 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||||||||||
| 3098680 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |
| 3142826 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||||||
| 3528054 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||
| 3576883 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||
| 3597875 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||
| 3834463 | CTAG | CTAG | ||||||||||||||||
| 3910806 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | @ | ||||
| 4101769 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |
| 4127619 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | @ | @ | CTAG | CTAG | CTAG | CTAG | CTAG | @ | ||||
| 4396313 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | |
| 4526230 | CTAG | @ | ||||||||||||||||
| 4606231 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | ||
| 4856458 | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG | CTAG |
S. tm: S. typhimurium LT2; S. ty: S. typhi Ty2; S. pA: S. paratyphi A ATCC9150; S. pu: S. pullorum RKS5078; S. ga: S. gallinarum 287/91; S. en: S. enteritidis P125109; S. ch: S. choleraesuis B67; S. pC: S. paratyphi C RKS4594; S. du: S. dublin CT_02021853; S. he: S. heidelberg B182; S. ag: S. agona SL483; S. ne: S. newport SL254; S. sc: S. schwarzengrund CVM19633; S. ja: S. javiana CFSAN001992; S. ar: S. arizonae RKS2980; S. bo: S. bongri NCTC12419; E. coli: E. coli K12. Note 2: The degenerated sequences with a different nucleotide from CTAG are in italic and underlined. Note 3: @denotes degenerated sequence at homologous locations to CTAG in LT2 or another Salmonella genome but with two nucleotides substituted. Genomic locations of CTAG only in LT2 are given.
CTAG sequences conserved between Salmonella and E. coli.
| CTAG site in LT2 | Gene_id | Annotation |
|---|---|---|
| 284175 | STM0242 | proline tRNA synthetase |
| 459521 | STM0403 & STM0404 | intergenic region between |
| 500950 | STM0445 & STM0446 | intergenic region between |
| 1280403 | STM1196 & STM1197 | intergenic region between |
| 1280562 | STM1197 | 3-oxoacyl-[acyl-carrier-protein] synthase II |
| 1459628 | STM1377 & STM1378 | intergenic region between |
| 1519898 | STM1444 | transcriptional regulator SlyA |
| 1794367 | STM1702 | RNase II |
| 1877803 | STM1780 | phosphoribosylpyrophosphate synthetase |
| 2035488 | STM1943 | tRNA-Cys |
| 2496632 | STM2385 & STM2386 | intergenic region between |
| 2544539 | STM2430 & STM2431 | intergenic region between |
| 2797006 | STM2657 | 23 S ribosomal RNA |
| 2797988 | STM2657 | 23 S ribosomal RNA |
| 2798973 | STM2657 | 23 S ribosomal RNA |
| 2799967 | STM2658 | tRNA-Sec |
| 2800314 | STM2659 | 16 S ribosomal RNA |
| 2801372 | STM2659 | 16 S ribosomal RNA |
| 2844094 | STM2692 & STM2693 | intergenic region between STM2692 and STM2693 |
| 3221860 | STM3060 | putative cytoplasmic protein |
| 3346112 | STM3182 | putative esterase |
| 3414851 | STM3245 & STM3246 | intergenic region between |
| 3494593 | STM3330 | glutamate synthase, large subunit |
| 3585835 | STM3418 & STM3419 | intergenic region between |
| 3589847 | STM3427 | 30 S ribosomal subunit protein S14 |
| 3593146 | STM3434 & STM3435 | intergenic region between |
| 4141162 | STM3933 | tRNA-Leu |
| 4631227 | STM4392 | primosomal replication protein N |
| 4810992 | STM4555 & STM4556 | intergenic region between |
Figure 3Genomic location and computer modeling of inter-lpp-pykF.
(A) Location of lpp, the intergenic region and pykF, with the umbers at the bottom indicating the start and end nucleotides of genes lpp and pykF and the red vertical line indicating the location of CTAG (start and end nucleotides in the brackets); (B) Predicted stem-loop structure; (C) Changed structure when C in the CT(U)AG sequence is substituted by U. The positional entropy is coded by hues ranging from red (low entropy, well-defined) via green to blue and violet (high entropy, ill-defined) Predicted.
Figure 4Comparison of S. heidelberg B182 and SL476 for their differences in CTAG profiles.
(A) Whole genome alignment to show the two largest insertions in SL476 but not B182; (B) Distribution patterns of CTAG inside insertions 1 and 2. The red color indicates the regions of the insertions, with the start and end positions marked in both insertions, and black color indicates the up- and down-stream genomic sequences.
Profiles of tetranucleotides consisting of one each of C, T, A and G in insertions 1 and 2 of S. heidelberg SL476.
| Insertion 1 | Insertion 2 | |
|---|---|---|
| Calculated CTAG | 162 | 221 |
| CTAG | 58 | 39 |
| GTAC | 86 | 126 |
| TAGC | 92 | 122 |
| GACT | 103 | 147 |
| CGTA | 111 | 146 |
| TACG | 112 | 186 |
| AGTC | 113 | 132 |
| GCTA | 119 | 130 |
| ACGT | 122 | 147 |
| GTCA | 141 | 221 |
| TGAC | 147 | 233 |
| CAGT | 156 | 301 |
| ACTG | 163 | 254 |
| CATG | 164 | 263 |
| TCGA | 168 | 126 |
| ATCG | 170 | 196 |
| GCAT | 178 | 289 |
| ATGC | 180 | 297 |
| AGCT | 189 | 173 |
| TGCA | 193 | 265 |
| CGAT | 200 | 219 |
| TCAG | 216 | 399 |
| GATC | 221 | 169 |
| CTGA | 224 | 388 |
Profiles of the tetranucleotide CTAG in two recent insertions and the core genome of S. heidelberg SL476.
| Insertion 1 | Insertion 2 | Core Genome | |
|---|---|---|---|
| Length of DNA (bp) | 41606 | 57892 | 4789272 |
| Number of profiled CTAG | 58 | 39 | 833 |
| Density of CTAG (number/kb) | 1.39 | 0.67 | 0.17 |
| Number of calculated CTAG | 162 | 221 | 18648 |
| CTAG profiled/calculated (%) | 35.8 | 17.6 | 4.5 |
| CTAG index (%) | 1.6 | 0.785 | 0.212 |
Note: 1 Length of the core genome is the whole genome of S. heidelberg SL476 (4888768) minus the lengths of the two insertions; Note: 2 CTAG index is the ratio of CTAG over the total number of all 24 combinations of the tetranucleotides consisting of one each of C, T, A and G.