| Literature DB >> 29484452 |
Osamu Miura1,2, Toshihiro Ogake2, Takashi Ohyama3,4.
Abstract
Inverted repeat (IR) sequences are DNA sequences that read the same from 5' to 3' in each strand. Some IRs can form cruciforms under the stress of negative supercoiling, and these IRs are widely found in genomes. However, their biological significance remains unclear. The aim of the current study is to explore this issue further. We constructed the first Escherichia coli genome-wide comprehensive map of IRs with cruciform-forming potential. Based on the map, we performed detailed and quantitative analyses. Here, we report that IRs with cruciform-forming potential are statistically enriched in the following five regions: the adjacent regions downstream of the stop codon-coding sites (referred to as the stop codons), on and around the positions corresponding to mRNA ends (referred to as the gene ends), ~ 20 to ~45 bp upstream of the start codon-coding sites (referred to as the start codons) within the 5'-UTR (untranslated region), ~ 25 to ~ 60 bp downstream of the start codons, and promoter regions. For the adjacent regions downstream of the stop codons and on and around the gene ends, most of the IRs with a repeat unit length of ≥ 8 bp and a spacer size of ≤ 8 bp were parts of the intrinsic terminators, regardless of the location, and presumably used for Rho-independent transcription termination. In contrast, fewer IRs were present in the small region preceding the start codons. In E. coli, IRs with cruciform-forming potential are actively placed or excluded in the regulatory regions for the initiation and termination of transcription and translation, indicating their deep involvement or influence in these processes.Entities:
Keywords: Cruciform; E. coli; Genome-wide distribution; Intrinsic terminator; Inverted repeat (IR) sequence
Mesh:
Substances:
Year: 2018 PMID: 29484452 PMCID: PMC6060812 DOI: 10.1007/s00294-018-0815-y
Source DB: PubMed Journal: Curr Genet ISSN: 0172-8083 Impact factor: 3.886
Fig. 1Distribution of IRs in the E. coli genome. The position coordinates of the R ≥ 5S ≤ 8 IRs are overlaid on the map of genes with annotation, with their repeat unit lengths shown as line heights. The map can also be browsed interactively in the CFIRs-Ec (http://www.waseda.jp/sem-ohyama/CFIRs-Ec)
Fig. 2Regional distribution profiles of the R ≥ 5S ≤ 8 IRs. a Position of each individual IR. Genic regions are subdivided into ORFs, 5′- and 3′-UTRs, and OUR-1s, -2s, and -3s, as schematically shown at the top. The gene end is defined as the DNA position corresponding to the end of the mRNA. The start codon-coding site and the stop codon-coding site are, respectively, referred to as start codon and stop codon. Intergenic regions are subdivided into TANs, DIVs, and CONs. The IRs are sorted based on the center position. In each data panel, the relationship between the primary structures of the IRs and their positions in the focused region is shown. Position 0 indicates the TSS, the first nucleotide of the start codon, the third nucleotide of the stop codon, or gene end position. The region sizes shown are the averages, except for the ORF panel (Supplementary Table S2). Since the average size of the ORFs is quite large (951 bp), in this case, only the IRs found in the region spanning from the start codon to 200 bp downstream or that from the stop codon to 200 bp upstream are shown. b Population histogram of the IRs for each region. Based on the data shown in (a), the region-based population histograms of the R ≥ 5S ≤ 8 IRs were generated. The bin size is 5 bp (top). The control data (bottom) were obtained using 50 control genomes (“Materials and methods”). Statistical significance levels were calculated based on the Grubbs test. *P < 0.05, **P < 0.01, ***P < 0.001
Fig. 3Distribution of the IRs in the gene end region. a Distributions of the R ≥ 5S ≤ 8 IRs and the R ≥ 8S ≤ 8 IRs. In total, 218 pairs of the TAN sample genes were used in the analysis. The paired genes are aligned according to the distance between the stop codon (dark blue) and the gene end (red) of the upstream-side genes (the distance gradually increases from top to bottom). Position 0 indicates the third nucleotide of the stop codon. The black lines indicate the IRs. The inset diagrammatically shows the definitions of IRα and IRβ, using the colored lines (see text for details). b IR-position-based assortment of genes and occupancies of the intrinsic terminators. According to the presence or absence of the IR, or the position of the IR or IRs, the upstream-side genes were sorted into several types as shown in the insets on the left: for #1–113 genes in (a), types I and II; for #114–218 genes, types III–VI. The two pie charts in the middle show the occupancies of the type I and II genes in the #1-113 genes and those of the type III, IV, and VI genes in the #114–218 genes (the type V gene was not found), respectively. The three pie charts at the right show the occupancies of the intrinsic terminators in the IRs of the type I, III, and IV genes. c Population histogram of the R ≥ 8S ≤ 8 IRs for the region centered at the gene end. The 1,131 genes whose end regions (− 50 to + 50 relative to the respective ends) are not invaded by the gene ends of downstream genes were subjected to the analysis. The bin size is 5 bp (top). The control data (bottom) were obtained using 50 control genomes (“Materials and methods”). Statistical significance levels were calculated based on the Grubbs test. ***P < 0.001