| Literature DB >> 18627618 |
Albert Pallejà1, Eoghan D Harrington, Peer Bork.
Abstract
BACKGROUND: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18627618 PMCID: PMC2478687 DOI: 10.1186/1471-2164-9-335
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Analysing previous overlapping genes reports
| Comparison study of overlapping genes in two Mycoplasma genomes. Study of overlapping genes in bacterial genomes | Homologous genes whose start codons was assigned differently and genes coding for hypothetical or putative proteins | Authentic ORFs, thus genes not annotated as hypothetical or putative proteins and conserved in COG database | Misprediction of the start codons | |
| Study of non-coding DNA in prokaryotic genomes | Genes coding for hypothetical proteins and overlapping more than 90 bps | Gene pairs not annotated as hypothetical or putative proteins and conserved in COG database | Misprediction of start codons, falsely predicted genes and missed genes, frameshifts | |
| Analysis of the purifying and directional selection in overlapping prokaryotic genes | Genes not conserved in COG database and neither co-directional nor divergent overlapping pairs nor overlapping gene pairs not conserved in two or more species | Convergent overlapping genes conserved in both the COG database and in two or more than two genomes | Misprediction of start codons (affecting co-directional and divergent overlaps) and loss of termination codons (affecting co-directional and convergent overlaps) | |
| Study of the properties of the overlapping genes in microbial genomes | Genes coding for hypothetical proteins | Gene pairs not annotated as hypothetical or putative proteins | Misidentification of coding sequences | |
| Comparison study of overlapping genes in two Rickettsia genomes | Genes coding for hypothetical proteins | Gene pairs not annotated as hypothetical or unknown proteins | Incorrectly annotated ORFs | |
| Study of the relative reading frame bias in Prokaryotic Two-component system genes which use to overlap | Genes with ambiguous locations | Two component system gene pairs well located in the chromosome | Invalid bacterial start codons or premature stop codons | |
Comparison of previous overlapping genes studies. Columns referring to the authors, the authors' objectives, the genes excluded from their study, the genes accepted for their study, and the misannotations which they suggest are present in prokaryotic chromosomes.
Figure 1Types of misannotation. Schema of the five categories of putative misannotations. Both the number and the percentage of co-directional overlapping pairs longer than 60 bps classified in each group is shown. Gene a represents the upstream gene, while gene b represents the downstream gene. In Fragmentation type gene x, y and z represent the orthologs of gene a and b.
Figure 2Distribution of the overlapping pairs with respect to the overlapping length. The longest overlaps selected for manual analysis are indicated by the red box. Several species contribute a disproportionate number of overlapping pairs to the misannotations. In the figure we can see the 5 species that accumulate more misannotations.
Figure 3Aligning a co-directional true overlap. Overlap between the holD (coding for a DNA polymerase psi subunit) and rimI (coding for an alanine acetyltransferase) genes among Enterobacteria. A) Multiple alignment of the C-terminal of the DNA polymerase psi subunit and the N-terminal of the alanine acetyltransferase protein among Enterobacteria species. The grey boxes indicate the fragments that are encoded in the overlapping region between holD and rimI genes. The alignments of Escherichia & Shigella, Salmonella and Yersinia are marked. B) Arrangement of overlapping regions and amino acid conservation within the overlap among Escherichia coli K12, Salmonella enterica Ty2 and Yersinia pestis CO92. The nucleotide consensus shows an asterisk for the conserved nucleotides and a dot for the not conserved. Although we chose one species of each group marked in part A (Escherichia & Shigella, Salmonella and Yersinia) we can observe the high similarity at the level of sample nucleotide sequences too.