| Literature DB >> 28122954 |
Jonghwan Baek1, Jiyoung Lee1, Kihoon Yoon2, Hyunwoo Lee3,4.
Abstract
Increasing evidence indicates that many, if not all, small genes encoding proteins ≤100 aa are missing in annotations of bacterial genomes currently available. To uncover unannotated small genes in the model bacterium Salmonella enterica Typhimurium 14028s, we used the genomic technique ribosome profiling, which provides a snapshot of all mRNAs being translated (translatome) in a given growth condition. For comprehensive identification of unannotated small genes, we obtained Salmonella translatomes from four different growth conditions: LB, MOPS rich defined medium, and two infection-relevant conditions low Mg2+ (10 µM) and low pH (5.8). To facilitate the identification of small genes, ribosome profiling data were analyzed in combination with in silico predicted putative open reading frames and transcriptome profiles. As a result, we uncovered 130 unannotated ORFs. Of them, 98% were small ORFs putatively encoding peptides/proteins ≤100 aa, and some of them were only expressed in the infection-relevant low Mg2+ and/or low pH condition. We validated the expression of 25 of these ORFs by western blot, including the smallest, which encodes a peptide of 7 aa residues. Our results suggest that many sequenced bacterial genomes are underannotated with regard to small genes and their gene annotations need to be revised.Entities:
Keywords: genome annotation; ribosome profiling; short ORF; small genes; small proteins
Mesh:
Year: 2017 PMID: 28122954 PMCID: PMC5345727 DOI: 10.1534/g3.116.036939
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Identification of unannotated and misannotated ORFs. (A) Shown is the criteria of pORFs. Two different lists of in silico predicted ORFs were generated with the genome sequence of S. Typhimurium 14028s using a custom-written perl algorithm (File S1). (B) Visualization in the genome browser MochiView, and manual inspection of ribosome profiling and mRNA-seq data for identification of unannotated and misannotated ORFs. Shown are examples of three unannotated ORFs, one (medium independent) identified in all four growth conditions and two identified only in the low Mg2+ condition. In the example of “medium-dependent,” the annotated STM14_3166 (abbreviated as 3166) was identified as being misannotated; y axis represents the ribosome and mRNA density per nucleotide. Annotated genes and putative ORFs are shown in gray and purple arrow boxes, respectively.
Figure 2Genome-wide identification of misannotated and unannotated ORFs and their amino acid length distribution. (A) Misannotated (blue) and unannotated (red) ORFs identified were spread widely around the genome. (B) Unannotated ORFs were enriched with those putatively encoding small peptides/proteins ≤50 aa, whereas the majority of the misannotated genes encoded proteins >100 aa.
Figure 3Verification of expression of selected misannotated and unannotated sORFs and small “y” genes by western blot. The sORFs and small genes examined for their expression are grouped into three categories: (A) Salmonella-specific, (B) conserved in Enterobacteriaceae, and (C) unassigned. “Unassigned” indicates sORFs whose conservation could not be determined by tBLASTn searches due to their short amino acids lengths. Mutant strains each carrying a chromosomal SPA tag fused to the C terminus of a target sORF/small gene were grown in respective medium used for ribosome profiling experiments (see Materials and Methods). Whole cell extracts (equivalent to the cell number at OD600 0.05) were run on a 16.5% SDS-PAGE gel, and the expression of SPA-tagged peptides/proteins was determined by western blot using an alkaline phosphatase-conjugated anti-FLAG antibody. A negative value on the y axis (ribo or mRNA density) indicates genes are located on reverse strand. The positions of the markers are shown for the approximate sizes of proteins (kDa). As a negative and a positive control for western blot, the whole cell extracts of the wild-type (no SPA tag) and wild-type cells expressing only SPA tag (tag only) were used (D).