| Literature DB >> 32576650 |
Morgan N Price1, Adam M Deutschbauer2,3, Adam P Arkin1,4.
Abstract
GapMind is a Web-based tool for annotating amino acid biosynthesis in bacteria and archaea (http://papers.genomics.lbl.gov/gaps). GapMind incorporates many variant pathways and 130 different reactions, and it analyzes a genome in just 15 s. To avoid error-prone transitive annotations, GapMind relies primarily on a database of experimentally characterized proteins. GapMind correctly handles fusion proteins and split proteins, which often cause errors for best-hit approaches. To improve GapMind's coverage, we examined genetic data from 35 bacteria that grow in defined media without amino acids, and we filled many gaps in amino acid biosynthesis pathways. For example, we identified additional genes for arginine synthesis with succinylated intermediates in Bacteroides thetaiotaomicron, and we propose that Dyella japonica synthesizes tyrosine from phenylalanine. Nevertheless, for many bacteria and archaea that grow in minimal media, genes for some steps still cannot be identified. To help interpret potential gaps, GapMind checks if they match known gaps in related microbes that can grow in minimal media. GapMind should aid the identification of microbial growth requirements.IMPORTANCE Many microbes can make all of the amino acids (the building blocks of proteins). In principle, we should be able to predict which amino acids a microbe can make, and which it requires as nutrients, by checking its genome sequence for all of the necessary genes. However, in practice, it is difficult to check for all of the alternative pathways. Furthermore, new pathways and enzymes are still being discovered. We built an automated tool, GapMind, to annotate amino acid biosynthesis in bacterial and archaeal genomes. We used GapMind to list gaps: cases where a microbe makes an amino acid but a complete pathway cannot be identified in its genome. We used these gaps, together with data from mutants, to identify new pathways and enzymes. However, for most bacteria and archaea, we still do not know how they can make all of the amino acids.Entities:
Keywords: amino acid biosynthesis; gene annotation; high-throughput genetics
Year: 2020 PMID: 32576650 PMCID: PMC7311316 DOI: 10.1128/mSystems.00291-20
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1How GapMind works. (A) A pathway with no variants. (B) The definition of a step. (C) Confidence levels for candidates from ublast. (D) Confidence levels for candidates from HMMER.
FIG 2GapMind handles fusion proteins and split proteins. (A) HSERO_RS20920 from Herbaspirillum seropedicae SmR1 is a fusion of AroL and AroB (shown with Swiss-Prot identifiers). (B) Split candidates for vitamin B12-dependent methionine synthase (MetH) in Burkholderia phytofirmans PsJN and Bacteroides thetaiotaomicron VPI-5482. a.a., amino acids.
FIG 3Arginine biosynthesis with succinylated intermediates. (A) The standard pathway. Protein names are from Escherichia coli or Bacillus subtilis. The formation of carbamoyl phosphate (catalyzed by CarAB) is not shown. (B) The pathway in Bacteroides and in other Bacteroidetes. (C) Fitness data from Bacteroides thetaiotaomicron VPI-5482, Echinicola vietnamensis KMM 6221 (DSSM 17526), and Pedobacter sp. strain GW460-11-11-14-LB5 (from references 8 and 20). Each fitness value is the log2 change in the abundance of the mutants in a gene during an experiment. Each experiment went from an optical density at 600 nm of 0.02 to saturation (usually 4 to 8 doublings). Fitness values for CA265_RS18510 were not estimated, because mutants of this gene were at low abundance in the starting samples.
FIG 4Tyrosine synthesis via phenylalanine hydroxylase in Dyella japonica and Echinicola vietnamensis. (A) Gene fitness in Dyella japonica UNC79MFTsu3.2. The x axis shows the median fitness across 59 genes that are predicted to be involved in amino acid biosynthesis (by TIGRFam role [14]), and the y axis shows the fitness of the predicted phenylalanine hydroxylase (PAH). (B) Gene fitness in Echinicola vietnamensis KMM 6221 (DSM 17526) for prephenate dehydrogenase (x axis) and for PAH (y axis). In both panels, we color code experiments by whether or not tyrosine was present in the media. The experiments with tyrosine usually included it via yeast extract or Casamino Acids, while the experiments without tyrosine are in defined media with just one or no amino acids added. Lines show x = 0 and y = 0, corresponding to no effect of mutating the genes. In panel A, lines show x = y.
Remaining gaps in amino acid biosynthesis for 35 bacteria that can make all 20 amino acids and have large-scale genetic data
| Type of gap | Pathway: gap | Organism | Comment |
|---|---|---|---|
| Novel | Histidine: HisN | Synpcc7942_1763 is a candidate for histidinol phosphatase but is not required for growth | |
| Novel | Lysine: DapCE or DapL | Echvi_3551 is a good candidate for the succinyltransferase DapD, which suggests succinylated intermediates, but DapC and DapE are missing, or Echvi_0124 might be a diverged diaminopimelate aminotransferase (DapL) | |
| Novel | Lysine: DapE | No convincing candidate for the desuccinylase DapE was found | |
| Novel | Serine: SerACB | This genome does not seem to encode the standard SerACB pathway of serine synthesis | |
| Novel | Serine: SerACB | This genome does not seem to encode the standard SerACB pathway of serine synthesis | |
| Novel | Serine: SerB | This genome has several weak candidates for phosphoserine phosphatase | |
| Novel | Serine: SerB | Synpcc7942_2078 is a candidate for phosphoserine phosphatase but is not required growth | |
| Novel | Threonine: ThrB | This genome does not encode a standard threonine synthase or ThrC (BT2401) | |
| Novel | Threonine: ThrB | This genome does not encode a standard threonine synthase or ThrC (Dshi_1146) | |
| Novel | Threonine: ThrB | This genome does not encode a standard threonine synthase or ThrC (PGA1_c06310) | |
| Diverged | Chorismate: AroA | Echvi_0122 may be a diverged AroA; it appears to be essential | |
| Diverged | Cysteine: CysE | Echvi_0221 may be a diverged serine acetyltransferase; it appears to be essential | |
| Diverged | Histidine: HisN | DVU2940 may be a diverged histidinol phosphatase; it appears to be essential (V. V. Trotter, personal communication) | |
| Diverged | Histidine: HisN | DvMF_0940 may be a diverged histidinol phosphatase; it appears to be essential | |
| Diverged | Histidine: HisC | Synpcc7942_1030 may be a diverged histidinol-phosphate aminotransferase; it appears to be essential | |
| Diverged | Leu/Ile/Val: IlvI | Various strains of | |
| Diverged | Methionine: MetC | This organism probably uses the transsulfuration pathway (MetB = N515DRAFT_4363 is important for growth in minimal media); N515DRAFT_4305 is likely to be cystathionine beta-lyase (MetC), but it is also very similar to a cystathionine gamma-lyase (Q5H4T8) | |
| Diverged | Phenylalanine: Pdehyd | Synpcc7942_0881 may be a diverged prephenate dehydratase; it appears to be essential | |
| Diverged | Serine: SerC | Echvi_1811 may be a phosphoserine aminotransferase; it appears to be essential | |
| Spurious | Chorismate: AroL | An open reading frame with 41% identity to AROK_ECOLI is present, but no protein was predicted | |
| Spurious | Chorismate: AroC | A frameshift error splits AroC into two reading frames (SO3078.2 and SO_3079) | |
| Spurious | Histidine: HisD | A frameshift error in the genome sequence prevented this protein from being predicted ( | |
| Spurious | Histidine: Prs | An open reading frame with 67% identity to KPRS_ECOLI is present, but no protein was predicted | |
| Spurious | Methionine: MetZ | The published assembly is missing a region that has an open reading frame with 84% identity to METZ_PSEAE | |
| Spurious | Serine: SerC | An open reading frame with 58% identity to SERC_METBF is present, but no protein was predicted |
FIG 5Number of gaps in amino acid biosynthesis in 148 diverse bacteria and archaea that can grow without amino acids. (These are distinct from the 35 bacteria with fitness data.)
FIG 6GapMind’s website renders the best paths for amino acid biosynthesis in Desulfovibrio alaskensis G20. Each step is color coded by its confidence level, and a question mark indicates known gaps in related organisms.