Literature DB >> 24802639

Automated genome mining of ribosomal peptide natural products.

Hosein Mohimani¹, Roland D Kersten, Wei-Ting Liu, Mingxun Wang, Samuel O Purvine, Si Wu, Heather M Brewer, Ljiljana Pasa-Tolic, Nuno Bandeira, Bradley S Moore, Pavel A Pevzner, Pieter C Dorrestein.

Abstract

Ribosomally synthesized and posttranslationally modified peptides (RiPPs), especially from microbial sources, are a large group of bioactive natural products that are a promising source of new (bio)chemistry and bioactivity.1 In light of exponentially increasing microbial genome databases and improved mass spectrometry (MS)-based metabolomic platforms, there is a need for computational tools that connect natural product genotypes predicted from microbial genome sequences with their corresponding chemotypes from metabolomic data sets. Here, we introduce RiPPquest, a tandem mass spectrometry database search tool for identification of microbial RiPPs, and apply it to lanthipeptide discovery. RiPPquest uses genomics to limit search space to the vicinity of RiPP biosynthetic genes and proteomics to analyze extensive peptide modifications and compute p-values of peptide-spectrum matches (PSMs). We highlight RiPPquest by connecting multiple RiPPs from extracts of Streptomyces to their gene clusters and by the discovery of a new class III lanthipeptide, informatipeptin, from Streptomyces viridochromogenes DSM 40736 to reflect that it is a natural product that was discovered by mass spectrometry based genome mining using algorithmic tools rather than manual inspection of mass spectrometry data and genetic information. The presented tool is available at cyclo.ucsd.edu.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2014 PMID： 24802639 PMCID： PMC4215869 DOI： 10.1021/cb500199h

Source DB: PubMed Journal: ACS Chem Biol ISSN： 1554-8929 Impact factor: 5.100

Discovery of microbial natural products by use of genome information is a promising approach to find new pharmaceutically valuable and ecologically important molecules.[2] Genome mining consists of a “prediction step” in which secondary metabolic pathways and their corresponding products are predicted from a genome sequence and a “connection step” in which an unknown natural product (chemotype) is linked to its biosynthetic gene cluster (genotype) based on information derived from the prediction step.[3] In current genome mining approaches, the “prediction step” has been starting to be automated by use of computational tools such as antiSMASH to rapidly characterize biosynthetic gene clusters in microbial genome sequences.[4] However, the “connection step” is time-consuming because only one target pathway is characterized in its product, for example, by genome manipulation or heterologous pathway expression. Here, we introduce the first genome mining tool that automates both prediction and connection steps for the discovery of ribosomally synthesized and posttranslationally modified peptides (RiPPs), specifically, lanthipeptides, from genome sequenced microbes and their metabolomes. RiPPs are a rapidly expanding group of natural products produced through the Post Ribosomal Peptide Synthesis (PRPS) pathway.[1] Among RiPPs, lanthipeptides are a prominent group of peptides with pharmaceutical and food industrial applications.[5] Lanthipeptides are structurally defined by the thioether amino acids lanthionine, methyllanthionine and labionin (Supporting Information (SI) Figure S1). Lanthionine and methyllanthionine are introduced by dehydration of a serine or threonine to generate a dehydroalanine or dehydrobutyrine and addition of a cysteine thiol, catalyzed by a dehydratase and a cyclase, respectively.[1,5] During lanthipeptide biosynthesis, a precursor gene lanA is translated by the ribosome to yield a precursor peptide LanA that consists of a N-terminal leader peptide and a C-terminal core peptide sequence. The core peptide is posttranslationally modified by the lanthionine biosynthetic machinery and other enzymes, proteolytically cleaved from the leader peptide to yield the mature lanthipeptide and exported out of the cell by transporters (SI Figure S1).[1,5] Lanthipeptides are divided into four classes based on their lanthionine synthetase genes. Class I lanthipeptides are formed by a separate dehydratase (LanB) and lanthionine cyclase (LanC), class II lanthipeptides by a bifunctional lanthionine synthetase (LanM) with a dehydratase and a cyclase domain, class III and class IV lanthipeptides by a lanthionine synthetase with a lyase, a kinase, and a cyclase domain where class III lanthipeptide synthetases (LanKC) lack zinc-binding residues in the cyclase domain.[1] Interestingly, class III lanthipeptides are structurally characterized by a special lanthionine modification, called labionin. Labionin is a triamino acid derived from a Ser(1)-X3-Ser(2)-X2–3-Cys motif. It consists of a Ser(2)-Cys lanthionine bridge and a Ser(1)-Ser(2) methylene bridge via a quartenary α-carbon at Ser(2), thus forming two ring systems within a peptide chain (SI Figure S1).[1,6] Workflow implemented in the RiPPquest algorithm for automated peptidogenomics of RiPPs. (a) Prediction of lanthipeptide gene clusters in microbial genome sequence. (b) Generation of 10 kb windows centered at LANC-domain of gene clusters. (c) Prediction of ORFs in each gene cluster. (d) Selection of all candidate precursor peptides ORFs <100 aa. (e) Generation of candidate core peptides via C-terminal half of each selected ORF. (f) Generation of all biosynthetic and gas phase products of each core peptide, exemplified by peptide TFCRS. (g) Generation of MS/MS peptide database of predicted lanthipeptide products. (h) MS/MS analysis of microbial extract. (i) Matching of MS/MS data with MS/MS lanthipeptide spectral database with computed p-values. (j) Molecular network analysis of MS/MS data to identify peptide homologues and to confirm PSMs. The automated detection of lanthipeptides has been historically difficult due to their complex posttranslational modifications and high molecular weight.[7] Increased understanding of lanthipeptide biosynthesis and accelerated sequencing of microbial genomes enabled characterization of lanthipeptides by genome mining of lanthionine synthetase genes.[8,9] Recent studies suggest that RiPPs such as lanthipeptides are widely distributed among bacterial genomes and that many microbial lanthipeptides remain to be discovered.[9−11] Since RiPPs are directly encoded in the genome as are proteins, a proteomic approach could be used to connect a RiPP chemotype with its genotype, that is, by matching a tandem MS spectrum of the RiPP with a peptide in the protein database of the target genome by a MS/MS database search tool. However, common proteomic database search tools such as Sequest[12] and Mascot[13] fail to identify RiPPs because (i) they have difficulty identifying highly modified peptides,[14] (ii) they have difficulty identifying poorly fragmented spectra typical for RiPPs with complex modifications, and (iii) they are unable to identify nonlinear peptides.[11] Moreover, genes encoding lanthipeptides are typically shorter than 100 amino acids and are often missed by existing gene prediction tools and not included in a target protein databases.[15] A recent study introduced a peptidogenomic approach for the rapid characterization of RiPPs by MS-guided genome mining.[11] Herein, de novo tandem MS sequence tags of RiPPs were obtained by manual MS/MS analysis and searched against the 6-frame translation of a target genome under consideration of biosynthetic and gas phase modifications of target core peptide sequences. The manual peptidogenomic approach relies on characterization of long MS/MS sequence tags of 4–5 amino acids (aa) and a reduction of the search space in the 6-frame translation of a microbial genome by consideration of only <100 aa-long ORFs. Since long sequence tags are often not present in macrocyclic RiPPs such as lanthipeptides and since manual peptidogenomic analysis of large metabolic LC-MS/MS data sets is limited, we implemented the peptidogenomic approach in a MS/MS database search tool, called RiPPquest, with computation of statistical significance for identified Peptide-Spectrum Matches (PSMs) to enable analysis of larger data sets (Figure 1). To overcome the weaknesses of proteomic MS/MS database tools to identify RiPPs, RiPPquest was specified in peptide database generation and peptide-spectrum matching for a connection of lanthipeptide MS/MS data with lanthipeptide gene clusters in microbial genomes.

Figure 1

Workflow implemented in the RiPPquest algorithm for automated peptidogenomics of RiPPs. (a) Prediction of lanthipeptide gene clusters in microbial genome sequence. (b) Generation of 10 kb windows centered at LANC-domain of gene clusters. (c) Prediction of ORFs in each gene cluster. (d) Selection of all candidate precursor peptides ORFs <100 aa. (e) Generation of candidate core peptides via C-terminal half of each selected ORF. (f) Generation of all biosynthetic and gas phase products of each core peptide, exemplified by peptide TFCRS. (g) Generation of MS/MS peptide database of predicted lanthipeptide products. (h) MS/MS analysis of microbial extract. (i) Matching of MS/MS data with MS/MS lanthipeptide spectral database with computed p-values. (j) Molecular network analysis of MS/MS data to identify peptide homologues and to confirm PSMs.

Characterization of class III lanthipeptide informatipeptin from Streptomyces viridochromogenes DSM 40736 by RiPPquest. (a) PSM of informatipeptin. (b) Gene cluster analysis of informatipeptin. (c) Predicted structures of informatipeptin. Abbreviation: N/A = not annotated.

Results and Discussion

RiPPquest Workflow

The RiPPquest workflow (Figure 1) starts with generation of a lanthipeptide database from a target microbial genome (Figure 1a–g). First, all lanthipeptide gene clusters are predicted in the target genome (Figure 1a). We selected the LANC-like domain (Pfam PF05147) for Pfam domain search of lanthipeptide gene clusters in RiPPquest due to its high specificity to lanthipeptide pathways (SI Figure S2). Then, ORFs <100 aa were predicted in the 6-frame translation of a 10 kb window centered at the LANC-like domain (Figure 1b–d). Only the C-terminal half of the selected ORFs were considered in the core peptide database (Figure 1e). Subsequently, MS/MS spectra of all possible mature lanthipeptides were computed for each putative core peptide based on possible biosynthetic transformations and gas phase reactions in lanthipeptides (Figure 1f,g, Methods, SI Methods). Next, peptide-spectrum matches were scored in order to identify candidate lanthipeptide chemotype-genotype connections between the LC-MS/MS data and the genome-derived lanthipeptide database (Figure 1i). We chose to score PSMs using a scoring function used in de novo peptide sequencing PepNovo.[16] In the brute force approach, one forms PSMs between each spectrum in the spectral data set and each modified core peptide if the parent masses of the spectrum and the modified core peptide are close to each other, that is, within 0.5 Da. Because it is time-consuming to compare each spectrum against each possible modified peptide for large spectral data sets, we use the spectral alignment technique to efficiently find modifications of the core peptide that best matches the spectrum.[17−20] In the next step, PSM scores were converted to p-values to make lanthipeptide chemotype-genotype connections more reliable. While PSM scores are useful for selecting top-scoring PSMs, they are notoriously unreliable for estimating the statistical significance of PSMs.[21] To convert scores into p-values, RiPPquest uses a recently developed algorithm to obtain statistical matches for nonlinear peptides called MS-DPR used in the evaluation of PSMs.[22] While there are other methods for evaluating p-values, MS-DPR (Direct Probality Distribution) is the only approach available today for evaluating p-values of PSMs formed by nonlinear, for example, cyclic peptides. Finally, a molecular network was generated from the MS/MS data set to identify homologues of characterized lanthipeptide from top-scored PSMs (Figure 1j). Molecular network is a visualization of spectra as familial groupings of corresponding molecules.[23,24,29] Edges in a molecular network connect nodes corresponding to spectra that represent molecules, in this case peptides, differing from each other by a mutation or a modification. Such pairs of spectra connected by edges in the molecular network are revealed using the spectral alignment approach.[18,20] Molecular networking enables discovery of uncharacterized homologues of known peptides, and families of related peptides and increases reliability of RiPPquest peptide identification. Most classes of RiPPs form families of related peptides, making molecular networks helpful in RiPP analysis.[24]

Connection of Lanthipeptide Chemotypes with Their Genotypes by RiPPquest

We tested the RiPPquest algorithm in the connection of lanthipeptide chemotypes with their genotypes in microbial genomes by analysis of LC-ESI-MS/MS, LC-HCD-MS/MS, and ESI-MS/MS data sets of organic extracts from 16 genome sequenced Streptomyces strains (SI Table S1). Genome mining revealed a total of 49 candidate lanthipeptide biosynthetic gene clusters in the 16 Streptomyces strains (SI Figure S3). The automated peptidogenomic pipeline identified six known lanthipeptides families from the target strains: SRO-2212 and SRO-3108 from Streptomyces roseosporus NRRL 15998,[11] AmfS from S. griseus IFO 13350,[25] SapB from S. coelicolor A3[2] and S. lividans TK24,[7] and SAL-2242 from S. albus J1074[11] (Table 1; SI Figure S4). The sheer number of lanthipeptide analogues was unexpected and not discovered manually.

Table 1

Top Lanthipeptides Discovered by the RiPPquest Automated Peptidogenomics Pipelinea

peptide	species	MS/MS type	MS/MS precursor [m/z]	z	p-value	annotation
informati-peptin	S. viridochromogenes DSM 40 736	CID	1065.56	2	9.2 × 10^–8	TDGGGASTVSLLSCISAASVLLCL (6 Dehyd)
	S. viridochromogenes DSM 40 736	HCD	1065.54	2	0.005	TDGGGASTVSLLSCISAASVLLCL (6 Dehyd)
	S. viridochromogenes DSM 40 736	CID	1015.02	2	0.013	DGGGASTVSLLSCISAASVLLCL (6 Dehyd)
	S. viridochromogenes DSM 40 736	HCD	1015.02	2	0.03	DGGGASTVSLLSCISAASVLLCL (6 Dehyd)
	S. viridochromogenes DSM 40 736	CID	957.51	2	0.0026	GGGASTVSLLSCISAASVLLCL (6 Dehyd)
	S. viridochromogenes DSM 40 736	HCD	957.50	2	0.0098	GGGASTVSLLSCISAASVLLCL (6 Dehyd)
	S. viridochromogenes DSM 40 736	CID	928.49	2	0.0003	GGASTVSLLSCISAASVLLCL (6 Dehyd)
	S. viridochromogenes DSM 40 736	HCD	928.99	2	0.012	GGASTVSLLSCISAASVLLCL (6 Dehyd)
AmfS	S. griseus IFO 13 350	CID	1028.04	2	2.5 × 10^–10	GSQVSLLVCEYSSLSVVLCTP (4 Dehyd)
SRO-2212	S. roseosporus NRRL 15 998	CID	1107.07	2	6.3 × 10^–10	TGSQVSLLVCEYSSLSVVLCTP (4 Dehyd)
	S. roseosporus NRRL 15 998	HCD	1107.06	2	5.3 × 10^–7	TGSQVSLLVCEYSSLSVVLCTP (4 Dehyd)
	S. roseosporus NRRL 15 998	HCD	738.37	3	6.9 × 10^–5	TGSQVSLLVCEYSSLSVVLCTP (4 Dehyd)
	S. roseosporus NRRL 15 998	CID	1056.55	2	2.0 × 10^–9	GSQVSLLVCEYSSLSVVLCTP (4 Dehyd)
	S. roseosporus NRRL 15 998	HCD	1065.53	2	0.001	GSQVSLLVCEYSSLSVVLCTP (3 Dehyd)
	S. roseosporus NRRL 15 998	HCD	710.693	3	0.001	GSQVSLLVCEYSSLSVVLCTP (3 Dehyd)
	S. roseosporus NRRL 15 998	CID	1142.58	2	3.8 × 10^–6	ATGSQVSLLVCEYSSLSVVLCTP (4 Dehyd)
	S. roseosporus NRRL 15 998	HCD	1142.57	2	0.013	ATGSQVSLLVCEYSSLSVVLCTP (4 Dehyd)
	S. roseosporus NRRL 15 998	HCD	1028.02	2	5.0 × 10^–5	SQVSLLVCEYSSLSVVLCTP (4 Dehyd)
SRO-3108	S. roseosporus NRRL 15 998	CID	1555.68	2	1.2 × 10^–6	TTWACATVTLTVTVCSPTGTLCGSCSMGTRGCC (9 Dehyd)
	S. roseosporus NRRL 15 998	HCD	1564.17	2	0.02	TTWACATVTLTVTVCSPTGTLCGSCSMGTRGCC (9 Dehyd)
	S. roseosporus NRRL 15 998	CID	1582.81	2	1.0 × 10^–7	{TTWAC}+54ATVTLTVTVCSPTGTLCGSCSMGTRGCC (9 Dehyd)
	S. roseosporus NRRL 15 998	CID	1636.71	2	0.001	{TTWAC}+162ATVTLTVTVCSPTGTLCGSCSMGTRGCC (9 Dehyd)
	S. roseosporus NRRL 15 998	CID	1037.11	2	0.0001	TTWACATVTLTVTVCSPTGTLCGSCSMGTRGCC (9 Dehyd)
	S. roseosporus NRRL 15 998	HCD	1037.11	3	0.0007	TTWACATVTLTVTVCSPTGTLCGSCSMGTRGCC (9 Dehyd)
	S. roseosporus NRRL 15 998	CID	1055.59	3	0.001	{TTWAC}+54ATVTLTVTVCSPTGTLCGSCSMGTRGCC (9 Dehyd)
	S. roseosporus NRRL 15 998	CID	1067.92	3	0.0001	TVTVCSPTGTLCGSCSMGTRGCC (5 Dehyd)
	S. roseosporus NRRL 15 998	HCD	1067.42	2	7.7 × 10^–7	TVTVCSPTGTLCGSCSMGTRGCC (5 Dehyd)
	S. roseosporus NRRL 15 998	CID	1123.99	2	4.6 × 10^–7	LTVTVCSPTGTLCGSCSMGTRGCC (5 Dehyd)
	S. roseosporus NRRL 15 998	CID	1166.47	2	0.0002	TLTVTVCSPTGTLCGSCSMGTRGCC (6 Dehyd)
	S. roseosporus NRRL 15 998	CID	1215.54	2	0.0003	VTLTVTVCSPTGTLCGSCSMGTRGCC (6 Dehyd)
	S. roseosporus NRRL 15 998	CID	1256.13	2	0.038	TVTLTVTVCSPTGTLCGSCSMGTRGCC (7 Dehyd)
	S. roseosporus NRRL 15 998	CID	1075.42	2	0.009	TVTVCS{PTGTLCGSCSMG}+16TRGCC (4 Dehyd)
	S. roseosporus NRRL 15 998	CID	1174.49	2	5.6 × 10^–8	TLTVTVCSPTGTLCGSCSMGTRGCC (5 Dehyd)
SapB	S. coelicolor A3(2)	CID	1013.99	1	0.02	TGSRASLLLCGDSSLSITTCN (4 Dehyd)
SapB	S. lividans TK24	HCD	934.96	2	0.016	SRASLLLCGDSSLSITTCN (4 Dehyd)
SapB	S. lividans TK24	HCD	1013.99	2	4.0 × 10^–5	TGSRASLLLCGDSSLSITTCN (4 Dehyd)
SAL-2242	S. albus J1074	HCD	1062.53	2	5.7 × 10^–5	GSQISLLICEYSSLSVTLCTP (4 Dehyd)
	S. albus J1074	HCD	1113.06	2	0.015	TGSQISLLICEYSSLSVTLCTP (5 Dehyd)
	S. albus J1074	HCD	1122.06	2	0.017	TGSQISLLICEYSSLSVTLCTP (4 Dehyd)

Number of dehydrations involved in lanthipeptide processing are shown in each case. Abbreviations: m/z, mass-to-charge ratio; z, precursor ion charge; dehydr, dehydration. Corresponding annotated spectra are shown in SI Figure S4.

Characterization of an Unknown Lanthipeptide Chemotype and Genotype from Streptomyces viridochromogenes DSM 40736 by RiPPquest

In addition to known lanthipeptides, RiPPquest yielded a PSM of an unknown lanthipeptide from LC-MS/MS data of an n-butanol extract of S. viridochromogenes DSM 40736, which was previously inspected manually[11](Figure 2A). Analysis of the corresponding gene cluster of the identified peptide revealed a lanthionine synthetase SSQG_07268, which lacks zinc ligand residues and thus indicates a class III lanthipeptide (Figure 2B, SI Table S2). We termed the new peptide informatipeptin, as it is the first natural product chemotype to be connected to its genotype in a microbial genome sequence by a genome mining algorithm. From FT-MS/MS analysis and sequence similarity to other class III lanthipeptides from S. chartreusis, S. erythraea, C. acidiphila, S. coelicolor, and S. griseus (SI Figures S5 and S6), we deduced that the Ser21/Ser24/Cys28-motif and the Ser30/Ser33/Cys37-motif are converted to either labionin or lanthionine. This leaves us with four structural candidates of two lanthionines, two labionins, or one lanthionine and one labionin in the peptide (Figure 2C). Based on the closest sequence similarity to avermipeptins from S. avermitilis and their characterized one lanthionine-one labionin modification,[26] we predict the same Lan/Lab ratio in informatipeptin. The fact that RiPPquest found all these peptides, including a new lanthipeptide in a single study illustrates the power of automated peptidogenomics for RiPP studies and that natural products can be discovered informatically.

Figure 2

Molecular network analysis revealed a plethora of compounds clustering with known lanthipeptides (SI Figure S7). The automated peptidogenomics pipeline revealed that many of these compounds are lanthipeptides with stepwise N-terminal leader processing and different dehydration numbers (Table 1). In this case of informatipeptin, up to six different N-terminal derivatives could be observed using mass spectrometry (SI Figure S8), which has been observed for other class III lanthipeptides such as the avermipeptins.[26] The putative cleavage site-promiscuous peptidase could be SSQG-07264. To date, only one class III lanthipeptide protease has been characterized,[27] which cleaves a similar leader peptide as the predicted informatipeptin leader peptide.

Conclusion

In this study, we introduced RiPPquest, the first genome mining algorithm for RiPP discovery. RiPPquest enables automated prediction of lanthipeptide gene clusters and connection to the corresponding chemotypes in a LC-MS/MS data set via a MS/MS database search tool approach. Cycloquest, a recent MS/MS database search tool for identifying cyclic ribosomal peptides requires well fragmented spectra and is limited to peptides with a small number of modifications.[28] In contrast, RiPPquest aims at discovering peptides with poor fragmentation and extensive number of modifications by characterization of the core peptides from genome sequences. Herein, RiPPquest capitalizes on the observation that the genes for RiPP precursor peptides usually appears in the vicinity of its specific biosynthetic enzymes in the genome. By reducing the search space to the RiPP gene clusters, one can perform a database search allowing for higher number of modifications. In addition to the identification of many known lanthipeptides in various Streptomyces strains (Table 1), RiPPquest discovered a new class III lanthipeptide informatipeptin from Streptomyces viridochromogenes DSM 40736. In contrast to the previous approaches where the compounds were analyzed after isolation,[11] RiPPquest enables analysis of MS/MS data collected on bacterial extracts simultaneously. This systematic approach enables discovery of putative differentially proteolyzed forms of lanthipeptides, missed by previous isolation-based approaches. While we focused on lanthipeptides throughout this study, the general pipeline can work for other types of RiPPs, since most RiPP classes have specific conserved biosynthetic gene clusters and go through specific modifications during their maturation. Several factors may contribute to the gap between the 49 putative lanthipeptide gene clusters discovered at the genotype level and only seven confirmed at the chemotype level. First, not all the candidate gene clusters found informatically are lanthipeptide gene clusters. Similar domains are found in other genes, and therefore, there is an overestimated search space. Perhaps tools such as BAGEL3 that is designed to find RiPPs at the genomic level can assist in improving these discoveries.[32] Second, gene expression for some of these clusters may be limited to specific environmental conditions. Third, sample preparation may be a determining factor in success of RiPP discovery. For example, among an n-butanol and a methanol extract data sets both collected on S. viridochromogenes DSM 40736, only the former contains spectra corresponding to informatipeptin. Our molecular network analysis revealed numerous distinct compounds related to known lanthipeptides. RiPPquest assigned these compounds to less abundant variants of the known lanthipeptides with alternative numbers of dehydrations and promiscuous N-terminal leader processing. This emphasizes the ability of computational mass spectrometry in discovering less abundant homologues that other methods fail to capture. Ultimately, the RiPPquest approach can greatly accelerate characterization of lanthipeptides, if further extended, of RiPPs leading to discovery of new peptide (bio)chemistry and bioactivity.

Methods

Cultivation and Extraction of Actinobacteria

We obtained 16 Streptomyces strains described in the Genome Data sets section (Supporting Information). Strains were grown on ISP2 agar plates (4 g yeast extracts, 10 g malt extract, 4 g d-glucose, 18 g agar, and 1000 mL water). Each agar plate was inoculated with each bacterial strain by 4 parallel streaks. The plates were incubated for 10 d at 28 °C. The agar was sliced into small pieces, covered with equal amount of Milli-Q water and n-butanol in a 50 mL centrifuge tube and shaken at 225 rpm for 12 h at 28 °C. The n-butanol layer was subsequently collected using transfer pipet and dried in vacuo.

MS/MS Analysis of Microbial Extracts

Collision-induced dissociation (CID) MS/MS data sets were collected with or without liquid chromatography (LC) separation in-line with mass spectrometry. For LC-MS, capillary columns were prepared by drawing a 360 μm O.D., 100 μm I.D. deactivated, fused silica tubing (Agilent) with a Model P-2000 laser puller (Sutter Instruments) (Heat, 330, 325, 320; Vel, 45; Del, 125) and were packed at 600 psi to a length of about 10 cm with C18 reverse-phase resin suspended in methanol. The column was equilibrated with 95% of solvent A (water, 0.1% AcOH) and loaded with 10 μL (10 ng/μL in 10% CH3CN) of bacterial butanol extract by flowing 95% of solvent A and 5% of solvent B (CH3CN, 0.1% AcOH) at 200 μL/min for 15 min. A gradient was established with a time-varying solvent mixture [(min, % of solvent A): (20,95), (30,60), (75,5)] and directly electrosprayed into the LTQ-FT MS inlet (source voltage, 1.8 kV; capillary temperature, 180 °C). The first scan was a high resolution broadband scan. The subsequent six scans were low resolution data-dependent on the first scan. In each data-dependent scan, the top intensity ions excluded the ones in exclusion list were selected to be fragmented by CID which generated hundreds of fragmentation spectra collected as individual data events. The resulting .RAW files were converted to .mzXML using the program ReAdW (http://tools.proteomecenter.org). Higher-energy collisional dissociation (HCD) data sets were acquired from samples prepared in 20% acetonitrile before injection. The constant flow capillary RPLC system used for peptide separations was similar to the previous report.[30] Briefly, the HPLC system consisted of a custom configuration of Agilent 1200 nanoflow pumps (Agilent Technologies), 2-position Valco valves (Valco Instruments Co., Houston, TX), and a PAL autosampler (Leap Technologies, Carrboro, NC), allowing for fully automated sample analysis across four separate HPLC columns (3-μm Jupiter C18 stationary phase, Phenomenex, Torrence, CA). Mobile phases consisted of 0.1% formic acid in water (A) and 0.1% formic acid acetonitrile (B). Flow rate through the capillary HPLC column was set as 300 nL/min. The HPLC system was equilibrated with 100% mobile phase A, and the following gradient was started 40 min after injection (5 μL sample loop): 0–2 min, 0–8% buffer B; 2–20 min, 8–12% buffer B; 20–75 min, 12–80% buffer B; 75–97 min, 80–95% buffer B. ESI using an etched fused-silica tip[31] was employed to interface the RPLC separation to a LTQ Orbitrap Velos mass spectrometer (Thermo Scientific, San Jose, CA). Precursor ion mass spectra (automatic gain control was set to 1 × 106) were collected for 400–2000 m/z range at a resolution of 60 K followed by data-dependent HCD MS/MS (resolution 7.5 K, normalized collision energy 45%, isolation window 2.5 Th, activation time 0.1 ms, AGC 5 × 104) of the ten most abundant ions. A dynamic exclusion time of 30 s was used to discriminate against previously analyzed ions.

Genome Mining of Lanthipeptide Gene Clusters

For each LANC-like domain in the microbial genome, a window of 10 kb centered at this domain is selected to form a database of putative core lanthipeptides for follow up MS/MS database search. Since lanthipetides usually appear in short ORFs, we further restrict our analysis to ORFs < 100 aa in the 6-frame translation of the genome. Because core lanthipeptides always appear at the C-terminus of an ORF, we only consider the peptide sequence of the C-terminal half of an ORF. This reduction the database size in RiPPquest searches is important since lanthipeptides are often poorly fragmented and identification of such poorly fragmented spectra in searches against large databases is problematic.

MS Analysis of Lanthipeptide Modifications

The most essential lanthipeptide modifications are dehydration of serine and threonine, and formation of the lanthionine and methyllanthionine bridges. Furthermore, a thiol elimination mechanism for lanthionine PTMs during mass spectrometry yields Cys and Dha at the position of Ser and Cys, respectively, in the core peptide.[11] Figure 1f,g shows all possible modified (mature) peptides for a hypothetical core lanthipeptide Thr-Phe-Cys-Arg-Ser. From a mass spectrometry standpoint, there are eight possible products by accumulation of PTMs, resulting in six possible scenarios for observed mass shifts in mass spectrometry (allowing Ser → Dha, Ser → Cys, Cys → Dha, and Thr → Dhb).

Scoring Peptide Spectrum Matches

All MS/MS database search tools score Peptide-Spectrum Matches (PSMs) with the goal to find out how well the experimental spectrum is explained by the theoretical spectrum formed by the fragment ions of the peptide (Figure 1i). We have chosen to score PSMs using an advanced scoring function used in de novo peptide sequencing PepNovo.[16]

Converting Scores to p-Values

To convert scores into p-values, RiPPquest uses a recently developed MS-DPR approach for evaluating p-values of PSMs.[22]

Spectral Networks

Spectral networks are a visualizion of spectra as familial groupings of corresponding peptides. Edges in a spectral network connect nodes corresponding to spectra that represent peptides differing from each other by a mutation or a modification. Such pairs of spectra connected by edges in the spectral network are revealed using the spectral alignment approach.[24]

32 in total

Review 1. Genomics-inspired discovery of natural products.

Authors: Jaclyn M Winter; Swantje Behnken; Christian Hertweck
Journal: Curr Opin Chem Biol Date: 2010-11-25 Impact factor: 8.822

2. PepNovo: de novo peptide sequencing via probabilistic network modeling.

Authors: Ari Frank; Pavel Pevzner
Journal: Anal Chem Date: 2005-02-15 Impact factor: 6.986

3. Identification of post-translational modifications by blind search of mass spectra.

Authors: Dekel Tsur; Stephen Tanner; Ebrahim Zandi; Vineet Bafna; Pavel A Pevzner
Journal: Nat Biotechnol Date: 2005-11-27 Impact factor: 54.908

4. MS/MS networking guided analysis of molecule and gene cluster families.

Authors: Don Duy Nguyen; Cheng-Hsuan Wu; Wilna J Moree; Anne Lamsa; Marnix H Medema; Xiling Zhao; Ronnie G Gavilan; Marystella Aparicio; Librada Atencio; Chanaye Jackson; Javier Ballesteros; Joel Sanchez; Jeramie D Watrous; Vanessa V Phelan; Corine van de Wiel; Roland D Kersten; Samina Mehnaz; René De Mot; Elizabeth A Shank; Pep Charusanti; Harish Nagarajan; Brendan M Duggan; Bradley S Moore; Nuno Bandeira; Bernhard Ø Palsson; Kit Pogliano; Marcelino Gutiérrez; Pieter C Dorrestein
Journal: Proc Natl Acad Sci U S A Date: 2013-06-24 Impact factor: 11.205

5. Labyrinthopeptins: a new class of carbacyclic lantibiotics.

Authors: Kathrin Meindl; Timo Schmiederer; Kathrin Schneider; Andreas Reicke; Diane Butz; Simone Keller; Hans Gühring; László Vértesy; Joachim Wink; Holger Hoffmann; Mark Brönstrup; George M Sheldrick; Roderich D Süssmuth
Journal: Angew Chem Int Ed Engl Date: 2010-02-01 Impact factor: 15.336

6. AmfS, an extracellular peptidic morphogen in Streptomyces griseus.

Authors: Kenji Ueda; Ken-Ichi Oinuma; Go Ikeda; Kuniaki Hosono; Yasuo Ohnishi; Sueharu Horinouchi; Teruhiko Beppu
Journal: J Bacteriol Date: 2002-03 Impact factor: 3.490

7. Characterization of new class III lantibiotics--erythreapeptin, avermipeptin and griseopeptin from Saccharopolyspora erythraea, Streptomyces avermitilis and Streptomyces griseus demonstrates stepwise N-terminal leader processing.

Authors: Ginka H Völler; Joanna M Krawczyk; Alexander Pesic; Bartlomiej Krawczyk; Jonny Nachtigall; Roderich D Süssmuth
Journal: Chembiochem Date: 2012-05-03 Impact factor: 3.164

8. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases.

Authors: Sangtae Kim; Nitin Gupta; Pavel A Pevzner
Journal: J Proteome Res Date: 2008-07-03 Impact factor: 4.466

9. Missing genes in the annotation of prokaryotic genomes.

Authors: Andrew S Warren; Jeremy Archuleta; Wu-Chun Feng; João Carlos Setubal
Journal: BMC Bioinformatics Date: 2010-03-15 Impact factor: 3.169

10. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences.

Authors: Marnix H Medema; Kai Blin; Peter Cimermancic; Victor de Jager; Piotr Zakrzewski; Michael A Fischbach; Tilmann Weber; Eriko Takano; Rainer Breitling
Journal: Nucleic Acids Res Date: 2011-06-14 Impact factor: 16.971

56 in total