Literature DB >> 33729755

MANTI: Automated Annotation of Protein N-Termini for Rapid Interpretation of N-Terminome Data Sets.

Fatih Demir^1,2, Jayachandran N Kizhakkedathu³, Markus M Rinschen^1,4, Pitter F Huesgen^2,5.

Abstract

Site-specific proteolytic processing is an important, irreversible post-translational protein modification with implications in many diseases. Enrichment of protein N-terminal peptides followed by mass spectrometry-based identification and quantification enables proteome-wide characterization of proteolytic processes and protease substrates but is challenged by the lack of specific annotation tools. A common problem is, for example, ambiguous matches of identified peptides to multiple protein entries in the databases used for identification. We developed MaxQuant Advanced N-termini Interpreter (MANTI), a standalone Perl software with an optional graphical user interface that validates and annotates N-terminal peptides identified by database searches with the popular MaxQuant software package by integrating information from multiple data sources. MANTI utilizes diverse annotation information in a multistep decision process to assign a conservative preferred protein entry for each N-terminal peptide, enabling automated classification according to the likely origin and determines significant changes in N-terminal peptide abundance. Auxiliary R scripts included in the software package summarize and visualize key aspects of the data. To showcase the utility of MANTI, we generated two large-scale TAILS N-terminome data sets from two different animal models of chemically and genetically induced kidney disease, puromycin adenonucleoside-treated rats (PAN), and heterozygous Wilms Tumor protein 1 mice (WT1). MANTI enabled rapid validation and autonomous annotation of >10 000 identified terminal peptides, revealing novel proteolytic proteoforms in 905 and 644 proteins, respectively. Quantitative analysis indicated that proteolytic activities with similar sequence specificity are involved in the pathogenesis of kidney injury and proteinuria in both models, whereas coagulation processes and complement activation were specifically induced after chemical injury.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 33729755 PMCID： PMC8027985 DOI： 10.1021/acs.analchem.1c00310

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Introduction

Proteolysis is a regulatory protein modification that results in the generation of distinct proteoforms with altered or new functions.[1,2] Misregulated proteolytic processing is implicated in many diseases, including amyloid β-precursor protein (ABPP) processing in Alzheimer’s disease,[3] cancer microenvironment remodeling,[4] and podocyte damage in kidney disease.[5] Characterization of protein termini reveals affected substrate proteins and allows the determination of the underlying protease–substrate relationships, which is a critical requirement to qualify specific proteases as drug targets or antitargets in a disease.[6] The standard “bottom-up” proteomics uses proteolytic digestion of proteins into the peptides for analysis by tandem mass spectrometry. The acquired peptide fragmentation spectra are then identified by computational matching to the predicted[7] or previously collected[8] peptide fragmentation spectra. Protein termini, including neotermini generated by endogenous proteolytic enzymes before proteome analysis, are only rarely identified in these standard procedures since they are only a minor fraction in the proteome digest, often of low abundance, and typically not considered during the database matching due to their partial mismatch with the digestion enzyme specificity.[2,9] To overcome these issues, a large variety of methods for the enrichment of protein termini have been developed (for reviews, see refs (1, 10, 11)). Most regularly used are negative selection approaches, where free primary amines of native protein N-termini, including the protease-generated neo-N-termini, are chemically modified before enzymatic proteome digestion (Figure S1). All peptides generated by the enzymatic proteome digest carry a free primary amine, which is subsequently used for selective depletion, for example, by capture with an aldehyde-functionalized polymer in Terminal Amine Isotopic Labeling of Substrates (TAILS)[12] or for chemical modification and subsequent selection by altered chromatographic mobility in Charge-based/Combined FRActional DIagonal Chromatography (ChaFRADIC[13]/COFRADIC[14]). Recent developments have further simplified and increased the sensitivity of the N-termini enrichment procedures,[15,16] enabled automation for high-throughput analysis,[17] and improved the identification of specific subsets of the N-terminal peptides.[18−20] Different protein N-termini can arise from the translation of predicted and alternative initiation sites or spliced mRNA, known or predicted proteolytic maturation such as signal/transit peptide processing or propeptide cleavage, as well as unpredicted proteolytic activity.[1,2] Correct positional annotation is therefore the basis for classification and, together with quantitative evaluation across experimental conditions, required for biological interpretation of N-terminome data and identification of protease substrates. However, this classification is challenging, as many identified N-terminal peptide sequences match multiple, often distinct, protein sequences in the database. Several web-based software tools visualize peptide position in relation to protein sequences and annotated features: Protter[21] maps peptides to topological and domain features retrieved from UniProt,[22] while QARIP[23] and Proteator[24] additionally visualize peptide abundance data and thereby reveal differentially affected domains, as for example, membrane protein shedding, in quantitative proteome data sets. All of these tools can assist in mapping identified protein termini to sequence features but generate one graphical depiction per protein that requires labor-intensive manual assessment and interpretation. Automated pipelines, on the other hand, rarely exist. CLIPPER[25] is the only previous tool specifically designed for N-terminal peptide annotation that leverages “razor” protein assignments from protein-level data from the same experiment according to well-established criteria.[26] However, CLIPPER is tightly integrated within the Trans-Proteomic Pipeline (TPP)[27] and not compatible with other pipelines. In summary, a one-stop-shop solution for the annotation of data generated by the plethora of different N-termini enrichment workflows does not exist yet. Here we present MANTI, a new termini-centric software tool that facilitates validation, annotation, and quantitative evaluation of N-terminome data sets analyzed with MaxQuant.[28] MaxQuant is currently one of the most widely used software packages due to its ease of use and superior accuracy in peptide quantification, particularly with stable isotope labels. The key advantage of MANTI is the integration of an array of external annotation sources into an automated, hierarchical N-termini validation and annotation workflow that generates curated N-termini information (Figure ). MANTI thereby provides a convenient, multidimensional overview over the data set, enables rapid identification of significant changes in substrates cleavage and links this with the annotation information that points to their potential physiological relevance. MANTI has already been successfully used for a multitude of different samples, including N-terminome data sets from both animals[5,29] and plants.[16,20,30,31]

Figure 1

MaxQuant source files, MANTI output, and external sources, which can be optionally integrated (blue). MANTI can integrate external sources for validation and interpretation of the N-termini data: (1) annotations for the validated N-terminal peptides can be retrieved from UniProt; (2) statistical validation with the limma package for R; (3) targeting signal/transit peptide predictions from LOCALIZER and/or TargetP2.0; (4) associated protein identifications from proteome, e.g., aliquots before enrichment, from (5) TopFINDer annotation. The main output files are the MANTI Masterfile and summary; the validation.txt file allows monitoring of the MANTI filtering steps during the N-termini validation. N-terminal peptides are collapsed into unique N-termini by their first eight amino acids and grouped by protein ID in the Unique_N-termini file. Integrated external annotation sources are all copied into the “external-sources” folder for reference. We further demonstrate the utility of MANTI for identification, annotation, and quantitative evaluation of proteolytic cleavages in two large-scale data sets comprising >10 000 peptide identifications from two independent animal models of chronic kidney disease. With the help of the annotation workflow, we observed that both chemically induced kidney injury in puromycin adenonucleoside (PAN)-treated rats[32,33] and genetically caused kidney disease in heterozygous Wilms Tumor protein 1 (WT1)-deficient mice[34] are associated with the altered abundance of specific subsets of proteolysis-derived protein N-termini.

Experimental Section

Animal Material

Glomeruli from PAN- or control-treated rats and kidney cortices of wt1+/– or control littermate mice were obtained from surplus material from our previous study.[5] Protocols for animal experiments were approved by the applicable local regulatory bodies, the Landesamt für Natur, Umwelt und Verbraucherschutz Nordrhein-Westfalen, and Regierungspräsidium Freiburg (G15-134). Four biological replicates were collected for the PAN model[5] and three for the WT1 model.[34]

N-Terminal Peptide Enrichment by TAILS

TAILS N-termini enrichment was performed as described in detail elsewhere.[5] In addition, enriched N-terminal peptides were fractionated at basic pH with steps of 15, 20, 25, 30, and 50% acetonitrile (ACN) in 10 mM ammonium hydroxide, followed by an acidic fraction with 50% ACN in 0.1% TFA using HR-X reverse-phase cartridges (Macherey-Nagel, Düren, Germany), subsequently purified by C18 StageTips[35] and analyzed by nano-LC-MS/MS as described.[5] The database queries were performed with MaxQuant v1.6.0.16 against the UniProt mouse (UniProt release 2017_10) and rat (2017_12) proteomes including isoform entries at a PSM FDR of 0.01 using standard settings for Bruker impactII Q-TOF instruments. The enzyme specificity set to semispecific, free N-terminus ArgC as dimethylation prevents trypsin cleavage at Lys residues. Cys carbamidomethylation (+57.021464) was set as a fixed modification and MaxQuant labels were configured as light (+28.031300) and heavy dimethylation (+34.063117) on Lys residues and peptide N-termini, whereas Met Oxidation (+15.994914), peptide N-terminal acetylation (+42.010565), and N-terminal pyro-Glu formation from Glu (−18.010565) or Gln (−17.026549) were set as variable modifications.

MANTI Dependencies and Source Code

The only dependency for MANTI is a Perl interpreter, e.g., the widely known Perl distributions ActiveState Perl (https://www.activestate.com/products/perl) or Strawberry Perl (http://strawberryperl.com/) that are available on a variety of operating systems; no additional modules or libraries are necessary. MANTI is available from the project site at sourceforge.net as the source code or bundled with executable files for the Win10 and Linux platforms (https://sourceforge.net/projects/manti/); additionally, R packages are required for the production of the graphs that can be obtained via CRAN: GGally, UpSetR, ggcorrplot, ggpubr, ggrepel, stringr, and svglite. Currently, N-terminal modification by reductive dimethylation with a single formaldehyde isotope (simplex), two (duplex) or three (triplex) stable isotope dimethyl labels, and SILAC labeling and TMT/iTRAQ multiplex labeling are supported.

Statistical Analysis

The auxiliary polvops.R script integrated into MANTI performs a limma-moderated[36]t-test.

Data Visualization

MANTI contains two R[37] scripts that allow the optional output of a variety of data plots with the ggplot2 library,[38] including volcano plots for statistical inference and UpSet plots[39] for data distribution among experiments. First, polvops.R incorporates statistical analysis and visualizes the results as sophisticated volcano plots, with the shape of the data points indicating the modification (circle, dimethylated; triangle, acetylated) and color hue reflecting the abundance change and classification (orange and red, significantly accumulating canonical and noncanonical termini; cyan and blue, depleted canonical and noncanonical termini) of the identified N-terminal peptides at the chosen cutoff values (here: LIMMA-moderated t-test p-value <0.05 and at least 50% change in abundance, log2 ±0.58). Those annotated volcano plots depicting the statistical significance and N-terminal categorization are generated for several, selected p-value and fold-change cutoff values. The second R script accompanying MANTI is called bufipretty.R, which generates a variety of data plots including a one-page PDF summary of the parsed N-terminome data.

Data Availability

Raw MS data have been deposited with the PRIDE repository[40] with the accession codes PXD023299 for the PAN data set and PXD023292 for the WT1het data set.

Results and Discussion

MANTI is a Perl[41] script that parses and validates peptide identifications from semispecific MaxQuant searches and integrates annotation from various sources to derive a consensus N-termini annotation to assist the N-terminome data interpretation. MANTI can be executed as a command-line Perl script or via the graphical user interface Yoğurtlu MANTI (Figure S2). MANTI follows a peptide-centric approach and first retrieves experimental information on the digestion protease, experiment names, fraction numbers, variable modifications considered, and the applied labeling strategies from the summary.txt file of the MaxQuant search result output (Figure b). Subsequently, MANTI parses the peptide sequences, their modification (from the modificationSpecificPeptides.txt), and quantitative information such as unprocessed MS1 intensities and associated MaxQuant ratios or TMT/iTRAQ reporter ion intensities and retrieves associated information such as peptide positions, amino acid information, razor protein ID (from the peptides.txt), and protein IDs (from proteinGroups.txt). In the “validation” step, MANTI eliminates peptides marked as contaminants or decoy database entries. N-terminal modifications considered during database queries typically include the amine modification introduced in the initial labeling step, such as dimethylation or TMT/iTRAQ, endogenous modifications such as peptide N-terminal acetylation, and pyro-Glutamate formation from N-terminal Gln/Glu, which can occur either as an endogenous modification or an artifact during the enrichment workflow. MANTI next removes nonsense identifications such as pyro-Glu-modified peptides lacking an N-terminal Glu or Gln, peptides with dimethyl label at the N-terminal proline residue (not possible as Pro is a secondary amine), and peptide identifications with mismatched last amino acids according to the protease specificity used for the semispecific MaxQuant database search. In addition to these validation filters, stable isotope-based quantification is tested for plausibility, for example, quantifications reported for acetylated peptides that do not contain any lysine in their sequence are removed. All validation steps performed by MANTI are reported and summarized in a validation.txt output file.

Figure 2

N-termini categorization and the MANTI workflow. (a) Protein N-termini can be categorized into canonical N-termini derived from protein synthesis with or without co-translational N-terminal methionine excision (NME), which removes the initiator-methionine, N-terminal signal and transit peptide cleavages for subcellular targeting, cleavage of known inhibitory propeptides converting zymogens into active proteases, and noncanonical N-termini generated by site-specific proteolytic processing. (b) Workflow of MANTI processing. MaxQuant output files (white) are parsed (orange), validated (purple), a preferred protein ID assigned (pink) and processed into quantitative MANTI analysis results (green). (c) Binary decision tree for the assignment of the preferred protein ID by MANTI during the positional annotation step. In the next step, validated N-terminal peptides are annotated with position-specific information from several web-based resources: Known N-terminal sequence features such as pro-, signal-, and transit peptide cleavage positions are retrieved online from UniProt,[22] along with other sequence features such as transmembrane domains, motifs, and active sites (Figure ). Termini-centric information such as known protease cleavage sites, alternative splice sites, and alternative translation start sites can be imported from TopFIND,[42] and improved predictions of signal-or/transit peptides as well as chloroplast targeting sequences for plant data sets can be retrieved from LOCALIZER[43] or TargetP2.0.[44] Annotations and associated prediction scores are presented in the MANTI result, but also saved locally for future reference. Sequences of N-terminal peptides, like any other single peptides, often match several protein entries in a proteome database, which may include closely related proteins with high sequence similarity, annotated isoforms, and redundant or outdated protein sequences not yet discarded by curation efforts. This ambiguity is a critical limitation that impedes further biological interpretation. To limit the number of hypotheses that need to be considered during the evaluation, MANTI utilizes all available sequence and annotation information in a binary decision tree to select a “preferred” UniProt protein accession number among all of the protein entries matching the identified peptide sequences (Figure b). If matching proteome data is available, the preferred razor protein determined by MaxQuant is retained. If not, or multiple protein entries remain, the number of N-terminal peptides matching to each protein ID drives the decision process. Adhering to Occam’s razor principle,[45] ambiguous peptides are counted for the protein ID with the highest number of peptides, and the protein ID with the highest number of N-terminal peptides in the data set is set as the preferred protein. If multiple protein entries remain, the existing annotations are utilized to determine conservative preferred entries: First, high-quality protein models with UniProt protein IDs marked as “reviewed” (manually curated) are preferred, followed by the well-annotated UniProt entries starting with O/P/Q, then protein IDs with cleavage at position 1/2 or an annotated canonical cleavage are preferred. Finally, if no preferred protein ID could be determined up to this point, the UniProt annotation score and as a last consequence, the alphabetically first protein ID is selected as the preferred protein. MANTI thus places preference on high-quality protein models previously proven to exist, and on protein models indicating an origin by known synthesis and conserved processing events before postulating proteolytic processing events. For data sets with multiple experimental conditions, MANTI integrates quantitative information either from label-free quantification or relative abundance ratios determined from stable isotope labeled experiments, filters incorrect quantifications in stable isotope labeling experiments (e.g., N-terminally acetylated peptides without lysine residues resulting in aberrantly high ratios in the light channel), and test for significant changes between the different conditions using a LIMMA-moderated t-test.[36] Finally, MANTI creates a summary of the validated peptides and the main output file listing all N-terminal peptide sequences, their modifications, and associated quantitative and annotation information in the main directory. MANTI further exports several subfolders for detailed investigations of the N-terminal peptides and their properties, including frequencies of N-terminal amino acids and modifications in different subcellular compartments (Figure and Table S1). Additionally, MANTI can automatically generate a variety of data blots using two embedded R scripts to summarize and visualize the data.

Annotation of N-Terminal Peptides and Identification of Proteolytic Cleavages in Kidney Disease Models

Proteases are emerging as key players in kidney disease.[46] We therefore chose to investigate altered processing in two different models of kidney disease, in rats treated with PAN as a chemical kidney injury model,[5] and in heterozygous WT1 knockout mice (WT1het) as a genetic kidney injury model[34] to determine common and distinct changes in proteolytic processing patterns associated with podocyte injury (Figure a). We extracted proteins under denaturing conditions in the presence of protease inhibitors, differentially labeled primary amines in proteins from diseased animals and corresponding controls using different stable isotopologues of formaldehyde and enriched N-terminal peptides by negative selection using the well-established TAILS protocol.[9,12] Subsequent MS/MS analysis and database searches with MaxQuant identified 6764 peptides in 4 biological replicates of PAN-treated rats and 3986 peptides in the WT1 mice, at an FDR of 1%. Nevertheless, the MANTI workflow flagged 8.8% of the parsed peptides in the PAN data set (Figure b) and 10.6% of the peptides in the WT1het data set (Figure d) as hits against the decoy database, peptides derived from common contaminants as marked by MaxQuant, or as invalid identification such as peptides identified with pyro-Glu modification, although they lack an N-terminal glutamic acid or glutamine residue at position 1. Thus, the final data set comprised 6170 bona fide N-terminal peptides from 1813 proteins in the PAN-treated rats and 3565 termini from 1558 proteins in the WT1 heterozygous mice (Figure S3). MANTI then further aggregated the peptides mapping to a unique position within a protein model, including the N-terminal peptides that have the same sequence but differ in their modification (e.g., acetylated and dimethylated peptides) or N-terminal peptides featuring a missed cleavage site or derived from different digestion enzymes. In both data sets, >60% of the sequences defined a unique position (Figure b,d). Almost 69 and 63% of the peptides in the PAN and WT1 data sets were quantified based on the dimethyl label for quantification for both data sets. Lastly, more than 30% of the peptides in each data set feature a remapped preferred protein ID in comparison to the MaxQuant-assigned razor protein (which was typically the alphabetically first of multiple assigned protein IDs), allowing improved categorization (Figure c,e). In both data sets, 1839 N-terminal peptides from 1238 proteins (PAN) and 1605 N-terminal peptides from 1130 proteins (WT1het) matched position 1 or 2 of the preferred protein entries or within 5 amino acids of known or predicted signal peptide, transit peptide, and propeptide processing sites, which we collectively term “canonical” protein termini. As expected,[47] the majority of cytosolic protein termini starting at P1 or at P2 after the removal of the initiating Met were N-terminally acetylated, and only a smaller proportion was modified by a dimethyl label, indicating a free amino group in vivo (Figure c,e).

Figure 3

MANTI analysis of different kidney injury models. (a) Altered proteolytic processing in two models of chronic kidney injury, chemical treatment of rats with puromycin, and genetic deficiency in WT1 protein in mice were analyzed for altered proteolytic processing in comparison to the corresponding control. (b, c) MANTI evaluation of the PAN data set. (b) MANTI processing summary displaying the amount of MANTI-validated, unique, and quantified N-termini. Additionally, the number of reannotated preferred protein IDs is given. (c) N-termini categorization according to integrated annotation sources (UniProt, LOCALIZER/TargetP): canonical N-termini comprise mainly acetylated peptides, the mature protein N-terminus at position 1/2 or cleavages at signal (SP)/transit peptides (TP) and propeptides (PP). The remaining cleavages are mainly dimethylated and feature no expected cleavage, thus these cleavages are categorized as noncanonical (NON). (d, e) Respective graphs for the WT1 data set display a similar distribution of N-termini as the PAN data set with the majority of the identified N-termini being dimethylated and noncanonical. Protein N-termini derived from known proteolytic maturation steps (signal, transit, and propeptide processing) were almost exclusively unmodified in vivo. In contrast, 3383 and 1258 N-terminal peptides matched to positions within 905 and 644 proteins in the PAN and WT1het data sets, respectively (Figures and S3). The vast majority of these, in the following text termed as noncanonical termini, were mostly dimethylated and thus presumably derived from proteolytic processing, whereas a smaller fraction of N-terminal acetylated peptides may arise from unrecognized alternative translation initiation or alternative splicing.[48] Notably, we observed a higher proportion of noncanonical protein termini in the data set from PAN-treated rats (Figure c) compared to that from WT1 mice (Figure e). According to the N-degron rule, protein termini and their modification have a strong impact on protein stability.[49] We therefore implemented an integrated auxiliary script (bufipretty.R) in MANTI to visualize the frequency of different amino acid residues as heatmaps, both collectively and separated by subcellular protein localization as annotated by the UniProt[40] database annotation (Figures S4 and S5). Acetylated N-terminal peptides were mostly found in the cytosol and nucleus-localized proteins (Figures S4a and S5a) and frequently had Met, Ala, and Ser as first amino acids, reflecting previous observations for amino acid occurrence at P1 and P2.[50,51] In contrast, dimethylated N-termini feature a broader distribution of first amino acids (Figure S4c,d) with a bias on Ala and Ser for the noncanonical cleavages (Figures S4d and S5d), followed by Thr, Gly, and Asp. In general, dimethylated, noncanonical cleavages feature more frequently first amino acid residues classified as destabilizing.[49]

Distinct Proteolytic Processes Are Associated with Chronic Kidney Injury in Both Disease Models

To identify proteolytic cleavages significantly altered in injured kidneys, MaxQuant peptide abundance ratios were parsed by MANTI and significant changes (LIMMA-moderated t-test p < 0.05 and at least 50% change in abundance) visualized by sophisticated volcano plots that indicate N-terminal peptide modification and classification (Figure a,b). In the more acute, chemical PAN kidney disease model, 535 (8.7%) of the N-terminal peptides showed significant changes (Figure a), whereas only 214 (6.0%) were found to be significantly altered in the genetic WT1 mouse model (Figure b). To gain insights into underlying proteolytic activities, regulated presumably protease-generated neo-N-terminal peptides (dimethylated noncanonical peptides) were aligned and the sequence surrounding the cleavage sites plotted using the iceLogo server[52] (Figure c–f). Interestingly, similar iceLogos were obtained in both disease models. Depleted N-terminal peptides had an over-representation of Arg at P1 preceding the cleavage site (Figure c,e). The WT1 data set also indicated a higher prevalence of Trp and Asn at this position (Figure e), although this may in part derive from the low number of reduced cleavage events observed in heterozygous WT1 mice (27 compared to 253 in PAN-treated rats; Figure c). The significantly upregulated N-termini show a strong over-representation of Asn in both disease models (Figure d,f). In the PAN model, the cleavage logo additionally features Glu at P1 (Figure d). Together, this suggests that common proteolytic activities are regulated in both chronic kidney injury models. Arg-specific proteolytic cleavages appear less prevalent, as also observed in our previous in vitro data set using PAN-treated podocyte cell cultures,[5] whereas cleavage after Asn is induced. It is tempting to speculate that the accumulating N-terminal peptides in PAN-treated rats and WT1het might be derived from asparaginyl endopeptidases (AEPs/legumains), cysteine proteases that exhibit a clear preference for cleavage after an Asn residue at an acidic pH.[53] Legumain is localized in lysosomes and endocytic pathway and has previously been implicated in extracellular matrix remodeling of renal proximal tubular cells.[54] In contrast, the Arg-specific cleavage motif derived from noncanonical N-termini with reduced abundance is often (although not exclusively) observed for trypsin-like extracellular serine proteases.[55] In PAN-treated rats, Leu is additionally over-represented at the positions P2/P3 as well as Phe at P2, which is consistent with the specificity of kallikreins.[56] Interestingly, a group of gene ontology[57] (GO) terms are regulated differentially in the two animal models as evaluated by Enrichr analysis:[58] hexose biosynthetic process, glucose metabolic process, and gluconeogenesis are strongly enriched in the significantly accumulating N-termini (Figure f, Table S7) for WT1het mice and decreased N-termini for the PAN-treated rats (Figure c and Table S6). Sorbitol dehydrogenase (Sord) was one example where proteolytic N-termini were differently regulated: rat Sord (P27867) N-termini starting at position 120 (CATPPDDGNLCR) and 192 (AKAMGASQVVVIDLSASR) were significantly depleted in the PAN treatment, whereas for the mouse Sord (Q64442), one proteolytic N-terminus starting at position 41 (SVGICGSDVHYWEHGR) significantly accumulated in the WT1het mice. Notably, in both models, the canonical N-terminus did not show a strong change in abundance, indicating that Sord protein turnover may be differentially regulated in both models. In both models, proteolysis-derived neo-N-termini of proteins involved in blood coagulation accumulated. Specifically, accumulation of neo-N-termini in the PAN model belonged to proteins associated with regulated exocytosis and protein metabolism (Table S5), whereas depleted neo-N-termini in the WT1het mice mainly comprised of fatty acid metabolism proteins (Table S8).

Figure 4

N-termini alterations in the PAN and WT1 kidney disease models. (a, b) Volcano plots generated with the polvops.R script display significantly accumulated (log2 > 0.58, LIMMA-moderated t-test p < 0.05) and depleted (log2 < −0.58, LIMMA-moderated t-test p < 0.05) canonical (orange/cyan) and noncanonical (red/blue) N-terminal peptides. N-terminal modifications are presented by the data point shapes (circle, dimethylated; triangle, acetylated). iceLogos visualize the amino acid frequency of regulated proteolytic cleavage events deduced from putative proteolysis-generated N-terminal peptides (significantly regulated, dimethylated noncanonical peptides), nonredundant N-terminal cleavage windows of the corresponding N-termini for PAN (c, d), and WT1 data sets (e, f) including the most significantly enriched gene ontology (GO) biological process (BP) terms as determined by Enrichr. The number of nonredundant, N-terminal cleavage windows used for the generation of the iceLogo is depicted on each panel, and the scissile bond is marked by a dashed line. To identify common and distinct processing events, we first determined N-terminal peptides mapping to similar cleavage sites in homologous rat and mice proteins. The majority of the shared N-terminal peptides showed no strong changes in both data sets (Figure S6a). A subset of mostly canonical acetylated N-terminal peptides matching positions 1 and 2 was strongly depleted in both models (Figure S6b), indicating decreased protein abundance. Notably, we did not observe common induced cleavage events. A number of neo-N-termini accumulated in the PAN rats with unchanged abundance in heterozygous WT1 mice (Figure S6a,c). Among these were cleavages in cytoskeletal proteins such as actin (Actv) and serum proteins such as albumin (Alb), fibrinogen alpha (Fba)/beta (Fbg), and complement C3 (Figure S6c). Consistent with the acute, more prominent PAN-induced kidney damage, N-terminal peptides were observed, indicating the proteolytic activation of complement C3 in cisplatin-induced kidney injury in mice,[59] while the observed peptides derived from fibrinogen cleavage have been proposed as potential biomarkers for kidney disease.[60]

Conclusions

Protein N-terminal peptides provide a wealth of information on co- and post-translational N-terminal modifications, protein stability, and proteolytic processing. We present MANTI as a new software tool that integrates well-established sources of protein annotation information including UniProt, LOCALIZER,[43] TargetP,[44] and TopFINDer[42] with routines for statistical evaluation of quantitative changes such as limma-moderated t-tests[36] to facilitate the generation of testable hypotheses on the origin and the function of identified protein N-termini. A graphical user interface, Yoğurtlu MANTI, provides a convenient, low-threshold opportunity for N-terminome data annotation while auxiliary R scripts (bufipretty.R and polvops.R) further enable visualization and characterization of the N-termini data set in publication-quality graphs. MANTI has been designed to handle peptide identification data from the popular MaxQuant data analysis pipeline, but it is envisioned that MANTI will be expanded to accommodate identifications obtained from other common data analysis tools such as OpenMS[61] and the TPP.[27] The source code is freely available under a Perl Artistic License v2.0 and usable for commercial and private purposes without any limitations. Finally, we demonstrate the value of MANTI for biological interpretation using two large-scale N-terminome data sets comprising >10 000 identified N-terminal peptides from two animal models of chronic kidney disease, which revealed common and distinct changes in proteolytic processing. These data sets further substantiate the importance of proteolytic processing in maintaining glomeruli function and may serve as a useful resource for further studies.[46]

56 in total

1. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.

Authors: Jürgen Cox; Matthias Mann
Journal: Nat Biotechnol Date: 2008-11-30 Impact factor: 54.908

2. WT1-dependent sulfatase expression maintains the normal glomerular filtration barrier.

Authors: Valérie A Schumacher; Ursula Schlötzer-Schrehardt; S Ananth Karumanchi; Xiaofeng Shi; Joseph Zaia; Stefanie Jeruschke; Dongsheng Zhang; Hermann Pavenstädt; Hermann Pavenstaedt; Astrid Drenckhan; Kerstin Amann; Carrie Ng; Sunny Hartwig; Kar-Hui Ng; Jacqueline Ho; Jordan A Kreidberg; Mary Taglienti; Brigitte Royer-Pokora; Xingbin Ai
Journal: J Am Soc Nephrol Date: 2011-06-30 Impact factor: 10.121

Review 3. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics.

Authors: Eric W Deutsch; Luis Mendoza; David Shteynberg; Joseph Slagel; Zhi Sun; Robert L Moritz
Journal: Proteomics Clin Appl Date: 2015-04-02 Impact factor: 3.494

4. OpenMS: a flexible open-source software platform for mass spectrometry data analysis.

Authors: Hannes L Röst; Timo Sachsenberg; Stephan Aiche; Chris Bielow; Hendrik Weisser; Fabian Aicheler; Sandro Andreotti; Hans-Christian Ehrlich; Petra Gutenbrunner; Erhan Kenar; Xiao Liang; Sven Nahnsen; Lars Nilse; Julianus Pfeuffer; George Rosenberger; Marc Rurik; Uwe Schmitt; Johannes Veit; Mathias Walzer; David Wojnar; Witold E Wolski; Oliver Schilling; Jyoti S Choudhary; Lars Malmström; Ruedi Aebersold; Knut Reinert; Oliver Kohlbacher
Journal: Nat Methods Date: 2016-08-30 Impact factor: 28.547

Review 5. Missing the target: matrix metalloproteinase antitargets in inflammation and cancer.

Authors: Antoine Dufour; Christopher M Overall
Journal: Trends Pharmacol Sci Date: 2013-03-26 Impact factor: 14.819

Review 6. Matrix metalloproteinases: regulators of the tumor microenvironment.

Authors: Kai Kessenbrock; Vicki Plaks; Zena Werb
Journal: Cell Date: 2010-04-02 Impact factor: 41.582

7. Legumain/asparaginyl endopeptidase controls extracellular matrix remodeling through the degradation of fibronectin in mouse renal proximal tubular cells.

Authors: Yoshikata Morita; Hisazumi Araki; Toshiro Sugimoto; Keisuke Takeuchi; Takuya Yamane; Toshinaga Maeda; Yoshio Yamamoto; Katsuji Nishi; Masahide Asano; Kanae Shirahama-Noda; Mikio Nishimura; Takashi Uzu; Ikuko Hara-Nishimura; Daisuke Koya; Atsunori Kashiwagi; Iwao Ohkubo
Journal: FEBS Lett Date: 2007-03-05 Impact factor: 4.124

8. limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971

9. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

Authors: Maxim V Kuleshov; Matthew R Jones; Andrew D Rouillard; Nicolas F Fernandez; Qiaonan Duan; Zichen Wang; Simon Koplev; Sherry L Jenkins; Kathleen M Jagodnik; Alexander Lachmann; Michael G McDermott; Caroline D Monteiro; Gregory W Gundersen; Avi Ma'ayan
Journal: Nucleic Acids Res Date: 2016-05-03 Impact factor: 16.971

10. The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Authors: Yasset Perez-Riverol; Attila Csordas; Jingwen Bai; Manuel Bernal-Llinares; Suresh Hewapathirana; Deepti J Kundu; Avinash Inuganti; Johannes Griss; Gerhard Mayer; Martin Eisenacher; Enrique Pérez; Julian Uszkoreit; Julianus Pfeuffer; Timo Sachsenberg; Sule Yilmaz; Shivani Tiwary; Jürgen Cox; Enrique Audain; Mathias Walzer; Andrew F Jarnuczak; Tobias Ternent; Alvis Brazma; Juan Antonio Vizcaíno
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

3 in total