Literature DB >> 26306205

The Pacific Northwest National Laboratory library of bacterial and archaeal proteomic biodiversity.

Samuel H Payne1, Matthew E Monroe1, Christopher C Overall1, Gary R Kiebel1, Michael Degan1, Bryson C Gibbons1, Grant M Fujimoto1, Samuel O Purvine2, Joshua N Adkins1, Mary S Lipton1, Richard D Smith1.   

Abstract

This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26306205      PMCID: PMC4540001          DOI: 10.1038/sdata.2015.41

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

Global measurements of -omic molecular data (genome, transcriptome, proteome, metabolome, etc.) are changing the way we research and think about biological systems. Computational biology research, which attempts to identify novel biological phenomena using these large-scale global measurements, depends on publically available data for training and testing new algorithms. Repositories like GEO[1] were vital to the development of robust computational methods for analyzing microarray and other genomics technologies. Therefore, depositing complementary proteomics data for a large number of organisms is a similarly valuable public resource. Researchers at the Pacific Northwest National Laboratory have participated in hundreds of collaborative projects that have involved mass spectrometry-based proteomic analysis of more than 300 species or distinct environmental communities. A portion of this data has been freely available through our website (omics.pnl.gov) for almost a decade, while metadata is maintained by our in-house LIMS systems[2]. In addition to the numerous project specific publications, meta-analyses of this massive corpus have advanced both computational algorithms[3-5] and biological discovery[6-8]. The size of the library, however, has precluded broad distribution due to a lack of public repositories large enough to host the data. Recently, the ProteomeXchange[9] repository system enabled accommodation of significantly larger data volumes. The purpose of this Data Descriptor is to announce the deposition of proteomics data from 112 microbial organisms representing 15 phyla into public 3rd party repositories (Table 1 (available online only)). All the data has been prepared, parsed and organized in a uniform manner to facilitate analysis and reuse (Fig. 1). The combined data deposited is 13 TB (compressed) from 35,162 mass spectrometry files and their associated analysis files. In total, the library contains >70 million spectra identified at q<0.0001, with 3 million peptides from 230,000 proteins. The median number of observed proteins per organism is 2154, or roughly half of the annotated proteins in the proteome. By releasing this data, we hope to promote open science. In this manuscript, we describe a variety of re-uses for mass spectrometry, algorithmic computation and basic biology.
Table 1

Overview of the organisms included in the PNNL Biodiversity Library

Organism Name # datasets Proteins Peptides Spectra False Spectra
Each organism is listed along with information about how many datasets, proteins, peptides and spectra were included in the Library creation. The number of false spectra identifications, as estimated by the decoy search strategy, is also included to help understand the quality of the data in the Library.     
Acidiphilium_cryptum_JF-54014547306123,9440
Actinosynnema_mirum_DSM_438278520831397580,9740
Anabaena_variabilis341332435814456,60013
Anaeromyxobacter_dehalogenans201250768554,0920
Anaplasma_phagocytophilium12969030668,3630
Arthrobacter_sp_FB24353327946122663,86129
Bacillus_anthracis_Ames20849648736,0140
Bacillus_anthracis_Sterne158250626165152,3390
Bacillus_subtilis_168242243539415709,1790
Bartonella_henselae_Houston-1160124536347508,8031
Borrelia_burdorferi_B3110487719060193,7431
Brachybacterium_faecium_DSM_481078208829248121,9140
Burkholderia_mallei189197033824270,6092
Candidatus_chloracidobacterium_thermophilum11156323606,0377
Caulobacter_crescentus_CB1510162971637732,320,21241
Cellulomonas_flavigena_DSM_20109103238238033175,1480
Cenarchaeum_symbiosum1083587574,1720
Chlorobaculum_tepidum_WT94158426561137,1500
Chloroflexus_aurantiacus225273644200323,03616
Clostridium_thermocellum4082110616181,140,2515
Cryptobacterium_curtum_DSM_156417811151788875,9030
Cyanobacterium_synechocystis_PCC6803279225127351344,4730
Cyanothece_sp_ATCC5114214043309572191,879,2390
Cyanothece_strain_ATCC5147293214620702134,8790
Cyanothece_strain_PCC742410921822076391,3890
Cyanothece_strain_PCC7425104259027429114,4980
Cyanothece_strain_PCC7822166323037493336,0520
Cyanothece_strain_PCC880190239923313145,7140
Cyanothece_strain_PCC880281679395822,7850
Dehalococcoides_ethenogenes437102525378262,6440
Deinococcus_radiodurans_R13682212434091,002,9310
Delta_proteobacterium_NaphS215822101300794,3780
Desulfovibrio_desulfuricans_G20463256448057916,18328
Desulfovibrio_sp_ND13292257342717225,5460
Desulfovibrio_vulgaris_Hildenborough218239736518451,5180
Dethiosulfovibrio_peptidovorans_DSM_110027817782528795,9400
Ehrlichia_chaffeensis127532404512,5450
Enterobacter_cloacae_SCF158179319194235,7210
Escherichia_coli_BL21191156420792190,6450
Escherichia_coli_K-123925332412211411,592,291461
Escherichia_coli_RK43533117661708953,3340
Fibrobacter_succinogenes_S85122150517408265,2190
Geobacter_bemidjiensis_Bem_T780256633313682,8970
Geobacter_metallireducens_GS-15155245527442245,6910
Geobacter_sulfurreducens_PCA11222770452982,597,9150
Geobacter_uraniumreducens287277942812401,5860
Haloferax_volcanii341463950256,6690
Halogeometricum_borinquense_DSM_115518322161877656,2250
Halorhabdus_utahensis_DSM_129408420822298789,8200
Heliobacterium_modesticaldum77154922105107,1000
Kineococcus_radiotolerans_SRS30216192274238694480,7050
Kosmotoga_olearia_TBF_19-5-125878721929,0800
Methanosarcina_barkeri89185224889116,2240
Methanospirillum_hungatei_JF-17817222304160,3000
Methylophilales_HTCC218161117220615104,6032
Mycobacterium_tuberculosis6472714352061,296,0187
Nakamurella_multipartita_DSM_442337821711978257,5820
Nocardiopsis_dassonvillei_DSM_431118019651650776,4040
Novosphingobium_aromaticivorans_F1992315631278375,8501
Opitutaceae_bacterium_TAV2125222920659292,6440
Pelagibacter_ubique_HTC10624261201305361,294,94558
Pelobacter_carbinolicus_DSM_238022830416819,0780
Prochlorococcus121128519310158,2250
Pseudomonas_aerunginosa40305424670109,8541
Pseudomonas_fluorescens_PfO-1397343744819535,8830
Pseudonocardia_sp481884481,4060
Ralstonia_pickettii194  5,7860
Rhodobacter_capsulatus_SB10034952761588941,316,84476
Rhodobacter_sphaeroides_2.4.116393362729773,982,59647
Rhodopseudomonas_palustris587274325366247,3730
Roseiflexus_castenholzii7625372841491,0630
Saccharomonospora_viridis_DSM_430177819181545839,3840
Salmonella_typhi_TY24992729502929,832,722385
Salmonella_typhimurium_ATCC_1402837013779914961,555,61728
Salmonella_typhimurium_LT28233214623991,051,5653
Sanguibacter_keddieii_DSM_105427822132048989,4270
Shewanella_amazonensis_SB2B114207927295469,39732
Shewanella_baltica_OS1556312988477251,874,615112
Shewanella_baltica_OS185190285341613395,8674
Shewanella_baltica_OS19565219323560175,2375
Shewanella_baltica_OS2237521582239192,4240
Shewanella_denitrificans_OS21791231325045240,0690
Shewanella_frigidimarina_NCIMB_40066228626819192,7587
Shewanella_loihica_PV-468208227260181,8145
Shewanella_oneidensis_MR-139543307941955,561,459442
Shewanella_putrefaciens_20064220025524191,9462
Shewanella_putrefaciens_CN-3278233829135254,18714
Shewanella_putrefaciens_W3-18-180197518900273,70512
Shewanella_sp_ANA-362215425870145,4744
Shewanella_sp_MR-465205225724195,5476
Shewanella_sp_MR-764221729196178,77616
Sinorhizobium_medicae40255717954164,0882
Sinorhizobium_meliloti_1021137254721631581,0602
Slackia_heliotrinireducens_DSM_2047679164522440105,6671
Stackebrandtia_nassauensis_DSM_44728104215822213138,7350
Sulfolobus_acidocaldarius_DSM_6393313801609392,4020
Synechococcus_sp_PCC700210502606524691,842,4521
Syntrophobacter_fumaroxidans8018931884671,2870
Thermobispora_bispora_DSM_43833781470919323,8610
Thermosynechococcus_elongatus_BP-113712668161182,6980
Thermosynechococcus_sp_NAK55471106958787,5500
Thermotoga_maritima281152652050544,7956
Thiocapsa_marina_DSM_5653T1301632857726,6490
Verrucomicrobium_sp_TAV13222881505979,8960
Verrucomicrobium_sp_TAV53321161289387,4432
Xylanimonas_cellulosilytica_DSM_15894200248741090319,0300
Yersinia_enterocolitica86162211983107,7312
Yersinia_pestis_CO92210230233289993,71419
Yersinia_pestis_KIM24013711108498,7800
Yersinia_pestis_Pestoides_F99215322980523,77916
Yersinia_pseudotuberculosis_IP_32953400215124690301,17511
Yersinia_pseudotuberculosis_PB1_Plus99202920729470,57516
 35162229,5373,147,57670,455,9911951
Figure 1

Workflow for library creation.

Biological samples are used to create MS/MS data as part of an experiment and primary publication. All of this data is stored on our servers for re-analysis. Historical data was collated by organism and researched for release in the Biodiversity Library.

As part of the analysis, we have cross referenced protein identifications to KEGG functional annotation where possible. Nine of the 112 organisms are not processed by KEGG, and therefore were excluded from this additional analysis. When viewing the Library as a whole, annotated biological pathways are broadly covered by the identified proteins. For example, the reference ‘cysteine and methionine metabolism pathway’ as defined by KEGG consists of 81 orthologous genes participating in 73 reactions. As expected, not all orthologs are annotated in every genome, e.g., Cellulomonas flavigena has only 23 of the 81 genes. By searching all MS/MS data with standard RefSeq databases, we can easily identify that 21 of the 23 Cellulomonas genes were observed in MS/MS data, or 91%. When considering all organisms in the Library, the median coverage of the cysteine and methionine metabolism pathway is 89%. A summary of the coverage of every KEGG pathway for each organism is presented in Supplementary Table 1. Using KEGG pathway categories, we determined the median coverage of all functionally classified proteins (Fig. 2). For example, in all 13 pathways for amino acid metabolism, the median coverage across the entire library is 89%. This high coverage is seen for most KEGG pathway categories: 82% for lipid metabolism, 83% for vitamin and cofactor metabolism, etc.
Figure 2

Kegg pathway coverage.

(top) Using the pathway classifications provided by KEGG, we can determine how many annotated proteins were identified in mass spectrometry data. The cysteine and methionine metabolism pathway is provided as an example. For each organism, we calculate the percentage of identified proteins. C. flavigena has 23 proteins in the pathway, 21 of which were observed (91%). The box plot shows average coverage of the 103 organisms that KEGG has annotated. Circles depict outliers. (bottom) Pathway coverage for all 13 amino acid metabolic pathways is shown.

Methods

As the library encompasses 35,162 mass spectrometry files from 10+ years of research, it is impossible to fully describe the evolving and diverse protocols for experimental sample preparation or data acquisition. In Supplementary Table 2, we have provided data from our LIMS system[2] about each sample data file (called a dataset). Below is a set of descriptions that represent a large fraction of the methods applied to generate the released datasets. Either an established or optimized protein extraction protocol was applied to each sample[7]. In brief, a typical experimental approach included global (total), insoluble, and soluble protein extractions from lysed cell cultures that were then washed and suspended in 100 mM NH4HCO3, pH 8.4 buffer. Global protein extracts were denatured and reduced by adding urea, thiourea, and dithiothreitol (DTT) followed by incubation at ~60 °C for ~30 min. Following incubation, the global protein samples were diluted to reduce salt concentration and then proteolytic digested, at 37 °C for ~4 h, using sequencing grade trypsin (Roche, Indianapolis, IN) at a ratio of 1 unit per 50 units of protein (1 unit=~1 μg of protein). Following incubation, digested samples were desalted using an appropriately sized C-18 SPE column (Supelco, St Louis, MO) and a vacuum manifold. The collected peptides were concentrated to a final volume ranging from 50 to 100 μl and measured using the BCA assay (Pierce Chemical Co., Rockfort, IL) according to the manufacturer's instructions. Insoluble protein extracts were produced by ultracentrifuging the cell lysate at 4 °C and 100,000 rpm for 10 min. The resulting supernatant that contained soluble proteins was separated from the pellet and retained for digestion as previously described for the global extraction. The pellet was washed by suspending it in 100 mM NH4HCO3, pH 7.8, using mild sonication and then ultracentrifuged at 100,000 rpm for 5 min, again at 4 °C. Following centrifugation, the pellet was resuspended in a solubilizing solution that contained urea, thiourea, 1% CHAPS in 50 mM NH4HCO3, pH 7.8. An aliquot of 50 mM DTT solution was also added to final concentration of 5 mM. The insoluble protein sample was then incubated and digested as described above with the exception that a 50 mM NH4HCO3, pH 7.8 buffer was used for the dilution step. Following proteolytic digestion, the pH of the sample was slowly lowered to <4.0 by adding small volumes (1 to 2 μl) of 20% formic acid. Removal of salts and detergent was performed using either an appropriately sized strong cation exchange (SCX) or solid phase extraction column (Supelco, St Louis, MO) and vacuum manifold. Peptides were then concentrated and their concentration measured as described above. The HPLCs used to run the samples were built in-house utilizing various commercial pumps, valves, and auto samplers, all of which were coordinated by a custom software package called LCMSnet. The data sets analyzed for this paper were run using LC columns that were 75 μm inner diameter, and either 30 or 65 cm in length. These LC columns were packed in house with Phenomenex Jupiter C18 3 μm porous beads. The flow rate was 300 nl/min. Mobile phase A is 0.1% formic acid in water and mobile phase B is 0.1% formic acid in acetonitrile. The 100 min gradient was delivered by starting at 5% mobile phase B and advancing to 8, 12, 35, 60, and 75% at times (in minutes) 2, 20, 75, 97, 100 respectively. Typically 2.5 μg of peptides were loaded to the head of the column or to a trapping column. Although operating conditions varied by capabilities of each instrument, typical conditions for each are as follows. The LTQ was run in data-dependent MS-MS mode, selecting the top 10 parent ions from each survey scan. The LTQ-Orbitrap and the Velos-Orbitrap instruments were typically set to have a high resolution survey scan of 60,000 resolution followed by the top 6 or 10 data-dependent MS-MS scans, respectively. Because of the diversity of data sets presented in this work, this is not a comprehensive list of conditions. Instrumentation details can be found in the raw data files (.RAW or.mzML).

Code availability

Software used in the generation of this project is largely third party software as described in the Data Records section, i.e., MSGF+ and Bibliospec. The only remaining software was to link protein identifications to KEGG functional assignments. This was done via custom parsing of the files and cross-referencing the KEGG database. This code is trivially reproducible.

Data Records

To maximize the utility and ease of access, the data described in this publication have been uploaded to the ProteomeXchange[9] with accession PXD001860 via MassIVE (Data Citation 1). On MassIVE (identifier MSV000079053), each organism’s data is located in a separate folder, with both raw and processed data as described below. Data is organized around a tandem mass spectrometry file that represents one run of the instrument on a biological sample. In our terminology this is called a dataset. Each dataset has the following associated files.

Mass spectrometry data

Each dataset is available in the original vendor format and the community standard open format mzML[10]. These files contain the raw mass spectra. Mass spectrometry data is a combination of MS and MS/MS data showing both the detection of all analytes at a particular time in chromatography (MS data) and the fragmentation of a particular analyte (MS/MS data). See the review by Aebersold and Mann for a basic primer of proteomic mass spectrometry data[11].

Peptide identifications

Each dataset is associated with a file describing the peptides that were identified via the spectra. This file was created using the MSGF+ algorithm[12] version v9979. All 35,162 datasets were analyzed with a consistent set of parameters. Searches included oxidized methionine as an optional post-translational modification, and specified partial trypsin specificity. For experiments that utilized iodoacetimide as an alkylation agent, the static modification (C+57) was also added. Precursor and fragment mass tolerances were set according to the resolving power of the mass analyzer. The output of MSGF+ is stored in the community standard mzIdentML format[13], which describes the peptide/spectrum match (PSM), search parameters and scoring details. The one caveat for peptide identification was that three organisms did not have a RefSeq proteome set derived from a publically available genome sequence. Escherichia coli RK4353 did not have a sequence genome at NCBI, so we used the relative BW2952 strain. Cyanothece strain ATCC51472 also lacks a sequence at NCBI; we substituted strain 8801. Thiocapsa marina DSM_5653T lacks a RefSeq genome; the Genbank submission was used instead.

Metadata

Data acquired at PNNL has been tracked using an in-house LIMS system since 2000. Each dataset is recorded with a variety of details including: acquisition date and time, instrument, chromatography details, organism, etc. These metadata are presented in Supplementary Table 2 with this publication.

Spectrum library

A spectrum library is a condensed collection of annotated tandem mass spectra. In addition to serving as an efficient storage format for very large datasets, these libraries are also utilized for annotating new datasets[14,15]. With this deposition, we created a spectrum library for each microbial organism using Bibliospec[16]. Peptide/spectrum matches were filtered for high quality matches (MSGF+’s q-value <0.0001). When viewed in aggregate, the 112 organisms had 70,455,991 spectra passing this cutoff (with 1951 false hits and an estimated FDR of 2e-5). This strict filtering is necessary to control false-positives when creating very large libraries. The libraries, stored as.blib files, are also available on the MassIVE repository.

Technical Validation

When releasing the Library, we took a conservative stance on spectral quality. Considering the large number of spectra, even a 1% false-positive rate would mean polluting the resource with nearly one million false-positive spectral identifications. Moreover, a well-known problem in proteomics is that aggregating numerous datasets leads to the inflation of false-positives when considering the entire group. This is especially true when rolling results up to a peptide or protein level as many true spectra are associated with a single true protein, whereas false proteins are typically represented by very few false-positive spectra identifications. The primary method to reduce false-positive peptide and protein identifications is to be more stringent on spectrum quality. When aggregating 35,162 datasets into the Library, using a typical qvalue cutoff of 0.01 on each individual dataset was insufficient to ensure high quality of the library as a whole (Fig. 3). Although the spectral false discovery was indeed 1%, the protein level false discovery was an astonishing 37%. We applied a qvalue cutoff of 0.0001, or two orders of magnitude more stringent than common practice. In this filtering process, 23 million true-positive spectra are removed. Although this may seem overly conservative, the more stringent filter also removed 600,000 false-positive peptides and 200,000 false-positive protein identifications. This allowed for a permissible false-discovery rate at spectrum, peptide, and protein levels (0.00002, 0.00009 and 0.001 respectively).
Figure 3

False Discovery Rate.

Due the large nature of the Library, the false-discovery rate of the aggregated data can inflate significantly, especially when rolled up to protein and peptide level. Data is shown for the FDR of the entire Library when using a specified qvalue cutoff of PSMs from the MSGF+ results. When using a loose PSM filter of qvalue<0.01, the protein and peptide FDR rates are unacceptably high. We choose the cutoff qvalue<0.0001, which produces high data quality at spectra, peptide and protein levels.

Usage Notes

Our purpose in depositing such a large corpus of data is to promote reuse and open science. The richness of the PNNL Biodiversity Library is seen in both the breadth and depth of coverage for proteins and phylogeny. Besides sheer size, a unique feature of the Library is the pairs of spectra that come from similar peptides; one million peptides in the Library are one mutation away from another peptide (edit distance=1). These pairs originate from orthologues, where the proteins share significant sequence identity (Fig. 4). Indeed, 21,721 peptides have four or more ‘one mutation’ neighbours. This vast web of sequence related spectra can be productively mined for a wide variety of bioinformatics and fundamental mass spectrometry research.
Figure 4

Peptide observation across taxa.

This is a multiple sequence alignment of a section of an ABC transporter (accessions and organism given), with observed peptides from the PNNL Biodiversity Library in blue. For simplicity, we displayed sequences from the Actinobacteria phylum, with Roseiflexus as an out group. The right side of the alignment shows consistent discovery in proteomics data across the phylum and in the out group. The left side of the alignment is only observed in the proteomics data for Arthrobacter sp. FB24.

Ion fragmentation

Exploring the fundamentals of fragmentation is typically done working with purified peptides in low throughput[17,18]. With the Biodiversity library, however, pairs of related spectra could easily be mined to understand the effect of residue changes on the intensity of fragment ions. For example, there are 2,854 peptides where sequences only differ in that an alanine residue is changed to a serine residue. Additionally, many peptides are repeatedly identified. Indeed 53,828 peptides have over 200 spectra. Replicate spectra for a peptide are often used in understanding and modelling fragmentation patterns. However, in the library we note that 30,672 peptides with over 200 spectra are from conserved regions of proteins found in multiple organisms. Thus they contain distinct background and noise in the MS/MS spectra, aiding in the identification of novel fragment peaks.

Proteotypic peptides

Computational prediction of which peptides are discoverable in experimental conditions is a valuable tool in proteomics workflows[19]. Such machine learning efforts will undoubtedly improve with the 3 million peptides provided by the PNNL Biodiversity Library. Yet the related sequences mentioned above provide a truly distinct perspective on peptide observability. Several important features of orthology can be utilized to improve the quality of machine learning predictions. First, as seen in Fig. 4, there are regions of a protein sequence which are fundamentally observable. In many orthologs spanning a large phylogeny, these regions are consistently observed. The sequence variation present in these regions can be leveraged to identify the physiochemical factors that govern mass spectrometry identification. Also seen are regions that are rarely observed. These could provide valuable negative training data for machine learning approaches.

Library search of MS/MS data

Spectrum annotation via library search is both faster and more sensitive than database search algorithms[20]. Due to a lack of data, library search has previously not been practical except for the most commonly used model systems (e.g., human and yeast). Since the Biodiversity Library contains data for nearly every model system, including numerous environmentally and medically relevant microbes, peptide identification via spectrum library matching becomes an attractive alternative to database searching.

Novel scoring functions

Bioinformatics algorithms to identify peptides from mass spectrometry data are constantly being developed and refined. For these, having free access to a large pool of training data is essential[12,21-23]. With data presented on different classes of instruments and multiple fragmentation modalities, the PNNL Biodiversity Library is an ideal source of data to test new scoring functions.

Unidentified spectra

Another application that we envision is the investigation of unidentified or unattributed spectra. With tens of thousands of LC-MS/MS data sets, there are literally hundreds of millions of fragmentation spectra for which there is not a confident identification using the current search tool and parameters. Of those unidentified species, many are fragmented in multiple data sets; spectrum averaging or other methods could be utilized to obtain a confident identification.

Novel post-translational modifications

For simplicity and sensitivity, only the most common post-translational modification (oxidized methionine) was included in the database search parameters. However, numerous post-translational modifications are observable in proteomics mass spectrometry[24]. Some modifications are rare, and therefore not commonly included in database searches. We recently uncovered a novel PTM switch in Salmonella for S-thiolation[25] and believe that many such unexpected post-translational modifications exist. Identifying which observed PTMs are functionally relevant is a difficult task, but observing it across different taxa and showing evolutionary conservation provides a valuable filter for high-priority targets[26,27].

Proteogenomics

The process of using peptides from mass spectrometry to assist genome annotation, or proteogenomics, has been very successful in identifying both false-negative omissions in a genome’s protein list, and also false-positives. To date most of the work in this area has been focused on a single genome, or a group of closely related genomes[4,28,29]. With the Biodiversity Library, one can now attempt to leverage identifications across an entire phylum, or perhaps the entire tree of life.

Additional Information

How to cite this article: Payne, S. H. et al. The Pacific Northwest National Laboratory library of bacterial and archaeal proteomic biodiversity. Sci. Data 2:150041 doi: 10.1038/sdata.2015.41 (2015).
  28 in total

1.  Open mass spectrometry search algorithm.

Authors:  Lewis Y Geer; Sanford P Markey; Jeffrey A Kowalak; Lukas Wagner; Ming Xu; Dawn M Maynard; Xiaoyu Yang; Wenyao Shi; Stephen H Bryant
Journal:  J Proteome Res       Date:  2004 Sep-Oct       Impact factor: 4.466

2.  The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search.

Authors:  Sangtae Kim; Nikolai Mischerikow; Nuno Bandeira; J Daniel Navarro; Louis Wich; Shabaz Mohammed; Albert J R Heck; Pavel A Pevzner
Journal:  Mol Cell Proteomics       Date:  2010-09-09       Impact factor: 5.911

3.  Computational prediction of proteotypic peptides for quantitative proteomics.

Authors:  Parag Mallick; Markus Schirle; Sharon S Chen; Mark R Flory; Hookeun Lee; Daniel Martin; Jeffrey Ranish; Brian Raught; Robert Schmitt; Thilo Werner; Bernhard Kuster; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2006-12-31       Impact factor: 54.908

4.  Does trypsin cut before proline?

Authors:  Jesse Rodriguez; Nitin Gupta; Richard D Smith; Pavel A Pevzner
Journal:  J Proteome Res       Date:  2007-12-08       Impact factor: 4.466

5.  Estimating probabilities of correct identification from results of mass spectral library searches.

Authors:  S E Stein
Journal:  J Am Soc Mass Spectrom       Date:  1994-04       Impact factor: 3.109

6.  Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis.

Authors:  Xin Zhang; Yunzi Li; Wenguang Shao; Henry Lam
Journal:  Proteomics       Date:  2011-02-07       Impact factor: 3.984

Review 7.  Building and searching tandem mass spectral libraries for peptide identification.

Authors:  Henry Lam
Journal:  Mol Cell Proteomics       Date:  2011-09-06       Impact factor: 5.911

8.  Top-down proteomics reveals a unique protein S-thiolation switch in Salmonella Typhimurium in response to infection-like conditions.

Authors:  Charles Ansong; Si Wu; Da Meng; Xiaowen Liu; Heather M Brewer; Brooke L Deatherage Kaiser; Ernesto S Nakayasu; John R Cort; Pavel Pevzner; Richard D Smith; Fred Heffron; Joshua N Adkins; Ljiljana Pasa-Tolic
Journal:  Proc Natl Acad Sci U S A       Date:  2013-05-29       Impact factor: 11.205

9.  A proteogenomic update to Yersinia: enhancing genome annotation.

Authors:  Samuel H Payne; Shih-Ting Huang; Rembert Pieper
Journal:  BMC Genomics       Date:  2010-08-05       Impact factor: 3.969

10.  The mzIdentML data standard for mass spectrometry-based proteomics results.

Authors:  Andrew R Jones; Martin Eisenacher; Gerhard Mayer; Oliver Kohlbacher; Jennifer Siepen; Simon J Hubbard; Julian N Selley; Brian C Searle; James Shofstahl; Sean L Seymour; Randall Julian; Pierre-Alain Binz; Eric W Deutsch; Henning Hermjakob; Florian Reisinger; Johannes Griss; Juan Antonio Vizcaíno; Matthew Chambers; Angel Pizarro; David Creasy
Journal:  Mol Cell Proteomics       Date:  2012-02-27       Impact factor: 5.911

View more
  7 in total

1.  A Skyline Plugin for Pathway-Centric Data Browsing.

Authors:  Michael G Degan; Lillian Ryadinskiy; Grant M Fujimoto; Christopher S Wilkins; Cheryl F Lichti; Samuel H Payne
Journal:  J Am Soc Mass Spectrom       Date:  2016-08-16       Impact factor: 3.109

Review 2.  The Skyline ecosystem: Informatics for quantitative mass spectrometry proteomics.

Authors:  Lindsay K Pino; Brian C Searle; James G Bollinger; Brook Nunn; Brendan MacLean; Michael J MacCoss
Journal:  Mass Spectrom Rev       Date:  2017-07-09       Impact factor: 10.946

3.  Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing.

Authors:  Wout Bittremieux; Pieter Meysman; William Stafford Noble; Kris Laukens
Journal:  J Proteome Res       Date:  2018-09-13       Impact factor: 4.466

4.  Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks.

Authors:  Seungjin Na; Samuel H Payne; Nuno Bandeira
Journal:  Mol Cell Proteomics       Date:  2016-09-08       Impact factor: 5.911

5.  The Archaeal Proteome Project advances knowledge about archaeal cell biology through comprehensive proteomics.

Authors:  Stefan Schulze; Zachary Adams; Micaela Cerletti; Rosana De Castro; Sébastien Ferreira-Cerca; Christian Fufezan; María Inés Giménez; Michael Hippler; Zivojin Jevtic; Robert Knüppel; Georgio Legerme; Christof Lenz; Anita Marchfelder; Julie Maupin-Furlow; Roberto A Paggi; Friedhelm Pfeiffer; Ansgar Poetsch; Henning Urlaub; Mechthild Pohlschroder
Journal:  Nat Commun       Date:  2020-06-19       Impact factor: 14.919

6.  A rapid methods development workflow for high-throughput quantitative proteomic applications.

Authors:  Yan Chen; Jonathan Vu; Mitchell G Thompson; William A Sharpless; Leanne Jade G Chan; Jennifer W Gin; Jay D Keasling; Paul D Adams; Christopher J Petzold
Journal:  PLoS One       Date:  2019-02-14       Impact factor: 3.240

7.  Protein abundances can distinguish between naturally-occurring and laboratory strains of Yersinia pestis, the causative agent of plague.

Authors:  Eric D Merkley; Landon H Sego; Andy Lin; Owen P Leiser; Brooke L Deatherage Kaiser; Joshua N Adkins; Paul S Keim; David M Wagner; Helen W Kreuzer
Journal:  PLoS One       Date:  2017-08-30       Impact factor: 3.240

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.