| Literature DB >> 26306205 |
Samuel H Payne1, Matthew E Monroe1, Christopher C Overall1, Gary R Kiebel1, Michael Degan1, Bryson C Gibbons1, Grant M Fujimoto1, Samuel O Purvine2, Joshua N Adkins1, Mary S Lipton1, Richard D Smith1.
Abstract
This Data Descriptor announces the submission to public repositories of the PNNL Biodiversity Library, a large collection of global proteomics data for 112 bacterial and archaeal organisms. The data comprises 35,162 tandem mass spectrometry (MS/MS) datasets from ~10 years of research. All data has been searched, annotated and organized in a consistent manner to promote reuse by the community. Protein identifications were cross-referenced with KEGG functional annotations which allows for pathway oriented investigation. We present the data as a freely available community resource. A variety of data re-use options are described for computational modelling, proteomics assay design and bioengineering. Instrument data and analysis files are available at ProteomeXchange via the MassIVE partner repository under the identifiers PXD001860 and MSV000079053.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26306205 PMCID: PMC4540001 DOI: 10.1038/sdata.2015.41
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Overview of the organisms included in the PNNL Biodiversity Library
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Each organism is listed along with information about how many datasets, proteins, peptides and spectra were included in the Library creation. The number of false spectra identifications, as estimated by the decoy search strategy, is also included to help understand the quality of the data in the Library. | |||||
| Acidiphilium_cryptum_JF-5 | 40 | 1454 | 7306 | 123,944 | 0 |
| Actinosynnema_mirum_DSM_43827 | 85 | 2083 | 13975 | 80,974 | 0 |
| Anabaena_variabilis | 341 | 3324 | 35814 | 456,600 | 13 |
| Anaeromyxobacter_dehalogenans | 20 | 1250 | 7685 | 54,092 | 0 |
| Anaplasma_phagocytophilium | 129 | 690 | 3066 | 8,363 | 0 |
| Arthrobacter_sp_FB24 | 353 | 3279 | 46122 | 663,861 | 29 |
| Bacillus_anthracis_Ames | 20 | 849 | 6487 | 36,014 | 0 |
| Bacillus_anthracis_Sterne | 158 | 2506 | 26165 | 152,339 | 0 |
| Bacillus_subtilis_168 | 242 | 2435 | 39415 | 709,179 | 0 |
| Bartonella_henselae_Houston-1 | 160 | 1245 | 36347 | 508,803 | 1 |
| Borrelia_burdorferi_B31 | 104 | 877 | 19060 | 193,743 | 1 |
| Brachybacterium_faecium_DSM_4810 | 78 | 2088 | 29248 | 121,914 | 0 |
| Burkholderia_mallei | 189 | 1970 | 33824 | 270,609 | 2 |
| Candidatus_chloracidobacterium_thermophilum | 111 | 563 | 2360 | 6,037 | 7 |
| Caulobacter_crescentus_CB15 | 1016 | 2971 | 63773 | 2,320,212 | 41 |
| Cellulomonas_flavigena_DSM_20109 | 103 | 2382 | 38033 | 175,148 | 0 |
| Cenarchaeum_symbiosum | 108 | 358 | 757 | 4,172 | 0 |
| Chlorobaculum_tepidum_WT | 94 | 1584 | 26561 | 137,150 | 0 |
| Chloroflexus_aurantiacus | 225 | 2736 | 44200 | 323,036 | 16 |
| Clostridium_thermocellum | 408 | 2110 | 61618 | 1,140,251 | 5 |
| Cryptobacterium_curtum_DSM_15641 | 78 | 1115 | 17888 | 75,903 | 0 |
| Cyanobacterium_synechocystis_PCC6803 | 279 | 2251 | 27351 | 344,473 | 0 |
| Cyanothece_sp_ATCC51142 | 1404 | 3309 | 57219 | 1,879,239 | 0 |
| Cyanothece_strain_ATCC51472 | 93 | 2146 | 20702 | 134,879 | 0 |
| Cyanothece_strain_PCC7424 | 109 | 2182 | 20763 | 91,389 | 0 |
| Cyanothece_strain_PCC7425 | 104 | 2590 | 27429 | 114,498 | 0 |
| Cyanothece_strain_PCC7822 | 166 | 3230 | 37493 | 336,052 | 0 |
| Cyanothece_strain_PCC8801 | 90 | 2399 | 23313 | 145,714 | 0 |
| Cyanothece_strain_PCC8802 | 81 | 679 | 3958 | 22,785 | 0 |
| Dehalococcoides_ethenogenes | 437 | 1025 | 25378 | 262,644 | 0 |
| Deinococcus_radiodurans_R1 | 368 | 2212 | 43409 | 1,002,931 | 0 |
| Delta_proteobacterium_NaphS2 | 158 | 2210 | 13007 | 94,378 | 0 |
| Desulfovibrio_desulfuricans_G20 | 463 | 2564 | 48057 | 916,183 | 28 |
| Desulfovibrio_sp_ND132 | 92 | 2573 | 42717 | 225,546 | 0 |
| Desulfovibrio_vulgaris_Hildenborough | 218 | 2397 | 36518 | 451,518 | 0 |
| Dethiosulfovibrio_peptidovorans_DSM_11002 | 78 | 1778 | 25287 | 95,940 | 0 |
| Ehrlichia_chaffeensis | 127 | 532 | 4045 | 12,545 | 0 |
| Enterobacter_cloacae_SCF1 | 58 | 1793 | 19194 | 235,721 | 0 |
| Escherichia_coli_BL21 | 191 | 1564 | 20792 | 190,645 | 0 |
| Escherichia_coli_K-12 | 3925 | 3324 | 122114 | 11,592,291 | 461 |
| Escherichia_coli_RK4353 | 31 | 1766 | 17089 | 53,334 | 0 |
| Fibrobacter_succinogenes_S85 | 122 | 1505 | 17408 | 265,219 | 0 |
| Geobacter_bemidjiensis_Bem_T | 780 | 2566 | 33313 | 682,897 | 0 |
| Geobacter_metallireducens_GS-15 | 155 | 2455 | 27442 | 245,691 | 0 |
| Geobacter_sulfurreducens_PCA | 1122 | 2770 | 45298 | 2,597,915 | 0 |
| Geobacter_uraniumreducens | 287 | 2779 | 42812 | 401,586 | 0 |
| Haloferax_volcanii | 34 | 1463 | 9502 | 56,669 | 0 |
| Halogeometricum_borinquense_DSM_11551 | 83 | 2216 | 18776 | 56,225 | 0 |
| Halorhabdus_utahensis_DSM_12940 | 84 | 2082 | 22987 | 89,820 | 0 |
| Heliobacterium_modesticaldum | 77 | 1549 | 22105 | 107,100 | 0 |
| Kineococcus_radiotolerans_SRS30216 | 192 | 2742 | 38694 | 480,705 | 0 |
| Kosmotoga_olearia_TBF_19-5-1 | 25 | 878 | 7219 | 29,080 | 0 |
| Methanosarcina_barkeri | 89 | 1852 | 24889 | 116,224 | 0 |
| Methanospirillum_hungatei_JF-1 | 78 | 1722 | 23041 | 60,300 | 0 |
| Methylophilales_HTCC2181 | 61 | 1172 | 20615 | 104,603 | 2 |
| Mycobacterium_tuberculosis | 647 | 2714 | 35206 | 1,296,018 | 7 |
| Nakamurella_multipartita_DSM_44233 | 78 | 2171 | 19782 | 57,582 | 0 |
| Nocardiopsis_dassonvillei_DSM_43111 | 80 | 1965 | 16507 | 76,404 | 0 |
| Novosphingobium_aromaticivorans_F199 | 23 | 1563 | 12783 | 75,850 | 1 |
| Opitutaceae_bacterium_TAV2 | 125 | 2229 | 20659 | 292,644 | 0 |
| Pelagibacter_ubique_HTC1062 | 426 | 1201 | 30536 | 1,294,945 | 58 |
| Pelobacter_carbinolicus_DSM_2380 | 22 | 830 | 4168 | 19,078 | 0 |
| Prochlorococcus | 121 | 1285 | 19310 | 158,225 | 0 |
| Pseudomonas_aerunginosa | 40 | 3054 | 24670 | 109,854 | 1 |
| Pseudomonas_fluorescens_PfO-1 | 397 | 3437 | 44819 | 535,883 | 0 |
| Pseudonocardia_sp | 48 | 188 | 448 | 1,406 | 0 |
| Ralstonia_pickettii | 194 | 5,786 | 0 | ||
| Rhodobacter_capsulatus_SB1003 | 495 | 2761 | 58894 | 1,316,844 | 76 |
| Rhodobacter_sphaeroides_2.4.1 | 1639 | 3362 | 72977 | 3,982,596 | 47 |
| Rhodopseudomonas_palustris | 587 | 2743 | 25366 | 247,373 | 0 |
| Roseiflexus_castenholzii | 76 | 2537 | 28414 | 91,063 | 0 |
| Saccharomonospora_viridis_DSM_43017 | 78 | 1918 | 15458 | 39,384 | 0 |
| Salmonella_typhi_TY2 | 499 | 2729 | 50292 | 9,832,722 | 385 |
| Salmonella_typhimurium_ATCC_14028 | 3701 | 3779 | 91496 | 1,555,617 | 28 |
| Salmonella_typhimurium_LT2 | 823 | 3214 | 62399 | 1,051,565 | 3 |
| Sanguibacter_keddieii_DSM_10542 | 78 | 2213 | 20489 | 89,427 | 0 |
| Shewanella_amazonensis_SB2B | 114 | 2079 | 27295 | 469,397 | 32 |
| Shewanella_baltica_OS155 | 631 | 2988 | 47725 | 1,874,615 | 112 |
| Shewanella_baltica_OS185 | 190 | 2853 | 41613 | 395,867 | 4 |
| Shewanella_baltica_OS195 | 65 | 2193 | 23560 | 175,237 | 5 |
| Shewanella_baltica_OS223 | 75 | 2158 | 22391 | 92,424 | 0 |
| Shewanella_denitrificans_OS217 | 91 | 2313 | 25045 | 240,069 | 0 |
| Shewanella_frigidimarina_NCIMB_400 | 66 | 2286 | 26819 | 192,758 | 7 |
| Shewanella_loihica_PV-4 | 68 | 2082 | 27260 | 181,814 | 5 |
| Shewanella_oneidensis_MR-1 | 3954 | 3307 | 94195 | 5,561,459 | 442 |
| Shewanella_putrefaciens_200 | 64 | 2200 | 25524 | 191,946 | 2 |
| Shewanella_putrefaciens_CN-32 | 78 | 2338 | 29135 | 254,187 | 14 |
| Shewanella_putrefaciens_W3-18-1 | 80 | 1975 | 18900 | 273,705 | 12 |
| Shewanella_sp_ANA-3 | 62 | 2154 | 25870 | 145,474 | 4 |
| Shewanella_sp_MR-4 | 65 | 2052 | 25724 | 195,547 | 6 |
| Shewanella_sp_MR-7 | 64 | 2217 | 29196 | 178,776 | 16 |
| Sinorhizobium_medicae | 40 | 2557 | 17954 | 164,088 | 2 |
| Sinorhizobium_meliloti_1021 | 137 | 2547 | 21631 | 581,060 | 2 |
| Slackia_heliotrinireducens_DSM_20476 | 79 | 1645 | 22440 | 105,667 | 1 |
| Stackebrandtia_nassauensis_DSM_44728 | 104 | 2158 | 22213 | 138,735 | 0 |
| Sulfolobus_acidocaldarius_DSM_639 | 33 | 1380 | 16093 | 92,402 | 0 |
| Synechococcus_sp_PCC7002 | 1050 | 2606 | 52469 | 1,842,452 | 1 |
| Syntrophobacter_fumaroxidans | 80 | 1893 | 18846 | 71,287 | 0 |
| Thermobispora_bispora_DSM_43833 | 78 | 1470 | 9193 | 23,861 | 0 |
| Thermosynechococcus_elongatus_BP-1 | 137 | 1266 | 8161 | 182,698 | 0 |
| Thermosynechococcus_sp_NAK55 | 47 | 1106 | 9587 | 87,550 | 0 |
| Thermotoga_maritima | 281 | 1526 | 52050 | 544,795 | 6 |
| Thiocapsa_marina_DSM_5653T | 130 | 1632 | 8577 | 26,649 | 0 |
| Verrucomicrobium_sp_TAV1 | 32 | 2288 | 15059 | 79,896 | 0 |
| Verrucomicrobium_sp_TAV5 | 33 | 2116 | 12893 | 87,443 | 2 |
| Xylanimonas_cellulosilytica_DSM_15894 | 200 | 2487 | 41090 | 319,030 | 0 |
| Yersinia_enterocolitica | 86 | 1622 | 11983 | 107,731 | 2 |
| Yersinia_pestis_CO92 | 210 | 2302 | 33289 | 993,714 | 19 |
| Yersinia_pestis_KIM | 240 | 1371 | 11084 | 98,780 | 0 |
| Yersinia_pestis_Pestoides_F | 99 | 2153 | 22980 | 523,779 | 16 |
| Yersinia_pseudotuberculosis_IP_32953 | 400 | 2151 | 24690 | 301,175 | 11 |
| Yersinia_pseudotuberculosis_PB1_Plus | 99 | 2029 | 20729 | 470,575 | 16 |
| 35162 | 229,537 | 3,147,576 | 70,455,991 | 1951 |
Figure 1Workflow for library creation.
Biological samples are used to create MS/MS data as part of an experiment and primary publication. All of this data is stored on our servers for re-analysis. Historical data was collated by organism and researched for release in the Biodiversity Library.
Figure 2Kegg pathway coverage.
(top) Using the pathway classifications provided by KEGG, we can determine how many annotated proteins were identified in mass spectrometry data. The cysteine and methionine metabolism pathway is provided as an example. For each organism, we calculate the percentage of identified proteins. C. flavigena has 23 proteins in the pathway, 21 of which were observed (91%). The box plot shows average coverage of the 103 organisms that KEGG has annotated. Circles depict outliers. (bottom) Pathway coverage for all 13 amino acid metabolic pathways is shown.
Figure 3False Discovery Rate.
Due the large nature of the Library, the false-discovery rate of the aggregated data can inflate significantly, especially when rolled up to protein and peptide level. Data is shown for the FDR of the entire Library when using a specified qvalue cutoff of PSMs from the MSGF+ results. When using a loose PSM filter of qvalue<0.01, the protein and peptide FDR rates are unacceptably high. We choose the cutoff qvalue<0.0001, which produces high data quality at spectra, peptide and protein levels.
Figure 4Peptide observation across taxa.
This is a multiple sequence alignment of a section of an ABC transporter (accessions and organism given), with observed peptides from the PNNL Biodiversity Library in blue. For simplicity, we displayed sequences from the Actinobacteria phylum, with Roseiflexus as an out group. The right side of the alignment shows consistent discovery in proteomics data across the phylum and in the out group. The left side of the alignment is only observed in the proteomics data for Arthrobacter sp. FB24.