| Literature DB >> 24377417 |
Linhuan Wu, Qinglan Sun, Hideaki Sugawara, Song Yang, Yuguang Zhou, Kevin McCluskey, Alexander Vasilenko, Ken-Ichiro Suzuki, Moriya Ohkuma, Yeonhee Lee, Vincent Robert, Supawadee Ingsriswang, François Guissart, Desmeth Philippe1, Juncai Ma.
Abstract
BACKGROUND: Throughout the long history of industrial and academic research, many microbes have been isolated, characterized and preserved (whenever possible) in culture collections. With the steady accumulation in observational data of biodiversity as well as microbial sequencing data, bio-resource centers have to function as data and information repositories to serve academia, industry, and regulators on behalf of and for the general public. Hence, the World Data Centre for Microorganisms (WDCM) started to take its responsibility for constructing an effective information environment that would promote and sustain microbial research data activities, and bridge the gaps currently present within and outside the microbiology communities. DESCRIPTION: Strain catalogue information was collected from collections by online submission. We developed tools for automatic extraction of strain numbers and species names from various sources, including Genbank, Pubmed, and SwissProt. These new tools connect strain catalogue information with the corresponding nucleotide and protein sequences, as well as to genome sequence and references citing a particular strain. All information has been processed and compiled in order to create a comprehensive database of microbial resources, and was named Global Catalogue of Microorganisms (GCM). The current version of GCM contains information of over 273,933 strains, which includes 43,436 bacterial, fungal and archaea species from 52 collections in 25 countries and regions.A number of online analysis and statistical tools have been integrated, together with advanced search functions, which should greatly facilitate the exploration of the content of GCM.Entities:
Mesh:
Year: 2013 PMID: 24377417 PMCID: PMC3890509 DOI: 10.1186/1471-2164-14-933
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Participant list of GCM collections
| BCC | BIOTEC Culture Collection | Thailand |
| BCCM/DCG | BCCM Diatom Collection Gent | Belgium |
| BCCM/IHEM | Belgian Coordinated Collections of Microorganisms / IHEM Fungi colleciton | Belgium |
| BCCM/LMBP | Belgian Coordinated Collections of Microorganisms / LMBP Plasmid Collection | Belgium |
| BCCM/LMG | Belgian Coordinated Collections of Microorganisms/ LMG Bacteria Collection | Belgium |
| BCCM/MUCL | Mycotheque de l’Universite catholique de Louvain | Belgium |
| BCCM/ULC | BCCM/ULC Culture Collection of (sub)polar cyanobacteria | Belgium |
| BCRC | Bioresource Collection and Research Center | Chinese Taipei |
| BIM | Belarusian Collection of non-pathogenic microorganisms | Belarus |
| CBS | Centraalbureau voor Schimmelcultures, Filamentous fungi and Yeast Collection | Netherlands |
| CCAP | Culture Collection of Algae and Protozoa | U.K. |
| CCARM | Culture Collection of Antimirobial Resistant Microorganisms | Korea |
| CCCryo | Culture Collection of Cryophilic Algae | Germany |
| CECT | Coleccion Espanola de Cultivos Tipo | Spain |
| CGMCC | China General Microbiological Culture Collectio Center | China |
| CIP | The Collection of the Institut Pasteur | France |
| CIRM-CF | Centre International de Ressources Microbiennes - Champignons Filamenteux | France |
| CIRM-CFBP | Centre International de Ressources Microbiennes - Levures (CLBP) | France |
| CIRM-Levures | Centre International de Ressources Microbiennes - Levures | France |
| CM-CNRG | Coleccion de Microorganismos del Centro Nacional de Recursos Geneticos | Mexico |
| CVCM | Centro Venezolano de Colecciones de Microorganismos | Venezuela |
| CWU-MACC | Herbarium of Kharkov University (CWU) – Micro Algae Cultures Collection | Ukraine |
| DMic | Medical importance fungi culture collection | Argentina |
| DSMZ | Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH | Germany |
| FACHB | Freshwater Algae Culture Collection, Chinese Academy of Sciences | China |
| FGSC | Fungal Genetics Stock Center | USA |
| Fiocruz-CLIOC | Coleção de Leishmania do Instituto Oswaldo Cruz | Brazil |
| GDMCC | Guangdong Culture Collection Centre of Microbiology | China |
| HPKTCC | Helicobacter pylori Korean Type Culture Collection | Korea |
| IMI(CABI) | CABI Genetic Resource Collection | U.K. |
| ITDI | Industrial Technology Development Institute Microbial Culture Collection | Philippines |
| ITM | Belgian Coordinated Collections of Microorganisms Mycobacterial Culture Collection | Belgium |
| JCM | Japan Collection of Microorganisms | Japan |
| KCTC | KCTC Korean Collection for Type Cultures | Korea |
| KEMB | Korea national Environmental Microorganisms Bank | Korea |
| KMMCC | Korea Marine Microalgae Culture Center | Korea |
| LEF | Korea Lichen & Allied Bioresource Center | Korea |
| LIPIMC | Lembaga Ilmu Pengetahuan Indonesia , Indonesian Institute for Sciences | Indonesia |
| MCC-MNH | Microbial Culture Collection - Museum of Natural History, Museum of Natural History (MNH) | Philippines |
| NBRC | NITE Biological Resource Center | Japan |
| PNCM | Philippine National Collection of Microorganisms | Philippines |
| PVGB | Plant Virus GenBank | Korea |
| TISTR | TISTR Culture Collection, Bangkok MIRCEN | Thailand |
| UCCAA | Ukrainian Collection of Cholera Aetiological Agents O1 and non O1 serogroups | Ukraine |
| UCDFST | Phaff Yeast Culture Collection | USA |
| UL | The UNILAB Clinical Culture Collection, United Laboratories | Philippines |
| UMinho-MUM | Micoteca da Universidade do Minho | Portugal |
| UOA/HCPF | UOA/HCPF University of Athens/Hellenic Collection of Pathogenic Fungi | Greece |
| UPCC | Natural Sciences Research Institute Culture Collection | Philippines |
| UPMC | MICROBIAL CULTURE COLLECTION UNIT | MALAYSIA |
| VKM | All-Russian Collection of Microorganisms | Russia |
| VTCC | Vietnam Type Culture Collection | Vietnam |
Summary of GCM strain data
| Antibody | 7 | 33 | 0 | 0 | 0 | 0 |
| Phage | 181 | 239 | 0 | 0 | 1 | 0 |
| Virus | 33 | 296 | 0 | 0 | 0 | 0 |
| Cyanobacteria | 134 | 287 | 0 | 178 | 0 | 0 |
| Protozoa | 236 | 754 | 0 | 0 | 0 | 0 |
| Actinomycetes | 842 | 1490 | 0 | 271 | 192 | 9 |
| Archaea | 1410 | 3273 | 1165 | 2176 | 1573 | 48 |
| Microalgae | 1820 | 5495 | 4 | 2 | 1 | 1 |
| Plasmid | 2030 | 2030 | 0 | 0 | 5 | 9 |
| Yeast | 3668 | 34907 | 4796 | 54773 | 2089 | 98 |
| Bacteria | 13714 | 101395 | 14233 | 29304 | 10975 | 268 |
| Fungi | 18537 | 121548 | 29916 | 94348 | 1960 | 65 |
| Diatom | 19 | 242 | 0 | 0 | 0 | 0 |
| Mycobacteria | 50 | 214 | 0 | 0 | 0 | 0 |
| Other | 755 | 1730 | 0 | 0 | 0 | 0 |
| Total | 43436 | 273933 | 50114 | 181052 | 16796 | 498 |
Information was as submitted by individual collections. Sequence, publication, and patent data were extracted from Genbank, Pubmed, Patent database using the strain numbers.
Figure 1Scheme of the workflow of GCM. Catalogue information from each of collection, shown on the left, is used to construct the framework of the global catalogue database. Species name and strain numbers are collected from the catalogue information, and are further used to identify and extract information from public database such as Genbank, NCBI Genome, SwissProt, PDB, Pubmed and the patent database. The data warehouse is built in a SQL database, and can be accessed via a web interface through different search options. Search results may be displayed in different formats allowing users to refine the results by using filters. The final results are displayed either as a strain page or they can be gathered into a species page depending on the query. BLAST and ClustalW are provided for further analysis of the results.
Figure 2Example of Species page of subsp.
Isolation sources of Strains sorted by type of organism
| Sludge/Wasterwater | 1 | 1091 | 6 | - | 9 | - | 2 | 1109 |
| Soil | 1708 | 3468 | 484 | 264 | 95 | 1 | 1 | 6021 |
| Sediment | 4 | 46 | 17 | - | 14 | - | - | 81 |
| Fermentation products | 123 | 358 | 327 | - | 1 | - | - | 809 |
| Plant-associated | 405 | 314 | 644 | - | 1 | - | 2 | 1366 |
| Host-associated | 139 | 480 | 180 | - | 3 | - | - | 802 |
| Human-associated | 18 | 11167 | 55 | - | 15 | - | 2 | 11257 |
| Water | 4 | 398 | 50 | - | 48 | - | - | 500 |
| Microbial-mat/Biofilm | - | - | - | - | 1 | - | - | 1 |
| Air | 6 | 20 | 29 | - | 1 | - | - | 56 |
| Genetic engineering strain | 22698 | - | - | - | - | - | - | - |
| Food | 193 | 83 | 69 | - | 2 | - | - | 347 |
| Others | 135 | 728 | 76 | - | 25 | - | 1 | 965 |
| Total | 2736 | 18153 | 1937 | 264 | 215 | 1 | 8 | 23314 |
Top 20 countries from which strains were collected
| 1 | Japan | 8248 | 11 | China | 3429 |
| 2 | France | 8070 | 12 | India | 2907 |
| 3 | United States | 7701 | 13 | Russian Federation | 2872 |
| 4 | Netherlands | 6709 | 14 | South Africa | 2419 |
| 5 | Korea | 6270 | 15 | Italy | 2009 |
| 6 | Germany | 6051 | 16 | Canada | 1848 |
| 7 | Thailand | 5894 | 17 | VietNam | 1818 |
| 8 | United Kingdom | 5717 | 18 | Sweden | 1786 |
| 9 | Belgium | 5177 | 19 | Australia | 1695 |
| 10 | Spain | 3869 | 20 | Switzerland | 1466 |
| Total | 85955 | ||||
114,578 of 273,933 strains contain information regarding their geographic origins. The strains were collected from 164 countries and regions, of which, 85,955 strains were collected from only 20 countries. This takes up approximately 74% of total strains, which indicates a relatively high sampling effort in these countries.
Result summary of species name check
| Archaea | 1399 | 32 | 2.30% |
| Microalgae | 1457 | 360 | 24.70% |
| Fungi | 20719 | 698 | 3.40% |
| Bacteria | 12855 | 1098 | 8.50% |
| Total | 36430 | 2188 | 6.00% |
This table provides comparative results of the species names within GCM with public microbial nomenclature database. Species2000, NCBI taxonomy, LPSN and Mycobank were used as reference databases. The average percentage of unmatched names is 6%, while the archaea and fungi showed lower than average percentage of unmatched names. The percentage of unmatched names is relatively high for microalgae, possibly due to the irregular naming for microalgae.
Figure 3Example of strain information of subsp.
Figure 4Database management system for collections. Users can import data by generating an EXCEL file to meet the WDCM RDS. After the data are imported into the system, users can either update or edit the catalogue information online. A species name check result is provided in the database management system to provide an overview of data quality and allow for further modifications.