| Literature DB >> 30325351 |
Christopher Markosian1, Luigi Di Costanzo2, Monica Sekharan2, Chenghua Shao2, Stephen K Burley2,3,4, Christine Zardecki2.
Abstract
Since 1971, the Protein Data Bank (PDB) archive has served as the single, global repository for open access to atomic-level data for biological macromolecules. The archive currently holds >140,000 structures (>1 billion atoms). These structures are the molecules of life found in all organisms. Knowing the 3D structure of a biological macromolecule is essential for understanding the molecule's function, providing insights in health and disease, food and energy production, and other topics of concern to prosperity and sustainability. PDB data are freely and publicly available, without restrictions on usage. Through bibliometric and usage studies, we sought to determine the impact of the PDB across disciplines and demographics. Our analysis shows that even though research areas such as molecular biology and biochemistry account for the most usage, other fields are increasingly using PDB resources. PDB usage is seen across 150 disciplines in applied sciences, humanities, and social sciences. Data are also re-used and integrated with >400 resources. Our study identifies trends in PDB usage and documents its utility across research disciplines.Entities:
Year: 2018 PMID: 30325351 PMCID: PMC6190746 DOI: 10.1038/sdata.2018.212
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Number of publications for the top-assigned Web of Science Journal Subject Category for all documents (2000–2016) citing the inaugural Berman et al. (2000) reference.
Biochemistry Molecular Biology is the largest category (6,735 publications), followed by Biophysics (2,872), Biochemical Research Methods (2,161), Computer Science Interdisciplinary Applications (1,852), Chemistry Medicinal (1,666), Chemistry Multidisciplinary (1,660), Mathematical Computational Biology (1,656), Biotechnology Applied Microbiology (1,297), Chemistry Physical (871), and Multidisciplinary Sciences (789).
Figure 2Number of articles citing the inaugural Berman et al. (2000) reference each year.
Total number of articles is shown in blue; the top Journal Subject Categories are below. Growth in the areas of Chemical Medicinal, Chemistry Multidisciplinary, and Multidisciplinary Sciences is increasing (shown in bold); the number of articles in the areas of Biochemistry Molecular Biology, Biophysics, and Biotechnology Applied Microbiology do not have statistically significant growth.
Figure 3The top Web of Science Journal Subject Categories demonstrating the greatest yearly growth in all documents citing the Berman et al. (2000) reference (2000–2016).
The study compares 34 categories with at least 100 citations. Growth rate was calculated as the slope coefficient of the linear regression model between the number of citations in the category and year of publication, starting with the first year an article appeared, and expressed as a normalized percentage of the average yearly publication of that category. Multidisciplinary Sciences has grown at the greatest rate (15.3%), followed by Medicine Research Experimental (11.6%), Mathematics Interdisciplinary Applications (9.8%), Biology (9.8%), Plant Sciences (9.6%), Chemistry Medicinal (9.5%), Pharmacology Pharmacy (9.3%), Chemistry Multidisciplinary (8.1%), Physics Atomic Molecular Chemical (8.0%), and Mathematical Computational Biology (7.8%).
Figure 4Network visualization of term occurrences extracted from abstracts and titles of 2000-2016 publications citing the inaugural Berman et al. (2000) reference.
Figure created using VOSviewer[46]. A threshold cutoff of 30 as number of term co-occurrence was used. The location of citation keywords is based on their overall position within the network; keywords located in more common regions of the map have higher network connectivity, i.e., they are more interconnected with surrounding keywords. Darker colors and font size represent keywords that appear more frequently among citations. Keywords are clustered in four main regions: red corresponds to keywords representing “computational” use of the data; green corresponds to 3D-structure and mechanism of action; blue corresponds to function; and yellow corresponds to keywords related to genetics and genomics.
Distribution of the 429 active resources in the NAR Molecular Biology Database Collection that utilize PDB archive data across major categories (bold) and subcategory (italics), and corresponding resources in the “golden set”[41] of the NAR Molecular Biology Database Collection.
| wwPDB partners RCSB PDB, PDBe, and PDBj are included in the subcategory | ||
|---|---|---|
| 81 | MMDB[ | |
| 11 | ChEBI[ | |
| 8 | ||
| 7 | ||
| 42 | GPCRDB[ | |
| 14 | CDD[ | |
| 15 | ELM[ | |
| 10 | dbPTM[ | |
| 8 | ||
| 6 | PIR[ | |
| 30 | STITCH[ | |
| 14 | CAZy[ | |
| 10 | BioCyc[ | |
| 4 | ||
| 11 | EcoCyc[ | |
| 9 | ||
| 8 | ||
| 7 | VectorBase[ | |
| 6 | CGD[ | |
| 4 | Genenames[ | |
| 2 | KEGG[ | |
| 5 | ||
| 3 | ||
| 7 | ||
| 7 | CanSAR[ | |
| 1 | COSMIC[ | |
| 21 | ChEMBL[ | |
| 3 | ||
| 1 | ||
| 11 | JASPAR[ | |
| 5 | ||
| 3 | DDBJ[ | |
| 1 | ||
| IEDB[ | ||
| NONCODE[ | ||
| 5 | FANTOM[ | |
| 4 | Ensembl[ | |
| 2 | UCSC Genome Browser[ | |
| 5 | ||
| Arabidopsis thaliana | 1 | |
| 1 | ||
| dbPTM[ | ||
| ArrayExpress[ | ||