| Literature DB >> 19508734 |
Holly Miller1, Catherine N Norton, Indra Neil Sarkar.
Abstract
BACKGROUND: GenBank is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs).Entities:
Year: 2009 PMID: 19508734 PMCID: PMC2704225 DOI: 10.1186/1756-0500-2-101
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Distribution publication type information associated with Genbank sequences. Pie-chart showing the percentage of Genbank records that have: no associated citation information (blue; 50,480,022 records), a PubMed ID indicating the publication's abstract is available from NCBI's PubMed database (red; 26417760 records), citation information corresponding to PubMed indexed publications but no PubMed ID (yellow; 3,641,494 records), a patent citation (green; 4,830,186 records), or citation information corresponding to a journal not indexed in the PubMed database (purple; 1,747,039 records).
Journals Indexed in PubMed Linked to Most Sequences (Including Genome Sequencing Projects)
| PLOS Biology | 17.7 | 60199 (32517)* | 108 (105) | 6501501(3414295) |
| Genome Research | 13.9 | 10091 | 505 | 5095937 |
| The Proceedings of the National Academy of Sciences | 7.6 | 325 | 8564 | 2779264 |
| Science | 5.4 | 1299 | 1532 | 1989825 |
| Nature | 4.6 | 703 | 2384 | 1675206 |
| Genome Biology | 4.0 | 19737 | 74 | 1460541 |
| Plant Physiology | 2.5 | 430 | 2173 | 935009 |
| BMC Genomics | 2.0 | 2975 | 247 | 734823 |
| Plant Molecular Biology | 1.7 | 235 | 2695 | 633337 |
| Nature Genetics | 1.3 | 694 | 710 | 492838 |
* Values in parentheses are generated after the exclusion of the metagenomic papers from the Global Ocean Sampling expedition ([10-12])
Citation data extracted from the GenBank records was used to determine how many sequences are associated with each journal. This table lists the top ten journals indexed in PubMed as defined by the number of GenBank sequences that cite an article in that journal. The percentage of sequences associated with the journal is calculated from the total number of GenBank entries that have some citation information. As described in the discussion, three article in PLoS Biology resulted in the more than 3 million sequences being deposited in GenBank. These adjusted calculated values based on omitting theses sequences are shown in parentheses.
Figure 2Number of sequences associated with the "Top 10 Journals" by year. Graph showing the number of sequences associated with articles published in the journals listed in Table 1 per year.
Journals Not Indexed in PubMed Linked to Most Sequences (Including Genome Sequencing Projects)
| Genetics and Molecular Biology | 0.65 | 10331 | 23 | 237603 |
| Breeding Science | 0.21 | 4366 | 18 | 78594 |
| Molecular Plant | 0.14 | 525 | 100 | 52510 |
| Pathology Journal of Phycology | 0.10 | 103 | 368 | 37808 |
| Molecular Ecology Resources (Formerly Molecular Ecology Notes) | 0.08 | 19 | 1556 | 28898 |
| Systematic Botany | 0.07 | 73 | 359 | 26374 |
| Integrative and Comparative Biology | 0.06 | 1776 | 13 | 23092 |
| Plant Biotechnology | 0.05 | 302 | 62 | 18753 |
| Phycologia Plant Molecular | 0.04 | 243 | 77 | 18738 |
| Biology Reporter | 0.04 | 702 | 20 | 14036 |
Citation data extracted from the GenBank records was used to determine how many sequences are associated with each journal. This table lists the top ten journals not indexed in PubMed as defined by the number of GenBank sequences that cite an article in that journal. The percentage of sequences associated with the journal is calculated from the total number of GenBank entries that have some citation information.
Journals Indexed in PubMed Linked to Most Sequences
| Journal of Biological Chemistry | 21873 |
| The Proceedings of the National Academy of | 17388 |
| Sciences | |
| Gene | 14487 |
| Nucleic Acids Research | 11408 |
| Journal of Bacteriology | 10633 |
| Applied Environmental Microbiology | 8512 |
| Genomics | 8275 |
| Biochem. Biophys. Research Communications | 7088 |
| Journal of Virology | 6051 |
| Int Journal of Systematic Evol Microbiol | 5502 |
Citation data extracted from the GenBank records was used to determine how many sequences are associated with each journal. The data was filtered such that only articles with 14 or fewer sequences attributed to them were considered in order to eliminate article that result from large genome sequencing projects. This table lists the top ten journals indexed in PubMed as defined by the number of GenBank sequences that cite an article in that journal.