| Literature DB >> 26664434 |
Elliot J Yates1, Louise C Dixon1.
Abstract
BACKGROUND: Optimal ranking of literature importance is vital in overcoming article overload. Existing ranking methods are typically based on raw citation counts, giving a sum of 'inbound' links with no consideration of citation importance. PageRank, an algorithm originally developed for ranking webpages at the search engine, Google, could potentially be adapted to bibliometrics to quantify the relative importance weightings of a citation network. This article seeks to validate such an approach on the freely available, PubMed Central open access subset (PMC-OAS) of biomedical literature.Entities:
Keywords: Bibliometrics; Citation count; Impact factor; Journal ranking; PageRank
Year: 2015 PMID: 26664434 PMCID: PMC4674919 DOI: 10.1186/s13029-015-0046-2
Source DB: PubMed Journal: Source Code Biol Med ISSN: 1751-0473
Fig. 1Methodology flowchart. Flowchart representing the major steps of data manipulation, as outlined in Methods
Fig. 2PageRank algorithm. PageRank algorithm representation. Set of unique PMIDs in citation network [pi], individual PageRank [PR(pi)], dampening factor [d = 0.85], total number of unique PMIDs [N], set of all inbound citations to pi [M(pi)], PageRank values of all inbound citations to pi [PR(pj)] and number of outbound citations of pj [L(pj)]
Fig. 3PageRank versus citation count. Scatter plot of PageRank versus citation count for random, 5 % sample of data. R = 0.905 (P < 0.01), R2 = 0.819 (P < 0.01)
Top of the corpus comparison
| PubMed ID (PMID) | Paper title | PageRank (E-5) | Citation count |
|---|---|---|---|
| 9254694 | Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. | 3.19 | 6291 |
| 2231712 | Basic local alignment search tool. | 2.88 | 5385 |
| 10802651 | Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. | 2.37 | 4293 |
| 11846609 | Analysis of relative gene expression data using real-time quantitative PCR and the 2(−Delta Delta C(T)) Method. | 1.95 | 6012† |
| 7984417 | CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. | 1.78 | 3899 |
| 942051 | A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. | 1.62 | 3850 |
| 21546353 | MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. | 1.58 | 3431 |
| 17488738 | MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. | 1.47 | 3075 |
| 5432063 | Cleavage of structural proteins during the assembly of the head of bacteriophage T4. | 1.13 | 2881 |
| 3447015 | The neighbor-joining method: a new method for reconstructing phylogenetic trees. | 1.12 | 2171 |
Top of the corpus comparison (n = 10), sorted by PageRank, descending. Paper titles were sourced from PMID via PMC-OAS look-up, though were not included in the initial XML extraction. Rankings accurate as of January 2015
Tests of normality
| Kolmogorov-Smirnova | |||
|---|---|---|---|
| Statistic | df | Sig. | |
| PageRank | .383 | 314664 | .000 |
| CitationCount | .399 | 314664 | .000 |
aLilliefors Significance Correction
Correlations
| PageRank | CitationCount | ||
|---|---|---|---|
| PageRank | Pearson Correlation | 1 | .905a |
| Sig. (1-tailed) | .000 | ||
| N | 314664 | 314664 | |
| CitationCount | Pearson Correlation | .905a | 1 |
| Sig. (1-tailed) | .000 | ||
| N | 314664 | 314664 | |
aCorrelation is significant at the 0.01 level (1-tailed)
Model summary
| Model | R | R Square | Adjusted R Square | Std. Error of the Estimate |
|---|---|---|---|---|
| 1 | .905a | .819 | .819 | 4.844 |
aPredictors: (Constant), PageRank
ANOVAa
| Model | Sum of Squares | df | Mean Square | F | Sig. | |
|---|---|---|---|---|---|---|
| 1 | Regression | 33389613.691 | 1 | 33389613.691 | 1423001.392 | .000b |
| Residual | 7383297.502 | 314662 | 23.464 | |||
| Total | 40772911.193 | 314663 | ||||
aDependent Variable: CitationCount
bPredictors: (Constant), PageRank
Coefficientsa
| Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||
|---|---|---|---|---|---|---|
| B | Std. Error | Beta | ||||
| 1 | (Constant) | −27.861 | .027 | −1023.789 | .000 | |
| PageRank | 204365203.423 | 171318.510 | .905 | 1192.896 | .000 | |
aDependent Variable: CitationCount