| Literature DB >> 32931500 |
Richard Klavans1, Kevin W Boyack2, Dewey A Murdick3.
Abstract
The prediction of exceptional or surprising growth in research is an issue with deep roots and few practical solutions. In this study, we develop and validate a novel approach to forecasting growth in highly specific research communities. Each research community is represented by a cluster of papers. Multiple indicators were tested, and a composite indicator was created that predicts which research communities will experience exceptional growth over the next three years. The accuracy of this predictor was tested using hundreds of thousands of community-level forecasts and was found to exceed the performance benchmarks established in Intelligence Advanced Research Projects Activity's (IARPA) Foresight Using Scientific Exposition (FUSE) program in six of nine major fields in science. Furthermore, 10 of 11 disciplines within the Computing Technologies field met the benchmarks. Specific detailed forecast examples are given and evaluated, and a critical evaluation of the forecasting approach is also provided.Entities:
Mesh:
Year: 2020 PMID: 32931500 PMCID: PMC7491740 DOI: 10.1371/journal.pone.0239177
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Numbers of papers by year in each global model of science.
For Model 1, numbers of papers added originally and at each update are also shown.
| Model 1 | Model 2 | |||||
|---|---|---|---|---|---|---|
| Year | Original | 2016_01 | 2017_05 | 2018_05 | DC5 | STS5 |
| 1996 | 926,967 | 3,747 | 14,008 | 8,739 | 953,461 | 985,021 |
| 1997 | 951,188 | 2,952 | 15,738 | 8,512 | 978,390 | 1,012,744 |
| 1998 | 964,061 | 3,706 | 15,795 | 11,638 | 995,200 | 1,037,234 |
| 1999 | 979,298 | 5,007 | 15,602 | 15,481 | 1,015,388 | 1,054,479 |
| 2000 | 1,031,993 | 11,370 | 23,138 | 13,031 | 1,079,532 | 1,117,072 |
| 2001 | 1,089,015 | 12,670 | 25,912 | 16,467 | 1,144,064 | 1,179,964 |
| 2002 | 1,137,594 | 16,562 | 32,886 | 19,539 | 1,206,581 | 1,247,285 |
| 2003 | 1,207,002 | 41,316 | 19,802 | 17,143 | 1,285,263 | 1,335,419 |
| 2004 | 1,343,284 | 22,680 | 15,439 | 13,735 | 1,395,138 | 1,443,125 |
| 2005 | 1,495,559 | 45,018 | 17,593 | 8,738 | 1,566,908 | 1,623,300 |
| 2006 | 1,606,285 | 43,362 | 16,976 | 10,116 | 1,676,739 | 1,743,001 |
| 2007 | 1,704,068 | 51,182 | 22,055 | 10,151 | 1,787,456 | 1,862,707 |
| 2008 | 1,802,622 | 60,046 | 24,905 | 12,163 | 1,899,736 | 1,977,881 |
| 2009 | 1,919,363 | 70,072 | 20,630 | 11,829 | 2,021,894 | 2,111,872 |
| 2010 | 2,033,280 | 104,847 | 21,567 | 12,284 | 2,171,978 | 2,241,956 |
| 2011 | 2,159,551 | 118,872 | 23,839 | 12,236 | 2,314,498 | 2,393,555 |
| 2012 | 2,169,761 | 182,997 | 43,046 | 21,156 | 2,416,960 | 2,523,847 |
| 2013 | 2,427,223 | 47,810 | 31,583 | 2,506,616 | 2,622,512 | |
| 2014 | 2,373,740 | 153,262 | 38,794 | 2,565,796 | 2,685,118 | |
| 2015 | 1,688,949 | 754,767 | 5,850 | 2,449,566 | 2,658,829 | |
| 2016 | 2,380,075 | 155,018 | 2,535,093 | 2,738,674 | ||
| 2017 | 662,165 | 1,742,014 | 2,404,179 | 2,813,466 | ||
| 2018 | 620,305 | 620,305 | 2,868,636 | |||
Fig 1Temporal profile of a research community in the DC5 model.
The forecast year (FY) is two years after the model was built (MY), and the peak year (PK) occurs before the forecast year. The growth rate (GR) is shown for both the PK to target year (TY) timespan as well as the FY to TY timespan.
Fig 2Likelihood of a research community having exceptional growth before and after a model is created (RY = 0).
Indicators that were tested for prediction of exceptional growth (std = standardized; log = log transformed).
| Type | Name | Definition | Transform |
|---|---|---|---|
| Life cycle | |||
| stage | Reciprocal length of time to peak year | Std | |
| cvit | Average reciprocal paper age | Std [log] | |
| rvit | Average reciprocal reference age from papers in FY | Std [4th root] | |
| Δrvit | Change in | See text | |
| Academic Importance | |||
| ntopj | Number of articles in top 250 journals in FY | Std [log] | |
| ctopj | Number of references to top 250 journals from articles in FY | Std [log] | |
| eigen | Number of articles in top 250 Eigenvalue journals in FY | Std [log] | |
| Size | |||
| nart | Number of non-review articles in FY | Std [log] | |
| nrev | Number of review articles in FY | Std [log] | |
| nref | Number of references | Std [log] | |
Likelihood of exceptional growth (xg) by stage using RCs with at least 20 papers in the FY.
| DC5 (MY = 2012, RY = +1) | STS5 (MY = 2018, RY = -3) | ||||||
|---|---|---|---|---|---|---|---|
| FY-PK | Stage | #RC (2013) | #xg | %xg | #RC (2015) | #xg | %xg |
| 0 | 1.000 | 5,397 | 967 | 17.92 | 4,585 | 1050 | 22.90 |
| 1 | 0.500 | 2,379 | 159 | 6.68 | 2,155 | 170 | 7.89 |
| 2 | 0.333 | 1,814 | 29 | 1.60 | 1,558 | 48 | 3.08 |
| 3 | 0.250 | 1,555 | 10 | 0.64 | 1,354 | 10 | 0.74 |
| 4 | 0.200 | 1,464 | 6 | 0.41 | 1,258 | 3 | 0.24 |
| 5 | 0.166 | 1,543 | 2 | 0.13 | 1,260 | 1 | 0.08 |
| <5 | 0.143 | 13,150 | 4 | 0.03 | 10,730 | 4 | 0.04 |
Relationship between stage and likelihood of reaching a peak publication share in the next year.
| FY-PK | DC5 (MY = 2012, RY = -1) | STS5 (MY = 2018, RY = -1) | ||||
|---|---|---|---|---|---|---|
| #RC (2011) | %RC | %pk (2012) | #RC (2017) | %RC | %pk (2018) | |
| 0 | 10,663 | 11.9 | 31.4 | 10,491 | 11.4 | 31.1 |
| 1 | 7,884 | 8.8 | 21.2 | 7,628 | 8.3 | 21.1 |
| 2 | 6,503 | 7.3 | 13.8 | 6,569 | 7.1 | 14.6 |
| 3 | 5,647 | 6.3 | 10.7 | 5,669 | 6.1 | 10.7 |
| 4 | 5,579 | 6.2 | 8.5 | 6,528 | 7.1 | 8.5 |
| 5 | 5,744 | 6.4 | 6.2 | 6,586 | 7.1 | 7.0 |
| >5 | 47,532 | 53.1 | 3.9 | 48,905 | 52.9 | 3.9 |
Indicator construction using different data samples.
| Data Sample | Coefficients from Probit Analysis | #RCs | Pseudo-R2 | ||||
|---|---|---|---|---|---|---|---|
| Model and FY | RY | stage | cvit | Δrvit | ntopj | ||
| DC5 (2008–09) | -3, -4 | 0.235 | 0.524 | 0.069 | 0.015 | 178,641 | 0.2694 |
| STS5 (2014–15) | -3, -4 | 0.185 | 0.561 | 0.073 | 0.059 | 172,795 | 0.2706 |
| STS5 (2008–09) | -9, -10 | 0.236 | 0.414 | 0.030 | 0.069 | 178,897 | 0.2070 |
| DC5 (2013–14) | +1, +2 | 0.312 | 0.540 | 0.167 | 0.124 | 54,347 | 0.3563 |
| DC5 (2008–09) | -3, -4 | 0.374 | 0.481 | 0.134 | 0.040 | 51,849 | 0.3129 |
| STS5 (2014–15) | -3, -4 | 0.393 | 0.583 | 0.087 | 0.067 | 46,137 | 0.3641 |
| STS5 (2008–09) | -9, -10 | 0.410 | 0.624 | 0.176 | 0.068 | 41,081 | 0.3388 |
† transforms for all variables are listed in Table 4.
Fig 3CSI scores by model and relative year.
Precision (%prec) and recall (%rec) for nine fields of research.
| Field | NPR Intensity | DC5 [2014 model year] | STS5 [2014 model year] | ||||||
|---|---|---|---|---|---|---|---|---|---|
| #RC | #xg | %Prec | %Rec | #RC | #xg | %Prec | %Rec | ||
| Biochemistry | 0.147 | 2,685 | 98 | 2,321 | 102 | ||||
| Computing Tech | 0.143 | 3,261 | 172 | 3,223 | 253 | ||||
| Applied Physics | 0.125 | 2,451 | 139 | 2,156 | 138 | ||||
| Medicine | 0.099 | 5,466 | 113 | 4,387 | 156 | ||||
| Inf. Disease | 0.077 | 971 | 21 | 31.0 | 42.9 | 803 | 28 | ||
| Engineering | 0.034 | 2,907 | 163 | 2,915 | 192 | ||||
| Sustainability | 0.032 | 3,618 | 134 | 30.7 | 45.5 | 2,940 | 132 | 27.8 | 41.7 |
| Basic Physics | 0.027 | 877 | 10 | 729 | 17 | 24.0 | 35.3 | ||
| Civics | 0.015 | 4,473 | 155 | 19.7 | 29.7 | 3,756 | 231 | 30.1 | 45.5 |
Precision (%prec) and recall (%rec) for the eleven DC2 disciplines in the Computing Technology field in both models using the 2014 model year.
| DC2 discipline | DC5 [2014 model year] | STS5 [2014 model year] | ||||||
|---|---|---|---|---|---|---|---|---|
| #RC | #xg | %Prec | %Rec | #RC | #xg | %Prec | %Rec | |
| 9 –Computer Vision/Language | 522 | 43 | 520 | 62 | ||||
| 27 –Networks | 347 | 27 | 340 | 46 | ||||
| 67 –Human Computing | 179 | 19 | 181 | 19 | ||||
| 52 –Telecommunications | 213 | 17 | 199 | 17 | ||||
| 6 –Computing | 560 | 16 | 29.2 | 43.8 | 555 | 47 | ||
| 34 –Industrial Engineering | 340 | 16 | 340 | 16 | ||||
| 83 –Cryptography | 152 | 12 | 139 | 15 | ||||
| 72 –Statistics | 164 | 6 | 172 | 6 | 22.2 | 33.3 | ||
| 45 –Operations Research | 240 | 6 | 258 | 10 | ||||
| 102 –Nonlinear Dynamics | 60 | 5 | 57 | 5 | ||||
| 20 –Mathematics | 484 | 5 | 28.6 | 40.0 | 462 | 10 | ||
Top 10 forecasted DC5 RCs from the Computing Technology field (FY = 2014, TY = 2017, RY = +2).
| DC5 | Label | Stage | Cvit | ΔRvit | Ntopj | Score | Growth |
|---|---|---|---|---|---|---|---|
| 25308 | software defined networks | 3.47 | 5.03 | 0.54 | 3.12 | 3.80 | |
| 48081 | D2D communication | 3.47 | 4.95 | 0.50 | 1.76 | 3.60 | |
| 12007 | mobile security/malware | 3.47 | 4.32 | -0.05 | 2.76 | 3.36 | |
| 14215 | Twitter event detection | 3.47 | 4.45 | -0.24 | 2.32 | 3.36 | |
| 54895 | nature-inspired optimization | 3.47 | 4.50 | 0.77 | 0.97 | 3.33 | |
| 23854 | computation offloading | 3.47 | 3.98 | 1.65 | 2.32 | 3.32 | |
| 14700 | appliance load monitoring | 3.47 | 3.55 | 0.73 | 4.62 | 3.29 | 4.4% |
| 13672 | cellular network energy efficiency | 3.47 | 4.13 | 0.24 | 2.32 | 3.25 | -1.0% |
| 3922 | EV wireless charging | 3.47 | 3.31 | 1.47 | 4.62 | 3.25 | |
| 31270 | internet of things | 3.47 | 4.43 | -0.06 | 0.97 | 3.21 |
† values listed are after transforms and standardization have been applied
Top 10 forecasted STS5 RCs from the Computing Technology field (FY = 2014, TY = 2017, RY = -4, #papers in 2014> = 20).
| STS5 | Label | #Papers | Score | Growth |
|---|---|---|---|---|
| 6681 | cloud radio access networks | 126 | 2.65 | |
| 3602 | D2D communication | 377 | 2.64 | |
| 385 | software defined networks | 675 | 2.62 | |
| 7974 | cellular content caching | 75 | 2.61 | |
| 44976 | (general computing) | 80 | 2.60 | -77.0% |
| 4223 | nature-inspired optimization | 247 | 2.60 | |
| 61637 | ontology mapping | 34 | 2.56 | -30.4% |
| 3046 | EM wave metamaterial absorbers | 439 | 2.52 | |
| 24180 | spectrum sharing | 50 | 2.47 | 1.6% |
| 51600 | (general image processing) | 23 | 2.46 | -6.0% |
Top 10 forecasted STS5 RCs (FY = 2018) from the Computing Technology field.
| STS5 | Label | #P | Score | Top Institution and Country | |
|---|---|---|---|---|---|
| 5495 | generative adversarial networks | 964 | 3.37 | Alphabet | U.S. |
| 27709 | intelligent fault diagnosis | 142 | 3.27 | Xi’an Jaiotong Univ | China |
| 105 | convolutional neural networks | 4238 | 3.11 | Tsinghua Univ | China |
| 3647 | semantic image segmentation | 1038 | 3.03 | Univ CAS | China |
| 44644 | deep computational models | 58 | 3.00 | Dalian Univ | China |
| 6403 | image captioning | 615 | 2.92 | Microsoft | U.S. |
| 28965 | hate speech detection | 105 | 2.88 | Poly Univ Valencia | Spain |
| 30977 | ReLU networks | 62 | 2.86 | Alphabet | U.S. |
| 37537 | few-shot learning | 53 | 2.85 | Alphabet | U.S. |
| 1005 | word embedding | 1831 | 2.79 | Tsinghua Univ | China |
Fig 4Characterization of STS5 topic #5495.