| Literature DB >> 35965665 |
Joseph Mariani1, Gil Francopoulo2, Patrick Paroubek1, Frédéric Vernier1.
Abstract
This paper aims at analyzing the changes in the fields of speech and natural language processing over the recent past 5 years (2016-2020). It is in continuation of a series of two papers that we published in 2019 on the analysis of the NLP4NLP corpus, which contained articles published in 34 major conferences and journals in the field of speech and natural language processing, over a period of 50 years (1965-2015), and analyzed with the methods developed in the field of NLP, hence its name. The extended NLP4NLP+5 corpus now covers 55 years, comprising close to 90,000 documents [+30% compared with NLP4NLP: as many articles have been published in the single year 2020 than over the first 25 years (1965-1989)], 67,000 authors (+40%), 590,000 references (+80%), and approximately 380 million words (+40%). These analyses are conducted globally or comparatively among sources and also with the general scientific literature, with a focus on the past 5 years. It concludes in identifying profound changes in research topics as well as in the emergence of a new generation of authors and the appearance of new publications around artificial intelligence, neural networks, machine learning, and word embedding.Entities:
Keywords: artificial intelligence; machine learning; natural language processing; neural networks; research metrics; speech processing; text mining
Year: 2022 PMID: 35965665 PMCID: PMC9363593 DOI: 10.3389/frma.2022.863126
Source DB: PubMed Journal: Front Res Metr Anal ISSN: 2504-0537
The NLP4NLP+5 corpus of conferences (24) and journals (10).
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| acl | 6,713 | Conference | Association for Computational Linguistics conference series | English | Open access | 1979–2020 | 42 |
| acmtslp | 82 | Journal | ACM Transaction on Speech and Language Processing | English | Private access | 2004–2013 | 10 |
| alta | 361 | Conference | Australasian Language Technology Association conference series | English | Open access | 2003–2019 | 17 |
| anlp | 278 | Conference | Applied Natural Language Processing | English | Open access | 1983–2000 | 6 |
| cath | 927 | Journal | Computers and the Humanities | English | Private access | 1966–2004 | 39 |
| cl | 905 | Journal | American Journal of Computational Linguistics | English | Open access | 1980–2020 | 41 |
| coling | 5,091 | Conference | Conference on Computational Linguistics | English | Open access | 1965–2020 | 24 |
| conll | 1,124 | Conference | Computational Natural Language Learning | English | Open access | 1997–2020 | 23 |
| csal | 1,111 | Journal | Computer Speech and Language | English | Private access | 1986–2020 | 34 |
| eacl | 1,139 | Conference | European Chapter of the ACL conference series | English | Open access | 1983–2017 | 15 |
| emnlp | 4,588 | Conference | Empirical methods in natural language processing | English | Open access | 1996–2020 | 25 |
| hlt | 2,219 | Conference | Human Language Technology | English | Open access | 1986–2015 | 19 |
| icassps | 10,971 | Conference | IEEE International Conference on Acoustics, Speech and Signal | English | Private access | 1990–2020 | 31 |
| ijcnlp | 2,047 | Conference | International Joint Conference on NLP | English | Open access | 2005–2019 | 8 |
| inlg | 495 | Conference | International Conference on Natural Language Generation | English | Open access | 1996–2020 | 12 |
| isca | 22,778 | Conference | International Speech Communication Association conference series | English | Open access | 1987–2020 | 33 |
| jep | 739 | Conference | Journées d'Etudes sur la Parole | French | Open access | 2002–2020 | 8 |
| lre | 490 | Journal | Language Resources and Evaluation | English | Private access | 2005–2020 | 16 |
| lrec | 6,920 | Conference | Language Resources and Evaluation Conference | English | Open access | 1998–2020 | 12 |
| ltc | 793 | Conference | Language and Technology Conference | English | Private access | 1995–2019 | 9 |
| modulad | 232 | Journal | Le Monde des Utilisateurs de L'Analyse des Données | French | Open access | 1988–2010 | 23 |
| mts | 906 | Conference | Machine Translation Summit | English | Open access | 1987–2019 | 17 |
| muc | 149 | Conference | Message Understanding Conference | English | Open access | 1991–1998 | 5 |
| naacl | 2,175 | Conference | North American Chapter of the ACL conference series | English | Open access | 2000–2019 | 14 |
| paclic | 1,352 | Conference | Pacific Asia Conference on Language, Information and Computation | English | Open access | 1995–2018 | 23 |
| ranlp | 521 | Conference | Recent Advances in Natural Language Processing | English | Open access | 2009–2019 | 4 |
| sem | 1,089 | Conference | Lexical and Computational Semantics / Semantic Evaluation | English | Open access | 2001–2020 | 13 |
| speechc | 1,087 | Journal | Speech Communication | English | Private access | 1982–2020 | 39 |
| tacl | 307 | Journal | Transactions of the Association for Computational Linguistics | English | Open access | 2013–2020 | 8 |
| tal | 222 | Journal | Revue Traitement Automatique du Langage | French | Open access | 2006–2020 | 15 |
| taln | 1,250 | Conference | Traitement Automatique du Langage Naturel | French | Open access | 1997–2020 | 24 |
| taslp | 7,387 | Journal | IEEE/ACM Transactions on Audio, Speech, and Language Processing | English | Private access | 1975–2020 | 46 |
| tipster | 105 | Conference | Tipster Defense Advanced Research Projects Agency (DARPA) text program | English | Open access | 1993–1998 | 3 |
| trec | 2,199 | Conference | Text Retrieval Conference | English | Open access | 1992–2020 | 29 |
| Total incl. duplicates | 88,752 | 1965–2020 | 687 | ||||
| Total excl. duplicates | 85,138 | 1965–2020 | 667 |
Included in the ACL anthology.
Sources attached to each of the two research areas.
|
|
|
|
|---|---|---|
| NLP-oriented | acl, alta, anlp, cath, cl, coling, conll, eacl, emnlp, hlt, ijcnlp, inlg, lre, lrec, ltc, mts, muc, naacl, paclic, ranlp, sem, tacl, tal, taln, tipster, trec | 40,751 |
| Speech-oriented | acmtslp, csal, icassps, isca, jep, lre, lrec, ltc, mts, speechc, taslp | 53,264 |
Figure 1Number of papers each year.
Figure 2Increase in the number of papers over the years.
Figure 3Cumulated number of papers over the years.
Figure 4Number of different authors over the years.
Figure 5Number of different authors, new authors, and completely new authors over time.
Figure 6Percentage of new authors and completely new authors over time.
Figure 7Percentage of completely new authors in the most recent event across the sources in 2020 (red) and difference with 2015 (blue).
Figure 8Author redundancy over time.
Figure 9Author redundancy across the sources in 2020 (red) and difference with 2015 (blue).
Figure 10Number of papers and authorships over time.
Figure 11Gender of the authors' contributions over time.
Figure 12Percentage of female authors across the sources in 2020 (red) and difference with 2015 (blue).
12 most productive authors (up to 2020, in comparison with 2015).
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
|
| ||||
|
| ||||||
| 1 | Shrikanth S. Narayanan | 453 | 1 | 358 | 95 | 27% |
| 2 | Hermann Ney | 388 | 2 | 343 | 45 | 13% |
| 3 | John H. L. Hansen | 354 | 3 | 299 | 55 | 18% |
| 4 | Haizhou Li | 350 | 4 | 257 | 93 | 36% |
| 5 | Satoshi Nakamura | 263 | 7 | 205 | 58 | 28% |
| 6 | Chin Hui P. Lee | 261 | 5 | 218 | 43 | 20% |
| 7 | Alex Waibel | 234 | 6 | 207 | 27 | 13% |
| 8 | Mark J. F. Gales | 230 | 8 | 195 | 35 | 18% |
| 9 | James R. Glass | 214 | 25 | 142 | 72 | 51% |
| 10 | Yang Liu | 209 | 19 | 148 | 61 | 41% |
| 11 | Lin Shan Lee | 204 | 9 | 193 | 11 | 6% |
| 12 | Li Deng | 201 | 10 | 192 | 9 | 5% |
12 most productive authors in the past 5 years (2016 to 2020).
|
|
|
|
|---|---|---|
| 1 | Graham Neubig | 109 |
| 2 | Shrikanth S. Narayanan | 103 |
| 3 | Haizhou Li | 100 |
| 4 | Yue Zhang | 99 |
| 5 | Björn W. Schuller | 91 |
| 6 | Dong Yu | 83 |
| 7 | Iryna Gurevych | 80 |
| 8 | Junichi Yamagishi | 80 |
| 9 | Shinji Watanabe | 78 |
| 10 | James R. Glass | 77 |
| 11 | Helen M. Meng | 72 |
| 12 | Pushpak Bhattacharyya | 71 |
Figure 13Average number of authors per paper.
Number and names of authors of single author papers.
|
|
|
|
|---|---|---|
| 28 | 1 | W. Nick Campbell |
| 26 | 1 | Jerome R. Bellegarda |
| 24 | 2 | Ellen M. Voorhees, Olivier Ferret |
| 21 | 1 | Ralph Grishman |
| 20 | 1 | Takayuki Arai |
| 18 | 2 | Mark A. Johnson, Rathinavelu Chengalvarayan |
| 17 | 3 | Beth M. Sundheim, Douglas B. Paul, Kenneth C. Litkowski |
| 16 | 3 | Jerry R. Hobbs, Oi Yee Kwong, Steven M. Kay |
| 15 | 1 | Donna Harman |
| 14 | 4 | Dominique Desbois, John Makhoul, Patrick Saint-Dizier, Sadaoki Furui |
| 13 | 4 | Eckhard Bick, Paul S. Jacobs, Rens Bod, Robert C. Moore |
| 12 | 11 | David S. Pallett, Harvey F. Silverman, Jen Tzung Chien, Jörg Tiedemann, Lynette Hirschman, Marius A. Pasca, Martin Kay, Reinhard Rapp, Stephen Tomlinson, Ted Pedersen, Yorick Wilks |
| 11 | 10 | Dekang Lin, Eduard H. Hovy, Hagai Aronowitz, Michael Schiehlen, Philip Rose, Philippe Blache, Roger K. Moore, Shunichi Ishihara, Stephanie Seneff, Tomek Strzalkowski |
| 10 | 11 | Aravind K. Joshi, Hermann Ney, Hugo Van Hamme, Joshua T. Goodman, Karen Spärck Jones, Kenneth Ward Church, Kuldip K. Paliwal, Mark Hepple, Mark A. Huckvale, Mark Jan Nederhof, Olov Engwall |
| 9 | 31 | … |
| 8 | 25 | … |
| 7 | 51 | … |
| 6 | 90 | … |
| 5 | 124 | … |
| 4 | 224 | … |
| 3 | 447 | … |
| 2 | 1,088 | … |
| 1 | 4,667 | … |
| 0 | 60,193 | … |
The 12 authors with the largest number of co-authors (up to 2020, in comparison with 2015).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
|
|
| |||
| Shrikanth S. Narayanan | 403 | 1 | 1 | 299 | 104 |
| Haizhou Li | 355 | 2 | 3 | 252 | 103 |
| Satoshi Nakamura | 292 | 3 | 4 | 234 | 58 |
| Björn W. Schuller | 291 | 4 | 39 | 135 | 156 |
| Yang Liu | 290 | 5 | 12 | 178 | 112 |
| Hermann Ney | 288 | 6 | 2 | 254 | 34 |
| Sanjeev Khudanpur | 284 | 7 | 8 | 193 | 91 |
| Khalid Choukri | 253 | 8 | 15 | 177 | 76 |
| Ming Zhou | 246 | 9 | 71 | 115 | 131 |
| Chin Hui P. Lee | 241 | 10 | 7 | 194 | 47 |
| Dong Yu | 241 | 10 | 187 | 82 | 159 |
| Alan W. Black | 238 | 12 | 25 | 149 | 89 |
The 12 authors with the largest number of collaborations (up to 2020, in comparison with 2015).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| |
| Shrikanth S. Narayanan | 1,411 | 1 | 1 | 1,035 | 376 |
| Haizhou Li | 1,288 | 2 | 2 | 899 | 389 |
| Hermann Ney | 1,026 | 3 | 3 | 890 | 136 |
| Satoshi Nakamura | 861 | 4 | 4 | 672 | 189 |
| Björn W. Schuller | 841 | 5 | 26 | 408 | 433 |
| Helen M. Meng | 717 | 6 | 46 | 337 | 380 |
| Dong Yu | 716 | 7 | 63 | 293 | 423 |
| Chin Hui P. Lee | 710 | 8 | 6 | 544 | 166 |
| Junichi Yamagishi | 685 | 9 | 48 | 332 | 353 |
| Ming Zhou | 680 | 10 | 57 | 315 | 365 |
| Alex Waibel | 679 | 11 | 5 | 580 | 99 |
| Bin Ma | 670 | 12 | 10 | 503 | 167 |
The 12 authors with the largest number of co-authors in the past 5 years (2016–2020).
|
|
|
|
|---|---|---|
| 1 | Graham Neubig | 193 |
| 1 | Björn W Schuller | 193 |
| 3 | Yue Zhang | 187 |
| 4 | Dong Yu | 175 |
| 4 | Yu Zhang | 175 |
| 6 | Haizhou Li | 161 |
| 7 | Kongaik Lee | 158 |
| 8 | Shrikanth S. Narayanan | 154 |
| 9 | Ming Zhou | 151 |
| 10 | Shinji Watanabe | 145 |
| 10 | Jan Hajic | 145 |
| 12 | Yang Liu | 143 |
Figure 14Mean degree of the collaboration graph for the 34 sources in 2015 (blue) and 2020 (red).
Computation and comparison of the closeness centrality, degree centrality, and betweenness centrality for the 10 most central authors (up to 2020, in comparison with 2015).
|
|
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | 8 | Sanjeev Khudanpur | 17863.281 |
| 1 | 1 | Shrikanth S Narayanan |
| 1 | 1 | Shrikanth S Narayanan | 44717979 |
|
| 2 | 5 | Haizhou Li | 17782.575 |
| 2 | 3 | Haizhou Li |
| 2 | 2 | Haizhou Li | 34084103 |
|
| 3 | 2 | Shrikanth S Narayanan | 17709.094 |
| 3 | 4 | Satoshi Nakamura |
| 3 | 8 | Yang Liu | 32048199 |
|
| 4 | 1 | Mari Ostendorf | 17565.169 |
| 4 | 41 | Björn W Schuller |
| 4 | 3 | Satoshi Nakamura | 28679912 |
|
| 5 | 3 | Chin Hui P Lee | 17454.696 |
| 5 | 12 | Yang Liu |
| 5 | 4 | Chin Hui P Lee | 25895571 |
|
| 6 | 6 | Julia B Hirschberg | 17449.533 |
| 6 | 2 | Hermann Ney |
| 6 | 28 | Laurent Besacier | 25076596 |
|
| 7 | 15 | Yang Liu | 17442.071 |
| 7 | 8 | Sanjeev Khudanpur |
| 7 | 11 | Alan W Black | 23527696 |
|
| 8 | 11 | Alan W Black | 17409.874 |
| 8 | 15 | Khalid Choukri |
| 8 | 10 | Khalid Choukri | 22889904 |
|
| 9 | 4 | Hermann Ney | 17272.551 |
| 9 | 14 | Ming Zhou |
| 9 | 18 | Sanjeev Khudanpur | 21917631 |
|
| 10 | 115 | Dong Yu | 17249.284 |
| 10 | 7 | Chin Hui P Lee |
| 10 | 5 | Hermann Ney | 21262259 |
|
| 10 | 187 | Dong Yu |
| ||||||||||
Closeness centrality for the 10 most central authors in the past 5 years (2016–2020).
|
|
|
|
|
|---|---|---|---|
| 1 | Dong Yu | 7205.507 |
|
| 2 | Yu Zhang | 7109.654 |
|
| 3 | Graham Neubig | 7103.21 |
|
| 4 | Yue Zhang | 7012.758 |
|
| 5 | Sanjeev Khudanpur | 6908.953 |
|
| 6 | Heng Ji | 6897.558 |
|
| 7 | Shinji Watanabe | 6881.992 |
|
| 8 | Xin Wang | 6836.757 |
|
| 9 | Mark A. Hasegawa Johnson | 6811.851 |
|
| 10 | Lukás Burget | 6732.778 |
|
Values in bold indicate normalize values based on the first one.
Betweenness centrality for the 10 most central authors in the past 5 years (2016–2020).
|
|
|
|
|
|---|---|---|---|
| 1 | Yue Zhang | 12633450 |
|
| 2 | Graham Neubig | 12539019 |
|
| 3 | Dong Yu | 10394169 |
|
| 4 | Yu Zhang | 9117498 |
|
| 5 | Shrikanth S. Narayanan | 8093016 |
|
| 6 | Laurent Besacier | 7640198 |
|
| 7 | Yang Liu | 6931507 |
|
| 8 | Shinji Watanabe | 6751311 |
|
| 9 | Haizhou Li | 6233480 |
|
| 10 | Xin Wang | 6096768 |
|
Figure 15The average number of references per paper over the years.
Figure 16The average number of citations of a paper over the years.
Absence of citations of authors and papers within NLP4NLP+5.
|
|
|
| |
|---|---|---|---|
|
|
| ||
| Papers never referenced | 31,603 | 37 | 44 |
| Papers never referenced (aside self ref) | 40,111 | 47 | 54 |
| Authors never referenced | 23,850 | 36 | 42 |
| Authors never referenced (aside self ref) | 25s ,281 | 38 | 44 |
A total of 20 most cited authors up to 2020.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1 | 3 | Christopher D. Manning | 13,195 | 152 | 86.809 | 2.145 |
| 2 | 1 | Hermann Ney | 7,109 | 388 | 18.322 | 16.205 |
| 3 | >20 | Christopher Dyer | 5,372 | 114 | 47.123 | 3.984 |
| 4 | >20 | Richard Socher | 5,175 | 37 | 139.865 | 1.198 |
| 5 | 2 | Franz Josef Och | 5,041 | 42 | 120.024 | 1.825 |
| 6 | 5 | Dan Klein | 4,945 | 130 | 38.038 | 6.249 |
| 7 | 4 | Philipp Koehn | 4,726 | 59 | 80.102 | 2.412 |
| 8 | >20 | Noah A. Smith | 4,648 | 160 | 29.05 | 6.713 |
| 9 | 7 | Andreas Stolcke | 4,532 | 145 | 31.255 | 6.355 |
| 10 | 6 | Michael John Collins | 4,256 | 69 | 61.681 | 3.195 |
| 11 | >20 | Kenton Lee | 4,251 | 21 | 202.429 | 0.729 |
| 12 | >20 | Luke S. Zettlemoyer | 4,158 | 92 | 45.196 | 5.075 |
| 13 | 9 | Salim Roukos | 4,132 | 71 | 58.197 | 1.5 |
| 14 | 18 | Daniel Jurafsky | 4,056 | 118 | 34.373 | 2.342 |
| 15 | >20 | Kristina Toutanova | 4,055 | 47 | 86.277 | 0.764 |
| 16 | >20 | Sanjeev Khudanpur | 4,051 | 135 | 30.007 | 6.492 |
| 17 | >20 | Daniel Povey | 3,796 | 112 | 33.893 | 7.929 |
| 18 | 16 | Li Deng | 3,672 | 201 | 18.269 | 14.842 |
| 19 | >20 | Dong Yu | 3,653 | 177 | 20.638 | 10.895 |
| 20 | >20 | Mirella Lapata | 3,578 | 138 | 25.928 | 6.987 |
The number of citations for the 20 most productive authors (1965–2020).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 453 | Shrikanth S. Narayanan | 13 | 3 | 388 | 86 | 0 | 0 | >20 | 782 | 1.726 | 2,129 | 4.7 |
| 388 | Hermann Ney | 27 | 7 | 325 | 84 | 10 | 3 | 2 | 1,152 | 2.969 | 5,957 | 15.353 |
| 354 | John H. L. Hansen | 29 | 8 | 283 | 80 | 3 | 1 | >20 | 779 | 2.201 | 1,076 | 3.04 |
| 350 | Haizhou Li | 13 | 4 | 256 | 73 | 2 | 1 | >20 | 490 | 1.4 | 1,623 | 4.637 |
| 263 | Satoshi Nakamura | 17 | 6 | 190 | 72 | 1 | 0 | >20 | 160 | 0.608 | 648 | 2.464 |
| 261 | Chin Hui P. Lee | 14 | 5 | 207 | 79 | 5 | 2 | >20 | 577 | 2.211 | 2,852 | 10.927 |
| 234 | Alex Waibel | 13 | 6 | 199 | 85 | 2 | 1 | >20 | 262 | 1.12 | 2,048 | 8.752 |
| 230 | Mark J. F. Gales | 31 | 13 | 105 | 46 | 9 | 4 | >20 | 638 | 2.774 | 2,923 | 12.709 |
| 214 | James R. Glass | 11 | 5 | 152 | 71 | 1 | 0 | >20 | 428 | 2 | 2,084 | 9.738 |
| 209 | Yang Liu | 48 | 23 | 83 | 40 | 3 | 1 | >20 | 240 | 1.148 | 2,080 | 9.952 |
| 204 | Lin Shan Lee | 10 | 5 | 189 | 93 | 0 | 0 | >20 | 328 | 1.608 | 656 | 3.216 |
| 201 | Li Deng | 57 | 28 | 73 | 36 | 6 | 3 | 18 | 545 | 2.711 | 3,127 | 15.557 |
| 197 | Hervé Bourlard | 10 | 5 | 141 | 72 | 3 | 2 | >20 | 277 | 1.406 | 940 | 4.772 |
| 195 | Mari Ostendorf | 29 | 15 | 100 | 51 | 5 | 3 | >20 | 309 | 1.585 | 2,136 | 10.954 |
| 195 | Tatsuya Kawahara | 33 | 17 | 110 | 56 | 0 | 0 | >20 | 248 | 1.272 | 708 | 3.631 |
| 192 | Björn W. Schuller | 40 | 21 | 105 | 55 | 0 | 0 | >20 | 511 | 2.661 | 1,583 | 8.245 |
| 188 | Keikichi Hirose | 28 | 15 | 95 | 51 | 1 | 1 | >20 | 140 | 0.745 | 330 | 1.755 |
| 183 | Frank K. Soong | 9 | 5 | 78 | 43 | 0 | 0 | >20 | 208 | 1.137 | 1,240 | 6.776 |
| 182 | Kiyohiro Shikano | 1 | 1 | 142 | 78 | 0 | 0 | >20 | 276 | 1.516 | 1,161 | 6.379 |
| 180 | Timothy Baldwin | 21 | 12 | 115 | 64 | 4 | 2 | >20 | 216 | 1.2 | 1,160 | 6.444 |
A number of 20 most cited authors in the past 5 years (2016–2020).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 | Christopher D. Manning | 9,148 | 152 | 60.184 | 0.875 |
| 2 | Richard Socher | 4,404 | 37 | 119.027 | 0.749 |
| 3 | Kenton Lee | 4,250 | 21 | 202.381 | 0.729 |
| 4 | Christopher Dyer | 3,881 | 114 | 34.044 | 3.015 |
| 5 | Luke S. Zettlemoyer | 3,640 | 92 | 39.565 | 3.407 |
| 6 | Sanjeev Khudanpur | 3,168 | 135 | 23.467 | 5.966 |
| 7 | Kristina Toutanova | 3,154 | 47 | 67.106 | 0.254 |
| 8 | Noah A. Smith | 3,115 | 160 | 19.469 | 4.687 |
| 9 | Ming Wei Chang | 2,990 | 31 | 96.452 | 1.204 |
| 10 | Daniel Povey | 2,852 | 112 | 25.464 | 6.872 |
| 11 | Jacob Devlin | 2,836 | 20 | 141.8 | 0.353 |
| 12 | Jeffrey Pennington | 2,586 | 2 | 1293 | 0 |
| 13 | Percy Liang | 2,312 | 56 | 41.286 | 3.287 |
| 14 | Dong Yu | 2,238 | 177 | 12.644 | 6.702 |
| 15 | Tomáš Mikolov | 2,232 | 18 | 124 | 0.314 |
| 16 | Yoshua Bengio | 2,170 | 47 | 46.17 | 2.074 |
| 17 | Mirella Lapata | 2,106 | 138 | 15.261 | 7.123 |
| 18 | Daniel Jurafsky | 2,002 | 118 | 16.966 | 1.049 |
| 19 | Eduard H. Hovy | 1,970 | 168 | 11.726 | 2.69 |
| 20 | Yoav Goldberg | 1,860 | 72 | 25.833 | 2.527 |
List of the 20 authors with the largest h-index up to 2020 in comparison with 2015.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
|
|
| |
|
|
| |||
| 1 | 1 | Christopher D. Manning | 49 | 32 |
| 2 | 12 | Noah A. Smith | 36 | 22 |
| 3 | 4 | Dan Klein | 35 | 25 |
| 4 | 2 | Hermann Ney | 34 | 29 |
| 5 | 12 | Daniel Jurafsky | 33 | 22 |
| 6 | 15 | Mirella Lapata | 33 | 21 |
| 7 | 12 | Li Deng | 33 | 22 |
| 8 | 3 | Andreas Stolcke | 32 | 28 |
| 9 | >20 | Christopher Dyer | 31 | |
| 10 | >20 | Luke S. Zettlemoyer | 31 | |
| 11 | >20 | Kevin Knight | 29 | |
| 12 | 5 | Michael John Collins | 29 | 24 |
| 13 | >20 | Dan Roth | 28 | |
| 14 | >20 | Dong Yu | 28 | |
| 15 | >20 | Regina Barzilay | 27 | |
| 16 | 12 | Stephen J. Young | 27 | 22 |
| 17 | >20 | Eduard H. Hovy | 27 | |
| 18 | >20 | Daniel Povey | 27 | |
| 19 | 15 | Joakim Nivre | 27 | 21 |
| 20 | >20 | Deliang Wang | 26 |
List of the 20 authors with the largest h-index for the past 5 years (2016–2020).
|
|
|
|
|---|---|---|
| 1 | Christopher D. Manning | 38 |
| 2 | Noah A. Smith | 31 |
| 3 | Christopher Dyer | 29 |
| 4 | Luke S. Zettlemoyer | 28 |
| 5 | Mirella Lapata | 26 |
| 6 | Daniel Jurafsky | 23 |
| 7 | Dong Yu | 23 |
| 8 | Daniel Povey | 22 |
| 9 | Tara N. Sainath | 22 |
| 10 | Dan Klein | 22 |
| 11 | Yoav Goldberg | 22 |
| 12 | Percy Liang | 21 |
| 13 | Dan Roth | 21 |
| 14 | Yang Liu | 21 |
| 15 | Shinji Watanabe | 20 |
| 16 | Sanjeev Khudanpur | 20 |
| 17 | Regina Barzilay | 20 |
| 18 | Deliang Wang | 20 |
| 19 | Björn W. Schuller | 20 |
| 20 | Yue Zhang | 20 |
The number of 20 most cited papers up to 2020.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | 1 | BLEU: a Method for Automatic Evaluation of Machine Translation | Kishore A. Papineni, Salim Roukos, Todd R. Ward, Wei Jing Zhu | acl | 2002 | 3,020 | 1514 |
| 2 | >20 | Glove: Global Vectors for Word Representation | Jeffrey Pennington, Richard Socher, Christopher D. Manning | emnlp | 2014 | 2,590 | |
| 3 | 0 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Jacob Devlin, Ming Wei Chang, Kenton Lee, Kristina Toutanova | naacl | 2019 | 2,468 | |
| 4 | 2 | Building a Large Annotated Corpus of English: The Penn Treebank | Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz | cl | 1993 | 1,610 | 1145 |
| 5 | 3 | Moses: Open Source Toolkit for Statistical Machine Translation | Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Christopher Dyer, Ondrej Bojar, Alexandra Constantin, Evan Herbst | acl | 2007 | 1,380 | 860 |
| 6 | 5 | SRILM - an extensible language modeling toolkit | Andreas Stolcke | isca | 2002 | 1,319 | 831 |
| 7 | >20 | Front-End Factor Analysis for Speaker Verification | Najim Dehak, Patrick J. Kenny, Réda Dehak, Pierre Dumouchel, Pierre Ouellet | taslp | 2011 | 1,170 | |
| 8 | 0 | Deep Contextualized Word Representations | Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke S. Zettlemoyer | naacl | 2018 | 1,166 | |
| 9 | 4 | A Systematic Comparison of Various Statistical Alignment Models | Franz Josef Och, Hermann Ney | cl | 2003 | 1,079 | 855 |
| 10 | 6 | Statistical Phrase-Based Translation | Philipp Koehn, Franz Josef Och, Daniel Marcu | hlt, naacl | 2003 | 1,038 | 829 |
| 11 | 7 | The Mathematics of Statistical Machine Translation: Parameter Estimation | Peter E. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer | cl | 1993 | 978 | 820 |
| 12 | 0 | Effective Approaches to Attention-based Neural Machine Translation | Thang Luong, Hieu Pham, Christopher D. Manning | emnlp | 2015 | 907 | |
| 13 | 8 | Minimum Error Rate Training in Statistical Machine Translation | Franz Josef Och | acl | 2003 | 879 | 726 |
| 14 | >20 | Convolutional Neural Networks for Sentence Classification | Yoon Chul Kim | emnlp | 2014 | 862 | |
| 15 | 0 | Neural Machine Translation of Rare Words with Subword Units | Rico Sennrich, Barry Haddow, Alexandra Birch | acl | 2016 | 836 | |
| 16 | >20 | Wordnet: A Lexical Database For English | George A. Miller | hlt | 1992 | 814 | |
| 17 | >20 | Spoken Language Translation | Hwee Tou Ng | emnlp | 1997 | 774 | |
| 18 | 15 | Europarl: A Parallel Corpus for Statistical Machine Translation | Philipp Koehn | mts | 2005 | 760 | 472 |
| 19 | 10 | Suppression of acoustic noise in speech using spectral subtraction | Steven F. Boll | taslp | 1979 | 728 | 566 |
| 20 | 13 | Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator | Yariv Ephraim, David Malah | taslp | 1984 | 708 | 488 |
The number of 20 most cited papers for the past 5 years (2016–2020).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 | Glove: Global Vectors for Word Representation | emnlp | 2014 | Jeffrey Pennington, Richard Socher, Christopher D. Manning | 2,486 |
| 2 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | naacl | 2019 | Jacob Devlin, Ming Wei Chang, Kenton Lee, Kristina Toutanova | 2,468 |
| 3 | BLEU: a Method for Automatic Evaluation of Machine Translation | acl | 2002 | Kishore A. Papineni, Salim Roukos, Todd R. Ward, Wei Jing Zhu | 1,491 |
| 4 | Deep Contextualized Word Representations | naacl | 2018 | Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke S. Zettlemoyer | 1,166 |
| 5 | Effective Approaches to Attention-based Neural Machine Translation | emnlp | 2015 | Thang Luong, Hieu Pham, Christopher D. Manning | 907 |
| 6 | Neural Machine Translation of Rare Words with Subword Units | acl | 2016 | Rico Sennrich, Barry Haddow, Alexandra Birch | 836 |
| 7 | Convolutional Neural Networks for Sentence Classification | emnlp | 2014 | Yoon Chul Kim | 820 |
| 8 | Front-End Factor Analysis for Speaker Verification | taslp | 2011 | Najim Dehak, Patrick J. Kenny, Réda Dehak, Pierre Dumouchel, Pierre Ouellet | 738 |
| 9 | Enriching Word Vectors with Subword Information | tacl | 2017 | Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomáš Mikolov | 687 |
| 10 | Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation | emnlp | 2014 | Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio | 566 |
| 11 | SQuAD: 100,000+ Questions for Machine Comprehension of Text | emnlp | 2016 | Pranav Rajpurkar, Jian Justin Zhang, Konstantin Lopyrev, Percy Liang | 556 |
| 12 | Moses: Open Source Toolkit for Statistical Machine Translation | acl | 2007 | Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Christopher Dyer, Ondrej Bojar, Alexandra Constantin, Evan Herbst | 505 |
| 13 | Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank | emnlp | 2013 | Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, Christopher Potts | 488 |
| 14 | Librispeech: An ASR Corpus Based on Public Domain Audio Books | icassps | 2015 | Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur | 474 |
| 15 | Recurrent neural network-based language model | isca | 2010 | Tomáš Mikolov, Martin Karafiát, Lukás Burget, Jan Honza Cernocký, Sanjeev Khudanpur | 472 |
| 16 | Wordnet: A Lexical Database for English | hlt | 1992 | George A. Miller | 456 |
| 17 | Get To The Point: Summarization with Pointer-Generator Networks | acl | 2017 | Abigail See, Peter J. Liu, Christopher D. Manning | 455 |
| 18 | Building a Large Annotated Corpus of English: The Penn Treebank | cl | 1993 | Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz | 447 |
| 19 | A large annotated corpus for learning natural language inference | emnlp | 2015 | Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning | 446 |
| 20 | Neural Architectures for Named Entity Recognition | naacl | 2016 | Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Christopher Dyer | 432 |
Figure 17Number of NLP (blue) vs. Speech (red) articles being cited over time.
Figure 18Percentage of NLP (blue) vs. Speech (red) articles being cited over time.
Figure 19Number of references to papers of 6 major conferences over the years.
Figure 20Number of references to papers of 6 major journals over the years.
Figure 21Mean Degree of authors citing authors in general for the 34 sources in 2015 (blue) and 2020 (red).
Figure 22Mean Degree of authors being cited for the 34 sources in 2015 (blue) and 2020 (red).
Figure 23Mean Degree of papers citing papers in general for the 34 sources in 2015 (blue) and 2020 (red).
Figure 24Mean Degree of papers being cited for the 34 sources in 2015 (blue) and 2020 (red).
Figure 25Internal h-index of the 34 sources in 2015 (blue) and 2020 (red).
Figure 26Total number of references (blue) and of NLP4NLP references (red) in NLP4NLP papers yearly.
Figure 27Percentage of NLP4NLP papers in the references.
Figure 28Average number of references per paper globally (blue) or only to NLP4NLP papers (red).
Figure 29Number of references to arXiv preprints.
Figure 30Number of references related to AI, neural networks, and machine-learning sources external to NLP4NLP.
Ranking of 28 top sources according to Google Scholar h5-index over the past 5 years (2016–2020), in comparison with the previous ranking over 2011–2015.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1 | 1 | Meeting of the Association for Computational Linguistics (ACL) | 157 | 275 | 65 | 99 |
| 2 | 2 | Conference on Empirical Methods in Natural Language Processing (EMNLP) | 132 | 235 | 56 | 81 |
| 3 | 5 | Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL) | 105 | 195 | 48 | 71 |
| 4 | 3 | IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 96 | 143 | 54 | 73 |
| 5 | 6 | Conference of the International Speech Communication Association (INTERSPEECH) | 89 | 150 | 39 | 70 |
| 6 | 8 | International Conference on Computational Linguistics (COLING) | 64 | 103 | 38 | 59 |
| 7 | 4 | IEEE/ACM Transactions on Audio, Speech, and Language Processing | 60 | 87 | 51 | 78 |
| 8 | Transactions of the Association for Computational Linguistics (TACL) | 59 | 136 | |||
| 9 | 7 | International Conference on Language Resources and Evaluation (LREC) | 53 | 81 | 38 | 64 |
| 10 | 15 | International Workshop on Semantic Evaluation (SEMEVAL) | 52 | 93 | 23 | 41 |
| 10 | 16 | Conference of the European Chapter of the Association for Computational Linguistics (EACL) | 52 | 98 | 21 | 34 |
| 12 | 20 | Workshop on Machine Translation (WMT) | 47 | 74 | 18 | 24 |
| 13 | 13 | Conference on Computational Natural Language Learning (CoNLL) | 43 | 77 | 24 | 36 |
| 14 | 10 | Computer Speech & Language (CSL) | 34 | 49 | 32 | 51 |
| 14 | 19 | Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) | 34 | 51 | 18 | 27 |
| 16 | IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) | 33 | 52 | |||
| 16 | 18 | IEEE Spoken Language Technology Workshop (SLT) | 33 | 58 | 18 | 28 |
| 18 | 12 | Computational Linguistics (CL) | 30 | 48 | 31 | 40 |
| 18 | 17 | International Joint Conference on Natural Language Processing (IJCNLP) | 30 | 48 | 20 | 27 |
| 20 | 11 | Speech Communication | 28 | 49 | 32 | 49 |
| 21 | Workshop on Representation Learning for NLP | 27 | 72 | |||
| 22 | Biomedical Natural Language Processing | 26 | 37 | |||
| 23 | Workshop on Innovative Use of NLP for Building Educational Applications | 25 | 34 | |||
| 24 | 14 | Language Resources and Evaluation (LRE) | 24 | 36 | 23 | 42 |
| 24 | Odyssey: The Speaker and Language Recognition Workshop | 24 | 45 | |||
| 24 | International Conference on Natural Language Generation (INLG) | 24 | 35 | |||
| 27 | Natural Language Engineering | 23 | 48 | |||
| 28 | IEEE International Conference on Semantic Computing | 22 | 31 |
According to Google Scholar, “h5-index is the h-index for articles published in the last 5 complete years. It is the largest number h such that h articles published in 2016–2020 have at least h citations each. h5-median for a publication is the median number of citations for the articles that make up its h5-index”.
The number of 25 most frequent terms up to 2020 overall, with number of occurrences and existences, frequency and presence, in comparison with 2015 (terms marked in green are those which progressed in frequency).
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| 1 | Dataset |
| 240,691 | 0.00758 | 24,288 | 0.28969 | 9.91 | 11 | 10 |
| 2 | Annotation |
| 187,175 | 0.00589 | 19,942 | 0.23786 | 9.39 | 4 | 2 |
| 3 | SR |
| 179,579 | 0.00566 | 25,916 | 0.30911 | 6.93 | 2 | −1 |
| 4 | LM |
| 164,944 | 0.00519 | 19,139 | 0.22828 | 8.62 | 3 | −1 |
| 5 | HMM |
| 155,335 | 0.00489 | 17,131 | 0.20433 | 9.07 | 1 | −4 |
| 6 | Embedding |
| 145,844 | 0.00459 | 11,804 | 0.14079 | 12.36 | 29 | 23 |
| 7 | Classifier |
| 143,885 | 0.00453 | 18,540 | 0.22114 | 7.76 | 6 | −1 |
| 8 | POS |
| 135,022 | 0.00425 | 18,946 | 0.22598 | 7.13 | 5 | −3 |
| 9 | NP |
| 111,726 | 0.00352 | 12,139 | 0.14479 | 9.20 | 7 | −2 |
| 10 | Parser |
| 107,678 | 0.00339 | 12,071 | 0.14398 | 8.92 | 8 | −2 |
| 11 | Neural network |
| 97,039 | 0.00306 | 18,724 | 0.22333 | 5.18 | 17 | 6 |
| 12 | Metric |
| 95,056 | 0.00299 | 20,451 | 0.24393 | 4.65 | 18 | 6 |
| 13 | Segmentation |
| 94,888 | 0.00299 | 14,033 | 0.16738 | 6.76 | 9 | −4 |
| 14 | SNR |
| 90,820 | 0.00286 | 8,517 | 0.10159 | 10.66 | 10 | −4 |
| 15 | MT |
| 88,790 | 0.0028 | 13,603 | 0.16225 | 6.53 | 15 | 0 |
| 16 | Parsing |
| 75,189 | 0.00237 | 12,551 | 0.1497 | 5.99 | 13 | −3 |
| 17 | DNN |
| 74,921 | 0.00236 | 5,740 | 0.06846 | 13.05 | 63 | 46 |
| 18 | GMM |
| 74,820 | 0.00236 | 8,203 | 0.09784 | 9.12 | 14 | −4 |
| 19 | ngram |
| 73,159 | 0.0023 | 11,285 | 0.1346 | 6.48 | 21 | 2 |
| 20 | Semantic | 70,186 | 0.00221 | 16,697 | 0.19915 | 4.20 | 12 | −8 | |
| 21 | Decoder |
| 69,385 | 0.00219 | 10,274 | 0.12254 | 6.75 | 71 | 50 |
| 22 | WER |
| 69,297 | 0.00218 | 8,547 | 0.10194 | 8.11 | 20 | −2 |
| 23 | LSTM | 68,445 | 0.00216 | 7,090 | 0.08457 | 9.65 | 145 | 122 | |
| 24 | SVM |
| 67,610 | 0.00213 | 9,005 | 0.10741 | 7.51 | 19 | −5 |
| 25 | Iteration |
| 65,686 | 0.00207 | 15,372 | 0.18335 | 4.27 | 16 | −9 |
Figure 31Overview of the GapChart (2000 to 2011) illustrating the parameters. Years appear on the X-axis, and ordered terms (here according to their presence) appear on the Y-axis (10 terms in each color).
Figure 32Evolution of the top 25 terms over the past 10 years (2010 to 2020) according to their presence (raw ranking without smoothing).
Figure 33Evolution of the terms “HMM” (in green) and “Neural Network” (in blue) over the past 30 years (1990 to 2020) according to their presence in the papers.
Figure 34Evolution of the terms “LSTM” (brown), “RNN” (green), “DNN” (blue) and “CNN” (red) over the past 5 years (from the 100th rank in 2015 to the 30th in 2020), according to their presence.
Figure 36Evolution of the terms “softmax” (blue), “hyperparameter” (red) and “epoch” (green) over the past 5 years (from the 100th rank in 2015 to the 20th in 2020), according to their presence.
Figure 37(A) Tag Cloud based on the abstracts of 2015. (B) Tag Cloud based on the abstracts of 2020.
Research topic prediction based on term frequency using the selected Weka algorithm.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
|
|
|
|
| |||||
| 1 | DATASET | 0.019411 | Dataset | 0.019293 | Embedding | 0.020756 | Dataset | 0.020833 |
| 2 | EMBEDDING | 0.012028 | Embedding | 0.018099 | Dataset | 0.017509 | Embedding | 0.015237 |
| 3 | ANNOTATION | 0.008888 | Encoder | 0.009572 | Encoder | 0.011884 | BERT | 0.01076 |
| 4 | LSTM | 0.008571 | LSTM | 0.008271 | BERT | 0.008609 | Annotation | 0.009168 |
| 5 | DNN | 0.006005 | Decoder | 0.007093 | Decoder | 0.008261 | Encoder | 0.009156 |
| 6 | SR | 0.005689 | LM | 0.006079 | Classifier | 0.007376 | LM | 0.006342 |
| 7 | RNN | 0.005585 | Metric | 0.005929 | LM | 0.006825 | Transformer | 0.006299 |
| 8 | Encoder | 0.005373 | BERT | 0.005745 | Metric | 0.006738 | SR | 0.006232 |
| 9 | Classifier | 0.005365 | SR | 0.005388 | LSTM | 0.006276 | Metric | 0.00604 |
| 10 | Neural network | 0.005334 | Annotation | 0.005326 | Transformer | 0.004887 | LSTM | 0.005866 |
Predictions are marked in green.
Comparison of the terms predicted in 2015 for the next 5 years (2016–2020) with the actual observations on these years (Predictions are in italics. Terms correctly predicted to appear among the 10 top terms are marked in yellow, term correctly predicted at its rank is marked in green).
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 | Dataset | Dataset | Dataset | Dataset | Dataset |
|
|
|
|
|
| 2 | Annotation | Embedding | Embedding | Embedding | Embedding |
|
|
|
|
|
| 3 | DNN | DNN | Annotation | Encoder | BERT |
|
|
|
|
|
| 4 | embedding | LSTM | LSTM | LSTM | Annotation |
|
|
|
|
|
| 5 | SR | SR | DNN | Decoder | Encoder |
|
|
|
|
|
| 6 | LSTM | RNN | SR | LM | LM |
|
|
|
|
|
| 7 | POS | Annotation | RNN | Metric | Transformer |
|
|
|
|
|
| 8 | Classifier | Neural network | Encoder | BERT | SR |
|
|
|
|
|
| 9 | Neural network | Classifier | Classifier | SR | Metric |
|
|
|
|
|
| 10 | RNN | LM | Neural network | Annotation | LSTM |
|
|
|
|
|
|
| |||||
|
|
|
|
|
|
| |||||
Figure 38Evolution of the distance between prediction and observation over the years.
Figure 39Measure of the expectation of an emerging research topic: Deep Neural Networks (DNN).
Predictions for the next 5 years (2021–2025).
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | Dataset | Dataset | Dataset | Dataset | Dataset | Dataset | Dataset |
| 2 | Embedding | Embedding | Embedding | Embedding | BERT | BERT | BERT |
| 3 | Encoder | BERT | BERT | BERT | Embedding | Embedding | Embedding |
| 4 | LSTM | Annotation | Encoder | Annotation | Encoder | Annotation | Transformer |
| 5 | Decoder | Encoder | Transformer | Encoder | Transformer | Transformer | Encoder |
| 6 | LM | LM | LM | transformer | LM | Encoder | LM |
| 7 | Metric | Transformer | Annotation | LM | Annotation | LM | Annotation |
| 8 | BERT | SR | Metric | Metric | Metric | Metric | Metric |
| 9 | SR | Metric | SR | SR | SR | SR | SR |
| 10 | Annotation | LSTM | Decoder | Decoder | Decoder | Annotator | Decoder |
Predictions are marked in green.
The number of 10 most present terms in 2020, with variants, date, authors, and publications where they were first introduced, number of occurrences and existences in 2020, number of occurrences, frequency, number of existences and presence in the 55-year archive, with ranking and average number of occurrences of the terms in the documents where they appear, and comparison with the ranking in 2015 (the terms which joined the top 10 are marked in green, while the 5 which went out are marked in orange with their new and former ranking).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | Dataset |
| 1966 | Laurence Urdang | cath1966-3 | 240,691 | 0.0076 | 24,288 | 0.290 | 1 | 2 | 9.91 | 59,794 | 4,313 | 0.0224 | 0.795 |
| 2 | 30 | Embedding |
| 1967 | Aravind K. Joshi, Danuta Hiz, Jane J. Robinson, Steven I. Laszlo | C67-1007 C67-1010 C67-1015 | 145,845 | 0.0046 | 11,804 | 0.141 | 6 | 25 | 12.36 | 37,346 | 3,193 | 0.0140 | 0.588 |
| 3 | 2 | Metric |
| 1965 | A Andreyewsky | C65-1002 | 95,056 | 0.0030 | 20,451 | 0.244 | 12 | 4 | 4.65 | 14,352 | 2,915 | 0.0054 | 0.537 |
| 4 | 7 | Neural network |
| 1972 | P J. Brown | cath1972-21 | 97,031 | 0.0031 | 18,716 | 0.223 | 11 | 8 | 5.18 | 9,190 | 2,623 | 0.0034 | 0.483 |
| 5 | >200 | Encoder |
| 1968 | Raymond F. Erickson | cath1968-2 | 62,324 | 0.0020 | 6,874 | 0.082 | 28 | 74 | 9.07 | 21,444 | 2,350 | 0.0080 | 0.433 |
| 6 | 6 | Annotation |
| 1967 | Kenneth Janda, Martin Kay | cath1967-12 cath1967-8 | 187,175 | 0.0059 | 19,942 | 0.238 | 2 | 5 | 9.39 | 21,751 | 2,160 | 0.0081 | 0.398 |
| 7 | 67 | Hyperparameter |
| 1989 | G Demoment | taslp1989-131 | 22,593 | 0.0007 | 7,900 | 0.094 | 104 | 58 | 2.86 | 5,232 | 2,110 | 0.0020 | 0.389 |
| 8 | 9 | LM |
| 1965 | Sheldon Klein | C65-1014 | 164,564 | 0.0052 | 19,080 | 0.228 | 4 | 6 | 8.62 | 14,850 | 1,977 | 0.0056 | 0.364 |
| 9 | 14 | NLP |
| 1965 | Denis M. Manelski, Gilbert K. Krulee | C65-1018 | 46,094 | 0.0015 | 14,243 | 0.170 | 40 | 14 | 3.24 | 6,978 | 1,946 | 0.0026 | 0.359 |
| 10 | 146 | LSTM | 1999 | Felix A. Gers, Fred Cummins, Juergen Schmidhuber | e99_93 | 68,445 | 0.0022 | 7,090 | 0.085 | 23 | 70 | 9.65 | 13,767 | 1,934 | 0.0051 | 0.356 | |
| 12 | 3 | Subset |
| 1965 | Denis M. Manelski, E. D. Pendergraft, Gilbert K. Krulee, Itiroo Sakai, N. Dale, Wojciech Skalmowski | C65-1006 C65-1018 C65-1021 C65-1025 | 65,243 | 0.0021 | 24,171 | 0.288 | 26 | 29 | 2.70 | 5,239 | 1,913 | 0.0020 | 0.353 |
| 14 | 4 | Classifier |
| 1967 | Aravind K. Joshi, Danuta Hiz | C67-1007 | 143,885 | 0.0045 | 18,540 | 0.221 | 7 | 13 | 7.76 | 11,125 | 1,847 | 0.0042 | 0.340 |
| 24 | 5 | SR |
| 1965 | Denis M. Manelski, Dániel Várga, Gilbert K. Krulee, Makoto Nagao, Toshiyuki Sakai | C65-1018 C65-1022 C65-1029 | 179,579 | 0.0056 | 25,916 | 0.309 | 3 | 1 | 6.93 | 14,630 | 1,423 | 0.0055 | 0.262 |
| 27 | 10 | Optimization |
| 1967 | Ellis B. Page | C67-1032 | 48,412 | 0.0015 | 15,221 | 0.182 | 36 | 13 | 3.18 | 3,514 | 1,356 | 0.0013 | 0.250 |
| 33 | 8 | POS |
| 1965 | Denis M. Manelski, Dániel Várga, Gilbert K. Krulee, Makoto Nagao, Toshiyuki Sakai | C65-1018 C65-1022 C65-1029 | 135,022 | 0.0042 | 18,946 | 0.226 | 8 | 14 | 7.13 | 7,278 | 1,158 | 0.0027 | 0.213 |
Global ranking of the innovation score of the terms overall and separately for speech and NLP up to 2020.
|
|
|
|
|
|---|---|---|---|
| 1 | Speech recognition | Semantic | Speech recognition |
| 2 | Subset | NP | Spectral |
| 3 | Semantic | Syntactic | HMM |
| 4 | LM | POS | Filtering |
| 5 | Filtering | Parsing | Subset |
| 6 | POS | Subset | Acoustics |
| 7 | HMM | Parser | Gaussian |
| 8 | Iteration | Lexical | Fourier |
| 9 | Spectral | Machine translation | Acoustic |
| 10 | Metric | Annotation | Linear |
Figure 40Cumulative presence of the 10 most important terms over time (% of all papers).
Global ranking of the authors overall and separately for speech and NLP up to 2020.
|
|
|
|
|
|---|---|---|---|
| 1 | Lawrence R. Rabiner | Ralph Grishman | Lawrence R. Rabiner |
| 2 | Hermann Ney | Jun'Ichi Tsujii | Shrikanth S. Narayanan |
| 3 | Shrikanth S. Narayanan | Kathleen R. Mckeown | John H. L. Hansen |
| 4 | John H. L. Hansen | Aravind K. Joshi | Hermann Ney |
| 5 | Chin Hui P. Lee | Christopher D. Manning | Chin Hui P. Lee |
| 6 | Haizhou Li | Mark A. Johnson | Haizhou Li |
| 7 | Mark J. F. Gales | Noah A. Smith | Mark J. F. Gales |
| 8 | Mari Ostendorf | Ralph M. Weischedel | Li Deng |
| 9 | Li Deng | Eduard H. Hovy | Hervé Bourlard |
| 10 | Alex Waibel | Timothy Baldwin | Frank K. Soong |
Global ranking of the sources overall and separately for Speech and NLP up to 2020.
|
|
|
|
|
|---|---|---|---|
| 1 | isca | acl | isca |
| 2 | taslp | coling | taslp |
| 3 | icassps | lrec | icassps |
| 4 | acl | emnlp | lrec |
| 5 | coling | cath | csal |
| 6 | lrec | cl | speechc |
| 7 | emnlp | hlt | mts |
| 8 | hlt | eacl | lre |
| 9 | cl | trec | ltc |
| 10 | csal | naacl | acmtslp |
Figure 41Authors' contributions to “HMM” in speech and NLP (% of topical papers).
Figure 42Sources' contributions to “HMM” in speech and NLP (% of topical papers).
Figure 43Percentages of authors (blue) and of papers (red) mentioning “DNN” in speech and NLP over the years (% of all papers).
Figure 44Cumulative authors' contributions to the study of DNN in speech and NLP (% of topical papers).
Figure 45Main contribution areas for Dong Yu (% of topical papers).
Figure 46Cumulative sources' contributions to “DNN” in speech processing and NLP (% of topical papers).
Figure 47Percentages of authors (blue) and of papers (red) mentioning “Embedding” in speech and NLP (% of all papers).
Figure 48Cumulative authors' contributions to the study of “Embedding” in speech and NLP (% of topical papers).
Figure 49Cumulative sources' contributions to the study of “Embedding” in speech and NLP (% of topical papers).
Figure 50Main contributions of the EMNLP conference series (% of topical papers).
Figure 51Evolution of the number of papers and of mentions of language resources in papers over the years.
Figure 52Evolution of the ratio between the number of mentions of Language Resources in papers and the number of papers over the years.
Presence of the LRE Map language resources in NLP4NLP+5 articles (2020 compared with 2015).
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | Wikipedia | NLPCorpus | 6,348 | 36695 | Ana Licuanan, Jinxi Xu, Ralph M. Weischedel | trec | 2003 | 2020 |
| 2 | 1 | WordNet | NLPLexicon | 5,803 | 37654 | Kenji Sakamoto, Kouichi Yamaguchi, Toshio Akabane, Yoshiji Fujimoto | isca | 1990 | 2020 |
| 3 | >10 | BLEU | NLPSpecification | 4,595 | 42311 | Ludovic Lebart | modulad | 2001 | 2020 |
| 4 | 2 | Timit | NLPCorpus | 3,982 | 15984 | Andrej Ljolje, Benjamin Chigier, David Goodine, David S. Pallett, Erik Urdang, Fileno Alleva, Francine R. Chen, George R. Doddington, Hong C. Leung, Hsiao Wuen Hon, James L. Hieronymus, James R. Glass, Jan Robin Rohlicek, Jeff Shrager, Jeffrey N. Marcus, John Dowding, John F. Pitrelli, John S. Garofolo, Joseph H. Polifroni, Judith R. Spitz, Julia B. Hirschberg, Kai Fu Lee, L. G. Miller, Mari Ostendorf, Mark Liberman, Meiyuh Hwang, Michael D. Riley, Michael S. Phillips, Robert Weide, Stephanie Seneff, Stephen E. Levinson, Vassilios V. Digalakis, Victor W. Zue | hlt, isca, taslp | 1989 | 2020 |
| 5 | 4 | Penn Treebank | NLPCorpus | 2,786 | 10,622 | Beatrice Santorini, David M. Magerman, Eric Brill, Mitchell P. Marcus | hlt | 1990 | 2020 |
| 6 | >10 | Word2Vec | NLPTool | 2,536 | 8,245 | Allan Hanbury, Amir Globerson, Angelina Ivanova, Baobao Chang, Bin Gao, Bing Qin, Bo Tang, Brigitte Grau, Bruno Martins, Bryan Rink, Carina Silberer, Carlos Guestrin, Carmen Banea, Chengqing Zong, Christopher D. Manning, Chuchu Huang, Claire Cardie, Cícero Nogueira Dos Santos, Cícero Nogueira Dos Santos, D. Song, Dakun Zhang, Daniel Zeman, Daniel P. Flickinger, Danqi Chen, David B. Bracewell, Daxiang Dong, Deniz Yuret, Di Chen, Dianhai Yu, Dimitri Kartsaklis, Dmitrijs Milajevs, Duyu Tang, Emanuela Boros, Enhong Chen, Fabin Shi, Fei Tian, Filip Ginter, Furu Wei, Georgiana Dinu, Germán Kruszewski, Guang Chen, Guoxin Cui, Haifeng Wang, Haiyang Wu, Hal Daumé Iii, Hanjun Dai, Heike Adel, Hinrich Schütze, Hu Junfeng, Hua Wu, Idan Szpektor, Ido Dagan, Ignacio Cano, Ion Androutsopoulos, Ivan Titov, Jacob Goldberger, Jan Hajic, Janyce M. Wiebe, Jason Weston, Jeffrey Pennington, Jenna Kanerva, Jiajun Zhang, Jiang Bian, Jiang Guo, Jianlin Feng, Jianwen Zhang, Johan Bos, Johannes Bjerva, John Pavlopoulos, Jordan Boyd Graber, João Filgueiras, João Palotti, Juhani Luotolahti, Jun Zhao, Jun Cheng Guo, Kai Hakala, Kang Liu, Karen Livescu, Kazuma Hashimoto, Keith Adams, Kevin Gimpel, Leonardo Claudino, Li Dong, Liheng Xu, Linda Anderson, Liumingjing Xiao, Maira Gatti, Makoto Miwa, Malvina Nissim, Maosong Sun, Marc Tomlinson, Marco Baroni, Marco Kuhlmann, Marek Rei, Mark Dredze, Matthew Purver, Mehrnoosh Sadrzadeh, Michael Mohler, Miguel B. Almeida, Mikhail Kozhevnikov, Ming Zhou, Mirella Lapata, Mo Yu, Mohit Bansal, Mohit Iyyer, Mu Li, Mário J. Silva, Nan Yang, Navid Rekabsaz, Nianwen Xue, Olivier Ferret, Omer Levy, Oren Melamud, P. Zhang, Peng Hsuan Li, Peter Enns, Philip Resnik, Pontus Stenetorp, Qinlong Wang, Rada F. Mihalcea, Regina Barzilay, Richard Socher, Rob Van Der Goot, Romaric Besançon, Rui Zhang, Sameer Singh, Sanda Maria Harabagiu, Shaoda He, Shizhu He, Shujie Liu, Silvio Amir, Stephan Oepen, Sumit Chopra, Suwisa Kaewphan, Tao Ge, Tao Li, Tatsuya Izuha, Ted Briscoe, Tie Yan Liu, Ting Liu, Travis R. Goodwin, Wanxiang Che, Wei He, Weiran Xu, Wen Ting Wang, Wenzhe Pei, Xiaobo Hao, Xiaoguang Hu, Xiaojun Zou, Xiaolei Liu, Xiaozhao Zhao, Xingxing Zhang, Xinxiong Chen, Xueke Xu, Xueqi Cheng, Yang Liu, Yi Zhang, Yoav Goldberg, Yonatan Belinkov, Yongqiang Chen, Yoon Chul Kim, Yoshimasa Tsuruoka, Yuanyuan Qi, Yuanzhe Zhang, Yuchen Zhang, Yue Liu, Yusuke Miyao, Yuta Tsuboi, Zhen Wang, Zheng Chen, Zhenjun Tang, Zhiqiang Toh, Zhiyuan Liu | acl, coling, conll, eacl, emnlp, lrec, sem, tacl, trec | 2014 | 2020 |
| 7 | 5 | Praat | NLPTool | 2,123 | 4,359 | Carlos Gussenhoven, Toni C. M. Rietveld | isca | 1997 | 2020 |
| 8 | >10 | MATLAB | NLPTool | 1,915 | 2,842 | Demosthenis Stavrinides, Michael D. Zoltowski | taslp | 1989 | 2020 |
| 9 | >10 | GloVe | NLPTool | 1,863 | 6,686 | Christopher D. Manning, Jeffrey Pennington, Richard Socher | emnlp | 2014 | 2020 |
| 10 | >10 | AnCora | NLPCorpus | 1,694 | 3,233 | Barbara J. Grosz, Jaime G. Carbonell, Mitchell P. Marcus, Ralph M. Weischedel, Raymond Perrault, Robert Wilensky, Wendy G. Lehnert | hlt | 1989 | 2020 |
Ranked top 10 mentioned LRE Map language resources per year (2000–2020).
|
|
|
|
|
|---|---|---|---|
| 2000 | 1,923 | 2,118 | Timit, WordNet, RST, HPSG, Penn Treebank, AnCora, EAGLES, ATIS, LFG, Pronunciation Dictionary |
| 2001 | 1,283 | 1,551 | WordNet, Timit, Penn Treebank, NOISEX, ATIS, SENSEVAL, HPSG, MATLAB, Maximum Likelihood Linear Regression, TDT |
| 2002 | 2,200 | 2,074 | WordNet, Timit, Penn Treebank, MATLAB, HPSG, British National Corpus, AnCora, Praat, EAGLES, BAF |
| 2003 | 2,085 | 1,991 | Timit, WordNet, Penn Treebank, BLEU, BAF, AQUAINT, Pronunciation Dictionary, British National Corpus, HPSG, TAG |
| 2004 | 3,633 | 2,695 | WordNet, Timit, Penn Treebank, AnCora, Praat, BLEU, British National Corpus, FrameNet, AQUAINT, EuroWordNet |
| 2005 | 3,453 | 2,416 | WordNet, Timit, BLEU, Penn Treebank, Praat, AQUAINT, GIZA++, MATLAB, Pronunciation Dictionary, ICSI |
| 2006 | 5,681 | 3,101 | WordNet, Timit, BLEU, Penn Treebank, AnCora, Praat, PropBank, AQUAINT, FrameNet, MATLAB |
| 2007 | 4,910 | 2,663 | WordNet, BLEU, Timit, Penn Treebank, GIZA++, Praat, MATLAB, SRILM, GALE, Wikipedia |
| 2008 | 6,582 | 3,208 | WordNet, BLEU, Wikipedia, Timit, Penn Treebank, Praat, AnCora, PropBank, GALE, FrameNet |
| 2009 | 6,067 | 2,919 | WordNet, BLEU, Wikipedia, Timit, Penn Treebank, Praat, SRILM, GALE, Europarl, GIZA++ |
| 2010 | 8,782 | 3,547 | WordNet, Wikipedia, BLEU, Penn Treebank, Timit, AnCora, GIZA++, MATLAB, Europarl, FrameNet |
| 2011 | 6,105 | 2,864 | Wikipedia, WordNet, BLEU, Timit, Penn Treebank, GIZA++, MATLAB, SRILM, Praat, Weka |
| 2012 | 10,097 | 3,663 | Wikipedia, WordNet, BLEU, Timit, Penn Treebank, Praat, Europarl, AnCora, GIZA++, MATLAB |
| 2013 | 8,874 | 3,342 | Wikipedia, WordNet, BLEU, Timit, Penn Treebank, SRILM, Weka, GIZA++, MATLAB, Praat |
| 2014 | 10,793 | 3,663 | Wikipedia, WordNet, BLEU, Timit, Penn Treebank, Praat, AnCora, MATLAB, Weka, SRILM |
| 2015 | 9,932 | 3,568 | Wikipedia, WordNet, Word2Vec, BLEU, Timit, SemEval, MATLAB, Penn Treebank, Praat, Weka |
| 2016 | 11,303 | 3,814 | Wikipedia, Word2Vec, WordNet, BLEU, Timit, Praat, Penn Treebank, MATLAB, AnCora, Europarl |
| 2017 | 7,915 | 3,042 | Wikipedia, Word2Vec, BLEU, WordNet, Timit, GloVe, Praat, MATLAB, Penn Treebank, Keras |
| 2018 | 13,295 | 4,482 | Wikipedia, Word2Vec, BLEU, GloVe, WordNet, Seq2seq, Timit, Penn Treebank, ROUGE, CoreNLP |
| 2019 | 13,461 | 5,003 | Wikipedia, BLEU, GloVe, Word2Vec, Seq2seq, WordNet, ROUGE, Timit, Penn Treebank, SQuAD |
| 2020 | 15,652 | 5,426 | Wikipedia, BLEU, GloVe, Word2Vec, Seq2seq, RoBERTa, WordNet, ROUGE, LibriSpeech, SQuAD |
Language resources impact factor (data, specifications, and tools).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
| |||
| Wikipedia | 6,348 | BLEU | 4,595 | Word2Vec | 2,536 |
| WordNet | 5,803 | ROUGE | 1,335 | Praat | 2,123 |
| Timit | 3,982 | MATLAB | 1,915 | ||
| Penn Treebank | 2,786 | GloVe | 1,863 | ||
| AnCora | 1,694 | SRILM | 1,375 | ||
| Europarl | 1,405 | GIZA++ | 1,314 | ||
| SemEval | 1,257 | Weka | 1,220 | ||
| FrameNet | 1,202 | Seq2seq | 1,162 | ||
| CoNLL | 1,091 |
Figure 53Percentage of papers reusing a part of other papers over the years.
Figure 54Percentage of papers being reused by other papers over the years.
Definition of terms.
|
|
| |
|---|---|---|
| At least one author in both papers | Self-Reuse | Self-Plagiarism |
| No author in common | Reuse | Plagiarism |