Literature DB >> 35434524

Exploring machine learning: a scientometrics approach using bibliometrix and VOSviewer.

David Opeoluwa Oyewola1, Emmanuel Gbenga Dada2.   

Abstract

Machine Learning has found application in solving complex problems in different fields of human endeavors such as intelligent gaming, automated transportation, cyborg technology, environmental protection, enhanced health care, innovation in banking and home security, and smart homes. This research is motivated by the need to explore the global structure of machine learning to ascertain the level of bibliographic coupling, collaboration among research institutions, co-authorship network of countries, and sources coupling in publications on machine learning techniques. The Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) was applied to clustering prediction of authors dominance ranking in this paper. Publications related to machine learning were retrieved and extracted from the Dimensions database with no language restrictions. Bibliometrix was employed in computation and visualization to extract bibliographic information and perform a descriptive analysis. VOSviewer (version 1.6.16) tool was used to construct and visualize structure map of source coupling networks of researchers and co-authorship. About 10,814 research papers on machine learning published from 2010 to 2020 were retrieved for the research. Experimental results showed that the highest degree of betweenness centrality was obtained from cluster 3 with 153.86 from the University of California and Harvard University with 24.70. In cluster 1, the national university of Singapore has the highest degree betweenness of 91.72. Also, in cluster 5, the University of Cambridge (52.24) and imperial college London (4.52) having the highest betweenness centrality manifesting that he could control the collaborative relationship and that they possessed and controlled a large number of research resources. Findings revealed that this work has the potential to provide valuable guidance for new perspectives and future research work in the rapidly developing field of machine learning.
© The Author(s) 2022.

Entities:  

Keywords:  Bibliometrix; Coupling; Machine learning; Scientometrics; VOSviewer

Year:  2022        PMID: 35434524      PMCID: PMC8996204          DOI: 10.1007/s42452-022-05027-7

Source DB:  PubMed          Journal:  SN Appl Sci        ISSN: 2523-3963


Introduction

Machine learning is a branch of science that studies how systems might be taught to learn on their own and continue to get better with time. In this respect, learning is related to the ability to identify sophisticated patterns and make informed judgments using the data [1]. The sub-discipline of machine learning studies various human learning processes as well as the scientific examination of various learning algorithms and methodologies for a variety of application areas [2]. Machine learning research has prompted researchers and businesses to predict massive mortality incidents [3], uncontaminated water management [4], client segmentation in commercial banking [5], text categorization [6], and crop productivity, like cocoa [7]. Machine learning has also been applied in the field of Optimization [8], intrusion detection [9], email spam filtering [10], image processing [11], crude oil price prediction [12], and others. The problem is that describing the set of all potential decisions given all input combinations is too complicated. To address this challenge, the discipline of machine learning creates algorithms that use strong statistical and computational approaches to uncover knowledge from particular data. In this area, supervised and unsupervised learning approaches are used to solve problems such as classification, prediction, regression, clustering, and association. Machine learning has become a cornerstone of digital technologies and, as a result, a vital aspect of our lives in recent years [13]. The Scientometrics of machine learning is explored in this research, a topic that has increased in prominence in recent years. This paper contributes to knowledge by performing a scientometrics investigation of machine learning publications in several fields of study. A summary of recent progress in ML models development, global structural network, and their application to different fields was presented. Scientometric exploration of several ML types of research that have made meaningful impact among the research community was conducted. Moreover, this paper also investigates the global structure of ML, and also apply Bibliometrix and VOSviewer simulation tools to generate and visualize structure map of 10,814 machine learning research papers published between 2010 and 2020. The Hierarchical Density-Based Spatial Clustering Applications with Noise (HDBSCAN) which is unsupervised learning was used to cluster prediction of authors dominance ranking and its effectiveness was evaluated in this paper. Scientometrics is a branch of statistics that studies science from a numerical standpoint. The quantification of effect, interpretation of scientific citations, and the creation of indices for use in planning and management contexts are among its main research interests. One of the most extensively utilized scientometric methodologies is citation analysis. It analyzes the frequency, structures, and trajectories of citations in publications using citations in scholarly publications to build relationships to other works or several other scholars. Using citation analysis, scientometrics measures have been developed to evaluate and compare different researchers' research activities based on their productivity. Counting how often journal articles are cited is the most basic of these metrics. They are founded on the notion that notable scholars and publications will be cited more often than others. They are an approach that can be used to objectively describe a researcher's scientific output as a series of numerical data. Scientometrics measures are now routinely used as a key mechanism by numerous funding organizations and promotion panels to examine practically every scientific assessment decision [14]. As a result, scientometrics is becoming a progressively popular subject in the scientific world. Eventually, scientometrics is concerned with not only evaluating the output of research but also with reviewing researcher strategies, socio-organizational frameworks, research and innovation service delivery, the role of science and technology in economic development, policy decisions on research, and technology, and so forth. Researchers all across the globe keep publishing a significant number of scientific publications as knowledge improves. Presently, the volume of data that can be not only saved but also analyzed is growing exponentially. Because of the volume, humans are unable to analyze the data using traditional statistical procedures by hand. Machine learning gives the capabilities for appropriately managing and dealing with massive amounts of data in this context. It also makes it easier to use numerical methods models to make predictions based on experience. It is a significant issue because the work of prediction is regarded as the basis of science. In this paper, a scientometric analysis of machine learning literature was done. ML algorithms have proven their efficacy in handling large data beyond every reasonable doubt. It is therefore very important to demystify the scientometric of the global structure of machine learning publications. The major contributions of this work include: A survey of recent advances in ML models, global structural network, and their application to classifying, standardizing, and grouping of related publications was presented Application of scientometric to explore different ML research topics that have attracted the research community Investigate the global structure of machine learning to determine bibliographic coupling, collaboration among research institutions, co-authorship network of countries, and sources coupling in machine learning; and Apply Bibliometrix and VOSviewer (version 1.6.16) software tool to create and visualize structure map of source coupling networks of journals, scholars, or different publications using citation, bibliographic coupling, co-citation, or co-authorship relations of 10,814 machine learning research papers published between 2010 and 2020. Apply Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to clustering prediction of authors dominance ranking. The rest of this paper is organized as follows: related works are done in Sect. 2. The methodology employed for this work as well as performance measurements is discussed in Sect. 3. The results and the discussion are presented in Sect. 4, and we conclude in Sect. 5.

Related works

Some research work has used ML techniques to carry out scientometric analysis of related publications where ML was applied to solve some problems. Rincon-Patino, Ramirez-Gonzalez, and Corrales [15] used the SciMAT tool to extracted data of machine learning publications from the Scopus database. Their analysis illustrates the tactical maps of progression and a group of topical networks. The findings give deep insight into machine learning's wide trends. The findings demonstrate that SciMAT is a helpful tool for conducting a scientific mapping study, and they support the notion that ML has a wide range of applications and will remain a fascinating subject of research in years to come. The drawback of their work is that it only covers the period 2007 to 2017. This means that recent publications on the application of ML to various fields were not analyzed. Recently, Klein et al. [16] integrated bibliometric with ML techniques for monoclonal antibody data curation and model development for uncommon illness medication identification. Their approach was used to find novel chemicals that could be used as medication candidates using data gathered from the literature to develop a Bayesian model. The proposed technique was used to evaluate sets of compounds that offer a range of chemically varied structures, and rate these molecules for in vitro testing. Recently, researchers have focused on the scientometric and bibliometric of the Coronavirus that has ravaged the world. For example, Aristovnik, Raveslj, and Umek [17] conducted a bibliometric examination of COVID-19 publications in the scientific and non-scientific research fields. The study made use of the Scopus database, as well as all relevant and current information on COVID-19 literature, which reached 16,866 in the first six months of 2020. The disadvantage of this research is that several papers on COVID-19 after June 2020 were not examined. Another group of researchers, Haghani et al. [18] carried out a scientometric analysis and exploratory investigation of COVID-19-related papers. The analysis of various recently published COVID-19 literature was not included in this study, which is a limitation. Doanvo et al. [19] used machine learning approaches to analyze the true content of coronavirus publication summaries to find research intersections between COVID-19 and other coronavirus illnesses, as well as research topics that have piqued interest and that require further examination. The downside of this study is that it did not examine various literature on COVID-19 after June 2020. Furthermore, Dong et al. [20] used topic modeling to understand research flashpoints surrounding COVID-19 and illnesses induced by coronavirus variations. The downside of the study is that it did not examine numerous COVID-19 papers after April 2020. Also, Le et al. [21] used COVID-19 and CORD-19 publishing records to project COVID-19 research activities from the moment the fatal virus was proclaimed a pandemic until May 2020. As a result, research on COVID-19 that was completed after May 2020 was not included in the study. Another work was done by Mao et al. [22] where the authors conducted a global bibliometric and prospective study on the importance and progress of coronavirus research. These authors looked at coronavirus-related literature from 2003 through the second month of 2020. Moreover, Abd-Alrazaq et al. [23] conducted a bibliometric analysis of COVID-19 papers using machine learning. The scientists found 196,630 literature in the CORD-19 database, however, only 28,904 were used in their analysis. The authors, on the other hand, solely utilized ML to divide subjects into topical clusters. The study has one drawback: it only includes COVID-19 publications for a period of seven months (January to July 2020). Several important pieces of the literature were not examined after this period. Furthermore, the study's only result was the percentage of topic and cluster dominance. There is no metric for evaluating the accuracy of the proposed system's machine learning models. In another related work, Colavizza et al. [24] did a scientometric summary of the CORD-19 database. From a scientometric standpoint, the authors looked into the description of publications included in the CORD-19 database. The limitation of the work is that the articles examined are those that are only valid until May 2020. As a result, many COVID-19 studies that were later published were not analyzed. Abualigah et al. [48] developed the Arithmetic Optimization Algorithm (AOA). It is a novel meta-heuristic technique that takes advantage of the distribution behavior of the major arithmetic operators in mathematics. AOA is scientifically developed and deployed to optimize processes across a wide range of search spaces. To demonstrate AOA's universality, its performance is tested on twenty-nine benchmark functions and various real-world engineering design issues. Different situations were used to assess the proposed AOA's performance, convergence tendencies, and computing complexities. Simulation results showed that AOA performed better than the other eleven popular optimization algorithms compared in the paper. Experimental results indicated that AOA gives highly promising outcomes in handling hard optimization issues. In terms of solution quality and computational performance, AOA outperforms other famous optimization techniques for the majority of the problems studied. Furthermore, AOAs’ results demonstrated their supremacy in evading being stuck in the local optima. Abualigah et al. [49] present a novel multilevel thresholding method centered on the Evolutionary Arithmetic Optimization Algorithm (AOA). The algorithm was instigated by the arithmetic operators used in science. The proposed strategy, DAOA, uses the Differential Evolution method to improve AOA local research. Employing Kapur's measure between-group variance functions, the presented approach is applied to the multilevel thresholding problem. The proposed DAOA is used to analyze images, which are comprised of eight standardized test images from two different classes: nature and CT COVID-19 images. The effectiveness of the developed DAOA method was evaluated against existing multilevel thresholding techniques. The findings indicated that the DAOA process is better than other similar methods and generates enhanced results. In summary, the research gaps identified from this literature showed that machine learning has been applied to solve problems in different fields of human endeavor. Moreover, the bulk of the COVID-19-related articles examined in the studies have narrow dates, around three months following the commencement of the COVID-19 pandemic. As a result, some subsequent papers were not examined. Furthermore, rather than focusing on COVID-19, numerous investigations looked into the literature related to a variety of coronaviruses. As a result, the findings of COVID-19-related research were integrated with those connected with other coronavirus variations. Furthermore, a small number of COVID-19-related papers were included in numerous studies. Furthermore, many research did not look into the subject that previous studies had looked into, instead of focusing on the metadata of those studies (such as countries, author name, author affiliation, total citations, bibliometric items, source journals, and others). Finally, rather than employing machine learning approaches, the classification of subjects across different studies was done manually. This paper will conduct a wide scientometric analysis of existing machine learning publications that have been applied to different fields to adequately address the identified gaps in the literature.

Methods

Data collection

Dimensions is a comprehensive worldwide academic database that includes more than 1.4 billion citations, data sets, patents, and policy papers from several millions of research publications [25]. The Dimensions database includes several features which may be utilized for bibliometric study, including the title, author, institution, country, year of publication, grants, clinical trials, and keywords [26, 27]. As a dependable data source for scientific analysis with applications in mathematical research, the Dimensions database has received greater attention recently [28-31]. We carried out a search in Dimensions database for articles using machine learning from 2010 to 2020. All documents were incorporated and no constraint of language has been established. By carefully examining the obtained papers, we confirmed the reliability of our search approach. The information retrieved from the Dimensions database are Publication ID, DOI, Title, Abstract, Source title, PubYear, Volume, Issue, Pagination, Authors, Authors affiliations, Dimensions URL, Times cited, and Cited references. All information was collected and stored in CSV format from the retrieved Dimensions Database.

Visualization and scientometrics analysis

We focused on using scientometric analysis to detect annual scientific production, co-citation network of the authors, collaboration network, documents coupling, research frontiers, and other scientometric information in machine learning. The scientometric analysis comprises the construction and graphical display of bibliometric maps [32]. In this study, we employed bibliometric analysis tools on the Dimensions data. Bibliometrix [33] has been used for bibliographical information extraction, analysis, and visualization such as authors co-citation network, institutions collaboration networks. Massimo Aria and Cuccurullo developed Bibliometrix. This is an open-source research tool for scientific and bibliometric quantitative research, which includes all major bibliometry testing methodologies [34]. VOSviewer (version 1.6.16; Leiden University) [35] has been used for collecting bibliographic data on collaborative networks, documents, and sources of researchers, authors and countries. VOSviewer is a software tool that creates maps using network data to build networks of scientific articles, scientific journals, scientists, research organizations, countries, and keywords. VOSviewer creates network-based maps, visualizes and explores maps. VOSviewer supports three map visualizations: the visualization of the network, the overlay visualization, and the visualization of density [36].

Hierarchical density-based spatial clustering of applications with noise (HDBSCAN)

Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) is unsupervised learning. HDBSCAN is a hierarchical clustering algorithm that improves on Density-Based Spatial Clustering of Applications with Noise (DBSCAN) by using a strategy to extract a flat clustering based on cluster stability [37]. HDBSCAN mathematical representation is as follows: The optimization problem for the sum of cluster stabilities is given as:where is the normal distance, is the core distance, is the stability, is the density value, is the cluster, is the Boolean indicator, is the leaf cluster, is the set of clusters on the paths from leaves to the excluded root.

Clustering predictions of authors dominance ranking

This study proposed clustering authors dominance ranking by extracting the author’s name, dominance factor, number of authored articles, number of single-authored articles, number of multi-authored articles, number of first-authored articles, author ranking by number of articles, and author ranking by dominance factor from the dominance function equation. A total of 5421 of the author dominance ranking during 2010–2021 was extracted. Cluster analysis is a strong data mining method for identifying distinct groups of authors and other sorts of behaviors that are not identified by dominance ranking. It aids in the discovery of groups in unlabeled data, with components belonging to the same group sharing comparable dataset feature values. While clustering has a wide range of applications, we will be focusing on clustering for exploratory data analysis. Exploratory data analysis refers to the act of looking for intriguing patterns in a data collection, such as author dominance ranking, to develop new hypotheses or research questions regarding the data set. In this section, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) will be used to forecast clustering of author dominance rankings. HDBSCAN is a density-based clustering method that builds a cluster hierarchy tree and then extracts flat clusters from it using a specified stability metric. HDBSCAN builds a hierarchy for all potential epsilon values concerning minimum cluster size, rather than selecting clusters based on a global epsilon threshold.

Results

The data set was extracted from the Dimensions bibliographic database. It includes all publications of the document types such as article, letter, proceedings paper, and so on, published between 2010 and 2020. The number of documents in the data set is 10,814 while the number of references is 161394. The single-authored documents have 2926 while the multi-authored documents are 20,788. This shows that authors prefer multi-authored documents in machine learning to be published as single-authored. The statistics for the data set are summarized in Table 1. Figure 1 displays the average article citations per year of both single and multi-authored. The year 2010 has the most cited document followed by the year 2016. 2020 has the least cited documents. The most cited Source is Lecture Notes in Computer Science with 3999 articles followed by Nature journal with 1879 articles. The least was from Neuroimage and Journal of NeuroScience with 1020 and 809, respectively, as shown in Fig. 2.
Table 1

Main Information about the data

DescriptionResults
TimeSpan2010–2020
Sources (journal, books, etc.)4462
Documents10,814
References161,394
Authors23,714
Authors of single-authored documents2926
Authors of multi-authored documents20,788
Fig. 1

Average article citations per year

Fig. 2

Most cited sources

Main Information about the data Average article citations per year Most cited sources A measure of the frequency with which the average article is cited for a certain year in a journal can be defined as a source impact factor. It is used to assess a journal's importance or rank by counting the number of times articles are cited. In this study, we use three different measures that are frequently used to measure the impact factors of the journal such as h_index, g_index, and m_index. PLOS ONE ranked high with an h_index of 28 while IEEE TRANSACTIONS ON IMAGE PROCESSING and SENSORS ranked high with 33 g_index if we considered using g_index (see Table 2). However, PLOS ONE ranked first using m_index with a 2.3 impact factor. This shows that articles published in PLOS ONE have been cited more than other journals. Figure 3 presents the most relevant top 10 authors in machine learning research. Jefferson T. (68 publications) ranked first among all authors, followed by Wang J (64 publications), Li J and Zhang Y (59 publications), and Wang Y (58 publications). Figure 4 is the scientific production of machine learning from several countries, in terms of the paper published. The geographic distribution of papers based on all authors' affiliations is concentrated in Asia countries with China (1582 publications), ranked first among all the countries, followed by European countries (UK (649 publications), Germany (495 publications)).
Table 2

Source impact factor

Sourceh_indexg_indexm_indexTCPY
Lecture notes in computer science18272.2517692014
Advances in intelligent systems and computing81113192014
PLOS one28152.331162010
Proceedings of Spie670.51362010
IEEE access11251.26992013
Communications in computer and information science550.71922015
International journal of engineering and advanced technology110.362019
Sensors15331.2511302010
Journal of physics conference series4100.361202011
BIORXIV81011502014
BMC bioinformatics14271.177522010
Studies in computational intelligence11181.3754002014
Contemporary sociology a journal of reviews220.2102012
Neural computing and applications14321.6710952010
International statistical review360.253934
Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering581952017
IEEE transactions on image processing17331.4225572010
Multimedia tools and applications10140.832672010
Fig. 3

Most relevant authors in machine learning

Fig. 4

Country scientific production

Source impact factor Most relevant authors in machine learning Country scientific production Table 3 displays the ten most globally cited documents. It contains four columns: Paper, Doi, Total Citation (TC), Normalized Total Citation (NTC), and Country. The author Oostenveld R in computational intelligence and neuroscience journal ranked first with 4894 total citations followed by Babanko B of IEEE transactions on pattern analysis and machine intelligence journal with 1553 total citations. Four out of ten of the global cited documents are concentrated in IEEE journal while seven out of ten of the global cited country is concentrated in the United States of American. Figure 5 is the frequency word of the abstract of the machine learning shown on a TreeMap. DATA were the most frequently used word in the abstract with 5911 (7%) occurrence, while MODEL was the most frequent word with 3571 (4%). This shows the importance of data and modeling in machine learning.
Table 3

Most global cited documents

PaperDOITCNTCCountry
Oostenveld, 2010, computational intelligence and neuroscience [38]10.1155/2011/1568694894150.64Netherlands
Babenko, 2010, IEEE transactions on pattern analysis and machine intelligence [39]10.1109/TPAMI.2010.226155347.80United States
Cai, 2010, IEEE transactions on pattern analysis and machine intelligence [40]10.1109/TPAMI.2010.231129439.83United States
Barnich, 2010, IEEE Transactions On Image Processing [41]10.1109/TIP.2010.2101613120136.97Belgium
Goferman, 2011, IEEE transactions on pattern analysis and machine intelligence [42]10.1109/TPAMI.2011.272115060.43Israel
Graveley, 2010, nature [43]10.1038/NATURE09715110433.98United States
Reich, 2010, nature [44]10.1038/NATURE09710110333.95United States
Roy, 2010, science [45]10.1126/SCIENCE.119837493228.69United States
Cao, 2010, journal of operations management [46]10.1016/J.JOM.2010.12.00883725.76United States
Shulaev, 2010, nature genetics [47]10.1038/NG.74083425.67United States
Fig. 5

Abstracts TreeMap of machine learning

Most global cited documents Abstracts TreeMap of machine learning Bibliometrix can be used to acquire an overview of the most often mentioned publications, the citation relationships between these publications, the time order of publications, and the assignment of publications to clusters for a certain number of publications. In this study, we want to better comprehend the co-citation published by authors from various levels of clusters. Figure 6 shows a visualization of bibliometrix in three groups of 30 of the most often cited papers. Each publication of the author is shown in a circle and is denoted by the author's name. The color of a publication shows the cluster to which the author's publication belongs, with red, blue, and green corresponding to clusters 1, 2, and 3, respectively. We examined the co-citation patterns of 30 productive authors and created a co-citation map. We discovered that few authors tended to collaborate with a large group, resulting in three primary author clusters, each with one or two main authors. According to the social network analysis, it proved that the research co-citation network in machine learning is very strong. The component analysis found that three research groups can be regarded as the backbone in this field. Therefore, researchers in machine learning should strengthen their co-citation network to improve the development and academic level of this field. Each node of the figure represents an author, and the connections among the nodes represent the co-citation relationships among authors. The weight of a link indicates the number of publications co-authored by two scholars. In this author’s co-citation network, the highest betweenness of wang j, wang x, and yang x was within the range of 4–6.4, indicating that they played a pivotal role in the co-citation network in cluster 2(blue). In cluster 1 (red), zhang x, li x, and liu y obtained the highest betweenness centrality manifesting that they could control co-citation relationship and that he possessed and controlled a large number of research resources. However, in cluster 3(green), li y, yang j, and wang z obtained the highest betweenness within the range of 1–7. In a co-citation network, the closer the distance between one author and the other, the easier it is to exchange information and build a cooperative research relationship (Appendix for Table 7).
Fig. 6

Authors co-citation network in machine learning

Table 7

Authors co-citation network

NodeClusterBetweennessClosenessPage rank
Na139.8620.0344830.100977
Zhang y16.7780260.0344830.052232
Breiman l10.4496030.0250.024264
Liu y12.8829570.031250.03954
Li j11.733190.0303030.035369
Li x12.9303810.0333330.038452
Zhang j12.3471370.031250.035592
Li h10.6429860.0277780.028383
Wang l10.6411680.0270270.026202
Zhang x11.3597540.0303030.028648
Chang c10.2843450.023810.019648
Lecun y10.5255480.0256410.022815
Wang j24.9459390.0333330.046948
Wang y24.6805090.0322580.050481
Wang x24.439750.0322580.04237
Chen y21.7916220.031250.033844
Zhang z21.016730.0294120.029029
Wang h21.381460.0277780.031242
Yang x20.152040.023810.020418
Liu x21.3588010.0285710.028303
Chen j20.2668280.0250.02309
Zhang h20.3231040.0256410.023837
Liu h20.7776020.0285710.026198
Li y37.6900780.0344830.05035
Wang z31.454170.0277780.027624
Zhang l31.331070.0294120.030797
Kim j30.2655660.0217390.015888
Yang y30.6116210.0263160.022786
Lee j30.4627320.0222220.017127
Yang j31.6132880.0285710.027548
Authors co-citation network in machine learning Figure 7 is the structure map of the institution collaboration network of machine learning. There are 11 clusters of institution collaborative networks structure but only three have the highest number of institutions and these are clusters 1, 2, and 3. Each node of the figure represents an institution, and the connections among the nodes represent the collaborative relationships among institutions. The weight of a link indicates the number of publications co-authored by two scholars in different institutions. In this collaboration network, the highest degree of betweenness centrality was obtained from cluster 3 with 153.86 from the University of California and Harvard University with 24.70. In cluster 1, the national university of Singapore has the highest degree betweenness of 91.72. Also, in cluster 5, the University of Cambridge (52.24) and imperial college London (4.52) having the highest betweenness centrality manifesting that he could control collaborative relationships and that they possessed and controlled a large number of research resources. Furthermore, it was observed that there is no strong level of cooperation among other institutions (Appendix Table 8).
Fig. 7

Structure map of institutions collaboration network of machine learning

Table 8

Institutions collaboration networks

NodeClusterBetweennessClosenessPageRank
National university of Singapore191.72380.00690.04302
Nanyang technological university100.005880.01461
Shanghai Jiao tong university200.006290.01545
Zhejiang university250.34830.007250.04838
Beihang university200.006130.01095
University of California3153.8650.007580.15613
Harvard university324.70610.007190.09291
University of Pennsylvania300.006370.00968
University of Michigan300.006450.01363
Johns Hopkins university300.006410.02469
University of southern California300.006370.00968
Massachusetts institute of technology300.006490.02863
University of Washington300.006710.02787
University of Toronto312.12960.006990.04273
STANFORD university31.500440.006850.03614
Pennsylvania state university300.006370.00968
Cornell university300.006490.02002
Tsinghua university434.30560.007140.04059
University of Florida414.87760.007040.04504
Peking university40.185710.006290.01467
Imperial college London54.515940.00690.03503
University college London51.4040.006620.04638
University of oxford51.196150.00680.0358
University of Cambridge552.24180.00730.06542
Carnegie Mellon university600.001150.00546
University of Chinese academy of sciences700.004670.03467
University of technology Sydney8480.006020.01886
Institute of automation9250.005290.04295
Northeastern university1000.001150.00546
Rwth Aachen university1100.001150.00546
Structure map of institutions collaboration network of machine learning Figure 8 provides a VOSviewer (version 1.6.16) of visualization of the 50 most co-authorship countries in five clusters. The minimum number of documents of a country is set as 5 while the minimum number of citations of a country is set as 2. As shown in Fig. 8, the co-authorship analysis of countries reflects the collaboration relationship between countries in this field, as well as the degree of collaboration. The larger nodes represent the most productive countries in the field of machine learning; the thickness and length of links between nodes represent the cooperative relationship between countries. Figure 8 shows the 50 most productive countries in the field of machine learning from 5 collaboration clusters, which were distinguished by different colors. The countries with the highest total link strength were the USA with 1352 documents and the link strength of 601 followed by China with 413 total link strength and 1014 documents. The United Kingdom was in the third position with 470 documents and total link strengths of 388 (see Appendix Table 9). As shown in Fig. 9, we used VOSviewer to build a visualization structural network map of the top 50 sources (journals books, etc.) in 5 clusters. The minimum number of documents of a source is set at 5, while the minimum number of citations of a source is set at 2. The larger node represents the most productive source in the field of machine learning. Lecture notes in computer science have the highest total link length of 8618 and 467 documents followed by IEEE access with 94 documents and total link strength of 3500. The third position is PLOS ONE with total link strength of 1893 and 129 documents (see Appendix Table 10).
Fig. 8

Structure map of countries co-authorships in machine learning

Table 9

Co-authorship structural network in machine learning

CountryDocumentsTotal link strength
United States1352601
China1014413
United Kingdom470388
Germany290234
Australia215205
France171161
Canada195158
Spain171161
Italy216123
Netherlands108104
Singapore8788
Switzerland9188
India27265
South Korea14865
Sweden7361
Japan15456
Belgium6553
Denmark4953
Finland6253
Norway2952
Poland10352
Portugal5650
Taiwan11350
Malaysia8248
Iran7946
Austria5043
Ireland3143
Brazil10242
Greece4230
Saudi Arabia3130
Russia6928
Turkey7428
Egypt3727
Pakistan3727
New Zealand3025
Vietnam1525
Mexico3223
Israel4822
Hungary1418
Romania2516
Algeria815
South Africa2915
United Arab Emirate1515
Indonesia9014
Qatar714
Colombia1412
Ecuador2611
Czechia2010
Serbia1410
Chile149
Fig. 9

Structure map of sources coupling in machine learning

Table 10

Coupling structural network in machine learning

SourceDocumentsTotal link strength
Lecture notes in computer science4678618
IEEE access943500
PLoS one1293140
BMC bioinformatics421893
IEEE transactions on image processing331769
IEEE Transactions on pattern analysis and machine learning291735
Sensors561622
Advances in intelligent systems and computing1371621
Studies in computational intelligence411214
Proceedings of Spie1011189
Multimedia tools and application331078
Bmc genomics271032
Neural computing and application331078
Biorxiv44950
IEEE journal of biomedical and health informatics22934
IEEE transactions on neural networks and learning30863
IEEE transactions on medical imaging9729
Neural Networks20768
IEEE transactions on geoscience and remote sensing181682
IEEE transactions on multimedia14578
Neural computation8563
IEEE transaction on circuits and systems for video8557
Springer handbooks of computational statistics11551
Knowledge and information systems21548
Remote sensing11539
Journal of biomedical informatics11539
Neuroimage23539
Artificial intelligence in medicine12516
Computer methods and programs in biomedicine13513
IEEE transactions on signal processing22507
Computational and mathematical methods in medicine11498
IEEE transactions on signal processing21495
Communication in computer and information science63481
Scientific reports11448
Bioinformatics18429
Machine learning12412
Plos computational biology15409
Medical image analysis7404
International Journal of computer vision6402
Artificial intelligence review11392
International journal of machine learning and cybersecurity14379
IEEE transactions on visualization and computer graphics6375
Springerbriefs in computer science5375
Algorithms in computational molecular biology10367
Entropy12359
Bmc system biology14352
Computers in biology and medicine12343
Neuroscience5343
Soft computing13341
The scientific world journal18306
Structure map of countries co-authorships in machine learning Structure map of sources coupling in machine learning Table 4 provides the authorship patterns of publication in machine learning of this table where the majority of the articles were contributed from the Multi authors with 9994(60.17%). The second position of the articles was contributed by the first authors with 6423(38.67%). The third position of the articles was contributed by the single authors and five authors were contributed 191(1.15%). This shows that researchers in machine learning published more as Multi-authored than single-authored. Figure 10 is the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) of Authors Dominance Ranking Tree. HDBSCAN cluster authors dominance ranking into three and these are cluster 1, 2, and 3. In this clustering prediction of authors dominance ranking, the highest degree was obtained from cluster 3 with 8.18 followed by cluster 1 with 2.54 as shown in Table 5. Table 6 consists of the 10 most ranking authors from each cluster. In cluster 3, Bennett RJ, Joughin L, and Sachnev V were having a membership probability of 1 and total articles of 5 while Berlin I was having the highest dominance ranking with total articles of 14 but a membership probability of 0. This shows that the author Berlin I is having a stability problem while Bernett RJ, Joughim L, and Sachnev V are very stable with other researchers. In cluster 2, Dai Y, Dehzangi O, and Feng Y are more stable with other researchers than Gao H and Huang K with 0 membership probability. In cluster 1, most of the authors are having higher total articles but lower membership probability. This shows that authors with higher total articles are not stable.
Table 4

Authorship pattern of publication in machine learning

AuthorsArticleCumulative% Article% Cumulative
First6423642338.6738.67
Single19166141.1539.82
Multi999416,60860.17100.00
Fig. 10

Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) of Authors Dominance Ranking Tree

Table 5

Cluster scores of authors dominance ranking

Cluster 1Cluster 2Cluster 3
2.54221.61058.1829
Table 6

Authors dominance ranking stability

S/NoAuthorTotal articlesClusterMembership_prob
1Berlin I1430
2Gray A1330.08
3Bowen WG630.5806
4Gaggioli A630.5806
5Jarvis S630.58003
6Moser M630.5806
7Bennett RJ531
8Innis H530.94306
9Joughin L531
10Sachnev V531
11Lu Z620.13506
12Gao H520
13Huang K820.20189
14Dai Y421
15Dehzangi O421
16Feng Y421
17Han W421
18Hu H421
19Ma J421
20Nguyen N421
21Luo Y910.52198
22Chen S1510.66437
23Yang D810.67509
24He Y710.68969
25Kim W710.68969
26Ma H710.68969
27Yuan Y710.68969
28Wang T1310.62296
29Chen K1210.66437
30Zhao Z1210.66437
Authorship pattern of publication in machine learning Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) of Authors Dominance Ranking Tree Cluster scores of authors dominance ranking Authors dominance ranking stability

Conclusion

Through the aid of scientometric quantitative analysis and visualization network map of the data extracted from the Dimensions database, the current study reveals the average article cited per year, most cited sources, most relevant authors, countries scientific production, most global cited documents, authors co-citation, institutions collaboration, countries co-authorships in machine learning research. Application of Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to clustering prediction of authors dominance ranking was implemented. The result shows that most of the authors that are having higher total articles are having a lower membership probability. This shows that authors with higher total articles are not stable with other researchers. We anticipate that by completely describing the trends in machine learning research, our findings will provide useful insight into future research paths and perspectives in this rapidly evolving subject. Many avenues for significant future work exist. Future work should explore several unsupervised learning on author dominance ranking to obtain the prediction of the cluster. Although this study concentrated on the Dimensions database, other databases can be applied such as Scopus, Web of Science, Cochrane Library, and Pubmed.
  21 in total

1.  Context-aware saliency detection.

Authors:  Stas Goferman; Lihi Zelnik-Manor; Ayellet Tal
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2012-10       Impact factor: 6.226

2.  Software survey: VOSviewer, a computer program for bibliometric mapping.

Authors:  Nees Jan van Eck; Ludo Waltman
Journal:  Scientometrics       Date:  2009-12-31       Impact factor: 3.238

3.  Robust Object Tracking with Online Multiple Instance Learning.

Authors:  Boris Babenko; Ming-Hsuan Yang; Serge Belongie
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2010-12-23       Impact factor: 6.226

4.  Identification of functional elements and regulatory circuits by Drosophila modENCODE.

Authors:  Sushmita Roy; Jason Ernst; Peter V Kharchenko; Pouya Kheradpour; Nicolas Negre; Matthew L Eaton; Jane M Landolin; Christopher A Bristow; Lijia Ma; Michael F Lin; Stefan Washietl; Bradley I Arshinoff; Ferhat Ay; Patrick E Meyer; Nicolas Robine; Nicole L Washington; Luisa Di Stefano; Eugene Berezikov; Christopher D Brown; Rogerio Candeias; Joseph W Carlson; Adrian Carr; Irwin Jungreis; Daniel Marbach; Rachel Sealfon; Michael Y Tolstorukov; Sebastian Will; Artyom A Alekseyenko; Carlo Artieri; Benjamin W Booth; Angela N Brooks; Qi Dai; Carrie A Davis; Michael O Duff; Xin Feng; Andrey A Gorchakov; Tingting Gu; Jorja G Henikoff; Philipp Kapranov; Renhua Li; Heather K MacAlpine; John Malone; Aki Minoda; Jared Nordman; Katsutomo Okamura; Marc Perry; Sara K Powell; Nicole C Riddle; Akiko Sakai; Anastasia Samsonova; Jeremy E Sandler; Yuri B Schwartz; Noa Sher; Rebecca Spokony; David Sturgill; Marijke van Baren; Kenneth H Wan; Li Yang; Charles Yu; Elise Feingold; Peter Good; Mark Guyer; Rebecca Lowdon; Kami Ahmad; Justen Andrews; Bonnie Berger; Steven E Brenner; Michael R Brent; Lucy Cherbas; Sarah C R Elgin; Thomas R Gingeras; Robert Grossman; Roger A Hoskins; Thomas C Kaufman; William Kent; Mitzi I Kuroda; Terry Orr-Weaver; Norbert Perrimon; Vincenzo Pirrotta; James W Posakony; Bing Ren; Steven Russell; Peter Cherbas; Brenton R Graveley; Suzanna Lewis; Gos Micklem; Brian Oliver; Peter J Park; Susan E Celniker; Steven Henikoff; Gary H Karpen; Eric C Lai; David M MacAlpine; Lincoln D Stein; Kevin P White; Manolis Kellis
Journal:  Science       Date:  2010-12-22       Impact factor: 47.728

5.  Genetic history of an archaic hominin group from Denisova Cave in Siberia.

Authors:  David Reich; Richard E Green; Martin Kircher; Johannes Krause; Nick Patterson; Eric Y Durand; Bence Viola; Adrian W Briggs; Udo Stenzel; Philip L F Johnson; Tomislav Maricic; Jeffrey M Good; Tomas Marques-Bonet; Can Alkan; Qiaomei Fu; Swapan Mallick; Heng Li; Matthias Meyer; Evan E Eichler; Mark Stoneking; Michael Richards; Sahra Talamo; Michael V Shunkov; Anatoli P Derevianko; Jean-Jacques Hublin; Janet Kelso; Montgomery Slatkin; Svante Pääbo
Journal:  Nature       Date:  2010-12-23       Impact factor: 49.962

6.  Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: a multidisciplinary comparison of coverage via citations.

Authors:  Alberto Martín-Martín; Mike Thelwall; Enrique Orduna-Malea; Emilio Delgado López-Cózar
Journal:  Scientometrics       Date:  2020-09-21       Impact factor: 3.238

Review 7.  The status and trends of coronavirus research: A global bibliometric and visualized analysis.

Authors:  Xingjia Mao; Lu Guo; Panfeng Fu; Chuan Xiang
Journal:  Medicine (Baltimore)       Date:  2020-05-29       Impact factor: 1.889

8.  Graph Regularized Nonnegative Matrix Factorization for Data Representation.

Authors:  Deng Cai; Xiaofei He; Jiawei Han; Thomas S Huang
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2010-12-23       Impact factor: 6.226

9.  Constraint-Based Hierarchical Cluster Selection in Automotive Radar Data.

Authors:  Claudia Malzer; Marcus Baum
Journal:  Sensors (Basel)       Date:  2021-05-13       Impact factor: 3.576

10.  The scientific literature on Coronaviruses, COVID-19 and its associated safety-related research dimensions: A scientometric analysis and scoping review.

Authors:  Milad Haghani; Michiel C J Bliemer; Floris Goerlandt; Jie Li
Journal:  Saf Sci       Date:  2020-05-07       Impact factor: 6.392

View more
  2 in total

1.  Exosomes in the Field of Neuroscience: A Scientometric Study and Visualization Analysis.

Authors:  Junzi Long; Yasu Zhang; Xiaomin Liu; Mengyang Pan; Qian Gao
Journal:  Front Neurol       Date:  2022-05-17       Impact factor: 4.086

2.  Bibliometric Analysis of Publications on the Omicron Variant from 2020 to 2022 in the Scopus Database Using R and VOSviewer.

Authors:  Hasan Ejaz; Hafiz Muhammad Zeeshan; Fahad Ahmad; Syed Nasir Abbas Bukhari; Naeem Anwar; Awadh Alanazi; Ashina Sadiq; Kashaf Junaid; Muhammad Atif; Khalid Omer Abdalla Abosalif; Abid Iqbal; Manhal Ahmed Hamza; Sonia Younas
Journal:  Int J Environ Res Public Health       Date:  2022-09-29       Impact factor: 4.614

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.