Literature DB >> 30735548

Measuring national capability over big science's multidisciplinarity: A case study of nuclear fusion research.

Hyunuk Kim^1,2,3, Inho Hong⁴, Woo-Sung Jung^1,4,5.

Abstract

In the era of big science, countries allocate big research and development budgets to large scientific facilities that boost collaboration and research capability. A nuclear fusion device called the "tokamak" is a source of great interest for many countries because it ideally generates sustainable energy expected to solve the energy crisis in the future. Here, to explore the scientific effects of tokamaks, we map a country's research capability in nuclear fusion research with normalized revealed comparative advantage on five topical clusters-material, plasma, device, diagnostics, and simulation-detected through a dynamic topic model. Our approach captures not only the growth of China, India, and the Republic of Korea but also the decline of Canada, Japan, Sweden, and the Netherlands. Time points of their rise and fall are related to tokamak operation, highlighting the importance of large facilities in big science. The gravity model points out that two countries collaborate less in device, diagnostics, and plasma research if they have comparative advantages in different topics. This relation is a unique feature of nuclear fusion compared to other science fields. Our results can be used and extended when building national policies for big science.

Entities: CellLine Chemical Disease Gene

Mesh：

Year: 2019 PMID： 30735548 PMCID： PMC6368312 DOI： 10.1371/journal.pone.0211963

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Big science is characterized by its big budgets, manpower, and machines. It includes a number of multidisciplinary fields such as nuclear fusion, particle accelerators, and space science [1]. Most of them originated for military reasons in World War II and were mainly led by superpowers. In recent decades, as these fields become more demanding, countries actively collaborate to utilize the resources of others and build shared infrastructure [2-4]. In this sense, compared to little science, big science requires more international collaboration and resource accessibility [5]. A large facility is considered the core resource of big science. From construction to operation, it requires participation of various stakeholders under the leadership of national government, resulting in economic spillovers to society [6-8]. A large facility also stimulates scientific advancements by supporting research activities that are hard to conduct in a laboratory. It attracts researchers of diverse disciplines and enhances scientific collaborations. Despite its scientific importance, little attention has been paid to examining how large facilities raise national research capacities because of difficulties in unraveling the multidisciplinarity of big science [9-11]. Moreover, national research capacity is difficult to quantify as it is built on the complex interactions between private and public domains [12, 13]. Depending on science and technology policies, countries have different goals, such as training experts, publishing papers, or granting patents, that constitute the national research capacity [14, 15]. Among many aspects of the national research capacity, this study focuses on academic publishing to estimate the capacity quantitatively [16-22], which we term “research capability,” by implementing topic modeling and revealed comparative advantage on the bibliographic information of research papers. The dynamic topic model [23, 24] first detects subject fields from paper abstracts and distributes publication counts over the detected fields in real values. Normalized revealed comparative advantage (NRCA) [25] is applied to fractional publication counts for projecting a country’s research capability as well as its changes by facility construction. Based on NRCA, we measure how similar two countries’ research capabilities are and include the distance in a gravity model to show its impact on international collaboration. For a case study, we investigate nuclear fusion, in which the construction of large facilities and international collaborations are crucial. Nuclear fusion is a field that countries have interest in as it produces clean, affordable, and sustainable energy [26, 27]. The history of nuclear fusion consists of the footprints of major successes in tokamaks [28]. After the nuclear fusion reaction of hydrogen was identified as the source of solar energy in the 1920s [29], scientists began to study controlled thermonuclear fusion for sustainable energy production in the 1950s [30]. The tokamak is a device that magnetically confines high-temperature plasmas essential for steady thermonuclear reactions [31], and now it is the most dominant and actively studied device for nuclear fusion research [32]. Tokamaks are composed of strong magnets for confining plasmas, several wall-components in a vacuum vessel for protection, heating devices, and diagnostic devices, which require knowledge across diverse fields: plasma physics, numerical simulations, diagnostics, material science, and engineering [31]. The performance of tokamaks positively scales with size, thus tokamaks have become greater, better, and more expensive [33-36]. The large budgets for tokamaks have increased international collaborations since the 1990s, as seen in the cases of JET (Joint European Torus) [37] and ITER (International Thermonuclear Experimental Reactor) construction [34]. Our approach successfully captures various aspects of nuclear fusion from a bibliographic database over 40 years, 1976–2016. The dynamic topic model disentangles multidisciplinarity and classifies 41 topics grouped into five topical clusters: material, plasma, device, diagnostics, and simulation. Furthermore, the revealed comparative advantage identifies leading countries that participate in international projects or have their own tokamak. The rise and fall of these countries match well with tokamak operation. With the gravity model of scientific collaboration, we additionally address whether complementarity leads to collaboration in nuclear fusion research. The regression results show that countries collaborate less if they have research capability in different topics. It is a unique characteristic of nuclear fusion compared to other sciences in which complementarity enhances collaborations [38-42]. This paper provides quantitative evidence for establishing strategic policies that initiate and evaluate big science projects.

Data and methods

Bibliographic data

We analyzed 25,085 nuclear fusion research papers published during 1976-2016. They were collected from the Scopus database (document type: article) and contain the term “tokamak” in the title, abstract, or keyword fields. Papers without affiliation information were manually filled by checking their original documents. When an author had multiple affiliations, we considered the first one as her/his nationality. We used the fractional counting method to obtain the number of papers for each country. For example, if a paper was written by three American and two Korean researchers, 0.6 and 0.4 were assigned to both countries’ paper counts. The fractional counting method gives more weight to leading countries, so that would embrace their inherent academic leadership. Nevertheless, the fractional counting method gives less biased results than the full counting method that assigns an equal weight to all countries in a paper. The full counting method could overrepresent some countries (e.g. the United States) which participate in many international projects. Systemic comparisons of the two methods recommend the fractional counting method in co-authorship analysis [43, 44], especially for scientific fields conducting large-scale international experiments. For this reason, we chose the fractional counting method to estimate research capability as well as the degree of collaborations. Among 75 countries in our dataset, we focused on the top 14 countries that published more than 250 papers in our time scope. The distribution of paper counts was highly skewed. These 14 countries published more than 90% of the research articles. The top 14 countries were the United States, Japan, China, Germany, the United Kingdom, Russia, France, Italy, the Republic of Korea, Switzerland, India, Sweden, Canada, and the Netherlands. The basic statistics of these countries are listed in Table 1. A paper written by more than two authors in different countries is classified as a collaborative paper.

Table 1

Summary statistics of 14 leading countries in nuclear fusion research.

All values are real numbers as we count the number of papers by the fractional counting method. Ratio is the proportion of collaborative papers to total papers.

Country	Collaborative Papers	Total Papers	Ratio
United States	978.4	7646.4	0.13
Japan	411.7	3025.7	0.14
China	335.7	2777.7	0.12
Germany	738.1	2147.1	0.34
United Kingdom	522.5	1775.5	0.29
Russia	299.5	1392.5	0.22
France	403.6	1135.6	0.36
Italy	325.1	964.1	0.34
Republic of Korea	115.8	424.8	0.27
Switzerland	153.5	409.5	0.37
India	49.8	400.8	0.12
Sweden	135.0	326.0	0.41
Canada	73.6	292.6	0.25
Netherlands	102.4	276.4	0.37

Summary statistics of 14 leading countries in nuclear fusion research.

All values are real numbers as we count the number of papers by the fractional counting method. Ratio is the proportion of collaborative papers to total papers.

Topic modeling and clustering

The dynamic topic model (DTM) conceptualizes the knowledge in nuclear fusion research [23, 24]. The DTM specifies topics in a set of documents based on latent Dirichlet allocation (LDA) [45], and it also describes the temporal evolution of detected topics by updating consequent input hyperparameters α and β by each year. α affects the topic distribution of a document, and β indicates the word distribution in a topic. The DTM infers both parameters to reproduce the empirical word distribution under the assumption that a document is made by both processes in year t, choosing a topic for a document by α and sampling words in that topic by β. α and β are used as references to estimate α and β. In our DTM implementation, insignificant words were filtered out if their term frequency–inverse document frequency (tf-idf) values were less than 0.01. Then, we used the words that appeared more than 10 times in the whole document. As a result, our dictionary contained 7,851 unique words, and the documents contained 1,619,233 words in total. The number of topics K needed to be determined before running the DTM. Following the recent approach [46], we specified the number of topics K = 41 (see S1 Appendix and S1 Fig). Open source codes were written by the authors of the DTM paper and available at https://github.com/blei-lab/dtm. We manually labelled 41 topics from their word frequencies (see S1 Table). The DTM provides an article’s topic distribution based on the learned parameters. As we set the number of topics to 41, the topic distribution of an article was given as a vector of length 41. Topic distribution was allocated to countries in proportion to their contributions on each article. For instance, if an article was written by American authors only, the topic distribution of the article was fully given to the United States. For another article written by three American and two Korean researchers, 60% of the topic distribution would be added to the United States. In this way, a country’s research capability over 41 topics was estimated for each year from 1976–2016.

Fractional publication and collaboration counts by topics

The fractional counting method was used for calculating a country’s publication and collaboration counts (Fig 1). For year t when n papers are published, we have two matrices, the fractional publication counts by countries (A: n papers × 75 countries) and the topic distributions of papers (B: n papers × 41 topics). represents the fractional publication counts of 75 countries by 41 topics at year t. Based on the five topical clusters that we found (Fig 2), the fractional counts were summed into five columns to obtain the discriminant power for further analysis. We will explain these topical clusters in the result section. We hereafter call this summarized matrix as national research capability over 5 topical clusters at year t, R (75 countries × 5 topical clusters). Collaborations were also counted in fractions. We multiplied the country profile of a paper and its transpose to obtain the collaboration matrix. The matrix was distributed over five matrices in proportion to topical cluster weights.

Fig 1

Schematics of the fractional counting method for publication and collaboration counts.

Two matrices, the fractional publication counts by countries A and the topic distributions of papers B, were extracted from the document set of year t. (1) represents the fractional publication counts by topics at year t. For further analysis, based on the hierarchical tree of clusters in Fig 2, the fractional publications by 41 topics are grouped into five topical clusters: material, plasma, device, diagnostics, and simulation. R is the aggregated matrix and is transposed in the figure to match with the hierarchical tree of 41 clusters. (2) The country profile of a paper is transformed into a collaboration matrix W1, which was distributed over the five topical clusters by weights. For each year, by aggregating the collaboration matrices of all published papers, we had five fractional collaboration matrices.

Fig 2

Hierarchical tree of 41 topics detected from the dynamic topic model.

Topics were agglomerated by the ward.D method [50]. The distance between topics was measured by the Jensen-Shannon distance [51], a square root of the Jensen-Shannon divergence. Five topical clusters—material, plasma, device, diagnostics, and simulation—are revealed. The branches are colored by the corresponding topical clusters.

Schematics of the fractional counting method for publication and collaboration counts.

Hierarchical tree of 41 topics detected from the dynamic topic model.

Normalized revealed comparative advantage (NRCA)

Normalized revealed comparative advantage (NRCA) [25], one of revealed comparative advantage indices, represents how much an entity’s value exceeds expectations. When comparing longitudinal RCA values, NRCA outperforms the Balassa index (BRCA) [47], the most popular RCA index that defines comparative advantage as a ratio of observations to expectations. Let be country i’s research capability on topical cluster j at year t. , the NRCA of country i on topical cluster j at year t, is calculated as where is the sum of country i’s research capability across five topical clusters at year t (), R is the sum of all countries’ research capabilities on topical cluster j at year t (), and R is the sum of all countries’ research capabilities on five topical clusters at year t, denoted by . A positive value means that country i has a comparative advantage on topical cluster j at year t. Countries have comparative advantages on different topics as it is almost impossible to be competitive in all topics. We measured how similar two countries’ research capabilities are as follows. First, the NRCA of each country was transformed into the binary vector by changing positive NRCA values to 1 and negative values to 0 to identify the topics with significant comparative advantages. Second, the Jaccard distance between two countries’ binary NRCA vectors was calculated for determining their topical dissimilarity (Eq 2). We call this distance between country m and n on topical cluster j at year t the capability distance c. A high c represents that two countries are in complementary relation where their differences in research capability generate synergy by collaborations.

Gravity model of scientific collaboration

Scientific collaboration between country m and n in topical cluster j at year t, w, is related to the number of publications of the two (P and P) and their geographical distance (d). The gravity model explains their relationships in many scientific fields [48, 49]. P and P positively and d negatively affects w. We added the capability distance to the gravity model for checking whether complementarity increases collaboration. Our basic model is written as where d is the Haversine distance (km) between capitals. For two countries m and n, we counted w, P, and P in real values, and calculated c from the binary transformed NRCA vectors. A positive λ indicates that complementarity stimulates collaboration.

Results

Knowledge structure of nuclear fusion research

The DTM detected 41 topics in the dataset. Each topic had its word distribution indicating the extent of word assignments to the topic. We assumed that two topics were close if their word distributions were similar. The topic distance between topic k1 and k2 was obtained by the Jensen-Shannon distance [51], a square root of the Jensen-Shannon divergence. For simplicity, we used the word distribution at the last year, and . A knowledge structure of nuclear fusion research was drawn by agglomerating 41 topics with the ward.D method [50]. The hierarchical tree consists of five distinguishable topical clusters: material, plasma, device, diagnostics, and simulation (Fig 2). Each cluster is clearly characterized by its topics. We observe the details of each branch from the top of the tree. The “material” cluster is described by tokamak edge plasmas and components as plasmas interact with wall materials at the edge. The “plasma” cluster contains general plasma-related topics (i.e., plasma flow, magnetohydrodynamics, and discharge), major instabilities in tokamak configurations (i.e., Alfvén eigenmode, neoclassical tearing mode, and edge-localized mode), and heating methods (i.e., lower hybrid current drive and electron cyclotron resonance heating). The “device” cluster includes mechanical components in tokamaks (i.e., coil, power supply, vessel, magnet, and blanket) and several tokamaks (i.e., Tore Supra, KSTAR, and EAST). The “diagnostics” cluster is composed of plasma diagnostics methods such as soft X-ray, neutron detector, and spectroscopy. Finally, the “simulation” cluster focuses on analytic calculations and computations.

National research capability and its overall trends

Normalized revealed comparative advantage (NRCA) on the fractional publication counts extracted national research capability over 40 years (Fig 3). In all countries, NRCA changes are in good agreement with tokamak construction and operation, representing the scientific effects of large facilities across multiple domains. The United States and Japan have led nuclear fusion research, while Japan’s influence has been decreasing since the 2000s. It may be due to the upgrade of their major tokamak JT-60 which was disassembled in 2009-2012 and is being upgraded to JT-60SA for first plasma in 2020. China rapidly develops research capability overall except in material-related topics. Even though we consider the rise of China in all science and technology fields, their pace in nuclear fusion research is surprisingly fast. China’s tokamaks, HT-7 and HL-2A, raise research capability in device, diagnostics, and simulation. At the point of EAST (Experimental Advanced Superconducting Tokamak) operation in 2006, they also began to equip plasma capability as well. The other countries operating their own tokamaks, Germany, the United Kingdom, Russia, France, Italy, and Switzerland, actively engage in nuclear fusion research. However, the countries without their own tokamak operation, Sweden and the Netherlands, are losing their research capabilities. Canada’s fall seems plausible as they left tokamak projects in the early 2000s [52]. There are two interesting countries, the Republic of Korea and India, that obtain research capability in all fields. Their rises coincide with the ITER project and construction of tokamaks, KSTAR (first plasma in 2008) and SST-1 (first plasma in 2013).

Fig 3

Ranks of normalized revealed comparative advantages for the top 14 countries.

Rank series of the countries are smoothed with LOESS (locally estimated scatterplot smoothing) and colored by the topical clusters.

Ranks of normalized revealed comparative advantages for the top 14 countries.

Rank series of the countries are smoothed with LOESS (locally estimated scatterplot smoothing) and colored by the topical clusters.

Negative relation between complementarity and collaboration

Complementarity positively affects collaboration in many science fields [38-42]. Researchers and countries find collaborators that exchange knowledge as well as resources they do not have. We assume complementarity boosts collaboration even in big science because countries have limited budgets and manpower. To observe whether our assumption holds, we implemented the gravity model of collaboration with the capability distance, a Jaccard distance of the binary NRCA vectors in five topical clusters (Eq 3). The OLS regression results with fixed time effects are given in Table 2. The coefficients of publication counts of two countries are the same because they are symmetric in the collaboration matrix.

Table 2

Gravity model OLS regression results.

Variables	Material	Plasma	Device	Diagnostics	Simulation
ln(P_m,j)	0.497***(0.033)	0.508***(0.032)	0.411***(0.030)	0.438***(0.033)	0.488***(0.033)
ln(P_n,j)	0.497***(0.033)	0.508***(0.032)	0.411***(0.030)	0.438***(0.033)	0.488***(0.033)
ln(d_mn)	-0.495***(0.044)	-0.451***(0.040)	-0.464***(0.042)	-0.546***(0.049)	-0.485***(0.043)
c_mn,j	-0.133(0.222)	-0.911***(0.284)	-0.949***(0.232)	-0.690***(0.175)	-0.027(0.194)
Observations	3518	3518	3518	3518	3518
R²	0.113	0.123	0.101	0.094	0.107

Standard error is in parenthesis.

Fixed time effects are included.

* p-value < 0.1,

** p-value < 0.05,

*** p-value < 0.01

Standard error is in parenthesis. Fixed time effects are included. * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01 In all topical clusters, as expected, the number of publications had a positive coefficient, and the geographical distance had a negative coefficient. This means that collaborations occur frequently when two countries have high research capability and locate closely. In contrast to our assumption, the capability distance negatively affects collaboration, indicating that countries collaborate less if they have research capabilities in different topics. This tendency is found in three clusters, plasma, device, and diagnostics, with respect to fusion reaction in tokamak facilities. Collaborations on material and simulation are not related to the capability distance. The regression results suggest that complementarity would affect collaborations differently by topics in big science. International collaborations in core knowledge fields happen when two countries mutually benefit based on similar research capability.

Discussion and conclusion

Large facilities and international collaboration, two core components of big science, were investigated with bibliographic data, the dynamic topic model, and revealed comparative advantage. In this study, we chose nuclear fusion for a case study. Word similarity between topics unfolded the knowledge structure of nuclear fusion comprising five multidisciplinary topical clusters: material, plasma, device, diagnostics, and simulation. Different countries have different comparative advantages over these clusters. The time points that the comparative advantage trend changes match well with tokamak operation. Catching-up countries that have built their own tokamaks have developed their research capability while countries that do not operate a tokamak miss their productivity. Revealed comparative advantage can be used as a new indicator of big science project evaluation. Through time series analysis [53], we can examine the connections between facility construction and revealed comparative advantages in different topical clusters. The time series analysis addresses whether knowledge spillover occurs in various scales from facilities to countries [54-56]. In addition, with external information such as the amount of funding, the number of employees, and instrument specifications, we can investigate the impact of facility construction and international collaboration in detail. The publishing policy of large facilities also needs to be considered when interpreting the comparative advantage. Large facilities that restrict the publication of academic papers for the purpose of secrecy [57] have low research capability in our study, relative to others that promote academic publishing. These qualitative factors of facilities require further evaluations to estimate their scientific impacts accurately as the measure for policy making, investment, and education [58]. The international collaboration in nuclear fusion was estimated by the gravity model with the capability distance that represents how similar two countries’ research capabilities are. The regression results show high capability distance distracts the international collaborations in fusion reaction related clusters: plasma, device, and diagnostics. This tendency contrasts with that of other science fields favoring collaborators that have complementary comparative advantages [38-42]. Real collaborations in nuclear fusion governed by this pattern are worth studying. Countries may have distinct motivations to collaborate with other countries and to participate in international projects. Political and societal factors would also be involved in the policy making process. Understanding the history of nuclear fusion research gives us insights into what science policy a country has to take depending on the development stage. Our approach can be applied to other fields of big science. Particle physics and Antarctic science are the potential targets. They depend on large facilities, particle accelerators, and research stations in Antarctica. In particle physics, we expect that the dynamic topic model differentiates various types of particle accelerators [59]. A country’s strategic decisions for particle accelerators can be traced with comparative advantages on topical clusters. In Antarctic science, research stations may increase research capabilities on geography-dependent topics [60, 61] because its location expands the range of research activities. An increasing comparative advantage on spatial topics will support this idea. Antarctic science, especially, has interesting aspects that affect the gravity model of collaboration. Collaboration in Antarctica would occur frequently between close research stations, not between close capitals, so the geographical distance of the model should be defined in a different way. The Antarctic Treaty System, which enforces the peaceful usage of Antarctica and freedom of scientific investigation [62], can encourage countries to collaborate with others having complementary comparative advantages. It is necessary to determine in particle physics and Antarctic science whether collaboration in big science decreases by complementarity as in the case of nuclear fusion. More studies are needed to understand the nature of big science.

Determining the number of topics from static LDA model.

(PDF) Click here for additional data file.

Topic usage distribution for static LDA model.

We used the topic usage distribution for static K = 500 model to calculate the cutoff that specifies sufficiently used topics. The minimum of KDE (blue line) derivative determines the cutoff (red dashed line), and the number of topics above this point, K = 41, is used for the DTM. (TIF) Click here for additional data file.

The top 10 words for 41 topics in nuclear fusion research.

(PDF) Click here for additional data file.

8 in total

1. Canada prepares to pull the plug on fusion project.

Authors: Geoff Brumfiel
Journal: Nature Date: 2003-10-30 Impact factor: 49.962

2. Big science: Price to the present.

Authors: J H Capshew; K A Rader
Journal: Osiris Date: 1992 Impact factor: 0.548

3. Openness versus Secrecy in Scientific Research Abstract.

Authors: David B Resnik
Journal: Episteme (Edinb) Date: 2006-02-01

4. Research funding. China bets big on big science.

Authors: Hao Xin; Gong Yidong
Journal: Science Date: 2006-03-17 Impact factor: 47.728

5. Impact of Large-Scale Science on the United States: Big science is here to stay, but we have yet to make the hard financial and educational choices it imposes.

Authors: A M Weinberg
Journal: Science Date: 1961-07-21 Impact factor: 47.728