Literature DB >> 27265052

Multidimensional scaling analysis of virus diseases.

António M Lopes¹, José P Andrade², J A Tenreiro Machado³.

Abstract

BACKGROUND AND
OBJECTIVE: Viruses are infectious agents that replicate inside organisms and reveal a plethora of distinct characteristics. Viral infections spread in many ways, but often have devastating consequences and represent a huge danger for public health. It is important to design statistical and computational techniques capable of handling the available data and highlighting the most important features.
METHODS: This paper reviews the quantitative and qualitative behaviour of 22 infectious diseases caused by viruses. The information is compared and visualized by means of the multidimensional scaling technique.
RESULTS: The results are robust to uncertainties in the data and revealed to be consistent with clinical practice.
CONCLUSIONS: The paper shows that the proposed methodology may represent a solid mathematical tool to tackle a larger number of virus and additional information about these infectious agents.

Entities: CellLine Chemical Disease Gene Species

Keywords: Clustering; Multidimensional scaling; Virus diseases

Mesh：

Year: 2016 PMID： 27265052 PMCID： PMC7114580 DOI： 10.1016/j.cmpb.2016.03.029

Source DB: PubMed Journal: Comput Methods Programs Biomed ISSN： 0169-2607 Impact factor: 5.428

Introduction

Viruses exert enormous damage on humans worldwide and are the single most important cause of infectious morbidity and mortality. History was, and still is, shaped since ancient times by viral diseases. These diseases began to be characterized in the 19th century leading to the identification and differentiation of many viral illnesses [1]. The first viruses were identified at the end of the 19th century and since then the process of discovery has continued steadily with a growing momentum in these years. In fact, in recent years it is possible to visualize viral structure at an atomic level of resolution, nucleotide sequences of viral genomes are known, and functional domains of numerous viruses and enzymes have been established [1], [2]. This information is now being applied to the development of diagnostic tools and effective antiviral therapies. The classification of viruses has also evolved. Firstly, sub-classifications were based on pathologic features such as the preference of a specific organ (for example, the liver in viral hepatitis). Secondly, some epidemiologic characteristics were defined as the transmission by arthropods (arbovirus, for example) [1]. The current classifications are based on the type and structure of the viral nucleic acid and its replication strategy, the symmetry type of the capsid of the virus, and the presence or absence of a lipid envelop [1], [2]. More than 2000 species of viruses have been identified and approximately 650 are capable of infecting humans and animals [2]. Diseases can range from the common cold to fatal events such as Ebola, Smallpox or Rabies [2]. Globally, viral diseases are very diverse and present several degrees of complexity. In this study we will adopt multidimensional scaling (MDS) to visualize the relationships between 22 selected human viral infectious diseases. Some viruses were selected based on recent viral outbreaks and presence in the media (for example, Influenza A virus subtype H5N1, Ebola and Chikungunya), others were chosen due to historical reasons (for example, Rabies, Poliomyelitis and Smallpox), and still others due to their prevalence and incidence in human populations (for example, Influenza, Rhinovirus and Norovirus). In two viral diseases (Human Immunodeficiency Virus and Rabies) we consider both the treated and untreated paradigms of the disease due to the huge discrepancy in mortality. MDS is proven to obtain a new perspective on visualizing global data associated with human pathologies. MDS is a set of techniques used to analyse similarities in data that produce spatial or geometric representations of complex objects [3], [4], [5]. MDS had its origin in behavioural sciences for its help in understanding judgements of individuals (as preference, or relatedness) concerning elements in a set of objects [6], [7], [8]. Nowadays, MDS is used with a large variety of real data, such as biological taxonomy [9], [10], [11], [12], finance [13], [14], marketing [15], sociology [16], physics [17], geophysics [18], [19], [20], communication networks [21], [22], biology and biomedics [23], [24], among others [25], [26]. Bearing these ideas in mind, the paper is organized as follows. In Section 2 we present the MDS technique. In Section 3 we study and compare data regarding 22 virus diseases. Finally, in Section 4 we draw the main conclusions.

Multidimensional scaling

Given s objects in a m-dimensional space and a measure of proximity, , between objects i and j, a symmetric s × s matrix, , of item to item (dis)similarities is calculated in a first step. The MDS algorithm produces a s × q (q < m) configuration, X, representing point coordinates (items), where q is specified by the user. Thus, row i from matrix X gives the coordinates of object i in the q-dimensional embedding space. Configuration X preserves, as best as possible, the proximities between pairwise elements in the higher m-dimensional space and unveils the underlying data structure. MDS is, consequently, different from other similar techniques, such as factor and cluster analysis, because there are no assumptions concerning which factors might drive each dimension. Additionally, MDS is able to treat distinct types of data, has better convergence rates, and is less complex than other methods [3], [27]. In order to arrive at the best configuration X, MDS evaluates different alternative configurations while minimizing a goodness-of-fit function. This problem, equivalent to minimizing the raw stress function, σ 2, can be formulated as [28]:where is a user chosen non-negative weight and is a measure of the (dis)similarities among the items in the embedding space. Therefore, is usually a distance measure. Smaller (larger) distances between two objects translate into more (less) similarities between them. For example, the Minkowski distance provides a general way to specify distance for quantitative data in a multidimensional space:where is the value of dimension k for object i and α is a weight factor. When α = 1, the Euclidean and the city-block distances are obtained for r = 2 and r = 1, respectively. Nevertheless, the MDS technique allows users to choose other metrics for the comparison of objects that can be better adequate for their data. In the sequel we will adopt the Canberra distance and the cosine correlation. There are different stress measures, such as the normalized raw stress, which is σ 2 divided by the sum of squared dissimilarities. Possible alternatives are Kruskal's stress-1 and Kruskal's stress-2, which divide σ by the sum of squared distances, or by a function of the variances of distances, respectively. Another example is the S-stress measure given by the sum of squared errors between squared distances and squared dissimilarities [29], [30]. The Shepard diagram is used to infer the quality of the MDS solution. Let denote the similarities between objects i and j. A Shepard diagram consists of pairs and . If a line connecting the pairs is drawn, then the approximation error, concerning dissimilarities of each object, is given by . The Shepard diagram is thus useful for visualizing the residuals and outliers resulting from the MDS application to the data. A narrow scatter around the 45 degree line indicates a good fit between and . The stress plot represents σ 2 versus the number of dimensions q of the MDS maps. Usually, we get a monotonic decreasing chart and we choose q as a compromise between reducing σ 2 and having a low dimension for the MDS charts. MDS can be divided according to the classification of data similarities, the number of similarity matrices and the nature of the MDS model. We thus have the non-metric, or metric MDS, if similarity data are qualitative or quantitative. In what concerns the number of similarity matrices and nature of the model we have classical MDS (i.e., with one matrix and unweighted models), replicated MDS (i.e., with several matrices and unweighted models) and weighted MDS (i.e., with several matrices and weighted models). The MDS interpretation is based on the emerging clusters and distances between points in the map, rather than on their absolute coordinates, or the geometrical form of the locus. Thus, we can rotate or translate the MDS chart since the distances between points remain identical. Usually, two or three dimensional charts are selected, because they allow a direct graphical representation. MDS has advantages over other methods, such as principal component analysis (PCA), since MDS can follow similarity/dissimilarity matrices based on several distinct metrics. MDS uses the inter-object distances rather than the coordinates of the objects and, therefore, it turns out that the MDS is a more general method than PCA [18], [31].

Multidimensional scaling analysis

In this section we use MDS tools to visualize the relationships between s = 22 infectious diseases caused by viruses, namely Bird Flu (BFlu), Chicken Pox (CPox), Chikungunya (Chi), Dengue Fever (Den), Ebola (Ebo), Hepatitis B (HepB), HIV (HIV), HIV—untreated (HIV Un), Marburg virus disease (Mar), Measles (Mea), MERS (MERS), Mumps (Mum), Norovirus (Nor), Polio (Pol), Rabies (Rab), Rabies—untreated (Rab Un), Rhinovirus (Rhi), Rotavirus (Rot), Rubella (Rub), SARS (SARS), Seasonal Flu (SFlu) and Smallpox (Sma). To the disease, , we associate m 1 = 5 quantitative and m 2 = 2 qualitative features. Therefore, we have a dimensional space of attributes. We start by pre-processing the quantitative and the qualitative data, yielding a new equivalent µ-dimensional space (to be defined in the sequel) for disease comparison. Table 1 lists the quantitative data, , , . We consider the disease fatality rate, the average basic reproductive number, the average serial interval, the incubation period and the virus survival time outside the host.

Table 1

Quantitative attributes of the diseases considered in the study.

	Disease	Acronym	Fatality rate (%)	Average basic reproductive number	Average serial interval (days)	Incubation period (days)	Survival outside host (days)
i		g	1	2	3	4	5
1	Bird flu	BFlu	59.00	2.00	3.00	3.0	30.0
2	Chicken pox	CPox	0.00	7.50	14.00	14.0	2.0
3	Chikungunya	Chi	0.40	4.00	23.00	2.5	–
4	Dengue fever	Den	5.00	3.00	16.00	7.0	63.0
5	Ebola	Ebo	75.00	2.50	15.30	11.4	50.0
6	Hepatitis B	HepB	0.75	6.00	25.00	75.0	28.0
7	HIV	HIV	2.10	3.50	–	60.0	42.0
8	HIV—untreated	HIV Un	80.00	3.50	–	60.0	42.0
9	Marburg virus disease	Mar	25.00	1.60	9.00	6.0	21.0
10	Measles	Mea	0.20	15.00	11.70	11.0	0.1
11	MERS	MERS	27.00	0.50	7.60	5.0	3.0
12	Mumps	Mum	0.01	5.50	18.00	17.0	0.3
13	Norovirus	Nor	0.08	3.70	1.86	1.5	24.0
14	Polio	Pol	22.00	6.00	–	13.0	160.0
15	Rabies	Rab	0.00	1.60	–	40.0	6.0
16	Rabies—untreated	Rab Un	100.00	1.60	–	40.0	6.0
17	Rhinovirus	Rhi	0.00	3.70	7.50	3.0	1.0
18	Rotavirus	Rot	0.00	3.50	7.00	1.5	60.0
19	Rubella	Rub	0.00	6.50	18.30	17.7	0.9
20	SARS	SARS	11.00	3.50	10.00	8.0	9.0
21	Seasonal flu	SFlu	0.01	1.30	3.30	2.0	2.0
22	Smallpox	Sma	15.00	6.00	17.70	14.0	1.5

Quantitative attributes of the diseases considered in the study. Table 2, Table 3 summarize the qualitative features, namely the transmission mode, and the main symptoms of the disease.

Table 2

Main transmission mode of the diseases considered in the study.

	Disease	Acronym	Animal–human	Airborne droplet	Bites	Body fluids	Sexual contact	Surfaces	Faecal–oral
i		h	1	2	3	4	5	6	7
1	Bird flu	BFlu	1	0	0	0	0	0	0
2	Chicken pox	CPox	0	1	0	0	0	0	0
3	Chikungunya	Chi	0	0	1	0	0	0	0
4	Dengue fever	Den	0	0	1	0	0	0	0
5	Ebola	Ebo	0	0	0	1	0	0	0
6	Hepatitis B	HepB	0	0	0	1	1	0	0
7	HIV	HIV	0	0	0	1	1	0	0
8	HIV—untreated	HIV Un	0	0	0	1	1	0	0
9	Marburg virus disease	Mar	0	0	0	1	0	0	0
10	Measles	Mea	0	1	0	0	0	0	0
11	MERS	MERS	0	1	0	0	0	0	0
12	Mumps	Mum	0	1	0	0	0	1	0
13	Norovirus	Nor	0	0	0	0	0	1	1
14	Polio	Pol	0	0	0	0	0	0	1
15	Rabies	Rab	0	0	1	0	0	0	0
16	Rabies—untreated	Rab Un	0	0	1	0	0	0	0
17	Rhinovirus	Rhi	0	1	0	0	0	0	1
18	Rotavirus	Rot	0	0	0	0	0	1	1
19	Rubella	Rub	0	1	0	0	0	0	0
20	SARS	SARS	0	1	0	0	0	0	1
21	Seasonal flu	SFlu	0	1	0	0	0	0	0
22	Smallpox	Sma	0	1	0	0	0	0	0

Table 3

Main symptoms of the diseases considered in the study.

	Disease	Acronym	Fever	Cough	Sore throat	Muscle and body aches	Nausea	Abdominal pain	Diarrhoea	Vomiting	Rash	Itchy	Fluid blisters	Tiredness, fatigue, weakness	Loss of appetite	Headache	Joint pain	Joint swelling	Bleeding	Jaundice	Swollen glands	Chills	Chest pain	Runny nose	Conjunctivitis	Red spots	Shortness of breath	Confusion	Agitation	Sneezing
i		l	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28
1	Bird flu	BFlu	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	Chicken pox	CPox	1	0	0	0	0	0	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	Chikungunya	Chi	1	0	0	1	0	0	0	0	1	0	0	0	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
4	Dengue fever	Den	1	0	0	1	0	0	0	0	1	0	0	0	0	1	1	0	1	0	0	0	0	0	0	0	0	0	0	0
5	Ebola	Ebo	1	0	0	1	0	1	1	1	0	0	0	1	0	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0
6	Hepatitis B	HepB	1	0	0	0	1	1	0	1	0	0	0	1	1	0	1	0	0	1	0	0	0	0	0	0	0	0	0	0
7	HIV	HIV	1	0	1	1	0	0	0	0	1	0	0	1	0	1	1	0	0	0	1	0	0	0	0	0	0	0	0	0
8	HIV—untreated	HIV Un	1	0	1	1	0	0	0	0	1	0	0	1	0	1	1	0	0	0	1	0	0	0	0	0	0	0	0	0
9	Marburg virus disease	Mar	1	0	1	1	1	1	1	1	1	0	0	0	0	1	0	0	1	0	0	1	1	0	0	0	0	0	0	0
10	Measles	Mea	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	0	0	0	0
11	MERS	MERS	1	1	0	0	1	0	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0
12	Mumps	Mum	1	0	0	1	0	0	0	0	0	0	0	1	1	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0
13	Norovirus	Nor	1	0	0	1	1	1	1	1	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
14	Polio	Pol	1	0	1	0	1	1	0	0	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
15	Rabies	Rab	1	0	0	0	0	0	0	0	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0	0	1	1	0
16	Rabies—untreated	Rab Un	1	0	0	0	0	0	0	0	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0	0	1	1	0
17	Rhinovirus	Rhi	0	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	1
18	Rotavirus	Rot	1	0	0	0	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
19	Rubella	Rub	1	0	0	0	0	0	0	0	1	0	0	0	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	0
20	SARS	SARS	1	0	0	1	0	0	1	0	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
21	Seasonal flu	SFlu	1	1	1	1	0	0	1	1	0	0	0	1	0	1	0	0	0	0	0	0	0	1	0	0	0	0	0	0
22	Smallpox	Sma	1	0	0	1	0	0	0	1	0	0	0	1	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0

The transmission mode dimension is represented as binary data (Table 2 ), , , . This means that we consider t = 7 conditions, specifically animal–human, airborne droplet, bites, body fluids, sexual contact, surfaces and faecal–oral. The value means that disease i has the transmission mode h, and means that disease i does not have the transmission mode h. Main transmission mode of the diseases considered in the study. In a similar way, for the symptoms dimension we consider y = 28 indicators, represented as binary data , , (Table 3 ). The value (or 0) means that disease i has (or has not) symptom l. Main symptoms of the diseases considered in the study. Before applying the MDS algorithm we start by “normalizing” the quantitative data to the interval [0, 1], i.e., . For the qualitative data, we apply and , meaning that and . In this way we avoid having some features saturating the numerical values. We proceed by constructing the vectors of features , yielding a µ-dimensional space that embeds all quantitative and qualitative data. This is equivalent to the m-dimensional space defined previously for disease comparison. In the next subsections we use two indices to compare the preprocessed data, namely the Canberra distance, , and the cosine correlation, . Other indices can be adopted, but these two are sufficient to explain the working concepts. We then apply the MDS technique and interpret the generated maps. Fig. 1 depicts a synoptic diagram of the disease's characteristics and quantification method.

Fig. 1

Synoptic diagram of the disease's characteristics and quantification method. The indices denote: i—disease; g—quantitative data; h—qualitative data, transmission mode; l—qualitative data, symptoms. In constructing Table 1, Table 2, Table 3, data were collected from the following sources: Influenza A virus subtype H5N1 commonly known as “Bird Flu” [32], [33], [34], [35]; Chicken Pox (varicella-zoster virus infection) [36], [37], [38], [39]; Chikungunya [40], [41], [42]; Dengue Fever [43], [44]; Ebola [45], [46], [47]; Hepatitis B [48], [49], [50]; Human Immunodeficiency Virus (HIV) and HIV—untreated [51], [52], [53], [54]; Marburg haemorrhagic fever [47], [55]; Measles [56], [57], [58], [59], [60]; Middle East Respiratory Syndrome (MERS) [61], [62], [63]; Mumps [64], [65]; Norovirus [66], [67]; Poliomyelitis [68], [69], [70]; Rabies and Rabies—untreated [71], [72], [73], [74]; Rhinovirus [75], [76], [77]; Rotavirus [78], [79], [80], [81]; Rubella [58], [82]; Severe Acute Respiratory Syndrome (SARS) [61], [83]; Influenza causing seasonal flu [34], [84], [85]; and Smallpox [86], [87].

MDS analysis based on the Canberra distance

In this subsection we consider the construction of matrix X using a measure based on the Canberra distance, , between diseases i and j : Given this index, the s × s symmetric matrix, , is then computed and the MDS tool applied. While several MDS criteria were tested, the Sammon criterion revealed good results and was adopted in all calculations. It should be noted that this criterion tries to optimize a cost function that describes how well the pairwise distances in a data set are preserved [88], [89]. Fig. 2 depicts the 2- and 3-dimensional (2D and 3D) maps produced by MDS. Each point represents a disease, denoted by the corresponding label as shown in Table 1, Table 2, Table 3. We can observe that the Canberra index leads to poor clustering. Nonetheless, we should note that MDS is merely a mathematical clustering and visualization tool and that a physical perspective of the reported results must be found in the light of the comparison index [90]. Therefore, a further explanation about physical mechanisms associated with the results must be envisaged by standard complementary procedures.

Fig. 2

MDS maps for the Canberra index with representations: (A) 2D; (B) 3D.

MDS maps for the Canberra index with representations: (A) 2D; (B) 3D. Fig. 3, Fig. 4 depict the Shepard and stress plots, respectively, which represent standard tools for the assessment of the MDS results. The Shepard diagram shows a good distribution of points around the 45 degree line, particularly for the 3D representation, which means a good fit of the distances to the dissimilarities. The stress plot reveals that a three dimensional space describes well the locus of the s = 22 diseases. In fact, the stress diminishes strongly until q = 2, moderately towards q = 3 and weakly. The maximum curvature point of the stress plot is often adopted as the criterion for deciding the dimensionality of the MDS maps. This means that, although four or more dimensions would represent the data slightly more accurately, 3D maps represent a good compromise between accuracy and easiness of visualization.

Fig. 3

Shepard plots for the Canberra index for representations: (A) 2D; (B) 3D.

Fig. 4

Stress versus q plot for the Canberra index .

Shepard plots for the Canberra index for representations: (A) 2D; (B) 3D. Stress versus q plot for the Canberra index .

MDS analysis based on the cosine correlation

In this subsection we adopt the cosine correlation, , to construct the matrix X. For each disease pair i and j we have:where represents weights, specified by the user, which are usually chosen to favour adequate clustering. Given expression (4) the s × s symmetric matrix, , is computed and the MDS is applied. Fig. 5 represents the 2D and 3D maps resulting from the MDS analysis for , α = 0.5, α = 2. The Shepard and stress plots are identical to the maps shown in Fig. 3, Fig. 4, revealing, as before, that the MDS results are trustworthy. We observe now the emergence of a different pattern, but the main idea of clustering remains. This observation is usual in MDS charts, where alternative indices capture different characteristics of the phenomena and lead to distinct plots, but allowing the same conclusions. The “best” index is simply the one that produces a MDS map where clusters reflect real-world in a more direct perspective.

Fig. 5

MDS maps for the cosine correlation with representations: (A) 2D; (B) 3D.

Clustering analysis

The standard MDS analysis is based on the object groups in the final map. We can rely either in the direct visualization of the plot, or in the implementation of some extra algorithm to extract the clusters. Bearing this idea in mind, in subsection 3.3.1 we adopt the non-hierarchical clustering algorithm K-means to identify clusters in the MDS map. In subsection 3.3.2 we use hierarchical clustering to confirm the results obtained. In subsection 3.3.3 we analyse the sensitivity of the MDS maps. In subsection 3.3.4 we discuss the results. We restrict the analysis to the cosine correlation metric since it revealed better results.

K-means clustering

Clustering consists on grouping objects that are, in some sense, similar to each other. The K-means is a non-hierarchical clustering method commonly used in machine learning and data mining [91]. The algorithm starts with a collection of s objects, where each object is a point in a q-dimensional space, and a given number of clusters, K, specified in advance by the user. The K-means groups the s objects into K ≤ s clusters, so as to minimize the objective function given by the sum of distances between the points and the centres of their clusters. The K-means arrives at a solution in which objects within each cluster are as close to each other as possible, and as far from objects in other clusters as possible. A key issue in K-means is how to determine the correct number of clusters, K. It should be noted that the very notion of “good clustering” is subjective and is a question of point of view. However, we can rely on different indices to measure the quality of the clustering, namely the Davies–Bouldin, the Caliński–Harabasz and the silhouette indices [92], [93], [94]. In this work we adopt the silhouette to compare different solutions. The silhouette value, S, for each object, is a measure of how well each object lies within its cluster. Silhouette values vary in the interval S ∈ [−1, 1]. Silhouette values closer to S = 1 correspond to objects that are very distant from neighbouring clusters and, therefore, they are assigned to the right cluster. If S = 0, then the objects could be assigned to another cluster. When S = −1, then the objects are correctly assigned. Given the coordinates of the s = 22 objects in the q = 3 dimensional space generated by the MDS, we evaluate the clusters identified by the K-means algorithm when varying the number of clusters in the interval K ∈ [2, 7]. Fig. 6 depicts the silhouette average values versus the number of clusters, K, leading to the optimum value K = 4. Fig. 7 illustrates the shape of the silhouettes obtained for , where we can see that the best shape is obtained for K = 4.

Fig. 6

Silhouette average values versus number of clusters K, for K ∈ [2, 7].

Fig. 7

Silhouette shape for .

Silhouette average values versus number of clusters K, for K ∈ [2, 7]. Silhouette shape for . For K = 4, the K-means generates the clusters , , and , to be discussed in subsection 3.3.4.

Hierarchical clustering

As an alternative approach, not involving the MDS, we use a hierarchical clustering algorithm that is fed directly with matrix . Fig. 8 depicts the dendrogram generated by successive (agglomerative) clustering and average-linkage method [95], [96]. We cut the tree at the level 0.27, since below this value we see that the clusters became too close from each other. We see that four clusters emerge with this method, confirming the results obtained in the previous subsection. Nevertheless, MDS uses more efficiently the space and produces charts with a more fruitful map for the objects.

Fig. 8

Dendrogram generated by the hierarchical clustering algorithm for the cosine correlation index.

Sensitivity analysis

The s = 22 virus diseases were compared based on quantitative and qualitative features. As qualitative characteristics are subjective, their influence upon the final results needs to be analysed, so as to prevent biased conclusions. In this line of thought, we vary the weights, {α, α}, of the two qualitative features in the interval and we check their influence in the generated MDS map. Each weight is discretized into r distinct values evenly spaced in the intervals and then r × r instances of the s × s dimensional similarity matrix are calculated. We use these matrices as the input for the MDS algorithm that generates r × r intermediate maps of “points” (i.e., one map per {α, α} pair). Finally, the charts are processed by means of Procrustes analysis in order to obtain a single global plot of “shapes”, where the “points” of the original maps are optimally superimposed [97]. Procrustes analysis performs linear transformations, namely translation, reflection, orthogonal rotation and scaling, with the objective of minimizing a measure of the difference between the “points” in the original maps. The algorithm (i) chooses one MDS map for reference (by selecting one of the available instances); (ii) superimposes all other MDS instances into the current reference; (iii) computes the mean form of the current set of superimposed maps; (iv) compares the distance between the mean and the reference instances to a given threshold value and, if above, sets the reference to the mean form and continues to step (ii). Fig. 9 shows the 3D MDS global map obtained by the Procrustes algorithm, as well as the clusters identified previously. As can be seen, the results are quite robust to large variations in the qualitative features, since the r × r points corresponding to each original object (disease) deviate somehow, but the clusters remain.

Fig. 9

Three-dimensional MDS global map for the cosine correlation , obtained by Procrustes with .

Three-dimensional MDS global map for the cosine correlation , obtained by Procrustes with . In a complementary perspective we address in the sequel the sensitivity of the MDS results to quantitative features. In fact, the quantitative values found in the literature diverge slightly, since they depend on the time of the study and on the conditions observed in each particular case, namely environmental conditions (e.g., temperature, humidity, pressure), geographic region, human development, medical assistance, among others. To assess the sensitivity we add random noise to the values of the quantitative features, with amplitude in the interval ±10% of the values shown in Table 1 (values are limited to zero to avoid negative numbers). In these conditions, we perform ten experiments, generating one MDS map per trial, and then the MDS individual maps are combined using Procrustes. Fig. 10 illustrates the 3D MDS global map obtained and the corresponding superimposed clusters . We conclude that the method is robust to variations in the values of the quantitative features used in the study.

Fig. 10

Three-dimensional MDS global map for the cosine correlation , obtained by Procrustes with random variations in the values of the quantitative features.

Discussion

The clusters have interesting medical and epidemiological value. In cluster , we find the Ebola and the Marburg viruses, which can cause serious, or most of the times, a lethal human disease, even with therapeutic interventions. According to the National Institutes of Health (NIH) Guidelines for Research Involving Recombinant or Synthetic Nucleic Acid Molecules (http://osp.od.nih.gov/sites/default/files/NIH_Guidelines.html#_Toc351276292) of the USA, from November 2013, those viruses are classified as Risk Group 4 pathogens. There are no vaccines or effective treatments for their infections. Accordingly, these viruses are manipulated only in Biosafety Level 4 conditions due to their high individual and community risk. They are also considered biological agents with material threat determinations in the scope of bioterrorism, in the USA [98]. Also, the highly infectious agents responsible for MERS (from the Coronaviridae Subfamily) and Bird flu (Influenza A virus subtype H5N1), classified as Risk Group 3, are mapped in . These viruses are recommended to be manipulated with Biosafety Level 3 precautions, indicated for agents that may cause serious or potentially lethal disease. In this cluster, untreated HIV and untreated rabies infections are also present. In contrast, and distant in the MDS map, we have cluster . Most of the viruses are in Risk Group 2, because they generally do not cause serious or life threatening illness and most of them are readily treated or prevented easily with vaccines. They are manipulated, as most viruses, in Biosafety Level 2 environments. There is the exception of the Chikungunya, an important cause of febrile illness in the world, and now re-emerging as cause of large outbreaks of human disease [99]. The arbovirus (arthropod-borne) alphavirus responsible for this pathology is considered a Risk Group 3 pathogen and requires Biosafety Level 3 precautions [100]. Cluster presents several virus species of different Risk Groups. Polio virus is a Risk Group 2 pathogen as well as the dengue fever virus, an arbovirus. On the other hand, the SARS-associated coronavirus is a pathogen of the Risk Group 3. Furthermore, smallpox, the disease caused by the variola virus, is also present in . Variola is considered a life-threatening disease posing the highest risk to national security due to its potential use as a biological weapon due to the high mortality rates and the major public health impact [98]. The reason is that smallpox was declared eradicated by the World Health Organization (WHO) in 1980, and vaccination, once widely practised, stopped in the same year [98]. Therefore, cluster can be considered as a transition cluster from to . In other words, it is located in the MDS map “equidistant” from and . Identical reasoning can be applied to cluster . In this cluster are the lentivirus (a subgroup of retrovirus) that causes HIV infection and acquired immunodeficiency syndrome (AIDS), a Risk Group 3 pathogen but generally manipulated with Biosafety Level 2 precautions, and the hepatitis B virus, considered belonging to Risk Group 2, equally transmitted by body fluids, and requiring a Biosafety Level 2 environment. Also considered of the Risk Group 2 is the third pathogen present in , i.e., the virus of Lyssavirus genus of the Rhabdoviridae family that causes rabies. Note that if the diseases caused by HIV and rabies are non-treated, then there is a high lethality in humans. They emerge in cluster of the MDS map. In conclusion, the MDS map resulted in a new visualization of the complex quantitative and qualitative data of several diseases caused by viruses, and several clusters were organized having some medical and epidemiological interest. In particular, a cluster emerged with viruses like Ebola and MERS, which are responsible for some recent viral outbreaks. In contrast, in the same MDS map, and distant from the previous group, there is a cluster of viruses associated with human diseases that present generally preventive and therapeutic interventions. The development of this methodology may help in understanding the dynamics of viral diseases.

Conclusion

This paper addressed the clinical characteristics of 22 viruses. A significant number of quantitative and qualitative characteristics were considered. When handling a large volume of information, we are confronted with the problem of comparing all details, but highlighting the most important properties. Discharging information a priori may lead to incomplete or even biased results. Therefore, embedding all details requires adequate statistical, computational and visualization techniques capable of revealing the main aspects while “filtering” the information with low relevance. The MDS technique adopted in this study proved to produce solid results in accordance with present day knowledge about those infectious agents.

Conflict of interest

The authors declare no conflict of interest.

48 in total

Review 1. Chikungunya: a re-emerging virus.

Authors: Felicity J Burt; Micheal S Rolph; Nestor E Rulli; Suresh Mahalingam; Mark T Heise
Journal: Lancet Date: 2011-11-17 Impact factor: 79.321

2. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry combined with multidimensional scaling, binary hierarchical cluster tree and selected diagnostic masses improves species identification of Neolithic keratin sequences from furs of the Tyrolean Iceman Oetzi.

Authors: Klaus Hollemeyer; Wolfgang Altmeyer; Elmar Heinzle; Christian Pitra
Journal: Rapid Commun Mass Spectrom Date: 2012-08-30 Impact factor: 2.419

3. Ventilation in the flow of measles and chickenpox through a community.

Authors: M W WELLS; W A HOLLA
Journal: J Am Med Assoc Date: 1950-04-29

4. The epidemiological profile of rubella and congenital rubella syndrome in the United States, 1998-2004: the evidence for absence of endemic transmission.

Authors: Susan E Reef; Susan B Redd; Emily Abernathy; Laura Zimmerman; Joseph P Icenogle
Journal: Clin Infect Dis Date: 2006-11-01 Impact factor: 9.079

5. Application of multidimensional scaling in numerical taxonomy: analysis of isoenzyme types of Candida species.

Authors: D A Lacher; P F Lehmann
Journal: Ann Clin Lab Sci Date: 1991 Mar-Apr Impact factor: 1.256

6. A classification approach for genotyping viral sequences based on multidimensional scaling and linear discriminant analysis.

Authors: Jiwoong Kim; Yongju Ahn; Kichan Lee; Sung Hee Park; Sangsoo Kim
Journal: BMC Bioinformatics Date: 2010-08-21 Impact factor: 3.169

Review 7. Minimum infective dose of HIV for parenteral dosimetry.

Authors: S Reid; O A Juma
Journal: Int J STD AIDS Date: 2009-12 Impact factor: 1.359

8. A complete analysis of HA and NA genes of influenza A viruses.

Authors: Weifeng Shi; Fumin Lei; Chaodong Zhu; Fabian Sievers; Desmond G Higgins
Journal: PLoS One Date: 2010-12-29 Impact factor: 3.240

Review 9. The aetiology, origins, and diagnosis of severe acute respiratory syndrome.

Authors: L L M Poon; Y Guan; J M Nicholls; K Y Yuen; J S M Peiris
Journal: Lancet Infect Dis Date: 2004-11 Impact factor: 25.071

10. Estimating the basic reproduction number for single-strain dengue fever epidemics.

Authors: Adnan Khan; Muhammad Hassan; Mudassar Imran
Journal: Infect Dis Poverty Date: 2014-04-07 Impact factor: 4.520

5 in total

1. Computational analysis of the SARS-CoV-2 and other viruses based on the Kolmogorov's complexity and Shannon's information theories.

Authors: J A Tenreiro Machado; João M Rocha-Neves; José P Andrade
Journal: Nonlinear Dyn Date: 2020-07-04 Impact factor: 5.022