| Literature DB >> 29357361 |
Susanne Fischer1, Conrad M Freuling2, Thomas Müller2, Florian Pfaff3, Ulrich Bodenhofer4, Dirk Höper2, Mareike Fischer5, Denise A Marston6, Anthony R Fooks6, Thomas C Mettenleiter2, Franz J Conraths1, Timo Homeier-Bachmann1.
Abstract
Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named 'affinity propagation clustering' (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29357361 PMCID: PMC5794188 DOI: 10.1371/journal.pntd.0006182
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Details of the 46 RABV isolates sequenced in this study.
| Genbank | Lab ID | Country | Year of isolation | Species | Taxonomic | Genome |
|---|---|---|---|---|---|---|
| LT909545 | 20282 | Afghanistan | 2006 | dog | 11,929 | |
| LT909542 | 13125 | Algeria | 1984 | dog | 11,931 | |
| LT909535 | 13123 | Algeria | 1989 | dog | 11,928 | |
| LT909530 | 13251 | Chile | 1979 | human | 11,931 | |
| LT909534 | 13465 | Kenya | 1997 | jackal | n.d. | 11,923 |
| LT909536 | 13471 | Kenya | 2001 | dog | 11,923 | |
| LT909546 | 13135 | Nigeria | 1988 | cat | 11,927 | |
| LT909548 | 13138 | Nigeria | 1989 | dog | 11,923 | |
| LT909541 | 13086 | Pakistan | 1984 | dog | 11,928 | |
| LT909531 | 13177 | Sudan | 1993 | dog | 11,930 | |
| LT909528 | 20520 | Tanzania | 2009 | jackal | n.d. | 11,923 |
| LT909551 | 13473 | Ethiopia | 1992 | dog | 11,927 | |
| LT909547 | 13284 | Germany | 1990 | fox | 11,923 | |
| LT909537 | 12951 | Estonia | 2000 | fox | 11,923 | |
| LT909543 | 13249 | Chile | 1973 | human | 11,925 | |
| LT909538 | 12989 | Finland | 1990 | fox | 11,923 | |
| LT909539 | 13182 | India | 2002 | dog | 11,929 | |
| LT909527 | 13102 | Indonesia | 1988 | dog | 11,930 | |
| LT909526 | 13162 | Iran | 1991 | fox | 11,924 | |
| LT909550 | 13020 | Norway | 2000 | fox | 11,927 | |
| LT909532 | 12929 | Poland | 1994 | fox | 11,924 | |
| LT909533 | 13044 | Saudi Arabia | 1990 | fox | 11,924 | |
| LT909540 | 13043 | Saudi Arabia | 1987 | fox | 11,924 | |
| LT909529 | 13122 | Algeria | 1984 | dog | 11,928 | |
| LT909544 | 13212 | Mexico | 2002 | dog | 11,925 | |
| LT909549 | 34873 | Thailand | 1988 | unknown | 11,930 | |
| MG458304 | RV50 | United States | 1975 | bat | n.d. | 11,922 |
| MG458305 | RV108 | Chile | unknown | bat | 11,923 | |
| MG458306 | RV860 | Czech Republic | unknown | fox | 11,924 | |
| MG458307 | RV995 | South Africa | 2000 | cat | 11,922 | |
| MG458308 | RV1009 | South Africa | 2000 | mongoose | n.d. | 11,926 |
| KY860584 | RV1124 | Turkey | 1999 | dog | 11,923 | |
| MG458309 | RV1185 | Serbia | 1978 | dog | 11,923 | |
| MG458310 | RV1189 | Serbia | 1986 | fox | 11,923 | |
| MG458311 | RV1196 | Serbia | 1998 | fox | 11,923 | |
| MG458312 | RV1219 | Serbia | 1997 | fox | 11,923 | |
| MG458313 | RV1336 | Russia | 1996 | dog | 11,926 | |
| MG458314 | RV1789 | British West Indies | 1997 | bat | 11,922 | |
| MG458315 | RV2321 | Egypt | 1998 | dog | 11,923 | |
| MG458316 | RV2322 | Egypt | 1998 | dog | 11,923 | |
| MG458317 | RV2323 | Egypt | 1999 | dog | 11,923 | |
| MG458318 | RV2481 | South Africa | 2008 | human | 11,918 | |
| MG458319 | RV2854 | Grenada | 2011 | mongoose | 11,925 | |
| MG458320 | RV2924 | Nepal | 2012 | human | 11,927 | |
| KP723638 | RV2985 | Ethiopia | 2014 | wolf | 11,926 | |
| MG458321 | ChDg | China | unknown | dog | 11,924 |
n.d. = not determined. Information on the exact species was not available as at least two different species of jackals, mongoose, and bats are occurring in those particular countries.
Summary of studies analyzing global RABV sequence diversity.
| Target sequence | Number of sequences analyzed | Aim of study | Focus of study | Cluster designation | References |
|---|---|---|---|---|---|
| N-gene (220 nt) | 61 | Epidemiologic and historical evaluations of relationships among RABV isolates | Global analysis of RABV | Numeric and geographic combinations | [ |
| N-gene | 54 | Molecular and phylogenetic analyses to evaluate the intrinsic variability and the evolutionary pattern of RABV N-genes | Global analyses of RABV | Combination of artificial names and numbers (e.g. Vaccine 1) and geographic/ numerical combinations (e.g. Africa1b) | [ |
| N-gene | 80 | Better understanding of the selection pressures acting on RABV Virus | Global analyses of RABV (focus on selection pressures) | Host associated with geographical belongings (e.g. Skunk (Canada)) | [ |
| G-L region | 65 | Determine the population | Global analyses (bat RABV as outgroup) focusing on African isolates | Combination of host and geographical origin (e.g. USA skunks, African canids) or regional names (e.g. Middle East) | [ |
| N-gene | 151 | Stochastic processes of genetic drift and | Global analyses of RABV dog related isolates | Geographical names (numerical) (e.g. Asian, Africa-3) | [ |
| N-gene | 228 | Provide molecular and virologic evidence that domestic dog rabies is no longer enzootic to the United States | Global analyses of RABV dog related isolates | Geographical, numerical and host combinations | [ |
| N-genes of full genomes | 22 | Elucidate the origin of new RABV isolates circulating in Sri Lanka | Global analyses with study focus on Sri Lanka | (e.g. America, India) | [ |
| N-gene | 80 | Molecular epidemiological study of the Arctic/Arctic-like lineage of RABV to date | (Global analyses without bats) study focus on Arctic regions | Geographical names (lineages & groups) numerical (cluster &subcluster) | [ |
| G-genes | 172 | Investigation of RABV host shifts in the Flagstaff area via large-scale genetic analyses | Global analyses, specific host cluster analyses | Host acronyms or geographical names or acronyms | [ |
| Full genomes (and extracted N-genes) | 53 | Integration of new South Korean isolates into the global RABV distribution | Global analyses, further detailed for Asian isolates | Geographical names | [ |
| Full genomes | 32 | Comparisons of molecular differences between an Isolate from China and one from Mexico, integration of both into global phylogeny | Global analyses, focusing on Asian isolates | Geographical names (numerical: e.g. SEA1) | [ |
| Full genomes | 36 | Evolutional analyses of RABV, quantify the current circulating animal rabies occurrence in Laos and complete the molecular characterization of the viruses | Global analyses focusing on Asian isolates | Geographical names (numerical: e.g. China 1) | [ |
| Full genomes | 321 | Large genome wide evolutionary investigation, aim is to identify those evolutionary patterns and processes associated | Global analyses | Geographical names (numerical) (e.g. Asian, Africa-3) | [ |
| Full genomes | 562 | Application of APC, a novel mathematical tool for transparent cluster allocation | Global analyses | Geographical names (e.g. Cosmopolitan, New World, Asian, Arctic) | This study |
Fig 1Graphical display of AP clusters over the range of input parameters for an extended data set (562, A) and a reduced number of sequences (392, B). Optimal input preference for intraspecific analyses, i.e. the optimal number of clusters, was defined as the largest plateau (here four AP clusters), with the exception of the two cluster plateau (shaded gray) as methodologically, the beginning of the lower bound of the two cluster plateau cannot be defined for certain. In A the increasing length of the five cluster plateaus suggests the existence of an additional AP cluster which is not yet supported by sufficient data.
Fig 2Global distribution of all 562 RABV full-genome sequences according to the results of AP clustering.
The width of a pie chart is representing the total number of sequences from a specific country. Forty-six newly generated sequences from previously underrepresented areas in the Near East, Europe, Southern America and Africa were included in this study. The allocation to the AP clusters, i.e. New World cluster (blue), Arctic/Arctic-like (grey), Cosmopolitan (red), and Asian (green) is indicated. The nomenclature of AP clusters was based on previously assigned names. Samples from the previously described Indian subcontinent are highlighted with a red circle (Cosmopolitan sequences), and a green circle (Asian sequences).
Fig 3Condensed phylogenetic neighbor joining tree of 562 full genome RABV sequences based on the Tamura-3-parameter evolution model as implemented in Mega 6.
Compression is conducted so that a condensed branch only contained sequences of one defined AP cluster. The allocation to the AP clusters, i.e. New World cluster (blue), Arctic/Arctic-like (grey), Cosmopolitan (red), and Asian (green) is indicated. Branches highlighted contain sequences from the Africa-2 lineage (*) and the Indian subcontinent (**).
Similarities in % of the four cluster exemplars to the Asian and Cosmopolitan cluster, additionally similarities from sequences from Indian subcontinent (N = 6) to both Asian and Cosmopolitan cluster exemplars.
| Exemplars and individual sequences | Exemplar | Exemplar | Exemplar | Exemplar |
|---|---|---|---|---|
| Exemplar Cosmopolitan | 100% | 85.22% | 82.80% | 87.74% |
| Exemplar Asian | 85.22% | 100% | 83.40% | 84.50% |
| Exemplar Arctic | 87.74% | 84.50% | 82.37% | 100% |
| Exemplar New World | 82.80% | 83.40% | 100% | 82.37% |
| KX148108_Nepal_2011 | 85.15% | 84.86% | 83.16% | 84.86% |
| KX148245_Nepal_2009 | 85.02% | 84.75% | 83.08% | 84.78% |
| KX148246_India_1997 | 84.97% | 85.21% | 83.15% | 84.37% |
| AB569299_Sri Lanka_human_2008 | 84.67% | 85.10% | 83.06% | 84.36% |
| AB635373_Sri Lanka_cat_2009 | 84.53% | 84.88% | 82.95% | 84.19% |
| KF154999_United Kingdom_dog_2008 | 84.80% | 85.09% | 83.18% | 84.44% |
Fig 4Graphical display of AP clusters over the range of input parameters for G-gene sequences (extracted from full genome sequences, S1 Table) and additional sequences from Nepal (N = 9) and Sri Lanka (N = 49) (S2 Table).
G-gene analysis supported the existence of a fifth AP cluster as well as an additionally increased adjacent plateau. As the length of two cluster plateau cannot be defined, it is shaded in gray.
Fig 5A) Phylogenetic neighbor joining tree of 21 full genome lyssavirus sequences (only lyssavirus species with at least 3 divergent and complete full genome sequences available were used) based on the Tamura-3-parameter evolution model as implemented in Mega6. Bootstrap values (500 iterations) are indicated next to the branches. B) Graphical display of AP clusters over the range of input parameters for the 21 sequences. The largest plateau was seven AP clusters (highlighted). As the length of two cluster plateau cannot be defined, it is shaded in gray.