| Literature DB >> 34191803 |
Eric Garcia1,2, Daniel Wright1, Remy Gatins1, May B Roberts1, Hudson T Pinheiro1,3, Eva Salas1,4, Jei-Ying Chen1, Jacob R Winnikoff1,5, Giacomo Bernardi1.
Abstract
A common way of illustrating phylogeographic results is through the use of haplotype networks. While these networks help to visualize relationships between individuals, populations, and species, evolutionary studies often only quantitatively analyze genetic diversity among haplotypes and ignore other network properties. Here, we present a new metric, haplotype network branch diversity (HBd), as an easy way to quantifiably compare haplotype network complexity. Our metric builds off the logic of combining genetic and topological diversity to estimate complexity previously used by the published metric haplotype network diversity (HNd). However, unlike HNd which uses a combination of network features to produce complexity values that cannot be defined in probabilistic terms, thereby obscuring the values' implication for a sampled population, HBd uses frequencies of haplotype classes to incorporate topological information of networks, keeping the focus on the population and providing easy-to-interpret probabilistic values for randomly sampled individuals. The goal of this study is to introduce this more intuitive metric and provide an R script that allows researchers to calculate diversity and complexity indices from haplotype networks. A group of datasets, generated manually (model dataset) and based on published data (empirical dataset), were used to illustrate the behavior of HBd and both of its terms, haplotype diversity, and a new index called branch diversity. Results followed a predicted trend in both model and empirical datasets, from low metric values in simple networks to high values in complex networks. In short, the new combined metric joins genetic and topological diversity of haplotype networks, into a single complexity value. Based on our analysis, we recommend the use of HBd, as it makes direct comparisons of network complexity straightforward and provides probabilistic values that can readily discriminate situations that are difficult to resolve with available metrics.Entities:
Year: 2021 PMID: 34191803 PMCID: PMC8244886 DOI: 10.1371/journal.pone.0251878
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Haplotype networks and diversity indices for manually generated sequence datasets.
Values for haplotype diversity (Hd), branch diversity (Bd), and haplotype network branch diversity (HBd), are given for each haplotype network. Colors represent individuals since each individual was intentionally set to represent a distinct population. For each network, haplotype classes (Hc) are represented in parenthesis with pairs of numbers where the class’ number of branches (nbHc) and individuals (niHc) are presented to the left and right of a colon, respectively. For example, the haplotype network in Panel A is made up of two haplotype classes, thus (1:15, 5:6) represents that there are 15 individuals within the 1-branch haplotype class and 6 individuals within the 5-branch haplotype class. This breakdown indicates the number of haplotype classes and the frequency-evenness among them, components which directly affect Bd and subsequently, HBd. All panels show datasets comprising 21 individuals and ranging from 6 to 21 haplotypes. Top panels (A, B, C, and D) illustrate four simple network configurations with six haplotypes that maintain Hd constant but that can be differentiated by Bd and HBd. Middle panels (E, F, G, and H) show variation in network configuration that maintains Bd constant but increases Hd from left to right. Bottom panels show more complex haplotype networks with 9, 17, 21, and 21 haplotypes, in panels I, J, K, and L, respectively, where Bd provides a larger margin to make comparisons than Hd, particularly between panels with equal Hd values (H and L). Additional dataset information for each panel is given in Table 1. Sequence and site files for all panels can be found in (S1–S13 Files).
Fig 2Haplotype networks and diversity indices for published sequence datasets.
Colors represent sampled populations. All datasets (described in the Methods section) are based on CO1 or d-loop sequences of fish species shown in italics and were chosen to represent real scenarios with different levels of network complexity (A and B [26]; C [25]; and D [24]). Values for haplotype diversity (Hd), branch diversity (Bd) and haplotype network branch diversity (HBd) are given for each haplotype network. Similar to Fig 1, haplotype classes (Hc) are represented in parenthesis with pairs of numbers where the class’ number of branches (nbHc) and individuals (niHc) are presented to the left and right of a colon, respectively. Additional values referring to these networks are given in Table 1. Sequence and site files for all panels can be found in (S14–S21 Files).
Diversity indices for model (Fig 1) and published datasets (Fig 2).
| A | testing_A | 21 | 6 | 2 | 0.86 | 0.43 | 0.35 |
| B | testing_B | 21 | 6 | 3 | 0.86 | 0.47 | 0.39 |
| C | testing_C | 21 | 6 | 2 | 0.86 | 0.51 | 0.42 |
| D | testing_D | 21 | 6 | 3 | 0.86 | 0.6 | 0.49 |
| E | testing_E | 21 | 6 | 3 | 0.86 | 0.47 | 0.39 |
| F | testing_F | 21 | 8 | 3 | 0.87 | 0.47 | 0.39 |
| G | testing_G | 21 | 11 | 3 | 0.91 | 0.47 | 0.41 |
| H | testing_H | 21 | 14 | 3 | 0.97 | 0.47 | 0.43 |
| I | testing_I | 21 | 9 | 2 | 0.63 | 0.5 | 0.3 |
| J | testing_J | 21 | 17 | 3 | 0.95 | 0.61 | 0.55 |
| K | testing_K | 21 | 21 | 4 | 1 | 0.53 | 0.51 |
| L | testing_L | 21 | 21 | 4 | 1 | 0.71 | 0.67 |
| A | 16 | 4 | 2 | 0.44 | 0.23 | 0.1 | |
| B | 17 | 8 | 3 | 0.87 | 0.63 | 0.52 | |
| C | 145 | 48 | 6 | 0.91 | 0.81 | 0.73 | |
| D | 185 | 152 | 10 | 0.99 | 0.67 | 0.66 |
Columns from left to right: Panel and names of datasets corresponding to Figs 1 and 2, number of individuals (n), number of haplotypes (nH), number of haplotype classes (nHc), haplotype diversity (Hd), branch diversity (Bd), and haplotype network branch diversity (HBd).