| Literature DB >> 31950994 |
Yukako Katsura1,2,3, Masatoshi Nei2,3.
Abstract
We previously introduced a numerical quantity called the stability (Ps) of an inferred tree and showed that for the tree to be reliable this stability as well as the reliability of the tree, which is usually computed as the bootstrap probability (Pb), must be high. However, if genome duplication occurs in a species, a gene family of the genome also duplicates, and for this reason alone some Ps values can be high in a tree of the duplicated gene families. In addition, the topology of the duplicated gene family can be similar to that of the original gene family if such gene families are identifiable. After genome duplication, however, the gene families are often partially deleted or partially duplicated, and the duplicated gene family may not show the same topology as that of the original family. It is therefore necessary to compute the similarity of the topologies of the duplicated and the original gene families. In this paper, we introduce another quantity called the reproducibility (Pr) for measuring the similarity of the two gene families. To show how to compute the Pr values, we first compute the Pb and Ps values for each of the MHC class II α and β chain gene families, which were apparently generated by genome duplication. We then compute the Pr values for the MHC class II α and β chain gene families. The Pr values for the α and β chain gene families are now low, and this suggests that the diploidization of gene segregation has occurred after the genome duplication. Currently higher animals, defined as animals with complex phenotypic characters, generally have a higher genome size, and this increase in genome size appears to have been caused by genome duplication and diploidization of gene segregation after genome duplication.Entities:
Keywords: MHC class II α; and β; chain gene families; computer program RESTA; diploidization; gene segregation; genome duplication; phylogenetic trees; reliability; reproducibility; stability
Mesh:
Year: 2020 PMID: 31950994 PMCID: PMC7012300 DOI: 10.1093/gbe/evz272
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—An example of computation of the Ps and Pr values. We assume that the original inferred tree of a gene family is composed of DNA or protein sequences (1), (2), and (3): Its topology is ((A,B), C) called T1 (Topology 1), and the tree of duplicated gene family is composed of sequences (4), (5) and (6): Its topology is ((B,C), A) called T2 (Topology 2). The T1 represents three possible subtree topologies, which is expressed as ((A,B), C), ((B,C), A), and ((A,C), B), and one of the close outgroups is used. The number of bootstrap replications used for each subtree topology is assumed to be 1,000. The probability of obtaining a subtree topology is shown as a percentage for each of the three subtree topologies (85%, 10%, and 5%). The Ps value is the bootstrap value for the tree topology showing the highest bootstrap probability among the three subtree topologies (85%), and in the present case we have assumed that it is the subtree topology having ((A,B), C). In this study, we compute the reproducibility of topologies between the original and the duplicated gene families. The reproducibility value (Pr) is the average of the bootstrap probability (Pr1) showing T2 among the replicates of the subtree of T1 and the bootstrap probability (Pr2) showing T1 among the replications of the original tree of T2. This figure shows that Pr1 is 10%, and Pr2 is not shown.
. 2.—Pb and Ps values of the phylogenetic tree for 27 MHC class II DPA, DQA, and DRA genes in 9 primates. The phylogenetic tree obtained by the NJp method (Saitou and Nei 1987; Yoshida and Nei 2016). The number of nucleotides used was 939 bp per sequence. The Pb value is shown for each interior branch, and the number of bootstrap replications used was 1,000. The Ps value was also obtained by the 1,000 replications for each gene family using the RESTA program (Katsura et al. 2017) and is shown as a bold and italic number below the Pb value for each relevant interior branch. The aligned sequences are in Supplementary Material online.
. 3.—Pb and Ps values of the phylogenetic tree for 27 MHC class II DPB, DQB and DRB genes in 9 primates. The Ps value is shown in a bold and italic number below the Pb value for each relevant interior branch. The number of nucleotides used was 927 bp per sequence.
The Pr1, Pr2, and Pr Values for the Phylogenetic Tree
| (A) The Pr1, Pr2, and Pr Values Between the α and β Chain Genes for the Tree of DP, DQ, and DR Genes Are Shown in Percentage, Respectively | |||
|---|---|---|---|
| Pr1 | Pr2 | Pr | |
| DPA–DPB | 0.1 | 1.3 | 0.7 |
| DQA–DQB | 8.0 | 66.8 | 37.4 |
| DRA–DRB | 0.5 | 0.0 | 0.3 |
|
| |||
|
| |||
|
| |||
| DPA–DQA | 0.5 | 13.0 | 6.8 |
| DPA–DQB | 12.7 | 0.0 | 6.4 |
| DPB–DQA | 3.4 | 2.0 | 2.7 |
| DPB–DQB | 0.0 | 0.0 | 0.0 |
| DQA–DRA | 10.7 | 1.7 | 6.2 |
| DQA–DRB | 42.7 | 1.4 | 22.1 |
| DQB–DRA | 11.2 | 6.7 | 9.0 |
| DQB–DRB | 2.0 | 1.4 | 1.7 |
| DRA–DPA | 1.1 | 1.6 | 1.4 |
| DRA–DPB | 0.0 | 15.7 | 7.9 |
| DRB–DPA | 0.1 | 0.0 | 0.1 |
| DRB–DPB | 5.7 | 0.0 | 2.9 |