| Literature DB >> 26780836 |
Vassily Lyubetsky1, Roman Gershgorin2, Alexander Seliverstov3, Konstantin Gorbunov4.
Abstract
BACKGROUND: One of the main aims of phylogenomics is the reconstruction of objects defined in the leaves along the whole phylogenetic tree to minimize the specified functional, which may also include the phylogenetic tree generation. Such objects can include nucleotide and amino acid sequences, chromosomal structures, etc. The structures can have any set of linear and circular chromosomes, variable gene composition and include any number of paralogs, as well as any weights of individual evolutionary operations to transform a chromosome structure. Many heuristic algorithms were proposed for this purpose, but there are just a few exact algorithms with low (linear, cubic or similar) polynomial computational complexity among them to our knowledge. The algorithms naturally start from the calculation of both the distance between two structures and the shortest sequence of operations transforming one structure into another. Such calculation per se is an NP-hard problem.Entities:
Mesh:
Year: 2016 PMID: 26780836 PMCID: PMC4717669 DOI: 10.1186/s12859-016-0878-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Two chromosome structures
Fig. 2Sequence of structures and operations generated by the algorithm for the example shown in Fig. 1
Fig. 3Shortest sequence of structures and operations in the case of circular weights for the example shown in Fig. 1
Mitochondrial chromosome structures in the class Aconoidasida
| Subclass | Species | Locus in GenBank | Type | Composition |
|---|---|---|---|---|
| Haemosporida |
| FJ168564.1 | C | ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 ls7 ss4 |
|
| FJ168563.1 | C | ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 ls7 | |
|
| NC_009336.1 | L | ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 | |
|
| NC_015303.1 | L | ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 | |
|
| NC_002375.1 | L | ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 | |
|
| NC_009961.2 | L | ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 | |
|
| AY722799.1 | C | ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb | |
|
| NC_008288.1 | L | ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 | |
|
| NC_008279.1 | L | ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 | |
|
| NC_007232.1 | C | ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb | |
|
| NC_009960.2 | L | ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 | |
|
| NC_002235.1 | L | ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 | |
|
| NC_012426.1 | C | ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 | |
|
| NC_007233.1 | C | ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 cox1 cytb ls8 ss5 ss1 | |
|
| NC_007243.1 | C | ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb | |
| Piroplasmida |
| NC_009902.1 | L | cox1 *cox3 ls1 *ls2 *ls3 *cytb *ls4 ls5 |
|
| NC_011005.1 | L | cox1 *cox3 ls1 *ls3 *cytb *ls5 ls4 | |
|
| CR940346.1 | L | cox1 *cox3 ls1 *ls3 *ls2 *cytb *ls5 ls4 |
Circular and linear chromosomes are marked by C and L, respectively. Everywhere in the list of genes asterisk indicates the complementary strand relative to that specified in GenBank. The rightmost column shows the gene order using standard gene names
Analyzed 66 species with rhodophytic plastids
| Locus in GenBank | Species | #prot | #clust | #sing |
|---|---|---|---|---|
| NC_024079.1 |
| 134 | 129 | 0 |
| NC_024080.1 |
| 145 | 138 | 1 |
| NC_012898.1 |
| 105 | 105 | 0 |
| NC_012903.1 |
| 110 | 110 | 0 |
| NC_011395.1 |
| 32 | 22 | 7 |
| NC_021075.1 |
| 201 | 200 | 1 |
| NC_025313.1 |
| 132 | 130 | 0 |
| NC_025310.1 |
| 131 | 128 | 0 |
| NC_020795.1 |
| 204 | 204 | 0 |
| NC_026522.1 |
| 71 | 71 | 0 |
| NC_014340.2 |
| 78 | 51 | 24 |
| NC_014345.1 |
| 81 | 69 | 5 |
| NC_024081.1 |
| 139 | 130 | 0 |
| NC_013703.1 |
| 82 | 79 | 3 |
| NC_004799.1 |
| 207 | 189 | 18 |
| NC_001840.1 |
| 197 | 186 | 11 |
| NC_024082.1 |
| 161 | 141 | 13 |
| NC_024083.1 |
| 130 | 128 | 0 |
| NC_014287.1 |
| 129 | 127 | 0 |
| NC_013498.1 |
| 148 | 143 | 1 |
| NC_004823.1 |
| 28 | 21 | 7 |
| NC_007288.1 |
| 119 | 112 | 7 |
| NC_024928.1 |
| 160 | 136 | 2 |
| NC_015403.1 |
| 135 | 130 | 1 |
| NC_016735.1 |
| 139 | 139 | 0 |
| NC_024665.1 |
| 182 | 181 | 1 |
| NC_023785.1 |
| 202 | 200 | 2 |
| NC_006137.1 |
| 203 | 201 | 2 |
| NC_021618.1 |
| 233 | 201 | 32 |
| NC_000926.1 |
| 147 | 142 | 5 |
| NC_010772.1 |
| 156 | 139 | 3 |
| NC_014267.1 |
| 139 | 132 | 6 |
| NC_027093.1 |
| 62 | 52 | 7 |
| NC_024084.1 |
| 132 | 130 | 0 |
| NC_022667.1 |
| 30 | 30 | 0 |
| NC_024085.1 |
| 138 | 129 | 0 |
| NC_020014.1 |
| 119 | 116 | 3 |
| NC_022259.1 |
| 125 | 123 | 0 |
| NC_022262.1 |
| 124 | 123 | 0 |
| NC_022263.1 |
| 126 | 123 | 1 |
| NC_022260.1 |
| 126 | 123 | 0 |
| NC_022261.1 |
| 123 | 123 | 0 |
| NC_001713.1 |
| 140 | 128 | 9 |
| NC_020371.1 |
| 111 | 102 | 9 |
| NC_016703.2 |
| 108 | 108 | 0 |
| NC_021637.1 |
| 108 | 108 | 0 |
| NC_008588.1 |
| 132 | 130 | 0 |
| NC_023293.1 |
| 31 | 31 | 0 |
| NC_000925.1 |
| 209 | 209 | 0 |
| NC_023133.1 |
| 224 | 183 | 40 |
| NC_021189.1 |
| 211 | 210 | 1 |
| NC_024050.1 |
| 209 | 207 | 2 |
| NC_007932.1 |
| 209 | 206 | 3 |
| NC_025311.1 |
| 135 | 123 | 1 |
| NC_009573.1 |
| 146 | 143 | 3 |
| NC_025312.1 |
| 140 | 126 | 0 |
| NC_018523.1 |
| 139 | 139 | 0 |
| NC_014808.1 |
| 142 | 126 | 1 |
| NC_008589.1 |
| 141 | 127 | 0 |
| NC_025314.1 |
| 141 | 127 | 0 |
| NC_007758.1 |
| 44 | 27 | 12 |
| NC_001799.1 |
| 26 | 21 | 5 |
| NC_026851.1 |
| 137 | 124 | 8 |
| NC_016731.1 |
| 130 | 128 | 0 |
| NC_011600.1 |
| 139 | 138 | 1 |
| NC_026523.1 |
| 192 | 191 | 1 |
#Prot, number of plastid-encoded proteins in the species; #clust, number of clusters containing at least one from the species and one out of the species; and #sing, number of plastid-encoded proteins from the species not included in any cluster
Fig. 4The tree of chromosome structures of mitochondria in sporozoan class Aconoidasida generated by the descent algorithm. (http://purl.org/phylo/treebase/phylows/study/TB2:S18685?x-access-code=bf7e98f7d030be83c7c2d1116c7faf0e&format=html)
Phylogenetic reconstruction of mitochondrial chromosome structures in sporozoan class Aconoidasida
| Tree node | Chromosome structure |
|---|---|
|
| ss1 cox1 *cox3 ls1 *ls3 *ss3 *ls6 ls8 ss5 (C) | *ls7 *ss6 *ss4 ls9 ss2 ls4 ls5 cytb ls2 (L) |
|
| *ls4 ls5 cytb ls2 ls3 *ls1 cox3 *cox1 (L) |
|
| *ls4 ls5 cytb ls2 ls3 *ls1 cox3 *cox1 (L) |
|
| cox1 *cox3 ls1 *ls3 *cytb *ls5 ls4 (L) |
|
| cox1 *cox3 ls1 *ls3 *ls2 *cytb *ls5 ls4 (L) |
|
| cox1 *cox3 ls1 *ls2 *ls3 *cytb *ls4 ls5 (L) |
|
| *ls7 *ss6 *ss4 *ls1 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (C) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss1 cox1 cytb ls1 ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 (C) |
|
| ss1 cox1 cytb ls1 ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 (C) |
|
| ss1 cox1 cytb ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 (C) |
|
| ss1 cox1 cytb ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 (C) |
|
| ss1 cox1 cytb ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 ls7 (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 ls7 ss4 (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 cox1 cytb ls8 ss5 ss1 (C) |
Reconstruction was generated by the descent algorithm for the tree in Fig. 4. Circular and linear chromosomes are marked by C and L, respectively. The left column shows a non-leaf tree node by the first and the last leaves. The right column shows the chromosome structure in the node (the order of rows corresponds to the traversal of the tree in Fig. 4). The leaves are labelled by (l), only their chromosomal structures are feeded to the input of our algorithm
Fig. 5The tree of chromosome structures of mitochondria in sporozoan class Aconoidasida. The tree was generated by the algorithm from the Section “The first algorithm solving the reconstruction problem for structures without paralogs”. (http://purl.org/phylo/treebase/phylows/study/TB2:S18685?x-access-code=bf7e98f7d030be83c7c2d1116c7faf0e&format=html)
Phylogenetic reconstruction of mitochondrial chromosome structures in sporozoan class Aconoidasida
| Tree node | Chromosome structure |
|---|---|
|
| ss1 cox1 *cox3 ls1 *ls3 *ss3 *ls6 ls8 ss5 (C) | *ls7 *ss6 *ss4 ls9 ss2 ls4 ls5 cytb ls2 (L) |
|
| *ls4 ls5 cytb ls2 ls3 *ls1 cox3 *cox1 (L) |
|
| cox1 *cox3 ls1 *ls3 *ls2 *cytb *ls5 ls4 (L) |
|
| cox1 *cox3 ls1 *ls2 *ls3 *cytb *ls4 ls5 (L) |
|
| cox1 *cox3 ls1 *ls3 *cytb *ls5 ls4 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss1 cox1 cytb ls1 ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 ls7 ss4 (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 cox1 cytb ls8 ss5 ss1 (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
Reconstruction was generated by the algorithm from the Section “The first algorithm solving the reconstruction problem for structures without paralogs” for the tree in Fig. 5
Fig. 6Tree of chromosomal structures of rhodophytic plastids generated by the descent algorithm. The data were obtained from GenBank for chromosomes listed in Table 2. The chromosome structures that were fed to our algorithm are shown in Additional file 1, #3, Tables S3a–S3b in rows denoted by (l). (http://purl.org/phylo/treebase/phylows/study/TB2:S18685?x-access-code=bf7e98f7d030be83c7c2d1116c7faf0e&format=html)
Fig. 7a Evolutionary scenario of chromosome structures along the small tree. The following events are shown on edges: −1, loss of one of two paralogs of gene psbY; +1, emergence of a paralog of gene psbY; +R, emergence of an inverted repeat of a chromosome segment; I1, inversion of a chromosome segment; T1, transversion of a chromosome segment; T2, translocation of a chromosome segment; I2, insertion of a chromosome segment, and D – disappearance of a chromosome segment. The number of the events is given in parentheses when greater than 1. For the reconstruction details, see Table S3a in Additional file 1, #3. b Evolutionary scenario of chromosome structures along the large tree. The following events are shown on edges: −1, loss of gene psbY; −2, loss of one of two paralogs of gene rpoC2; +2, emergence of a paralog of gene rpoC2; +3, emergence of a paralog of gene clpC; I1, inversion of a chromosome segment; T1, transversion of a chromosome segment; T2, translocation of a chromosome segment; 2 F, fusion of two paralogs of gene rpoC2 into one large gene, and D, deletion of a chromosome segment. The number of the events is given in parentheses when greater than 1. For the reconstruction details, see Table S3b in Additional file 1, #3