| Literature DB >> 29212445 |
Vassily Lyubetsky1,2, Roman Gershgorin1, Konstantin Gorbunov3.
Abstract
BACKGROUND: Chromosome structure is a very limited model of the genome including the information about its chromosomes such as their linear or circular organization, the order of genes on them, and the DNA strand encoding a gene. Gene lengths, nucleotide composition, and intergenic regions are ignored. Although highly incomplete, such structure can be used in many cases, e.g., to reconstruct phylogeny and evolutionary events, to identify gene synteny, regulatory elements and promoters (considering highly conserved elements), etc. Three problems are considered; all assume unequal gene content and the presence of gene paralogs. The distance problem is to determine the minimum number of operations required to transform one chromosome structure into another and the corresponding transformation itself including the identification of paralogs in two structures. We use the DCJ model which is one of the most studied combinatorial rearrangement models. Double-, sesqui-, and single-operations as well as deletion and insertion of a chromosome region are considered in the model; the single ones comprise cut and join. In the reconstruction problem, a phylogenetic tree with chromosome structures in the leaves is given. It is necessary to assign the structures to inner nodes of the tree to minimize the sum of distances between terminal structures of each edge and to identify the mutual paralogs in a fairly large set of structures. A linear algorithm is known for the distance problem without paralogs, while the presence of paralogs makes it NP-hard. If paralogs are allowed but the insertion and deletion operations are missing (and special constraints are imposed), the reduction of the distance problem to integer linear programming is known. Apparently, the reconstruction problem is NP-hard even in the absence of paralogs. The problem of contigs is to find the optimal arrangements for each given set of contigs, which also includes the mutual identification of paralogs.Entities:
Keywords: Ancestral genome; Chromosomal rearrangement; Chromosome structure; Efficient algorithms; Evolution along the tree; Integer linear programming; Parsimony principle; Reconstruction of ancestral genomes; Transformation of chromosome structures
Mesh:
Year: 2017 PMID: 29212445 PMCID: PMC5719933 DOI: 10.1186/s12859-017-1944-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1a Concatenation of any two neighboring special nodes s 1 and s 2(both from a). The nodes s 1 and s 2 are replaced with one special node s 1 s 2 (the concatenated sequence of the sequences of two initial special nodes). Similarly for (b). b Removal of a special node. Large point is an a-special node s and the resulting combined edge is marked (a). Similarly for (b)
Reconstruction obtained by reduction to ILP for chromosome structures in Rhizobium spp.
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE_2 rplY rpoH_1 rpsU_2 rpoE_3 rpsP rplS *rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE2 rplY rpoH_1 rpsU_2 rpoE_3 rpsP rplS rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE2 rplY rpoH_1 rpsU_2 rpoE_3 rpsP rplS rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE2 rplY rpoE_3 *rpoE_4 rpoH_1 rpsU_2 rpsP rplS rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE_2 rplY rpoH_1 rpsU_2 rpoE_3 rpsP rplS rpoH_2 rplU (C) |
|
| *rpsA rpsO rplT rpsT rpoN rpoE_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE_2 rplY rpoH_1 *rpsU_1 rpsU_2 rpsP rplS *rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE2 rplY rpoH_1 rpsU_2 rpsP rplS rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpoE_3 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE_2 rplY rpoH_1 rpsU_2 rpsP rplS rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoE_2 rplY rpoH_1 rpsU_2 rpsP rplS rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpsU_1 *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD *rpoD *rpoZ rpoH_1 rpsU_2 rpoE_3 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpoE_1 rpsU_1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoH_1 rpsU_2 rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT *rpoN *rplI *rpsR *rpsI *rplM rplK rplA rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rplV rpsC rplP rpsQ rplN rplX rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoH_1 *rpoH_2 rplU (C) |
|
| *rpsA *rpsO rplT rpsT rpoN rpsU1 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD rpoH_1 rpsU_2 rpoH_2 rplU (C) |
|
| *rpsO rplT rpsT *rpoN rpoZ *rpsR *rpsF *rpsI *rplM rpsB *rpsD *rplQ *rpoA *rpsK *rpsM *rplO *rpsE *rplR *rplF *rpsH *rpsN *rplE *rplX *rplN *rplP *rpsC *rplV *rpsS *rplB *rplW *rplD *rplC *rpsJ *rpsG *rpsL *rpoC *rpoB *rplL *rplJ *rplA *rplK *rpoD rplY rpoH_1 rpsP rplS *rplU (C) rpsU_1 *rpsU_2 rpsA (L) |
|
| rpsO rpsA rplT rpsT rpoN rpoZ *rplI *rpsR *rpsF rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ *rpsI *rplM rpsD *rpsB *rpoD rplY rpoH_1 rpsU_1 *rpsU_2 rpsU_3 rpsP rplS rpoH2 *rplU (C) |
|
| rpsO rpsA rplT rpsT rpoN rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsD *rpsB *rpoD rplY rpoH_1 rpsU_1 *rpsU_2 rpsU_3 rpsP rplS rpoH_2 *rplU (C) |
|
| rpsO rpsA rplT *rpoN rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB rpsU_2 *rpsD *rpoD rplY rpoH_1 rpsU_1 rplU rpsP rplS *rpoH_2 (C) |
|
| rpsO rpsA rplT rpsT rpoN rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD *rpoD rplY rpoH_1 rpsU_1 rpsP rplS rpoH_2 *rplU (C) |
| Tree root | rpsO rpsA rplT rpsT rpoN rpsU_2 rpoZ *rplI *rpsR *rpsF *rpsI *rplM rplK rplA rplJ rplL rpoB rpoC rpsL rpsG rpsJ rplC rplD rplW rplB rpsS rplV rpsC rplP rpsQ rplN rplX rplE rpsN rpsH rplF rplR rpsE rplO rpsM rpsK rpoA rplQ rpsB *rpsD *rpoD rplY rpoH_1 rpsU_1 rpsP rplS rpoH_2 rplU (C) |
For other designations, see Tables 1 and 2
Reconstruction obtained by reduction to ILP for mitochondrial chromosome structures in sporozoan class Aconoidasida. The data in the tree leaves are in the lines marked by (l) after the species name. It was obtained from genomes represented in GenBank
|
| *ls5 ls6 ls2 (L) ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 (C) |
|
| cox1 *cox3 ls1 *ls3 *cytb *ls5 ls4 (L) |
|
| cox1 *cox3 ls1 *ls3 *cytb *ls5 ls4 (L) |
|
| cox1 *cox3 ls1 *ls3 *cytb *ls5 ls4 (L) |
|
| cox1 *cox3 ls1 *ls3 *ls2 *cytb *ls5 ls4 (L) |
|
| cox1 *cox3 ls1 *ls2 *ls3 *cytb *ls4 ls5 (L) |
|
| ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 (C) ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ls1 ss4 ss6 ls7 ls6 ss3 ls3 ls9 ss2 ls4 ls5 *cox3 ls8 ss5 ss1 cox1 cytb ls2 (L) |
|
| ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 (C) ls2 (L) |
|
| ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 (C) ls2 (L) |
|
| ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (C) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss3 ls3 ls9 ss2 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ss4 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss4 ss6 ls7 (L) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 ls7 (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb (C) |
|
| ss3 ls3 ls9 ss2 ls4 *cox3 ls8 ss5 ss1 cox1 cytb ls1 ss6 ls7 ss4 (C) |
|
| ls1 ss6 ls7 ss3 ls3 ls9 ss2 ls4 *cox3 cox1 cytb ls8 ss5 ss1 (C) |
If a structure has two chromosomes, they are given on separate lines. Circular and linear chromosomes are marked by (C) and (L), respectively. The symbol * means the complementary chain
Reconstruction obtained by reduction to ILP for plastid chromosome structures with paralogs in brown algae
|
| rpl32_1 rpl21_1 *rps4 *rps16 *rps1 rpl9 rpl11 rpl1 rpl12 *rps10 *tufa *rps7 *rps12 *rpl31 *rps9 *rpl13 *rpoa *rps11 *rps13 *rpl36 *rps5 *rpl18 *rpl6 *rps8 *rpl5 *rpl24 *rpl14 *rps17 *rpl29 *rpl16 *rps3 *rpl22 *rps19 *rpl2 *rpl23 *rpl4 *rpl3 *rpl21_2 *rpl32_2 *rpl35 rpl20 *rpl19 rpl27 rpl34 rps20 rpob rpoc1 rpoc2 rps2 rps14 *rps18 *rpl33 clpc rbcl (C) |
|
| *rpl19 rpl27 rpl34 rps20 rpob rpoc1 rpoc2 rps2 rpl35 rpl20 rbcl rps14 *clpc rpl33 rps18 *rpl32_2 rps16 rps4 rps1 rpl9 rpl11 rpl1 rpl12 *rps10 *tufa *rps7 *rps12 *rpl31 *rps9 *rpl13 *rpoa *rps11 *rps13 *rpl36 *rps5 *rpl18 *rpl6 *rps8 *rpl5 *rpl24 *rpl14 *rps17 *rpl29 *rpl16 *rps3 *rpl22 *rps19 *rpl2 *rpl23 *rpl4 *rpl3 *rpl21_2 (C) |
|
| *rps2 *rpoc2 *rpoc1 *rpob *rps20 *rpl34 *rpl27 rpl19 rpl35 rpl20 rbcl rps14 *rps18 *rpl33 clpc rpl32_1 rpl21_1 rpl3 rpl4 rpl23 rpl2 rps19 rpl22 rps3 rpl16 rpl29 rps17 rpl14 rpl24 rpl5 rps8 rpl6 rpl18 rps5 rpl36 rps13 rps11 rpoa rpl13 rps9 rpl31 rps12 rps7 tufa rps10 *rpl12 *rpl1 *rpl11 *rpl9 rps1 *rps4 *rps16 (C) |
| Inner non-root node | *rpl19 rpl27 rpl34 rps20 rpob rpoc1 rpoc2 rps2 rpl35 rpl20 rbcl rps14 rpl32_2 *rps18 *rpl33 clpc rpl32_1 rpl21_1 *rps4 *rps16 *rps1 rpl9 rpl11 rpl1 rpl12 *rps10 *tufa *rps7 *rps12 *rpl31 *rps9 *rpl13 *rpoa *rps11 *rps13 *rpl36 *rps5 *rpl18 *rpl6 *rps8 *rpl5 *rpl24 *rpl14 *rps17 *rpl29 *rpl16 *rps3 *rpl22 *rps19 *rpl2 *rpl23 *rpl4 *rpl3 *rpl21_2 (C) |
| Tree root | rpl32_1 rpl21_1 *rps4 *rps16 *rps1 rpl9 rpl11 rpl1 rpl12 *rps10 *tufa *rps7 *rps12 *rpl31 *rps9 *rpl13 *rpoa *rps11 *rps13 *rpl36 *rps5 *rpl18 *rpl6 *rps8 *rpl5 *rpl24 *rpl14 *rps17 *rpl29 *rpl16 *rps3 *rpl22 *rps19 *rpl2 *rpl23 *rpl4 *rpl3 *rpl21_2 *rpl19 rpl27 rpl34 rps20 rpob rpoc1 rpoc2 rps2 rps14 rpl32_2 *rps18 *rpl33 clpc rpl35 rpl20 rbcl (C) |
Paralog numbers are given after the underscore. For other designations, see Table 1
Fig. 2Tree of chromosomal structures of Rhizobium spp. generated using the chromosome structures given in Table 3 in the lines marked by (l) after the species name. The reconstruction result is presented in the other lines of the same Table 3
Fig. 3a Given sets a and b composed of three contigs each. b Problem solution: the minimum cycles for (a) (left) and (b) (right)