| Literature DB >> 17038171 |
Xiyin Wang1, Xiaoli Shi, Zhe Li, Qihui Zhu, Lei Kong, Wen Tang, Song Ge, Jingchu Luo.
Abstract
BACKGROUND: The identification of chromosomal homology will shed light on such mysteries of genome evolution as DNA duplication, rearrangement and loss. Several approaches have been developed to detect chromosomal homology based on gene synteny or colinearity. However, the previously reported implementations lack statistical inferences which are essential to reveal actual homologies.Entities:
Mesh:
Year: 2006 PMID: 17038171 PMCID: PMC1626491 DOI: 10.1186/1471-2105-7-447
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A modified Smith-Waterman algorithm to locate colinearity. (A) A simplified gene homology matrix (GHM, denoted as H). Genes A1, A2, ..., A18 on chromosome A are arranged horizontally, and genes B1, B2, ..., B14 on chromosome B are arranged vertically. Each cell of the matrix is filled with "1" or "0" based on the homology information from BLASTP search, e.g., gene A1 and gene B1 are homologous, and gene A2 and B2 are non-homologous. (B) A modified dynamic programming procedure. A scoring matrix S is constructed recursively based on H, with mg set to 2 genes apart. The distance criterion demands that neighboring genes in colinearity are no more than 2 genes apart. Pointers are shown by dark or grey arrow lines. Two collinear paths containing 9 and 5 genes are shown by dark arrow lines reflecting the same colinear relationship between the corresponding chromosomal regions.
Figure 2Examples of dot maps. (A) A dot map between rice chromosomes 2 and 4. Each dot in the map reflects a homologous gene pair with BLASTP score > 100. The dots are not distributed uniformly in the map. The map is also featured by many horizontal and vertical lines formed by repetitive genes. (B) A dot map between the same chromosomes as (A) with repetitive genes filtered. (C) A dot map of rice chromosome 1 against itself. Self-matching dots form a solid diagonal line. (D) A dot map with self-matching and repetitive genes filtered. A diagonal line reflecting the neighboring homologues can still be seen.
Figure 3Distance distribution of homologous genes. (A) The distance distribution of rice homologous genes. (B) The distance distribution of Arabidopsis homologous genes.
Figure 4Different distribution pattern of homologous genes in sister regions. Same numbers of homologous genes located in two pairs of sister regions with different size. Homologue pairs are more densely located in (B) than in (A). Dark horizontal lines represent chromosomes, round dots denote genes on chromosomes, lines linking the dots indicate gene homology.
The number and percentage of genes in duplicated blocks in rice and Arabidopsis genomes
| Multiplication level 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
| Arabidopsis | Gene No | 13452 | 3934 | 1345 | 269 | 115 | 86 | 0 |
| Percentage 2 | 0.750 | 0.224 | 0.071 | 0.018 | 0.008 | 0.003 | 0.000 | |
| Rice | Gene No | 17947 | 11114 | 6020 | 2930 | 1371 | 846 | 550 |
| Percentage | 0.762 | 0.429 | 0.223 | 0.111 | 0.057 | 0.031 | 0.016 | |
1. Multiplication level of a gene displays that it is in a duplicated segment that appears for how many times in a genome.
2. The percentage is the ratio of genes in multiplication levels >= a specific number.
The 20 longest duplicated segments in the Arabidopsis genome
| Colinear gene number | Gene orientation identity | Epv | Segment A in | Segment B in | ||||||
| Starting gene | Ending gene | Gene number | Length (Mb) | Starting gene | Ending gene | Gene number | Length (Mb) | |||
| 106 | 0.96 | 1.92E-226 | 2_3518 | 2_4078 | 560 | 1.88 | 3_4725 | 3_5165 | 440 | 1.55 |
| 83 | 0.96 | 3.99E-200 | 1_1726 | 1_2130 | 404 | 1.54 | 1_6013 | 1_6467 | 454 | 1.65 |
| 82 | 0.98 | 3.64E-199 | 1_6041 | 1_6467 | 426 | 1.56 | 1_1752 | 1_2130 | 378 | 1.45 |
| 78 | 0.97 | 1.83E-155 | 3_840 | 3_1141 | 301 | 0.97 | 5_151 | 5_609 | 458 | 1.54 |
| 70 | 0.90 | 1.38E-138 | 2_3108 | 2_3445 | 337 | 1.26 | 3_4285 | 3_4619 | 334 | 1.19 |
| 69 | 0.97 | 1.68E-129 | 1_3629 | 1_3991 | 362 | 1.43 | 3_1669 | 3_2096 | 427 | 1.56 |
| 64 | 0.94 | 1.23E-116 | 1_639 | 1_343 | 296 | 1.02 | 2_2172 | 2_2582 | 410 | 1.47 |
| 64 | 0.95 | 5.65E-127 | 3_118 | 3_353 | 235 | 0.75 | 5_1391 | 5_1691 | 300 | 1.05 |
| 62 | 1.00 | 2.82E-123 | 1_4258 | 1_4017 | 241 | 0.90 | 3_1328 | 3_1603 | 275 | 0.97 |
| 60 | 0.97 | 1.67E-114 | 4_1800 | 4_1590 | 210 | 0.78 | 5_3789 | 5_4037 | 248 | 0.90 |
| 59 | 0.97 | 1.24E-110 | 2_1607 | 2_1840 | 233 | 0.97 | 4_2963 | 4_3236 | 273 | 1.02 |
| 54 | 0.91 | 9.58E-101 | 4_2685 | 4_2484 | 201 | 0.68 | 5_4682 | 5_5062 | 380 | 1.35 |
| 53 | 0.98 | 6.29E-098 | 2_1326 | 2_1096 | 230 | 0.86 | 4_2727 | 4_2960 | 233 | 0.85 |
| 53 | 0.96 | 3.97E-091 | 3_2507 | 3_2114 | 393 | 1.57 | 4_1114 | 4_1412 | 298 | 1.27 |
| 51 | 0.94 | 6.88E-100 | 2_935 | 2_1093 | 158 | 0.62 | 4_3464 | 4_3625 | 161 | 0.57 |
| 41 | 0.98 | 1.07E-063 | 1_1423 | 1_1285 | 138 | 0.47 | 2_6 | 2_245 | 239 | 0.98 |
| 37 | 0.92 | 3.83E-062 | 2_1308 | 2_1472 | 164 | 0.56 | 4_3786 | 4_3958 | 172 | 0.57 |
| 37 | 0.89 | 1.33E-056 | 4_2327 | 4_2478 | 151 | 0.50 | 5_4245 | 5_4549 | 304 | 1.10 |
| 33 | 0.88 | 1.91E-044 | 1_134 | 1_271 | 137 | 0.44 | 4_191 | 4_396 | 205 | 0.83 |
| 32 | 1.00 | 6.68E-055 | 3_111 | 3_1 | 110 | 0.32 | 5_1250 | 5_1390 | 140 | 0.47 |
* The gene names (also in Table 4 and 5) reflect the chromosome and the gene order, e.g. '2_3518' represents the 3518th gene on chromosome 2.
The 20 longest duplicated segments in the rice genome
| Colinear gene number | Gene orientation identity | Epv | Segment A in rice | Segment B in rice | ||||||
| Starting gene | Ending gene | Gene number | Length (Mb) | Starting gene | Ending gene | Gene number | Length (Mb) | |||
| 194 | 0 | 0.95 | 11_5 | 11_700 | 695 | 4.11 | 12_73 | 12_691 | 618 | 3.73 |
| 191 | 0 | 0.95 | 2_3549 | 2_4505 | 956 | 6.18 | 4_3002 | 4_4139 | 1137 | 7.03 |
| 157 | 0 | 0.94 | 1_5264 | 1_3976 | 1288 | 8.42 | 5_3767 | 5_4423 | 656 | 3.85 |
| 139 | 0 | 0.97 | 1_6225 | 1_5317 | 908 | 5.68 | 5_2981 | 5_3719 | 738 | 4.50 |
| 126 | 3.65E-296 | 0.90 | 8_3168 | 8_4097 | 929 | 5.92 | 9_1944 | 9_2863 | 919 | 5.73 |
| 122 | 8.23E-286 | 0.89 | 3_2952 | 3_1610 | 1342 | 9.01 | 7_3177 | 7_4010 | 833 | 5.10 |
| 110 | 3.85E-257 | 0.94 | 2_5335 | 2_4716 | 619 | 3.94 | 6_682 | 6_1503 | 821 | 5.59 |
| 88 | 5.33E-191 | 0.95 | 2_2758 | 2_3516 | 758 | 5.07 | 4_2121 | 4_2941 | 820 | 5.56 |
| 59 | 2.57E-112 | 0.83 | 3_5355 | 3_5625 | 270 | 1.61 | 7_335 | 7_910 | 575 | 3.87 |
| 41 | 1.81E-067 | 0.98 | 1_743 | 1_1160 | 417 | 2.97 | 5_701 | 5_1067 | 366 | 2.75 |
| 40 | 4.49E-072 | 0.95 | 2_896 | 2_661 | 235 | 1.54 | 6_3717 | 6_4079 | 362 | 2.57 |
| 34 | 4.97E-054 | 0.97 | 3_3955 | 3_4295 | 340 | 2.28 | 12_2686 | 12_2997 | 311 | 2.08 |
| 33 | 1.83E-059 | 0.94 | 2_643 | 2_310 | 333 | 2.22 | 6_4127 | 6_4386 | 259 | 1.64 |
| 32 | 1.30E-051 | 0.94 | 2_1059 | 2_907 | 152 | 1.05 | 6_3325 | 6_3648 | 323 | 2.39 |
| 30 | 8.41E-049 | 0.90 | 2_1361 | 2_1112 | 249 | 1.76 | 6_2904 | 6_3209 | 305 | 2.18 |
| 23 | 1.49E-036 | 0.96 | 3_661 | 3_561 | 100 | 0.59 | 10_1721 | 10_1879 | 158 | 1.16 |
| 22 | 5.24E-030 | 1.00 | 1_6600 | 1_6816 | 216 | 1.30 | 5_2616 | 5_2780 | 164 | 1.15 |
| 22 | 9.83E-031 | 0.95 | 8_2936 | 8_3126 | 190 | 1.40 | 9_1646 | 9_1894 | 248 | 1.80 |
| 21 | 3.50E-038 | 0.95 | 1_581 | 1_724 | 143 | 0.74 | 5_559 | 5_644 | 85 | 0.56 |
| 21 | 3.23E-030 | 0.81 | 3_5157 | 3_5293 | 136 | 0.92 | 7_71 | 7_279 | 208 | 1.32 |
The 20 longest collinear regions between the Arabidopsis and rice genome
| Colinear gene number | Gene orientation identity | epv | Segment in | Segments in rice | ||||||
| Starting gene | Ending gene | Gene number | Length (Mb) | Starting gene | Ending gene | Gene number | Length (Mb) | |||
| 14 | 0.93 | 5.53E-16 | 4_3648 | 4_3741 | 93 | 0.61 | 2_3736 | 2_3815 | 79 | 0.32 |
| 11 | 0.64 | 1.45E-09 | 2_5222 | 2_5287 | 65 | 0.38 | 4_2954 | 4_3026 | 72 | 0.26 |
| 11 | 0.91 | 2.69E-10 | 4_2502 | 4_2430 | 72 | 0.46 | 2_2835 | 2_2883 | 48 | 0.19 |
| 10 | 0.60 | 1.37E-07 | 7_562 | 7_658 | 96 | 0.64 | 3_2215 | 3_2280 | 65 | 0.24 |
| 10 | 0.70 | 9.51E-08 | 10_3103 | 10_3021 | 82 | 0.54 | 2_3733 | 2_3781 | 48 | 0.19 |
| 10 | 0.70 | 3.61E-09 | 1_5674 | 1_5595 | 79 | 0.51 | 2_3286 | 2_3339 | 53 | 0.18 |
| 10 | 0.90 | 4.37E-10 | 8_3510 | 8_3590 | 80 | 0.42 | 5_4866 | 5_4931 | 65 | 0.23 |
| 9 | 0.78 | 7.81E-07 | 3_1660 | 3_1736 | 76 | 0.48 | 2_2221 | 2_2248 | 27 | 0.12 |
| 9 | 0.89 | 3.63E-07 | 2_4713 | 2_4635 | 78 | 0.41 | 5_4870 | 5_4948 | 78 | 0.29 |
| 9 | 0.56 | 4.60E-08 | 9_2767 | 9_2822 | 55 | 0.31 | 4_3435 | 4_3468 | 33 | 0.12 |
| 9 | 0.89 | 1.44E-08 | 7_3347 | 7_3305 | 42 | 0.28 | 1_5880 | 1_5908 | 28 | 0.10 |
| 9 | 1.00 | 4.51E-10 | 1_5843 | 1_5889 | 46 | 0.31 | 3_4249 | 3_4279 | 30 | 0.10 |
| 8 | 0.75 | 1.32E-05 | 5_3294 | 5_3386 | 92 | 0.56 | 5_104 | 5_198 | 94 | 0.29 |
| 8 | 0.75 | 6.45E-05 | 6_1100 | 6_1031 | 69 | 0.54 | 5_5811 | 5_5863 | 52 | 0.21 |
| 8 | 0.75 | 6.08E-05 | 7_3242 | 7_3189 | 53 | 0.32 | 5_5408 | 5_5441 | 33 | 0.12 |
| 8 | 0.75 | 4.70E-05 | 4_2502 | 4_2430 | 72 | 0.46 | 3_4109 | 3_4154 | 45 | 0.16 |
| 8 | 0.75 | 3.60E-05 | 4_3665 | 4_3741 | 76 | 0.50 | 3_4887 | 3_4952 | 65 | 0.23 |
| 8 | 0.63 | 2.24E-05 | 6_772 | 6_682 | 90 | 0.57 | 4_3081 | 4_3144 | 63 | 0.21 |
| 8 | 0.63 | 1.25E-05 | 4_2498 | 4_2425 | 73 | 0.49 | 2_1946 | 2_2009 | 63 | 0.19 |
| 8 | 0.63 | 1.12E-05 | 1_5796 | 1_5720 | 76 | 0.43 | 5_128 | 5_207 | 79 | 0.25 |