| Literature DB >> 19622155 |
Ismael A Vergara1, Allan K Mah, Jim C Huang, Maja Tarailo-Graovac, Robert C Johnsen, David L Baillie, Nansheng Chen.
Abstract
BACKGROUND: The nematode Caenorhabditis elegans was the first multicellular organism to have its genome fully sequenced. Over the last 10 years since the original publication in 1998, the C. elegans genome has been scrutinized and the last gaps were filled in November 2002, which present a unique opportunity for examining genome-wide segmental duplications.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19622155 PMCID: PMC2728738 DOI: 10.1186/1471-2164-10-329
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Size distribution of perfect duplicons in . (a) Size distribution of all perfect duplicons in the C. elegans genome measured in number of genes. (b) Size distribution of all perfect duplicons in the C. elegans genome measured in kb. Each N value in the x-axis represent all those duplicons that fall in the range [N-1..N) kb. The y-axis represents the frequency in a logarithmic scale (base 10) of the frequency of a specific duplicon size. Thus, those bins with no visible bar mean that only one duplicon is observed for that particular value.
Figure 2The two largest duplicons in the . (a) Genome browser image of the largest duplicons CL-2198_1 (depicted in black) and CL-2198_2 (depicted in gray), and flanking Cemar1 transposons (shown in red). (b) Alignment of the two largest duplicons indicate the locations of the small differences. From 5' to 3': (1) 319 bp deletion in first duplicon. (2) A single nucleotide insertion ('C') in first duplicon at 2,381,150 bp. (3) A single nucleotide difference ('T" is the first duplicon at 2,420,123 bp and 'C' in the second duplicon at 2,528,402 bp) (4) A single nucleotide difference ('A' in the first duplicon at 2,420,126 bp and 'T' in the second duplicon at 2,528,405 bp). (5) A single nucleotide difference ('T' in the first duplicon at 2,420,132 bp and 'C' in second duplicon at 2,528,411 bp). (6) A triplet difference ('TAC' in the first duplicon from 2,420,134 bp to 2,420,136 bp and 'ACT' in the second duplicon from 2,528,413 bp to 2,528,415 bp). (c) The 319 bp unique sequence in the largest duplicon. Multiple copies of Ce000266 repetitive element are located in the region. The upper and lower panels show the upstream and downstream copies of the largest duplicons, respectively.
List of genes within the duplicated region.
| Sequence Name | EST Support | Paralog Sequence Name | EST Support | Identical? | Method of repair |
|---|---|---|---|---|---|
| Y19D10A.7 | NS | F56A4.9 | NS | N | Longest: F56A4.9 |
| Y19D10A.9 | PS | F56A4.2 | PS | Y | N.A. |
| Y19D10A.8 | NS | F56A4.10 | NS | N | Longest: F56A4.10 |
| Y19D10A.6 | NS | F56A4.1 | NS | N | Evidence: nas-2 |
| Y19D10A.10 | NS | F56A4.11 | NS | N | Longest: F56A4.11 |
| Y19D10A.11 | NS | F56A4.12 | NS | N | Longest: Y19D10A.11 |
| Y19D10A.12 | PS | C01B4.9 | PS | N | Longest: C01B4.9 |
| Y19D10A.5 | FS | C01B4.8 | FS | Y | N.A. |
| Y19D10A.4 | PS | C01B4.7 | PS | Y | N.A. |
| Y19D10A.16 | FS | C01B4.6 | FS | Y | N.A. |
| Y19D10A.15 | NS | C01B4.5 | NS | Y | N.A. |
| Y19D10A.2 | NS | C01B4.3 | NS | Y | N.A. |
| Y19D10A.13 | NS | C01B4.10 | NS | Y | N.A. |
| Y19D10A.1 | NS | C01B4.1 | NS | N | Evidence: str-257 |
| Y19D10A.17 | NS | Y45G12C.8 | NS | Y | N.A. |
| C13B7.3 | NS | Y45G12C.7 | NS | Y | N.A. |
| C13B7.6 | PS | Y45G12C.16 | PS | N | Longest: Y45G12C.16 |
| C13B7.4 | NS | Y45G12C.9 | NS | Y | N.A. |
| C13B7.5 | NS | Y45G12C.10 | NS | N | Evidence: str-119 |
| C13B7.2 | NS | Y45G12C.6 | NS | N | Evidence: str-120 |
| C13B7.1 | NS | Y45G12C.5 | NS | Y | N.A. |
| F56A4.5 | NS | Y45G12C.4 | NS | N | GeneWise: E02C12.11 |
| F56A4.6 | NS | Y45G12C.11 | NS | N | Longest: F56A4.6 |
| F56A4.4 | PS | Y45G12C.3 | PS | Y | N.A. |
| F56A4.7 | NS | Y45G12C.12 | NS | Y | N.A. |
| F56A4.3 | FS | Y45G12C.2 | FS | N | * |
Each gene pair is shown in order of appearance from 5' to 3'. The "Method of repair" column suggests a way to fix those gene models that have different peptide sequence, given the lack of supporting information for better improvement. Longest: suggests taking the longest peptide sequence as the correct model. Evidence: suggests considering as correct the member of the pair that has been reported as "person evidence" in WormBase. GeneWise: suggest a paralog gene that can be used to predict a gene structure in the region within each member of the pair. Each gene is characterized in terms of EST data support as NS (Not Supported) if no intron is supported, PS (Partially Supported) if at least one intron is not supported, and FS (Fully Supported) if all introns are supported by EST data. *: see text.
Figure 3PCR analysis of the largest tandem segmental duplicons. (a) A schematic illustration of the largest duplicons, with PCR primers used for genotyping labeled. (b) A representative gel for strains that do not carry the largest duplication. (c) A representative gel for strains carrying the largest duplication. Lane 1 shows PCR product using primers 319L and 319OR; lane 2 shows PCR product using primers 319L and 319IR; lane 3 shows PCR product using primers DupOL and DupIR; lane 4 shows PCR product using primers DupIL and DupIR; and lane 5 shows PCR product using primers DupIL and DupOR.
Tandem segmental duplications in C. elegans of size 1,000 or larger
| Coordinates Dup1 | Coordinates Dup2 | Matches (bp) | Orientation | N Genes Dup1 | N Genes Dup2 | Associated Transposons |
|---|---|---|---|---|---|---|
| V:2347883..2454596 | V:2455844..2562875 | 106707 | F | 26 | 26 | Cemar1 |
| V:8813143..8850811 | V:8855237..8892906 | 37642 | F | 11 | 13 | TC5, Cer9 |
| III:1251054..1258404 | III:1259414..1266845 | 7339 | F | 4 | 4 | NO |
| IV:12471444..12478970 | IV:12478981..12486507 | 7527 | F | 3 | 3 | NO |
| X:226651..231363 | X:236067..240779 | 4713 | F | 3 | 3 | NDNAX1 |
| IV:5241391..5244977 | IV:5246223..5249809 | 3587 | R | 3 | 3 | NO |
| I:12627236..12632544 | I:12635161..12640469 | 5304 | R | 2 | 2 | NO |
| X:1940626..1945025 | X:1949155..1953554 | 4399 | F | 2 | 2 | NO |
| V:9087269..9088593 | V:9089256..9090580 | 1325 | R | 2 | 2 | NO |
| X:3558880..3563952 | X:3567445..3572527 | 4985 | F | 1 | 1 | NO |
| IV:13129621..13133199 | IV:13135213..13138791 | 3579 | F | 1 | 1 | NDNAX3 |
| I:11616806..11620253 | I:11623105..11626552 | 3448 | R | 1 | 1 | NO |
| IV:4348439..4351841 | IV:4352611..4356013 | 3403 | R | 1 | 1 | NO |
| X:4333166..4336008 | X:4339618..4342467 | 2823 | R | 1 | 1 | NO |
| II:11757121..11759167 | II:11759614..11761660 | 2047 | R | 1 | 1 | NO |
| V:13967844..13969831 | V:13974541..13976528 | 1988 | R | 1 | 1 | NO |
| III:7171786..7173519 | III:7174002..7175735 | 1734 | R | 1 | 1 | NO |
| IV:16339625..16341334 | IV:16342450..16344159 | 1710 | R | 1 | 1 | NO |
| III:11787629..11789338 | III:11790417..11792126 | 1709 | R | 1 | 1 | NO |
| III:2433538..2435215 | III:2436093..2437770 | 1678 | R | 1 | 1 | NO |
| I:11303731..11305210 | I:11308113..11309592 | 1480 | R | 1 | 1 | NO |
| IV:1617460..1618943 | IV:1622242..1623725 | 1481 | R | 1 | 1 | NO |
| II:3588277..3589715 | II:3592045..3593483 | 1439 | R | 1 | 1 | NO |
| IV:9284870..9286382 | IV:9292363..9293902 | 1466 | F | 1 | 1 | NO |
| IV:2566235..2567558 | IV:2569372..2570695 | 1324 | R | 1 | 1 | NO |
| IV:16766557..16767821 | IV:16768481..16769745 | 1265 | R | 1 | 1 | LINE2 |
| X:8319606..8320838 | X:8322049..8323281 | 1233 | R | 1 | 1 | NO |
| I:11355228..11356362 | I:11358159..11359293 | 1135 | R | 1 | 1 | NO |
| I:13890329..13891445 | I:13893120..13894236 | 1117 | R | 1 | 1 | NO |
| II:13079317..13080572 | II:13082405..13083577 | 1173 | R | 1 | 1 | NO |
| IV:5232834..5233864 | IV:5236511..5237541 | 1031 | R | 1 | 1 | NO |
Figure 4Gene F56A4.3 at the junction of the largest pair of duplicons. F56A4.3 gene model (shown in the "Gene Models" track) is fully supported by an EST sequence (shown in the "ESTs aligned by BLAT (best)" track). The black and grey bars represent the ends of the largest pair of duplicons.