| Literature DB >> 24564959 |
Yi-Min Chen, Chun-Hui Yu, Chi-Chuan Hwang, Tsunglin Liu.
Abstract
BACKGROUND: Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying.Entities:
Mesh:
Year: 2013 PMID: 24564959 PMCID: PMC4029551 DOI: 10.1186/1752-0509-7-S6-S7
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1OMACC workflow. OMACC runs in four steps: 1) aligning contigs (colored lines with an arrow indicating the orientation) to the optical map (top rectangle) using SOMA2, and obtaining the relative order of contigs on the map, 2) rescaling optical map via comparing the lengths of restriction fragments on the map to the corresponding lengths on the contigs, 3) searching contig graph for all possible paths of contigs that connects each pair of neighboring contigs, and 4) determining the best path whose length is a closest match to the gap size.
Figure 2Ratio of restriction fragement lengths on the optical map to those on the contig sequences (GI1 data). The ratio varies when the contig length is below 10 Kb. For contigs of length at least 10 Kb, a consistent ratio 0.9174 is obtained.
Figure 3Pseudo-code of our modified DFS algorithm.
Connections of the E. coli contigs by OMACC. For contigs, the number, "+/-", and ":" stand for the contig index, strand, and connections, respectively.
| Neighboring contigs | Gap size (bp) | Best contig path | Contig path length (bp) | Length difference | Correct |
|---|---|---|---|---|---|
| 006+,091+ | 2326 | 006+:116+:091+ | 2327 | 1 | 1 |
| 004+,086+ | 10658 | 004+:005+:086+ | 10660 | 2 | 1 |
| 032-,046+ | 6058 | 032-:047+:046+ | 6056 | 2 | 1 |
| 044-,017+ | 11342 | 044-:047+:017+ | 11340 | 2 | 1 |
| 013+,061+ | 23861 | 013+:011-:053+:061+ | 23864 | 3 | 1 |
| 010+,069- | 4003 | 010+:078+:069- | 4007 | 4 | 1 |
| 050-,059- | 17496 | 050-:116-:059- | 17500 | 4 | 1 |
| 012-,036- | 20629 | 012-:011+:036- | 20635 | 6 | 1 |
| 086+,054- | 6739 | 086+:117-:054- | 6745 | 6 | 1 |
| 017+,012- | 10239 | 017+:018+:012- | 10246 | 7 | 1 |
| 029-,010+ | 38377 | 029-:015+:056+:116-:010+ | 38384 | 7 | 1 |
| 057-,079- | 29696 | 057-:053-:011+:075+:111-:080-:079- | 29703 | 7 | 1 |
| 023+,022+ | 40884 | 023+:117+:022+ | 40893 | 9 | 1 |
| 022+,002+ | 49821 | 022+:116-:068-:011-:053+:081-:116-:002+ | 49831 | 10 | 1 |
| 048+,044- | 27181 | 048+:028+:044- | 27191 | 10 | 1 |
| 002+,050- | 25463 | 002+:015+:050- | 25475 | 12 | 1 |
| 108+,062- | 48785 | 108+:117-:062- | 48800 | 15 | 1 |
| 062-,016+ | 11600 | 062-:015-:016+ | 11238 | 362 | 1 |
| 091+,067- | 24244 | 091+:053-:011+:067- | 23876 | 368 | 1 |
| 076+,029- | 10053 | 076+:028+:029- | 9285 | 768 | 1 |
| 001+,023+ | 29005 | 001+:117+:023+ | 29788 | 783 | 0* |
| 016+,033- | 8649 | 016+:083+:033- | 6842 | 1807 | 0* |
The last column indicates whether the best contig path matches the true contig order in Table S1, Additional file 1.
*True contig path not in the contig graph.
Connections of the E. coli contigs by FINISH. The notations are the same as in Table 1.
| Neighboring contigs | Gap size (bp) | Best contig path | Contig path length (bp) | Length difference | Correct |
|---|---|---|---|---|---|
| 006+,091+ | 2326 | 006+:116+:091+ | 2327 | 1 | 1 |
| 004+,086+ | 10658 | 004+:005+:086+ | 10660 | 2 | 1 |
| 032-,046+ | 6058 | 032-:047+:046+ | 6056 | 2 | 1 |
| 044-,017+ | 11342 | 044-:047+:017+ | 11340 | 2 | 1 |
| 013+,061+ | 23861 | 013+:011-:053+:061+ | 23864 | 3 | 1 |
| 010+,069- | 4003 | 010+:078+:069- | 4007 | 4 | 1 |
| 050-,059- | 17496 | 050-:116-:059- | 17500 | 4 | 1 |
| 012-,036- | 20629 | 012-:011+:036- | 20635 | 6 | 1 |
| 086+,054- | 6739 | 086+:117-:054- | 6745 | 6 | 1 |
| 017+,012- | 10239 | 017+:018+:012- | 10246 | 7 | 1 |
| 029-,010+ | 38377 | 029-:015+:056+:116-:010+ | 38384 | 7 | 1 |
| 057-,079- | 29696 | 057-:053-:011+:075+:111-:080-:079- | 29703 | 7 | 1 |
| 023+,022+ | 40884 | 023+:117+:022+ | 40893 | 9 | 1 |
| 048+,044- | 27181 | 048+:028+:044- | 27191 | 10 | 1 |
| 002+,050- | 25463 | 002+:015+:050- | 25475 | 12 | 1 |
| 108+,062- | 48785 | 108+:117-:062- | 48800 | 15 | 1 |
| 120+,105- | 57776 | 120+:115-:043-:103-:039+:104+:113-:100-:073-:105- | 57885 | 109 | 0 |
| 059-,107- | 15847 | 059-:078+:063+:092+:110-:027-:107- | 15609 | 238 | 0* |
| 062-,016+ | 11600 | 062-:015-:016+ | 11238 | 362 | 1 |
| 091+,067- | 24244 | 091+:053-:011+:067- | 23876 | 368 | 1 |
| 046+,120+ | 18124 | 046+:101+:084-:087-:025-:122-:120+ | 18753 | 629 | 1 |
| 054-,032- | 41348 | 054-:101+:084-:087-:025-:122-:123-:118-:116-:032- | 41983 | 635 | 1 |
| 076+,029- | 10053 | 076+:028+:029- | 9285 | 768 | 1 |
| 001+,023+ | 29005 | 001+:117+:023+ | 29788 | 783 | 0* |
| 016+,033- | 8649 | 016+:083+:033- | 6842 | 1807 | 0* |
| 069-,108+ | 113278 | 069-:116-:074+:082+:084-:087-:025-:122-:123-:088+:026-:092+:110-:027-:108+ | 111423 | 1855 | 0* |
| 045+,057- | 56400 | 045+:115-:043-:121-:024-:005+:003-:058+:119+:057- | 46042 | 10358 | 0* |
| 022+,002+ | 49821 | 022+:116-:002+ | 14242 | 35579 | 0 |
| 038-,076+ | 144606 | 038-:117+:111+:075-:011-:053+:021+:124+:014-:015-:076+ | 86949 | 57657 | 0* |
| 085+,048+ | 122463 | 085+:116+:007-:111-:048+ | 52387 | 70076 | 0* |
Connections of GI1 contigs by OMACC.
| Neighboring contigs | Gap size (bp) | Best contig path | Contig path length (bp) | Length difference | Consistent |
|---|---|---|---|---|---|
| 010+,002+ | 6451 | 010+:245+:253+:245+:253+:245+:253+:245+:002+** | 6444 | 7 | 1 |
| 019-,034+ | 24373 | 019-:200-:034+ | 24362 | 11 | 1 |
| 017+,043- | 9345 | 017+:176-:043- | 9365 | 20 | 1 |
| 018-,033- | 21609 | 018-:262+:181-:259-:033- | 21547 | 62 | 1 |
| 034+,007+ | 12182 | 034+:175-:007+ | 12096 | 86 | 1 |
| 003+,038+ | 16008 | 003+:227-:205+:038+ | 16095 | 87 | 1 |
| 015+,021-* | 11178 | 015+:021- | 11025 | 153 | 1 |
| 036+,013- | 17106 | 036+:149-:013- | 17269 | 163 | 1 |
| 039-,057- | 16268 | 039-:175-:057- | 16105 | 163 | 1 |
| 084+,071+ | 40296 | 084+:175+:142-:175+:142-:175+:142-:175+:142-:175+:142-:175+:142-:175+:071+** | 40523 | 227 | 1 |
| 026+,044+ | 17568 | 026+:199+:193-:199+:044+** | 17305 | 263 | 1 |
| 029-,070- | 8250 | 029-:179-:070- | 7930 | 320 | 1 |
| 035+,006- | 67333 | 035+:261-:173-:099-:182+:006- | 67009 | 324 | 1 |
| 023+,010+ | 19779 | 023+:159+:010+ | 20127 | 348 | 1 |
| 027-,016- | 75414 | 027-:196+:152+:196+:103-:126-:063-:016-** | 74985 | 429 | 1 |
| 037+,025+ | 18649 | 037+:176+:025+ | 18218 | 431 | 1 |
| 043-,056+ | 10828 | 043-:240+:056+ | 10288 | 540 | 1 |
| 004-,035+ | 24146 | 004-:175-:035+ | 23260 | 886 | 1 |
| 044+,020- | 44467 | 044+:168+:020- | 43369 | 1098 | 1 |
| 022-,026+* | 11242 | 022-:026+ | 9992 | 1250 | 1 |
| 041+,104+ | 27829 | 041+:175-:142+:175-:142+:175-:104+** | 29356 | 1527 | 1 |
The last column indicates whether the contig path length is consistent with the gap size, i.e., difference in length ≤2 Kb. *Gap closure unique to OMACC. **Best contig path different from those by FINISH.
Connections of GI1 contigs by FINISH. Note that although FINISH does not rescale optical map, the rescaled gap sizes are shown here.
| Neighboring contigs | Gap size (bp) | Best contig path | Contig path length (bp) | Length difference | Consistent |
|---|---|---|---|---|---|
| 019-,034+ | 24373 | 019-:200-:034+ | 24362 | 11 | 1 |
| 017+,043- | 9345 | 017+:176-:043- | 9365 | 20 | 1 |
| 018-,033- | 21609 | 018-:262+:181-:259-:033- | 21547 | 62 | 1 |
| 034+,007+ | 12182 | 034+:175-:007+ | 12096 | 86 | 1 |
| 003+,038+ | 16008 | 003+:227-:205+:038+ | 16095 | 87 | 1 |
| 036+,013- | 17106 | 036+:149-:013- | 17269 | 163 | 1 |
| 039-,057- | 16268 | 039-:175-:057- | 16105 | 163 | 1 |
| 029-,070- | 8250 | 029-:179-:070- | 7930 | 320 | 1 |
| 035+,006- | 67333 | 035+:261-:173-:099-:182+:006- | 67009 | 324 | 1 |
| 023+,010+ | 19779 | 023+:159+:010+ | 20127 | 348 | 1 |
| 037+,025+ | 18649 | 037+:176+:025+ | 18218 | 431 | 1 |
| 043-,056+ | 10828 | 043-:240+:056+ | 10288 | 540 | 1 |
| 004-,035+ | 24146 | 004-:175-:035+ | 23260 | 886 | 1 |
| 010+,002+ | 6451 | 010+:245+:002+** | 5556 | 895 | 1 |
| 044+,020- | 44467 | 044+:168+:020- | 43369 | 1098 | 1 |
| 026+,044+ | 17568 | 026+:199+:044+** | 15881 | 1687 | 1 |
| 027-,016- | 75414 | 027-:196+:103-:126-:063-:016-** | 72109 | 3305 | 0 |
| 005+,008-* | 10972 | 005+:175+:008- | 17552 | 6580 | 0 |
| 041+,104+ | 27829 | 041+:175-:104+** | 20998 | 6831 | 0 |
| 084+,071+ | 40296 | 084+:175+:071+** | 15449 | 24847 | 0 |
| 030-,132+* | 2838 | 030-:227-:205+:064+:159+:053-:175-:065-:132+ | 109213 | 106375 | 0 |
| 046-,119+* | 23104 | 046-:230-:009-:154+:139-:101+:186-:105+:126-:048+:175+:053+:159-:064-:205-:227+:119+ | 322943 | 299839 | 0 |
*Gap closure unique to FINISH. **Best contig path different from those by OMACC.
Differences in contig connections before and after (indicating by "<" and ">" in the first column, respectively) including non-uniquely aligned contigs for (a) OMACC on E. coli data, (b) FINISH on E. coli data, (c) OMACC on GI1 data, and (d) FINISH on GI1 data.
| In | Neighboring contigs | Gap size (bp) | Best contig path | Contig path length (bp) | Length difference | Correct/Consistent |
|---|---|---|---|---|---|---|
| (a) | ||||||
| > | 118-,032- | 9331 | 118-:116-:032- | 9334 | 3 | 1 |
| > | 060+,085+ | 6070 | 060+:116+:085+ | 7271 | 1201 | 0 |
| (b) | ||||||
| > | 118-,032- | 9331 | 118-:116-:032- | 9334 | 3 | 1 |
| < | 059-,107- | 15847 | 059-:078+:063+:092+:110-:027-:107-* | 15609 | 238 | 0 |
| > | 054-,118- | 13029 | 054-:101+:084-:087-:025-:122-:123-:118- | 13656 | 627 | 1 |
| < | 054-,032- | 41348 | 054-:101+:084-:087-:025-:122-:123-:118-:116-:032- | 41983 | 635 | 1 |
| > | 060+,085+ | 6070 | 060+:116+:085+ | 7271 | 1201 | 0 |
| < | 069-,108+ | 113278 | 069-:116-:074+:082+:084-:087-:025-:122-:123-:088+:026-:092+:110-:027-:108+ | 111423 | 1855 | 0 |
| > | 112+,066+ | 10905 | 112+:116-:065+:117+:111+:066+ | 14522 | 3617 | 0 |
| > | 066+,048+ | 45302 | 066+:117+:111+:048+ | 19187 | 26115 | 0 |
| > | 081+,052- | 68596 | 081+:053-:011+:051-:052- | 39775 | 28821 | 0 |
| < | 038-,076+ | 144606 | 038-:117+:111+:075-:011-:053+:021+:124+:014-:015-:076+ | 86949 | 57657 | 0 |
| > | 038-,081+ | 36179 | 038-:117+:089-:015+:014+:124-:021-:053-:011+:068+:116+:081+ | 103480 | 67301 | 0 |
| < | 085+,048+ | 122463 | 085+:116+:007-:111-:048+ | 52387 | 70076 | 0 |
| (c) | ||||||
| > | 014+,069+ | 6852 | 014+:069+ | 6761 | 91 | 1 |
| < | 035+,006- | 67333 | 035+:261-:173-:099-:182+:006- | 67009 | 324 | 1 |
| < | 027-,016- | 75414 | 027-:196+:152+:196+:103-:126-:063-:016- | 74985 | 429 | 1 |
| > | 048+,060- | 55177 | 048+:175+:142-:175+:142-:175+:142-:175+:142-:175+:142-:175+:060- | 55737 | 560 | 1 |
| (d) | ||||||
| < | 035+,006- | 67333 | 035+:261-:173-:099-:182+:006- | 67009 | 324 | 1 |
| < | 027-,016- | 75414 | 027-:196+:103-:126-:063-:016- | 72109 | 429 | 0 |
| > | 104+,074+ | 35095 | 104+:175+:112-:167+:102+:258+:160-:136+:198-:225+:228+:074+ | 37882 | 2787 | 0 |
| > | 048+,060- | 55177 | 048+:175+:060- | 34842 | 20335 | 0 |
| > | 111+,006- | 25002 | 111+:188-:149-:115-:173-:099-:182+:006- | 52888 | 27886 | 0 |
| < | 046-,119+ | 23104 | 046-:230-:009-:154+:139-:101+:186-:105+:126-:048+:175+:053+:159-:064-:205-:227+:119+* | 322943 | 299839 | 0 |
*These contig connections are not disrupted by any non-uniquely aligned contig, but disappear when non-uniquely aligned contigs are included, indicating a bug in FINISH.