| Literature DB >> 17697381 |
Shiguo Zhou1, Michael C Bechner, Michael Place, Chris P Churas, Louise Pape, Sally A Leong, Rod Runnheim, Dan K Forrest, Steve Goldstein, Miron Livny, David C Schwartz.
Abstract
BACKGROUND: Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data.Entities:
Mesh:
Year: 2007 PMID: 17697381 PMCID: PMC2048515 DOI: 10.1186/1471-2164-8-278
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Flow chart showing the strategy used for the assembly of optical maps.
Figure 2A whole-genome optical map of rice (Oryza sativa ssp. japonica cv. Nipponbare). The 14 optical map contigs are displayed as horizontal lines representing consensus maps; their centromeric regions located by green boxes and partial boxes indicate incomplete centromeric coverage. A consensus map comprises many (29,512 maps; 14 contigs) individual restriction maps, each constructed from one (~470 kb) endonuclease digested molecule shown overlapping other molecules along the accompanying diagonal track. Chromosomes marked with an "*" indicate partial optical map contigs. Inset shows a zoomed view of a ~400 kb interval on chromosome 12 (28.19 Mb). Here, each horizontal track depicts an optical map; its "daughter" restriction fragments are consecutive colored bars and congruent fragments across separate optical maps are color-keyed. Since restriction digestion is not quantitative, some bars (restriction fragments) bear missing or false cleavage sites – relative to the consensus map – flagged by disparate colors.
Figure 4Whole genome view showing optical map vs. IRGSP or TIGR sequence data (pseudomolecules) – identification of errors and their loci (Additional file 1 and 2). 6 tracks depict data and comparison for each of the rice chromosomes (1–12): track 1 (gold solid horizontal line), in silico SwaI maps of the pseudomolecule data; track 2 (grey bars), false cut – cut present in optical map, but absent in sequence data; track 3 (red bars), gaps present in sequence but filled by optical maps; track 4 (blue bars), sequence misassemblies; track 5 (green bars), missing cut – cut present in sequence, but absent in map data; track 6 (magenta bars), new gaps called within the sequence pseudomolecule by optical maps (Table 2 and 3).
Rice (Oryza sativa ssp. japonica cv. Nipponbare) whole-genome shotgun optical mapping
| 1 | 51.5 | 44 | 56.9/1.63 | 45.05 | 1(F) | 43.66 | 43.73/1.61 | 17.77/0.62/30.12/3.02 | 15.86 | 3,311 | 40.48 | 1.44 |
| 2 | 43.4 | 39.8 | 48.8/1.68 | 36.78 | 1(F) | 36.14 | 36.21/1.62 | 19.86/9.91/34.77/1.57 | 15.15 | 2,456 | 35.69 | 1.47 |
| 3 | 47.5 | 40.8 | 52.5/0.89 | 37.37 | 1(F) | 36.97 | 37.03/0.84 | 28.27/10.18/41.78/0.92 | 15.30 | 2,669 | 39.38 | 1.46 |
| 4 | 36.8 | 39 | 35.5/5.13 | 36.15 | 1(F) | 35.80 | 35.85/2.61 | 2.65/8.79/0.98/0.84 | 14.55 | 2,368 | 35.38 | 1.39 |
| 5 | 33.6 | 33.2 | 34.6/1.99 | 30.00 | 1(F) | 30.06 | 30.11/1.44 | 11.59/10.26/14.91/0.37 | 14.30 | 1,845 | 33.03 | 1.40 |
| 6 | 35.1 | 31.8 | 35.7/0.93 | 31.60 | 1 | 16.09 | 31.94/1.01 | 9.89/0.44/11.77/1.06 | 13.99 | 1,536 | 47.03 | 1.37 |
| 2 | 15.78 | 14.89 | 1,483 | 44.17 | 1.40 | |||||||
| 7 | 33.1 | 35 | 32.3/1.78 | 30.28 | 1(F) | 29.65 | 29.71/1.45 | 11.41/17.81/8.72/1.92 | 14.41 | 2,052 | 35.93 | 1.37 |
| 8 | 33.6 | 27.6 | 30.0/1.38 | 28.56 | 1(F) | 28.40 | 28.45/1.09 | 18.10/2.99/5.45/0.39 | 13.46 | 2,244 | 39.38 | 1.37 |
| 9 | 27 | 21.6 | 26.7/6.54 | 30.53 | 1 | 24.50 | 26.53/2.77 | 1.77/18.58/0.64/15.08 | 14.16 | 2,112 | 39.19 | 1.48 |
| 10 | 23.7 | 26.8 | 23.0/3.07 | 23.96 | 1(F) | 23.96 | 24.00/1.85 | 1.25/11.67/4.17/0.17 | 13.12 | 2,658 | 55.74 | 1.32 |
| 11 | 33.7 | 30.3 | 28.8/1.32 | 30.76 | 1 | 17.21 | 31.00/1.39 | 8.71/2.26/7.10/0.77 | 14.34 | 1,202 | 35.38 | 1.34 |
| 2 | 12.37 | 14.11 | 931 | 35.55 | 1.39 | |||||||
| 12 | 30.9 | 30.6 | 25.1/1.32 | 27.77 | 1(F) | 28.19 | 28.24/1.22 | 9.42/8.36/11.12/1.66 | 14.07 | 2,645 | 45.38 | 1.27 |
*Ch. = Chromosome; est. = estimated; AR = Arm Ratio; Opt. = Optical; Ave. = Average; Frag. = Fragment; SD = Standard Deviation; Cov. = Coverage; F = Finished; Difference = |Estimated size-Optical Map size|/Optical Map size*100%; Mol. No.: the number of single molecule maps were used to build the chromosome optical map contig; X Coverage = (total mass of maps within a contig)/(chromosome optical map consensus size). a Estimated chromosome size is based on the YAC and BAC physical map [7]; b Estimated chromosome size is based on the BAC/PAC physical map [13]; c Estimated chromosome size by pachytene FISH [62]. d International Rice Genome Sequencing Project [37]. ePredicted chromosome size = the optical map sizes + gap sizes [the centromere sizes in BAC and YAC map for chromosome 6 and 11] + missing small fragment sizes (Table 2).
Comparison between the optical maps and the in silico maps from the IRGSP pseudomolecules of rice genome sequence
| 1 | 2605 | 42.22 | 96.70 | 3.30 | 2/12 | 1/707 | 5/69 | 6/42 | 3/57 | 87/66.38 | 43/4 |
| 2 | 2272 | 35.23 | 97.48 | 3.41 | 1/6 | 1/429 | 3/21 | 2/12 | 2/10 | 73/65.55 | 27/1 |
| 3 | 2312 | 36.75 | 99.40 | 3.37 | 2/37 | 1/438 | 4/496 | 7/75 | 4/239 | 74/63.74 | 45/4 |
| 4 | 2230 | 33.65 | 93.99 | 3.49 | 2/111 | 0/0 | 5/192 | 15/359 | 15/1036 | 63/51.45 | 36/5 |
| 5 | 1985 | 29.03 | 96.57 | 3.60 | 2/255 | 1/54 | 5/96 | 9/66 | 8/314 | 61/52.26 | 29/3 |
| 6 | 2114 | 30.84 | 96.56 | 3.34 | 2/175 | 1/810 | 2/12 | 3/17 | 1/36 | 80/68.71 | 40/5 |
| 7 | 1974 | 29.20 | 98.48 | 3.60 | 1/13 | 2/84 | 0/0 | 8/145 | 3/55 | 64/57.67 | 39/4 |
| 8 | 1999 | 28.00 | 98.59 | 3.37 | 2/16 | 1/24 | 0/0 | 6/104 | 1/39 | 59/53.10 | 30/0 |
| 9 | 1613 | 22.31 | 84.25 | 3.63 | 2/3503 | 1/320 | 3/51 | 3/16 | 1/55 | 59/52.24 | 42/5 |
| 10 | 1664 | 22.51 | 93.95 | 3.35 | 2/54 | 1/117 | 5/671 | 7/79 | 3/62 | 45/36.53 | 28/4 |
| 11 | 1892 | 27.29 | 88.26 | 3.60 | 2/67 | 1/2090 | 4/178 | 9/141 | 5/635 | 91/76.07 | 23/8 |
| 12 | 1844 | 26.70 | 94.71 | 3.41 | 2/40 | 1/338 | 0/0 | 9/327 | 7/438 | 59/46.55 | 44/2 |
| 24,504 | 363.73 | 22/4289 | 12/5240 | 36/1786 | 84/1383 | 53/2974 | 815/690 | 426/46 | |||
| 94.91 | 3.46 | ||||||||||
*Ch. = Chromosome, Ave. = Average; frag. = fragment; Centr. = centromeric; a The predicted chromosome size based on optical map was used as the total mass to compute the percentage of the matching fragment mass. Gap or misassembly masses in the alignments were calculated based on the optical mapping data.
Comparison between the optical maps and the in silico maps from the TIGR pseudomolecules of rice genome sequence
| 1 | 2641 | 42.94 | 98.35 | 3.32 | 2/14 | 1/795 | 6/69 | 6/42 | 3/328 | 69/57.60 | 44/4 |
| 2 | 2301 | 35.52 | 98.28 | 3.45 | 1/10 | 1/466 | 3/17 | 2/12 | 2/14 | 62/52.53 | 33/1 |
| 3 | 2261 | 35.93 | 97.19 | 3.46 | 2/37 | 1/432 | 4/496 | 6/67 | 1/7.5 | 74/63.56 | 45/4 |
| 4 | 2279 | 34.63 | 96.73 | 3.57 | 2/112 | 0/0 | 3/65 | 21/621 | 11/435 | 62/51.92 | 36/5 |
| 5 | 2005 | 29.48 | 98.07 | 3.65 | 2/255 | 1/77 | 4/60 | 8/62 | 5/250 | 56/47.05 | 34/4 |
| 6 | 2118 | 30.90 | 96.74 | 3.31 | 2/175 | 1/810 | 1/2 | 3/17 | 1/42 | 73/61.90 | 39/4 |
| 7 | 1977 | 29.38 | 99.09 | 3.64 | 2/20 | 1/61 | 1/23 | 8/146 | 2/30 | 54/48.19 | 48/5 |
| 8 | 2009 | 28.10 | 98.94 | 3.49 | 2/23 | 1/24 | 1/1 | 8/245 | 2/68 | 52/45.18 | 37/2 |
| 9 | 1625 | 22.56 | 85.20 | 3.73 | 2/3451 | 1/302 | 4/54 | 3/20 | 2/57 | 58/52.50 | 42/4 |
| 10 | 1683 | 22.52 | 93.99 | 3.34 | 2/51 | 1/402 | 6/498 | 6/76 | 1/9 | 43/34.02 | 28/4 |
| 11 | 1939 | 28.15 | 91.04 | 3.62 | 2/66 | 1/2090 | 5/218 | 11/211 | 6/238 | 83/69.79 | 30/7 |
| 12 | 1878 | 27.03 | 95.89 | 3.52 | 2/41 | 1/342 | 0/0 | 11/358 | 3/63 | 58/48.18 | 38/2 |
| 24,716 | 367.14 | 23/4255 | 11/5801 | 38/1503 | 93/1877 | 39/1540 | 744/632.39 | 454/48 | |||
| 96.07 | 3.51 | ||||||||||
*Ch. = Chromosome, Ave. = Average; frag. = fragment; Centr. = centromeric; a The predicted chromosome size based on optical map was used as the total mass to compute the percentage of the matching fragment mass. Gap or misassembly masses in the alignments were calculated based on the optical mapping data.
Figure 3SwaI optical maps of chromosome 10 vs. IRGSP sequence pseudomolecule data. A: plot of sizing error: optical map fragments vs. in silico map fragment from well-aligned regions. The error bars represent the SD of optical map fragment sizes on the calculated means. B: plot of the relative error of optical fragment size vs. in silico map fragments derived from sequence data.
Figure 5Examples of gap filing, gap calling and sequence assembly discordances detected by alignments between in silico (pseudomolecule sequence) and optical maps. Panels A-D show types of discordances revealed through alignment of optical maps with in silico restriction maps from IRGSPBuild4 and TIGRBuild4 pseudomolecules; red arrows show their basepair locations, and green bars highlight the size of reported gaps in pseudomolecules. Aligned in silico (blue) and optical maps (gold) are shown as tracks comprising individual restriction fragments drawn as numbered bars whose length scales with size (kb). Identified discordances are annotated by color-keyed bars describing restriction map features presented by optical maps, but not found within corresponding in silico restriction maps: magenta = consecutive restriction fragments; red = restriction cut site(s); turquoise = missing restriction site(s). A: The top panel (IRGSPBuild4 Ch01, 10,043,782 – 10,107,330 bp) shows an overestimated sequence gap (green bar; 50.0 kb vs. optical map = 13.10 kb + 12.22 kb); bottom panel (IRGSPBuild4 Ch10, 3,968,375 – 4,068,843 bp), an underestimated gap (green bar; 100.468 kb vs. 26 optical restriction fragments = 507.18 kb, arrow). B: Discovered gaps in pseudomolecules: IRGSPBuild4 Ch08 (3,241,575 bp; 0.41 kb + 30.07 kb) and TIGRBuild4 Ch11 (27,515,267 bp; 2.65 kb + 47.26 kb). C: Extra sequence: IRGSPBuild4 Ch10 (14,828,584 – 14,978,980 bp, 150.396 kb vs. 48.36 kb, turquoise bar); TIGRBuild4 Ch11 (19,298,056 – 19,328,366 bp, 30.310 kb vs. 11.65 kb, turquoise bar). D: Misassembly: IRGSPBuild4 Ch04 (12,428,850 – 12,585,538 bp) vs. a stretch of 11 unaligned optical restriction fragments; TIGRBuild4 Ch04 (15,179,265 – 15,246,346 bp) vs. 5 unaligned restriction fragments. Panels E and F show examples of large-scale misassembly of sequence. In silico and optical maps are horizontal tracks comprising restriction fragments demarcated by vertical lines with aligned portions color-keyed and indicated by connecting lines; unaligned restriction fragments are white. E: IRGSPBulid4 Ch11 (29,945,713 – 30,823,503 bp; 877.790 kb) shows an 89.796 kb inversion (blue) and two significant portions (18.121 kb, 202.752 kb; white) unaligned to the optical map. F: IRGSPBuild4 & TIGRBuild4 Ch11 (18,181,576 – 18,600,983 bp; 419.407 kb & 15,853,410 – 16,272,766 bp; 419.356 kb, blue lettering for TIGRBuild4) show a 39.334 kb inversion (blue), a small insertion and portion (18,356,214 – 18,365,879 bp) missing a possible repetitive region characterized by the optical map.
Figure 6Estimates of chromosome 1 gap sizes by genetic markers or fibre (or pachytene) FISH, vs. optical mapping results. Diagram shows gaps (spaces between contigs), and sizes estimates among the 7 sequence contigs.