| Literature DB >> 26616024 |
Hiroaki Sakai1, Ken Naito2, Eri Ogiso-Tanaka2, Yu Takahashi2, Kohtaro Iseki2, Chiaki Muto2, Kazuhito Satou3, Kuniko Teruya3, Akino Shiroma3, Makiko Shimoji3, Takashi Hirano3, Takeshi Itoh1, Akito Kaga2, Norihiko Tomooka2.
Abstract
Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.Entities:
Mesh:
Year: 2015 PMID: 26616024 PMCID: PMC4663752 DOI: 10.1038/srep16780
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Statistics of the azuki genome assemblies.
| Contigs | Scaffolds | ||||||
|---|---|---|---|---|---|---|---|
| Assembly_1 | Assembly_2 | Assembly_3 | Assembly_1 | Assembly_2 | Assembly_3 | ||
| Assembly | |||||||
| Estimated genome size (bp) | 540,000,000 | 540,000,000 | 540,000,000 | 540,000,000 | 540,000,000 | 540,000,000 | |
| No. of sequences | 42,291 | 46,291 | 4,638 | 8,910 | 3,611 | 2,529 | |
| N50 (bp) | 27,734 | 20,134 | 809,255 | 612,411 | 3,015,641 | 2,952,390 | |
| Mean (bp) | 10,729 | 8,402 | 113,058 | 56,654 | 131,037 | 203,251 | |
| Max (bp) | 238,199 | 200,680 | 7,479,592 | 2,943,707 | 14,105,755 | 12,729,393 | |
| Total (bp) | 453,752,535 | 388,940,178 | 524,364,527 | 504,793,233 | 473,173,317 | 514,022,036 | |
| Coverage (%) | 84.0 | 72.0 | 97.1 | 93.5 | 87.6 | 95.2 | |
| Gap (%) | 0 | 0 | 0 | 10.1 | 16.1 | 0.1 | |
| No. of detected misassemblies | – | – | 19 | 376 | 69 | 0 | |
| No. of coding genes | – | – | – | 31,153 | 30,187 | 31,310 | |
| No. of non-coding genes | – | – | – | 2,658 | 2,482 | 2,493 | |
| No. of CEGs | – | – | – | 436 | 439 | 447 | |
| Anchored contigs/scaffolds | |||||||
| No. of sequences | – | – | 759 | 1,024 | 308 | 279 | |
| Total | – | – | 448,540,275 | 462,178,104 | 451,228,859 | 462,506,651 | |
| Coverage (%) | – | – | 83.1 | 85.6 | 83.6 | 85.6 | |
| Gap (%) | – | – | – | 6.8 | 14.7 | 0.07 | |
| No. of coding genes | – | – | – | 30,397 | 29,528 | 30,507 | |
| No. of non-coding genes | – | – | – | 2,548 | 2,377 | 2,453 | |
| No. of CEGs | – | – | – | 432 | 436 | 440 | |
*Calculated before anchoring.
Statistics of recently assembled legume genomes.
| Assembly version | v1.0 | ver3 | ver6 | v1.0 | v5.0 | v1.0 | Wm82.a2.v1 | Mt4.0v1 |
| Estimated genome size | 540 Mb | 591 Mb | 543 Mb | 738 Mb | 833 Mb | 587 Mb | 1,115 Mb | 463 Mb |
| No. of chromosomes | 11 | 11 | 11 | 8 | 11 | 11 | 20 | 8 |
| Sequencer | PacBio Illumina | Illumina Roche | Illumina Roche | Illumina | Illumina | Roche Illumina | Sanger | Sanger Illumina |
| BAC | No | No | No | No | No | Yes | Yes | Yes |
| Optical mapping | No | No | No | No | No | No | No | Yes |
| All scaffolds | ||||||||
| Total size (bp) | 522,761,097 | 444,438,822 | 463,085,359 | 532,289,632 | 605,780,537 | 521,076,998 | 978,495,272 | 411,831,487 |
| Gap rate (%) | 1.8 | 10.2 | 7.3 | 15.5 | 5.7 | 9.3 | 2.4 | 5.5 |
| Coverage (%) | 95.1 | 67.5 | 79.1 | 61.0 | 68.6 | 80.5 | 85.7 | 84.0 |
| No. of CEGs | 447 | 442 | 443 | 443 | 440 | 441 | 447 | 445 |
| Anchored scaffolds | ||||||||
| Total size (bp) | 471,245,712 | 227,273,901 | 333,308,464 | 347,247,377 | 247,494,949 | 514,820,528 | 949,183,385 | 384,466,993 |
| Gap rate (%) | 1.9 | 7.6 | 5.8 | 10.0 | 4.7 | 9.1 | 1.8 | 4.8 |
| Coverage (%) | 85.6 | 35.5 | 57.8 | 42.3 | 28.3 | 79.8 | 83.6 | 79.0 |
| No. of CEGs | 440 | 287 | 382 | 402 | 285 | 439 | 446 | 439 |
| Unanchored scaffolds | ||||||||
| Total size (bp) | 51,515,385 | 217,164,921 | 129,776,895 | 185,042,255 | 358,285,588 | 6,256,470 | 29,311,887 | 27,364,494 |
*Calculated after anchoring.
Figure 1NG graphs of the three assemblies in scaffold length (a) and contig length (b). The y-axis indicates the calculated NG contig/scaffold length (NG1 through NG100, see text for detail) in each assembled genome. The vertical line indicates the NG50 contig/scaffold length.
Figure 2Summary of annotations.
(a) The amounts of unique sequences, repetitive sequences, gaps, and unassembled sequences in each assembly. (b) Examples of wrong annotations in Assembly_2. At the locus of Vigan.02G030200 (top) in Assembly_3, sequence from the 2nd to the 3rd intron was left as a gap in Assembly_2, leading to fragmentations of this locus. The 23 kb region of the locus Vigan.03G124500 (bottom) was assembled into only a 13 kb contig in Assembly_2, in which both ends of this region were totally unassembled, and a 2 kb region in the 9th intron was missing. In this case, two genes were also annotated, one of which was mostly comprised of intronic sequences. (c) Number of gene families with size differences. ++ and −− indicate gene families with differences of more than +4 and −4 in size, respectively. (d) Difference in total gene numbers in gene families with size differences.
Figure 3NG graphs of legume genomes of (a) contigs and (b) pseudomolecules. The x-axis indicates NG integers, and the y-axis indicates the calculated NG length in each assembled genome. The vertical line indicates the NG50 contig/scaffold length. The labels are sorted according to the ranking of contig/scaffold NG50. The solid lines indicate the reference grade assemblies (total size of anchored scaffolds covering ~80% of genome), whereas broken and dotted lines indicate the draft assemblies (total size of anchored scaffolds covering ~50% and ~30%, respectively).
Figure 4An overview of the azuki bean genome.
The x-axis indicates the physical position in Mb in pseudomolecules of LG1, 2, and 5.