| Literature DB >> 29931126 |
Dongmei Yin1, Changmian Ji2, Xingli Ma1, Hang Li2, Wanke Zhang3, Song Li2, Fuyan Liu2, Kunkun Zhao1, Fapeng Li1, Ke Li1, Longlong Ning1, Jialin He1, Yuejun Wang4, Fei Zhao4, Yilin Xie4, Hongkun Zheng2, Xingguo Zhang1, Yijing Zhang4, Jinsong Zhang3.
Abstract
Arachis monticola (2n = 4x = 40) is the only allotetraploid wild peanut within the Arachis genus and section, with an AABB-type genome of ∼2.7 Gb in size. The AA-type subgenome is derived from diploid wild peanut Arachis duranensis, and the BB-type subgenome is derived from diploid wild peanut Arachis ipaensis. A. monticola is regarded either as the direct progenitor of the cultivated peanut or as an introgressive derivative between the cultivated peanut and wild species. The large polyploidy genome structure and enormous nearly identical regions of the genome make the assembly of chromosomal pseudomolecules very challenging. Here we report the first reference quality assembly of the A. monticola genome, using a series of advanced technologies. The final whole genome of A. monticola is ∼2.62 Gb and has a contig N50 and scaffold N50 of 106.66 Kb and 124.92 Mb, respectively. The vast majority (91.83%) of the assembled sequence was anchored onto the 20 pseudo-chromosomes, and 96.07% of assemblies were accurately separated into AA- and BB- subgenomes. We demonstrated efficiency of the current state of the strategy for de novo assembly of the highly complex allotetraploid species, wild peanut (A. monticola), based on whole-genome shotgun sequencing, single molecule real-time sequencing, high-throughput chromosome conformation capture technology, and BioNano optical genome maps. These combined technologies produced reference-quality genome of the allotetraploid wild peanut, which is valuable for understanding the peanut domestication and evolution within the Arachis genus and among legume crops.Entities:
Mesh:
Year: 2018 PMID: 29931126 PMCID: PMC6009596 DOI: 10.1093/gigascience/giy066
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Morphological characters of the Arachis monticola. Mature plants in field (A), flowers (B), and pods (C) are shown.
Statistics of pseudochromosomes of A. monticola
| Chr | Length (bp) | No. of gap | Gap length (bp) | Gaps ratio (%) | Anchored percent (%) | |
|---|---|---|---|---|---|---|
|
| A.mon-A01 | 118,283,061 | 1961 | 12,923,146 | 10.93 | 4.51 |
| A.mon-A02 | 84,409,872 | 1598 | 13,652,890 | 16.17 | 3.22 | |
| A.mon-A03 | 123,011,103 | 2089 | 18,448,429 | 15.00 | 4.69 | |
| A.mon-A04 | 106,244,467 | 2020 | 15,031,534 | 14.15 | 4.05 | |
| A.mon-A05 | 123,320,146 | 1950 | 15,552,662 | 12.61 | 4.70 | |
| A.mon-A06 | 98,474,784 | 1770 | 11,764,791 | 11.95 | 3.75 | |
| A.mon-A07 | 72,108,480 | 1299 | 7250,302 | 10.05 | 2.75 | |
| A.mon-A08 | 39,681,652 | 442 | 1898,702 | 4.78 | 1.51 | |
| A.mon-A09 | 107,717,523 | 1889 | 11,324,084 | 10.51 | 4.11 | |
| A.mon-A10 | 100,634,791 | 1847 | 13,895,555 | 13.81 | 3.84 | |
| Un-chr | 61,870,352 | 422 | 7811,614 | 12.63 | 2.36 | |
|
| A.mon-B01 | 140,073,190 | 2773 | 17,354,378 | 12.39 | 5.34 |
| A.mon-B02 | 124,915,013 | 2271 | 14,941,271 | 11.96 | 4.76 | |
| A.mon-B03 | 160,549,902 | 2512 | 18,727,668 | 11.66 | 6.12 | |
| A.mon-B04 | 147,957,427 | 2521 | 16,939,677 | 11.45 | 5.64 | |
| A.mon-B05 | 121,568,645 | 2396 | 14,347,666 | 11.80 | 4.63 | |
| A.mon-B06 | 154,488,041 | 2644 | 22,222,939 | 14.38 | 5.89 | |
| A.mon-B07 | 136,067,974 | 2462 | 15,804,193 | 11.61 | 5.19 | |
| A.mon-B08 | 138,850,997 | 2492 | 17,429,178 | 12.55 | 5.29 | |
| A.mon-B09 | 163,848,611 | 2991 | 16,573,361 | 10.12 | 6.24 | |
| A.mon-B10 | 147,468,805 | 2693 | 18,369,757 | 12.46 | 5.62 | |
| Un-chr | 49,370,401 | 428 | 7142,698 | 14.47 | 1.88 | |
| Unknown | – | 103,005,886 | 972 | 16,282,706 | 15.81 | – |
| Total | 2623,921,123 | 46,879 | 325,689,201 | 12.41 | – |
Comparison of assembly results between A. monticola and its progenitors
|
|
|
|
| |
|---|---|---|---|---|
| Genome size (bp) | 1035,756,231 | 1485,159,006 | 1068,326,401 | 1257,035,815 |
| Contig number | 18,620 | 27,431 | 135,613 | 123,165 |
| Max length (bp) | 1481,449 | 1683,058 | 221,145 | 250,973 |
| Min length (bp) | 14,852 | 10,392 | 10,007 | 10,021 |
| Contig N50 (bp) | 107,702 | 110,501 | 22,900 | 22,562 |
| Contig N90 (bp) | 29,116 | 29,291 | 3342 | 5216 |
| Gap number | 18,005 | 26,847 | 134,110 | 122,617 |
| Gap ratio (%) | 12.50 | 12.11 | 11.95 | 7.32 |
| GC content (%) | 35.79 | 36.18 | 35.81 | 36.85 |
Note: only sequences whose length is more than 10 kb are considered.
Figure 3:Interaction frequency distribution of Hi-C links among chromosomes. (A) Genome-wide Hi-C map of A. monticola. (B) Genome-wide Hi-C map of A. ipaensis and A. duranensis. We scanned the genome by 500-kb nonoverlapping window as a bin and calculated valid interaction links of Hi-C data between any pair of bins. The log2 of link number was calculated. The distribution of links among chromosomes was exhibited by heatmap based on HiCplotter. The color key of heatmap ranging from light yellow to dark red indicated the frequency of Hi-C interaction links from low to high (0∼10).
Figure 2:Work flow of assembly of alloteraploid wild peanut (A. monticola). We first corrected SMRT subreads by error correction module of Canu based on 36.10x Pacbio subreads. For subreads aborted by Canu, we corrected them with LoRDEC based on ∼50-fold coverage of Illumina short reads. Then we assembled these high-quality data using Canu, Falcon, and WTDGB, respectively, and used Pilon to polish them. To integrate advantages of different algorithms, we merged the assemblies by Quickmerge. We also curated “chimeric error” of genome assembly combing Pacbio molecules, BioNano data, and HiC links and scaffolded the contigs using SSPACE and IrysView. Further analysis of scaffold order and orientation through HiC-pro and LACHESIS led to chromosome-length scaffolds. SMRT subreads and short reads were used for gap filling and genome polishing through Pbjelly, GapCloser, and Pilon packages. The subgenomes of AA- and BB- genotypes were simply distinguished by the overall macro-synteny between genome assemblies and its corresponding ancestors.