| Literature DB >> 31316072 |
Qiuju Xia1,2,3, Lei Pan4, Ru Zhang5, Xuemei Ni2,3, Yangzi Wang2,3, Xiao Dong2,3, Yun Gao6, Zhe Zhang2,3, Ling Kui7, Yong Li2,3, Wen Wang5,7, Huanming Yang1,8, Chanyou Chen4, Jianhua Miao9, Wei Chen10,11,12, Yang Dong13,14,15.
Abstract
Asparagus bean (Vigna. unguiculata ssp. sesquipedialis), known for its very long and tender green pods, is an important vegetable crop broadly grown in the developing Asian countries. In this study, we reported a 632.8 Mb assembly (549.81 Mb non-N size) of asparagus bean based on the whole genome shotgun sequencing strategy. We also generated a linkage map for asparagus bean, which helped anchor 94.42% of the scaffolds into 11 pseudo-chromosomes. A total of 42,609 protein-coding genes and 3,579 non-protein-coding genes were predicted from the assembly. Taken together, these genomic resources of asparagus bean will help develop a pan-genome of V. unguiculata and facilitate the investigation of economically valuable traits in this species, so that the cultivation of this plant would help combat the protein and energy malnutrition in the developing world.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31316072 PMCID: PMC6638192 DOI: 10.1038/s41597-019-0130-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1General description of the assembly workflow. The pipeline included removal of low quality and adapter-contaminated reads, de novo assembly, construction of linkage map, chromosome-scale assembly, and genome annotation.
Statistics of Raw Data after Filtering.
| Insert Size | Clean Length (bp) | Number of Clean Reads | Clean Bases (Gb) | Sequence Coverae (X) |
|---|---|---|---|---|
| 350 | 2 × 125 #Hiseq 4000 | 143,324,095 | 35.831 | 54.92 |
| 445 | 2 × 125 #Hiseq 4000 | 200,584,850 | 50.146 | 76.86 |
| 758 | 2 × 125 #Hiseq 4000 | 60,211,855 | 15.053 | 23.07 |
| 912 | 2 × 125 #Hiseq 4000 | 113,659,706 | 28.415 | 43.55 |
| 2000 | 2 × 125 #Hiseq 4000 | 79,141,602 | 19.785 | 30.32 |
| 3000 | 2 × 125 #Hiseq 4000 | 82,610,562 | 20.653 | 31.65 |
| 5000 | 2 × 125 #Hiseq 4000 | 80,415,362 | 20.104 | 30.81 |
| 9000 | 2 × 125 #Hiseq 4000 | 72,037,228 | 18.009 | 27.6 |
| 15000 | 2 × 125 #Hiseq 4000 | 59,701,495 | 14.925 | 22.87 |
| Total | — | 891,686,755 | 222.921 | 341.66 |
Estimation of genome size and heterozygosity of asparagus bean by k-mer analysis.
| k | Total number of k-mers | Minimum coverage (X) | Number of erroneous k-mers | Homozygous peak | Estimated genome size (Mb) | Estimated heterozygosity (%) |
|---|---|---|---|---|---|---|
| 17 | 61,995,624,762 | 36 | 2,614,930,973 | 100 | 593.81 | 0.81758 |
| 19 | 61,069,619,958 | 30 | 5,112,996,411 | 93 | 601.68 | 0.90512 |
| 21 | 60,143,656,148 | 28 | 6,464,570,689 | 90 | 596.43 | 0.89166 |
| 23 | 59,217,725,396 | 27 | 7,226,388,959 | 89 | 584.17 | 0.83246 |
| 25 | 58,291,828,145 | 26 | 7,766,008,145 | 86 | 587.51 | 0.76826 |
| 27 | 57,365,972,074 | 26 | 8,206,249,053 | 84 | 585.23 | 0.71033 |
| 29 | 56,440,164,790 | 24 | 8,580,620,676 | 82 | 583.65 | 0.66007 |
| 31 | 55,514,382,010 | 22 | 8,908,657,517 | 79 | 589.95 | 0.61617 |
Fig. 217-mer frequency distribution of sequencing reads.
Results of the asparagus bean genome assembly.
| Contigs | Scaffolds | |||
|---|---|---|---|---|
| Size (bp) | Number | Size (bp) | Number | |
| N90 | 4,293 | 36,621 | 221,483 | 308 |
| N80 | 7,053 | 26,804 | 918,008 | 183 |
| N70 | 9,566 | 20,138 | 1,507,419 | 130 |
| N60 | 12,222 | 15,059 | 2,195,354 | 96 |
| N50 | 15,154 | 11,022 | 2,730,264 | 70 |
| Longest | 119,701 | — | 14,145,393 | — |
| Total Number (> = 500b) | — | 61,962 | — | 9,083 |
| Total Number (> = 1 kb) | — | 54,864 | — | 5,621 |
| Total | 549,819,688 | 80,696 | 632,812,756 | 21,836 |
Statistics of pseudo-chromosomes and genetic map in asparagus bean.
| Chromosomes | Anchored Scaffolds Number | Total length (Mb) | SNP Number | bin marker Number | Genetic distance (cM) | Gene Bank accession |
|---|---|---|---|---|---|---|
| Vu01 | 162 | 52.07 | 54,989 | 159 | 113.72 | CP039350 |
| Vu02 | 81 | 41.88 | 41,888 | 170 | 125.31 | CP039348 |
| Vu03 | 161 | 82.25 | 58,426 | 306 | 398.24 | CP039346 |
| Vu04 | 233 | 55.8 | 40,719 | 175 | 185.51 | CP039349 |
| Vu05 | 87 | 60.58 | 31,849 | 171 | 83.74 | CP039354 |
| Vu06 | 97 | 45.38 | 36,916 | 154 | 94.72 | CP039345 |
| Vu07 | 81 | 51.81 | 23,748 | 189 | 207.05 | CP039353 |
| Vu08 | 148 | 49.22 | 44,186 | 193 | 333.04 | CP039351 |
| Vu09 | 79 | 53.94 | 27,657 | 179 | 260.14 | CP039355 |
| Vu10 | 203 | 49.61 | 95,735 | 155 | 164.48 | CP039352 |
| Vu11 | 224 | 54.99 | 80,711 | 162 | 214.19 | CP039347 |
| Total | 1556 | 597.53 | 536,824 | 2013 | 2180.14 |
Statistics of Repeats in the asparagus bean genome.
| Type | Repeat Size (bp) | % of genome |
|---|---|---|
| Trf | 67,718,076 | 10.67 |
| Repeatmasker | 41,222,404 | 6.49 |
| Proteinmask | 64,741,265 | 10.2 |
|
| 264,487,557 | 41.67 |
| Total | 294,953,638 | 46.47 |
TEs Content in the assembled asparagus bean genome.
| Type | Repbase TEs | TE proteins | De novo | Combined TEs | ||||
|---|---|---|---|---|---|---|---|---|
| Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | Length (bp) | % in genome | |
| DNA | 6,870,914 | 1.0825 | 9,850,112 | 1.5518 | 41,887,195 | 6.5992 | 46,098,143 | 7.2626 |
| LINE | 698,393 | 0.11 | 1,195,466 | 0.1883 | 1,651,447 | 0.2601 | 2,666,968 | 0.4201 |
| SINE | 30,804 | 0.0048 | — | — | 62,452 | 0.0098 | 74,704 | 0.0117 |
| LTR | 33,989,514 | 5.3549 | 53,894,224 | 8.4908 | 112,113,184 | 17.6631 | 122,145,625 | 19.2437 |
| Other | 13,118 | 0.002 | — | — | — | — | 13,118 | 0.002 |
| Unknown | — | — | — | — | 119,021,287 | 18.7515 | 119,021,287 | 18.7515 |
| Total | 41,222,404 | 6.494 | 64,741,265 | 10.1998 | 261,567,318 | 41.2092 | 272,160,906 | 42.8782 |
Prediction of protein-coding genes in asparagus bean genome.
| Gene set | Gene number | Ave. gene length | Ave. CDS length | Total Exon number | Ave. exon number | Ave. exon length | Total intron number | |
|---|---|---|---|---|---|---|---|---|
| Homology | Augustus | 45,883 | 2,243.10 | 1,005.11 | 207,693 | 4.53 | 222.05 | 56,802,940 |
| Arabidopsis | 26,867 | 3,133.37 | 1,080.92 | 124,326 | 4.63 | 233.59 | 55,143,207 | |
| Pigeonpea | 44,018 | 3,055.98 | 996.71 | 169,707 | 3.86 | 258.52 | 90,644,666 | |
| Chickpea | 29,722 | 3,267.60 | 1,101.41 | 135,727 | 4.57 | 241.19 | 64,383,299 | |
| Soybean | 35,380 | 2,919.91 | 1,032.92 | 152,214 | 4.3 | 240.09 | 66,761,546 | |
| Lotus | 37,713 | 2,436.51 | 912.21 | 142,619 | 3.78 | 241.22 | 57,486,204 | |
| Medicago | 37,164 | 2,785.79 | 951.18 | 148,495 | 4 | 238.05 | 68,181,528 | |
| Rice | 25,956 | 2,971.76 | 1,010.14 | 112,815 | 4.35 | 232.41 | 50,915,754 | |
| Common bean | 32,860 | 3,059.37 | 1,099.25 | 149,363 | 4.55 | 241.84 | 64,409,431 | |
| Mungbean | 29,468 | 3,695.35 | 1,123.44 | 143,184 | 4.86 | 231.21 | 75,789,153 | |
| Grape | 27,358 | 3,732.39 | 1,059.30 | 134,163 | 4.9 | 216.01 | 73,130,296 | |
| Adzuki bean | 37,596 | 3,191.78 | 991.8 | 160,449 | 4.27 | 232.4 | 82,710,459 | |
| Denovo | Genscan | 40,736 | 8,880.46 | 1,153.45 | 230,011 | 5.65 | 204.28 | 314,767,263 |
| GlimmerHMM | 46,755 | 1,867.51 | 847.52 | 164,690 | 3.52 | 240.61 | 47,689,651 | |
| Transcriptome | 114,947 | 8,244.23 | 752.27 | 243,192 | 2.12 | 355.57 | 861,179,063 | |
| EVidenceModeler | 42,609 | 3,156.05 | 1,043.18 | 190,304 | 4.47 | 233.57 | 90,027,213 | |
Fig. 3The distribution of bin markers. Black arrows indicated the low-recombination regions.
Functional annotation of predicted genes in asparagus bean genome.
| Number | Percent (%) | |
|---|---|---|
| Total | 42,609 | —– |
| InterPro | 25,254 | 59.27 |
| GO | 19,254 | 45.19 |
| KEGG | 18,372 | 43.12 |
| Swiss-Prot | 23,953 | 56.22 |
| TrEMBL | 32,126 | 75.4 |
| NR | 32,356 | 75.94 |
| Annotated | 32,513 | 76.31 |
Annotation of non-coding RNA in asparagus bean genome.
| Type | Copy(w) | Average Length(bp) | Total Length(bp) | % of genome | |
|---|---|---|---|---|---|
| miRNA | 210 | 118.0571 | 24792 | 0.003906 | |
| tRNA | 1593 | 75.10295 | 119639 | 0.018849 | |
| rRNA | 538 | 155.6636 | 83747 | 0.013194 | |
| 18S | 114 | 346.0877 | 39454 | 0.006216 | |
| rRNA | 28S | 77 | 116.2338 | 8950 | 0.00141 |
| 5.8S | 22 | 146.8636 | 3231 | 0.000509 | |
| 5S | 325 | 98.80615 | 32112 | 0.005059 | |
| snRNA | 350 | 120.44 | 42154 | 0.006641 | |
| snRNA | CD-box | 179 | 102.1285 | 18281 | 0.00288 |
| HACA-box | 24 | 123.7083 | 2969 | 0.000468 | |
| splicing | 147 | 142.2041 | 20904 | 0.003293 | |
Experimental study and data records.
| Subjects | Protocol 1 | Protocol 2 | Protocol 3 | Data 1 | Protocol 4 | Data 2 | ||
|---|---|---|---|---|---|---|---|---|
| Xiabao II | Young leaves dissection | DNA extraction | Whole-genome shotgun sequencing |
| Accession range: SRR7135464-SRR7135488 |
| Accession range: CP039345-CP039355 | |
| https://db.cngb.org/search/?q=CNP0000264&from=CNSA | ||||||||
| 10.6084/m9.figshare.8131823 | ||||||||
| 97 F2 individuals | Young leaves dissection | DNA extraction | Whole-genome resequencing |
| Accession range: SRR7125688-SRR7125784 | Genetic map construction and chromosome assembly | 10.6084/m9.figshare.8131823 | |
| Xiabao II | Root and stem tissues | RNA extraction | RNA-seq | 10.6084/m9.figshare.8131535 | Annotation based on RNA-seq | 10.6084/m9.figshare.8131823 | ||
Fig. 4Synteny between asparagus bean pseudo-chromosomes and “ZZ v.2” linkage map. Each linkage on the right corresponds to one chromosome on the left with lines.
Fig. 5Comparative genome analysis between Xiabao II and IT97K-499-35a. Black arrows indicated the inconsistent areas between these two genomes.
Comparisons of other four published cowpea assemblies.
| Xiabao II | IT97K-499-35a | IT97K-499-35b | IT97K-499-35c | IT86D-1010 | |
|---|---|---|---|---|---|
| Assembled Non-N Size (Mb) | 549.81 | 518.8 | 323.3 | 568.1 | 609.5 |
| GC content (%) | 28.78 | 32.99 | 35.96 | 33.6 | 33.59 |
| Repeat elements (%) | 46.47 | 49.5 | NA | NA | NA |
| Scaffold N50 size (kb) | 2730.26 | 16,417.66 | 6.33 | 17.92 | 36.69 |
| Total scaffolds | 21,836 | 722 | 644,126 | 57,590 | 39,123 |
| Number of Anchored into chromosomes | 1,556 | 47 | NA | NA | NA |
| Annotated protein-coding genes | 42,609 | 29,773 | NA | NA | NA |
| Numbers of CDSd | 41,457 | 42,287 | 14,994 | 40,055 | 40,198 |
aIT97K-499-35 assembled by Lonardi et al. 2019.
bIT97K-499-35 assembled by Munoz-Amatriain et al. 2017.
cIT97K-499-35 assembled by Spriggs et al. 2018.
dA total of 4,2287 cds sequences from Vigna unguiculata v1.0, NSF, UCR, USAID, DOE-JGI, http://phytozome.jgi.doe.gov/.
| Design Type(s) | sequence assembly objective • sequence annotation objective |
| Measurement Type(s) | genome assembly |
| Technology Type(s) | DNA sequencing |
| Factor Type(s) | |
| Sample Characteristic(s) | Vigna unguiculata subsp. sesquipedalis • leaf |