| Literature DB >> 31043757 |
Weijian Zhuang1, Hua Chen2, Meng Yang3, Jianping Wang4,5, Manish K Pandey6, Chong Zhang2, Wen-Chi Chang7,8, Liangsheng Zhang4, Xingtan Zhang4, Ronghua Tang9, Vanika Garg6, Xingjun Wang10, Haibao Tang4, Chi-Nga Chow7,8, Jinpeng Wang11, Ye Deng2, Depeng Wang3, Aamir W Khan6,12, Qiang Yang2, Tiecheng Cai2, Prasad Bajaj6, Kangcheng Wu2,4, Baozhu Guo2,13, Xinyou Zhang14, Jingjing Li3, Fan Liang3, Jiang Hu3, Boshou Liao15, Shengyi Liu15, Annapurna Chitikineni6, Hansong Yan4, Yixiong Zheng2,16, Shihua Shan10, Qinzheng Liu2, Dongyang Xie2, Zhenyi Wang11, Shahid Ali Khan2, Niaz Ali2, Chuanzhi Zhao10, Xinguo Li10, Ziliang Luo5, Shubiao Zhang2,17, Ruirong Zhuang2, Ze Peng5, Shuaiyin Wang2, Gandeka Mamadou2, Yuhui Zhuang2,18, Zifan Zhao5, Weichang Yu19, Faqian Xiong9, Weipeng Quan3, Mei Yuan10, Yu Li2,17, Huasong Zou2, Han Xia10, Li Zha2, Junpeng Fan3, Jigao Yu11, Wenping Xie2, Jiaqing Yuan11, Kun Chen2, Shanshan Zhao2, Wenting Chu2, Yuting Chen2, Pengchuan Sun11, Fanbo Meng11, Tao Zhuo2, Yuhao Zhao11, Chunjuan Li10, Guohao He20, Yongli Zhao20, Congcong Wang16, Polavarapu Bilhan Kavikishor21, Rong-Long Pan2,22, Andrew H Paterson11,23, Xiyin Wang24, Ray Ming25,26, Rajeev K Varshney27,28.
Abstract
High oil and protein content make tetraploid peanut a leading oil and food legume. Here we report a high-quality peanut genome sequence, comprising 2.54 Gb with 20 pseudomolecules and 83,709 protein-coding gene models. We characterize gene functional groups implicated in seed size evolution, seed oil content, disease resistance and symbiotic nitrogen fixation. The peanut B subgenome has more genes and general expression dominance, temporally associated with long-terminal-repeat expansion in the A subgenome that also raises questions about the A-genome progenitor. The polyploid genome provided insights into the evolution of Arachis hypogaea and other legume chromosomes. Resequencing of 52 accessions suggests that independent domestications formed peanut ecotypes. Whereas 0.42-0.47 million years ago (Ma) polyploidy constrained genetic variation, the peanut genome sequence aids mapping and candidate-gene discovery for traits such as seed size and color, foliar disease resistance and others, also providing a cornerstone for functional genomics and peanut improvement.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31043757 PMCID: PMC7188672 DOI: 10.1038/s41588-019-0402-2
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Peanut genome assembly statistics
| Illumina | Illumina + Linkage Map | Illumina | Illumina + Linkage Map | PacBioa | PacBio + Hi-C | PacBio + Hi-C + Linkage Map | |
|---|---|---|---|---|---|---|---|
| Total assembly size of contigs (bp) | 1,211,482,656 | 1,512,089,950 | 2,538,408,906 | ||||
| Number of contigs | 765,406 | 869,435 | 7,232 | ||||
| N50 contig length (bp) | 22,293 | 23,492 | 1,509,423 | ||||
| N90 contig length (bp) | NA | NA | 342,540 | ||||
| L50 contig count | 12,992 | 15,898 | 505 | ||||
| L90 contig count | NA | NA | 1,804 | ||||
| Longest contig (bp) | 221,145 | 250,973 | 8,550,813 | ||||
| Total assembly size of scaffolds (bp) | 1,074,450,206 | 1,041,781,911 | 1,388,638,929 | 1,342,408,530 | 2,424,161,010 | 2,506,735,760 | |
| Number of scaffolds | 635,392 | 10 | 759,499 | 10 | 20 | 20 | |
| N50 scaffold length (bp) | 947,955 | 110,037,037 | 5,343,284 | 136,175,642 | 129,846,058 | 135,108,068 | |
| N90 scaffold length (bp) | NA | 94,617,824 | NA | 126,351,151 | 104,681,234 | 109,264,827 | |
| L50 scaffold count | 334 | 5 | 86 | 5 | 10 | 9 | |
| L90 scaffold count | NA | 8 | NA | 9 | 17 | 17 | |
| Missing bases (%)b | 11.3 | 3.0 | 8.2 | 3.3 | 4.5 | 1.3 | |
NA, not available; L50, smallest number of contigs whose length sum makes up half of the assembled genome; L90, smallest number of contigs whose length sum makes up 90% of the assembled genome. aWith HiSeq clean data 1,350 Gb for quivering.
bMissing bases (%) = Gap length / total assembly size × 100.
Fig. 1Expression differentiation of paired homeologous genes between peanut subgenomes and repeat expansion among peanut and diploid ancestor genomes.
a, Widespread expression differentiation of homeologous gene pairs between two subgenomes is shown. Homeologous chromosomes are indicated at the bottom of the figure. b, Density distribution of substitution rates using the paired-end sequences of LTR retrotransposons in the A. hypogaea, A. hypogaea-SubA, A. hypogaea-SubB, A. duranenesis and A. ipaensis genomes. The LTR in A. hypogaea and A. hypogaea-SubA exhibited rapid expansion ~246,700 years ago, but those of A. hypogaea-SubB, A. duranensis and A. ipaensis did so about 0.8922, 1.1206 and 1.0049 Ma, respectively, based on the formula T = S/2 µ (where T is the evolution time, S is the substitution rate here and µ is the 1.64 × 10−8 substitution rate per year; Supplementary Note 3.3.5). The number of LTR retrotransposons and the peak substitution rate for each part are embedded in the figure.
Fig. 2Characterization of the peanut genome and chromosomes.
a, Circos diagram depicting relationships of A and B subgenome chromosomal pseudomolecules. The scale for the chromosomes (outer bars) is megabases; colors represent the density of nonautonomous LTR retrotransposons and Ty3-gypsy elements (blue) and genes (green). Homeologous blocks of ≥30 gene pairs between Chr01–Chr10 and Chr11–Chr20 (A01–A10 and B01–B10, respectively) are connected with lines. b, Syntenic comparisons between peanut subgenomes and diploid A and B genomes. The outer three circles are chromosomes, density of genes and of Ty3-gypsy and nonautonomous LTR retrotransposons (as shown in a). Colored lines connect blocks with ≥30 orthologous gene pairs between the A and B subgenomes and A. duranensis and A. ipaensis genomes, respectively, based on BLASTN. c, Alignment of diploid peanut A03 and B03 contigs to corresponding tetraploid chromosomes, with parameters: -a 8 -p blastn -m 9 -e 1e-10. The best hits with alignment length ≥2,000 bp were plotted. Translocation between chromosomes A03 and B03 is evident in cultivated peanut. d, Eleven-genome alignment using co-linear genes, each mapped onto the barrel medic chromosomes. A, A. hypogaea A; B, A. hypogaea B; C, C. cajan; D, A. duranensis; E, C. arietinum; G, G. max; I, A. ipaensis; L, L. japonicus; M, M. truncatula; P, P. vulgaris; R, V. radiata; U, V. angularis.
Fig. 3Legume karyotype evolution.
The 16 ancestral legume chromosomes (called Lu, denoted by capital letters A–Q), were reconstructed by using corresponding common bean genes and compared with extant legume genomes. By using dot plots between Lu and each legume genome, and between close legume relatives, we reconstruct the origin of peanut chromosomes, including A. duranensis and A. ipaensis.
Fig. 4Peanut gene retention after tetraploidization.
a, Numbers of shared and unique orthologous protein-coding gene clusters in peanut (AHAB), A. duranensis (Aradu) and A. ipaensis (Araip). b, The number of single-copy gene sets is presented (blue), retained as a single copy (orange) or lost (gray) in the peanut A or B subgenomes. c, Maximum likelihood tree of ARF gene family, with 114, 28 and 28 members in peanut, A. duranensis, and A. ipaensis, respectively. Branch values represent the percentage of 1,000 bootstrap replicates supporting the topology. Scale bar represents substitutions per site. d, Chromosome distributions of genes involved in fatty acid metabolism, symbiotic nitrogen fixation pathways and biotic stress resistance in cultivated peanut, from outer to inner circles representing chromosomes, R genes, acyl-lipid-related and nodulation-related genes.
Fig. 5Evolutionary history of peanut.
a, Ks distributions of gene pairs in each species. Diploid A (Ad) and B (Bd) genomes diverged from one another about 2.6 Ma, and from their corresponding subgenomes ~0.42–0.47 Ma based on a mutation rate of 8.21 × 10−9 Ks yr−1 (ref. [38]). b, Maximum likelihood tree of 52 varieties generated from 17.16 million SNPs. Color brackets indicate different groups. Topologies are supported by percentages of 1,000 bootstrap replicates indicated by branch values. Scale bar represents substitutions per site. c, Pattern of admixture analysis of the 52 accessions when K = 3. Of the three major groups detected, accessions from A-genome, Pr-genome and one synthetic, ISATGR 184, clustered together as group 1 (red bars). All of the accessions belonging to the B-genome, K-genome, C-genome and E-genome clustered together with three synthetics (ISATGR 5, ISATGR 1212 and ISATGR 278-18) as group 2 (blue bars). The largest group was group 3 (green bars), consisting of all the tetraploids, except the synthetics. d, Evolutionary relationships and distribution of Arachis species, showing the hypothesized hybridization producing tetraploid A. monticola and the subsequent evolution of peanut into two subspecies and four (later six) varieties or ecotypes. Dashed line arrows A–D show the original A. hypogaea varieties were moved and domesticated independently to form var. hypogaea in Bolivia; Peruvian type (var. hirsuta) in Ancon, Peru; Valencia type (fastigiata) in Paraguay-central Brazil; and Spanish type (vulgaris) in the Guarani area (Paraguay–Argentina–Brazil), respectively[5,40]. Accessions are shown based on collection site. A.h., Arachis hypogaea; ssp., subspecies; var., variant.
Fig. 6Candidate genes underlying seed size and color and foliar disease resistances.
a, Seeds with red and pink testa color, linkage mapping and the candidate-gene model with SNPs. Scale bars indicate 1 cm. b, Phenotypes of RILs with pod size segregation, BSA mapping by resequencing and QTL mapping of pod size (100 pod weight in RILs (Yueyou 92 × Xinhuixiaoli)). Seed size QTLs were mapped on Chr07 (A07) and Chr12 (B02) using genetic mapping and QTL-seq approaches. Scale bar indicates 1 cm. c, Phenotypes of LLS-susceptible and LLS-resistant RILs from TAG 24 × GPBD 4. A Chr13 (B03) genomic region was mapped for both LLS and rust resistance. Scale bars indicate 5 cm.