| Literature DB >> 29774047 |
Qing Lu1, Haifen Li1, Yanbin Hong1, Guoqiang Zhang2, Shijie Wen1, Xingyu Li1, Guiyuan Zhou1, Shaoxiong Li1, Hao Liu1, Haiyan Liu1, Zhongjian Liu2, Rajeev K Varshney3,4, Xiaoping Chen1, Xuanqiang Liang1.
Abstract
Peanut (Arachis hypogaea L.), an important leguminous crop, is widely cultivated in tropical and subtropical regions. Peanut is an allotetraploid, having A and B subgenomes that maybe have originated in its diploid progenitors Arachis duranensis (A-genome) and Arachis ipaensis (B-genome), respectively. We previously sequenced the former and here present the draft genome of the latter, expanding our knowledge of the unique biology of Arachis. The assembled genome of A. ipaensis is ~1.39 Gb with 39,704 predicted protein-encoding genes. A gene family analysis revealed that the FAR1 family may be involved in regulating peanut special fruit development. Genomic evolutionary analyses estimated that the two progenitors diverged ~3.3 million years ago and suggested that A. ipaensis experienced a whole-genome duplication event after the divergence of Glycine max. We identified a set of disease resistance-related genes and candidate genes for biological nitrogen fixation. In particular, two and four homologous genes that may be involved in the regulation of nodule development were obtained from A. ipaensis and A. duranensis, respectively. We outline a comprehensive network involved in drought adaptation. Additionally, we analyzed the metabolic pathways involved in oil biosynthesis and found genes related to fatty acid and triacylglycerol synthesis. Importantly, three new FAD2 homologous genes were identified from A. ipaensis and one was completely homologous at the amino acid level with FAD2 from A. hypogaea. The availability of the A. ipaensis and A. duranensis genomic assemblies will advance our knowledge of the peanut genome.Entities:
Keywords: Arachis ipaensis; genome evolution; genome sequence; polyploidizations; whole genome duplication
Year: 2018 PMID: 29774047 PMCID: PMC5943715 DOI: 10.3389/fpls.2018.00604
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Genome assembly and annotation of the A. ipaensis.
| Number of scaffolds | 79,408 |
| Total span | 1,391,700,926 bp (~1.39 G) |
| N50 (scaffolds) | 170,050 bp |
| Longest scaffold | 1,172,168 bp |
| Number of contigs | 1,008,989 |
| N50 (contigs) | 8,067 bp |
| Longest contig | 81,804 bp |
| GC content | 36.70% |
| Number of gene models | 39,704 |
| Mean transcript length | 3,741 bp |
| Mean coding sequence length | 1,246 bp |
| Mean number of exons per gene | 4.99 |
| Mean exon length | 250 bp |
| Mean intron length | 625 bp |
| Mean gene density | 35.05 Kb |
| Number of genes annotated | 39,645 |
| Number of genes unannotated | 59 |
| Number of pre-miRNA genes | 71 |
| Mean length of pre-miRNA genes | 123 bp |
| Pre-miRNA genes share in genome | 0.000590% |
| Number of pre-rRNA fragments | 313 |
| Mean length of pre-rRNA fragments | 186 bp |
| Pre-rRNA fragments share in genome | 0.003928% |
| Number of pre-tRNA genes | 2,914 |
| Mean length of pre-tRNA genes | 75 bp |
| Pre-tRNA genes share in genome | 0.014836% |
| Number of pre-snRNA genes | 152 |
| Mean length of pre-snRNA genes | 111 bp |
| Pre-snRNA genes share in genome | 0.001139% |
| Total transposable elements, bp (TEs) | 1,125,924,736 |
| Transposable element percent in genome | 75.97% |
Figure 1A. ipaensis genome overview. From the outer edge inward, circles represent the 50 largest DNA sequence scaffolds (green), the genes on each scaffold (purple), the non-coding RNA on each scaffolds (brown), GC content (red and blue), repeat density at 10 Kb (yellow), and transposable element density at 10 Kb (black).
Figure 2Comparative genomic and evolutionary analysis. (A) Scatter plot of percentage of A. ipaensis transcription factors in relation to L. japonicas, C. arietinum, C. cajan, G. max, A. thaliana and O. sativa. (B) Venn diagram showing distribution of gene families among A. ipaensis, G. max, M. truncatula, C. cajan and C. arietinum. (C) Cluster tree for 17 plant species including common leguminous and gramineous crops based on single copy orthologous genes. (D) Phylogenetic tree for 7 representative plant species. The numerical on each node represents the estimated differentiation time using the evolutionary time between A. thaliana and G. max (~108–114 Mya) as a correction. (E) Syntenic relationship between A. ipaensis scaffolds and G. max chromosomes. (F) Synonymous substitution rate (Ks) dating of duplication blocks in A. ipaensis and different combinations of orthologs of A. duranensis, A. thaliana, C. arietinum, G. max, and O. sativa. Different colored lines represent the distribution of Ks against orthologs gene pairs among different plant species. Inset shows the distribution of Ks between the gene pairs present in the duplicated blocks within the A. ipaensis genome.
Organization of repetitive sequences in A. ipaensis genome.
| Total retrotransposons | 2,444,183 | 9,88,193,900 | 87.77 | 66.68 |
| LINE retrotransposons | 163,947 | 43,942,874 | 3.9 | 2.97 |
| SINE retrotransposons | 2,859 | 726,676 | 0.06 | 0.05 |
| LTR retrotransposons | 2,277,377 | 950,690,158 | 84.44 | 64.15 |
| Gypsy | 1,727,232 | 796,763,491 | 70.77 | 53.76 |
| Copia | 343,066 | 91,500,532 | 8.13 | 6.17 |
| LTR | 23,529 | 1,543,961 | 0.14 | 0.10 |
| Other | 183,550 | 98,476,493 | 8.75 | 6.64 |
| Other retrotransposons | 668 | 47,680 | 0 | 0.00 |
| Total DNA transposons | 364,250 | 98,441,246 | 8.74 | 6.64 |
| Total unclassified elements | 311,209 | 84,709,729 | 7.52 | 5.72 |
| Total transposable elements | 3,120,310 | 1,125,924,736 | – | 75.97 |
| Redundant | 1,171,344,875 | |||
| Nonredundant | 1,125,924,736 |
Figure 3Biological nitrogen fixation in leguminous plants. (A) Genes involved in nodule initiation, development and signal recognition pathway. (B) Protein sequence alignment of Nod related genes identified in A. ipaensis and A. duranensis. (C) Phylogenetic tree of nodule development genes and their homologs from A. ipaensis and A. duranensis. (D) Identification of high conserved domains of leucine-rich repeat (LRR) receptor kinases. Red dashed boxes represent LRR conserved motif. (A) The rhizobium (blue) attach to the surface of root hair cell. After swelling, deformation, curling and infection thread, the bacteria are released into cells via endocytosis then a vacuole-like structures (symbiosomes), in which the bacteria convert N2 to NH3, formed. But how is the Nod signal transmitted? Initially, the rhizobia-derived signal is perceived by LysM-type protein receptor kinases, such as NRF1 and 5 (Radutoiu et al., 2003) and SYM10 (Schneider et al., 2002) identified in L. japonicus and P. sativuml, followed by a downstream leucine-rich receptor kinase, for example SYMRK (Stracke et al., 2002 and Capoen et al., 2005), NORK (Endre et al., 2002), DMI2 (Catoira et al., 2000), and SYM19 (Stracke et al., 2004) from L. japonicus, Sesbania rostrata, M. sativa, M. truncatula, and P. sativuml, respectively. Then, the Nod factor (NF) signal is processed through a signal transduction cascade involving proteins including ion channels [MDI1(Ané et al., 2004), CASTOR (Imaizumi-Anraku et al., 2005), POLLUX (Imaizumi-Anraku et al., 2005), and SYM8 (Edwards et al., 2007)], calcium-calmodulin-dependent kinase (CaCaDK) (MDI3 and SYM9) (Lévy et al., 2004) and transcription factors [NSP1 (Smit et al., 2005), NSP2 (Kaló et al., 2005), SYM7 (Kaló et al., 2005), NIN (Schauser et al., 1999), and SYM35 (Borisov et al., 2003)]. Finally, rhizobia infection occurred primarily through uncharacterized target genes that may be activated by these TFs.
Figure 4Homologous genes of δ-12 oleic acid desaturase (FAD2). (A) FAD2 catalyze oleate into linoleate. (B) Multiple alignment of amino acid sequence of substrate binding motif of FAD2 in oil seed plants and its homologous genes in A. ipaensis. (C) Phylogenetic tree of FAD2 and its homologous genes from different species. (D) Signal peptides analysis of FAD2 homologous gene (XP_007162321.1-D2) from A. ipaensis. (E) Tansmembrane region prediction of FAD2 homologous gene, XP_007162321.1-D2. Red, blue, and pink boxes represent transmembrane, inside, and outside domains. (F) Hydrophobicity and hydrophilicity prediction for the homologous gene XP_007162321.1-D2. Pink box represent protein hydrophobic region.