| Literature DB >> 35188189 |
Xinxin Yi1, Jing Liu1,2, Shengcai Chen1,2, Hao Wu1, Min Liu1, Qing Xu1, Lingshan Lei1, Seunghee Lee3, Bao Zhang2, Dave Kudrna3, Wei Fan1,2, Rod A Wing3, Xuelu Wang2, Mengchen Zhang4, Jianwei Zhang1,3, Chunyan Yang4, Nansheng Chen1,5.
Abstract
Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromosome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with 3 published soybeans (WM82, ZH13, and W05), which identified 5 large inversions and 2 large translocations specific to JD17, 20,984-46,912 presence-absence variations spanning 13.1-46.9 Mb in size. A total of 1,695,741-3,664,629 SNPs and 446,689-800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.Entities:
Keywords: assembly; comparative genomics; genome; soybean cultivar Jidou 17; symbiotic nitrogen fixation
Mesh:
Year: 2022 PMID: 35188189 PMCID: PMC8982393 DOI: 10.1093/g3journal/jkac017
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Assembly statistics of Glycine_max_JD17 (JD17), Glycine_max_v4.0 (WM82), Gmax_ZH13 (ZH13), and W05.
| JD17 | WM82 | ZH13 | W05 | |
|---|---|---|---|---|
| Assembly feature | ||||
| Estimated genome size (by K-mer analysis) (Mb) | 1,109 | 1,115 | — | — |
| Number of contigs | 446 | 9,200 | 1,528 | 1,870 |
| Total size of contigs (Mb) | 995.0 | 952.5 | 1,007 | 998.6 |
| Longest contig (Mb) | 31.8 | — | — | — |
| Number of contigs > 1 Mb | 97 | — | — | — |
| Number of contigs > 10 Mb | 39 | — | — | — |
| N50 contig length (Mb) | 18.0 (PacBio) | 0.4 |
CANU : 2.9 +Bionno : 18.0 +Hi-C : 22.6 | 3.3 |
| L50 contig count | 21 | 649 | 66 | 58 |
| Anchored contigs | ||||
| Number of chromosomes | 20 | 20 | 20 | 20 |
| Number of contigs | 411 | — | — | 772 |
| Total size (Mb) | 965.8 | 978.4 | 1,011.2 | 1,013.2 |
| Number of gaps | 391 | 7,221 | 448 | 750 |
Fig. 1.Overview of the JD17 reference genome. Tracks from outer to inner circles indicate: the chromosome of the genome; the gene density map; the repeat sequence density map; density distribution of SNPs between JD17 and WM82; density distribution of InDel between JD17 and WM82. PAV length distribution between JD17 and WM82; GC content of JD17.
Comparison of genome annotation of JD17, WM82, ZH13, and W05.
| JD17 | WM82 | ZH13 | W05 | |
|---|---|---|---|---|
| Number of genes | 52,840 | 52,872 | 55,573 | 47,201 |
| Number of transcripts | 74,054 | 86,256 | 96,496 | 69,277 |
| Average number of transcripts per gene | 1.4 | 1.6 | 1.7 | 1.5 |
| Average length of transcript (bp) | 4,465 | 4,889 | 5,230 | 5,198 |
| Average exons number per transcript | 5.9 | 6.5 | 6.5 | 6.6 |
| Average length of 5′ UTR (bp) | 302 | 294 | 395 | 252 |
| Average length of 3′ UTR (bp) | 487 | 448 | 562 | 336 |
| Number of single exon mRNA | 10,057 | 12,065 | 8,466 | 7,803 |
Fig. 2.Chromosome distribution map and percentage of different types of TEs. Chromosomes were split into 200 kb bins without overlap, and the percentage of major types of TE elements (include DNA, DNA transposons; LINEs, long interspersed nuclear elements, low complexity; LTR, long terminal repeat retrotransposons; NON LTR, nonlong terminal repeat retrotransposons; other, simple repeat; SINE, short interspersed nuclear elements) in each bin was counted.