| Literature DB >> 30872580 |
Min Xie1, Claire Yik-Lok Chung1, Man-Wah Li1, Fuk-Ling Wong1, Xin Wang1, Ailin Liu1, Zhili Wang1, Alden King-Yung Leung1, Tin-Hang Wong1, Suk-Wah Tong1, Zhixia Xiao1, Kejing Fan1, Ming-Sin Ng1, Xinpeng Qi1, Linfeng Yang2, Tianquan Deng2, Lijuan He2, Lu Chen2, Aisi Fu3, Qiong Ding3, Junxian He1, Gyuhwa Chung4, Sachiko Isobe5, Takanari Tanabata5, Babu Valliyodan6, Henry T Nguyen6, Steven B Cannon7, Christine H Foyer8, Ting-Fung Chan9, Hon-Ming Lam10.
Abstract
Efficient crop improvement depends on the application of accurate genetic information contained in diverse germplasm resources. Here we report a reference-grade genome of wild soybean accession W05, with a final assembled genome size of 1013.2 Mb and a contig N50 of 3.3 Mb. The analytical power of the W05 genome is demonstrated by several examples. First, we identify an inversion at the locus determining seed coat color during domestication. Second, a translocation event between chromosomes 11 and 13 of some genotypes is shown to interfere with the assignment of QTLs. Third, we find a region containing copy number variations of the Kunitz trypsin inhibitor (KTI) genes. Such findings illustrate the power of this assembly in the analysis of large structural variations in soybean germplasm collections. The wild soybean genome assembly has wide applications in comparative genomic and evolutionary studies, as well as in crop breeding and improvement programs.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30872580 PMCID: PMC6418295 DOI: 10.1038/s41467-019-09142-9
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
. Summary of W05 genome assembly and annotation
| Categories | Type | Length (Mb) | No. | Percentage (%) |
|---|---|---|---|---|
| Assembly | Contigs | 988.6 | 1870 | − |
| Contig N50 | 3.3 | 58 | − | |
| Contig N90 | 0.4 | 432 | − | |
| Scaffolds | 1013.2 | 1118 | − | |
| Scaffold N50 | 50.7 | 10 | − | |
| Scaffold N90 | 38.4 | 19 | − | |
| Protein-coding genes | Total transcripts | − | 89,477 | 100.0 |
| Function assigned transcripts | − | 82,567 | 92.3 | |
| Non-coding RNAs | miRNA | 0.036 | 288 | 0.004 |
| snRNA | 0.216 | 1988 | 0.021 | |
| rRNA | 0.032 | 147 | 0.003 | |
| tRNA | 0.067 | 892 | 0.007 | |
| Transposable elements | Class I: Retroelements | 359.6 | − | 35.5 |
| SINEs | 1.1 | − | 0.1 | |
| LINEs | 13.3 | − | 1.3 | |
| LTR elements | 345.2 | − | 34.1 | |
| Ty1/Copia | 93.5 | − | 9.2 | |
| Ty3/gypsy | 248.0 | − | 24.5 | |
| Others | 3.8 | − | 0.4 | |
| Class II: DNA transposons | 74.8 | − | 7.4 | |
| CMC-EnSpm | 29.7 | − | 2.9 | |
| MULE | 27.9 | − | 2.8 | |
| TcMar | 0.8 | − | 0.1 | |
| hAT | 8.7 | − | 0.9 | |
| Helitron | 4.2 | − | 0.4 | |
| Others | 3.5 | − | 0.3 | |
| Satellites | 4.9 | − | 0.5 | |
| Simple repeats | 44.1 | − | 4.4 | |
| Low complexity | 3.1 | − | 0.3 | |
| Unknown | 59.8 | − | 5.9 | |
| Total transposable elements | 546.4 | − | 53.9 |
Fig. 1Distribution of W05 genomic features. The outer layer illustrates the 20 chromosomes of W05 in megabases (Mb). a Repeat coverage was calculated by the occupancy of repeat sequence in 1 Mb window (step size: 500 Kb). b Gene coverage was calculated by the occupancy of coding sequence in 1 Mb window (step size: 500 Kb). c GC content was calculated in a 200 Kb window. d Position of simple sequence repeat (SSR) markers were indicated in purple. Marker information could be found in Supplementary Data 1. e Presence of telomeric tandem arrays and cent91/92 soybean specific type centromeric repeats were marked in pink and blue, respectively
Predicted phenotypes based on genomic assemblies and observed phenotypes*
| W05 | Wm82 | ||||||
|---|---|---|---|---|---|---|---|
| Trait | Locus | Allele type | Predicted phenotype | Observed phenotype | Allele type | Predicted phenotype | Observed phenotype |
| Salt tolerance |
| Intact | Salt tolerant | Salt tolerant | TE-inserted | Salt sensitive | Salt sensitive |
| Nodulation |
|
| Do not restrict neither | Do not restrict neither |
| Restrict some strains of | Restrict some strains of |
| Flower color |
|
| Purple flower | Purple flower |
| White flower | White flower |
| Seed coat color |
|
| Pigmented | Pigmented |
| Colorless | Colorless |
| Seed coat color |
|
| Stay green after seed maturation | Stay green after seed maturation |
| Do not stay green after seed maturation | Do not stay green after seed maturation |
*Italicized text denoted gene loci, gene alleles, or species names
Fig. 2Causal structural variation that controls soybean seed coat pigmentation. a Sequence comparison between W05 genome and Wm82 bacterial artificial chromosome (BAC) sequences at the I locus region. CHS genes and subtilisin gene/gene fragments are indicated with blue and orange, respectively. b Top panel: cartoon shows the exon structure of the subtilisin gene fragment (orange), the CHS1 gene (blue), and the Expressed Sequence Tag (EST) sequence Gm-c1069–6017. Positions of primers designed for PCR amplification of subtilisin-anti-CHS1 chimeric transcript are indicated with black arrows. Bottom panel: PCR amplification of the subtilisin-anti-CHS1 chimeric transcript. Experiment was repeated at least twice with independent samples. Marker: 1 Kb Plus DNA ladder (NEB, cat. N3200S). NTC, no template control. GmACT11 is used as a housekeeping control. Unprocessed gel image is provided in Source Data file. c Proposed model for the generation of siRNAs originated from a large structural rearrangement in the I locus. CHS genes and the subtilisin gene/gene fragments are illustrated as blue and orange, respectively. Arrowheads indicated the direction of transcription that causes the formation of double-stranded RNA. Cluster A and B are named according to a previous report[23]. IR-CHS gene cluster: inverted repeat of CHS gene cluster
Fig. 3Large structural variations in soybean genomes detected by OM. a Seed coat pigmentation causal inversion in the I locus. Pink regions are the aligned flanking regions of the I locus. Aligned blocks in the I locus are painted in different colors to illustrate the inversion and duplication in accessions with colorless seed coat. b Reciprocal inter-chromosomal translocation between chromosomes 11 and 13. Segments in blue and red are regions homologous to the W05 chromosomes 11 and 13, respectively. Segments in gray contains optical signals that cannot be aligned to the W05 in silico map. c A previously reported cultivated soybean-specific region on chromosome 15[7]. Blue regions are the aligned flanking region of the previously proposed cultivated soybean-specific region. Segments that cannot be aligned with W05 in silico map are shown in gray. d Length polymorphism of a KTI gene cluster in chromosome 8. Orange triangles indicate the location of KTI genes in W05 (top track) and Wm82_v2 (bottom track), respectively. KTI, Kunitz trypsin inhibitor genes. Asterisks (*) next to the accession IDs indicate the use of in silico map instead of optical contigs