| Literature DB >> 23518688 |
Ruolin Yang1, David E Jarvis, Hao Chen, Mark A Beilstein, Jane Grimwood, Jerry Jenkins, Shengqiang Shu, Simon Prochnik, Mingming Xin, Chuang Ma, Jeremy Schmutz, Rod A Wing, Thomas Mitchell-Olds, Karen S Schumaker, Xiangfeng Wang.
Abstract
Halophytes are plants that can naturally tolerate high concentrations of salt in the soil, and their tolerance to salt stress may occur through various evolutionary and molecular mechanisms. Eutrema salsugineum is a halophytic species in the Brassicaceae that can naturally tolerate multiple types of abiotic stresses that typically limit crop productivity, including extreme salinity and cold. It has been widely used as a laboratorial model for stress biology research in plants. Here, we present the reference genome sequence (241 Mb) of E. salsugineum at 8× coverage sequenced using the traditional Sanger sequencing-based approach with comparison to its close relative Arabidopsis thaliana. The E. salsugineum genome contains 26,531 protein-coding genes and 51.4% of its genome is composed of repetitive sequences that mostly reside in pericentromeric regions. Comparative analyses of the genome structures, protein-coding genes, microRNAs, stress-related pathways, and estimated translation efficiency of proteins between E. salsugineum and A. thaliana suggest that halophyte adaptation to environmental stresses may occur via a global network adjustment of multiple regulatory mechanisms. The E. salsugineum genome provides a resource to identify naturally occurring genetic alterations contributing to the adaptation of halophytic plants to salinity and that might be bioengineered in related crop species.Entities:
Keywords: Arabidopsis thaliana; Brassicaceae; Eutrema salsugineum; genome annotation; halophyte; whole-genome sequencing
Year: 2013 PMID: 23518688 PMCID: PMC3604812 DOI: 10.3389/fpls.2013.00046
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 6Expansion of F-box gene family in . (A) Tandem duplications contribute to F-box gene family expansion. (B) The lineage-specific F-box genes in the four species. A total of 1,912 orthologous F-box genes in the four species were clustered to 428 clusters with at least two F-box genes. (C) Phylogenetic tree analysis of the total of ∼1900 F-box genes identified in the four species. The tree on the right was built based on the ∼60-aa F-box domain. Four E. salsugineum specific F-box clades were numbered by 1–4. The tree on the left was built based on the alignment of full CDS sequences of F-box genes in the four species.
Figure 1Abbreviated phylogeny of the .
Summary statistics of the output of the whole-genome assembly prior to screening, removal of organelles, and contaminating scaffolds and chromosome-scale pseudomolecule construction.
| Size (bp) | No of Scaffolds | No of Contigs | Scaffold size | Basepairs | % Non-gap basepairs |
|---|---|---|---|---|---|
| 5,000,000 | 21 | 1,699 | 195,535,853 | 193,524,991 | 98.97 |
| 2,500,000 | 28 | 1,891 | 218,149,215 | 215,804,235 | 98.93 |
| 1,000,000 | 37 | 2,086 | 232,605,192 | 229,708,972 | 98.75 |
| 500,000 | 42 | 2,146 | 236,141,537 | 233,187,252 | 98.75 |
| 250,000 | 48 | 2,192 | 238,193,084 | 234,998,491 | 98.66 |
| 100,000 | 54 | 2,224 | 238,995,661 | 235,442,776 | 98.51 |
| 50,000 | 58 | 2,255 | 239,276,985 | 235,673,122 | 98.49 |
| 25,000 | 79 | 2,375 | 240,118,515 | 236,163,550 | 98.35 |
| 10,000 | 198 | 2,680 | 241,758,756 | 237,686,842 | 98.32 |
| 5,000 | 516 | 3,354 | 243,995,540 | 239,753,605 | 98.26 |
| 2,500 | 960 | 4,121 | 245,685,632 | 241,300,862 | 98.22 |
| 1,000 | 1,003 | 4,186 | 245,762,997 | 241,369,752 | 98.21 |
| 0 | 1,107 | 4,292 | 245,820,000 | 241,426,360 | 98.21 |
The table shows total contigs and total assembled basepairs for each set of scaffolds greater than the size listed in the left hand column.
Final summary assembly statistics for chromosome-scale assembly.
| Scaffold total | 639 |
| Contig total | 3,511 |
| Scaffold sequence total | 243.1 Mb |
| Contig sequence total | 238.5 Mb (1.9% gap) |
| Scaffold N/L50 | 8/13.4 Mb |
| Contig N/L50 | 272/251.6 kb |
Statistics of predicted genes in .
| Genome size | 243.1 Mb |
| Total genes | 26,351 |
| Total exons | 137,652 bp |
| Average CDS length | 1,224 bp |
| Average intron length | 363 bp |
| Average gene length | 2,209 bp |
Figure 2Determination of homology cutoff. (A) Protein-level homology distributions of E. salsugineum and rice genes against A. thaliana genes. (B) Cumulative percentage of orthologs in E. salsugineum and rice compared to A. thaliana.
Identification of duplication events in each genome and genomic synteny between .
| Within each genome | Multiplicons | Total SDs | Total SD length | % of genome | Duplicated genes | % of genes |
|---|---|---|---|---|---|---|
| 200 | 843 | 40.9 Mb | 34.1% | 10,330 | 35.9% | |
| 215 | 1,012 | 91.6 Mb | 37.7% | 10,067 | 38.2% | |
| 123 | 2,105 | 96.3 Mb | 80.2% | 25,232 | 87.7% | |
| 123 | 2,120 | 187.5 Mb | 77.2% | 21,617 | 82.0% | |
Because a genomic segment containing a same group of genes might be duplicated for multiple times, the multiply duplicated segments were merged as one multiplicon (Simillion et al., .
Figure 3Dramatic expansion of pericentromeric heterochromatin in . (A) Macro-synteny between the 25 largest scaffolds in E. salsugineum and the five chromosomes in A. thaliana. (B) Scaffolds 5, 7, 8, 9 are in synteny with At Chr1. (C) Scaffolds S1 and S3 are in synteny with AtChr4.
Figure 4The digital karyotypes of the .
Figure 5Chromosome-scale synteny between . Each dot represents a pair of orthologs between A. thaliana and E. salsugineum. Red dots, macro-synteny orthologs. Blue dots, tandemly duplicated orthologs. Green dots, background noise.
Figure 7(A) The composition of SOS-like gene families is similar in A. thaliana and E. salsugineum. (B) A genomic region on AtChr4 shows two tandem duplication events in A. thaliana increasing the gene dosage of the AtCDPK family. (C) The genes involved in “ubiquitin-dependent protein modification” are more enriched in E. salsugineum than in A. thaliana.
Figure 8Tandemly duplicated stress-related miRNAs in . (A) The hairpin structures of ath-MIR399a and ath-MIR168a in A. thaliana, and the four copies of esa-MIR399a and esa-MIR168a miRNA precursors in E. salsugineum. (B) Copy number variation of stress-related microRNA genes MIR399a and MIR168a between A. thaliana and E. salsugineum.
Figure 9High translational efficiency of transportation-related genes in . (A) Kernel density plot showing the globally similar codon usage bias (tAI) between A. thaliana and E. salsugineum. The x- and y-axis denote the tAI coefficient of genes. (B) Comparison of codon usage bias (tAI) of genes in terms of functional categories.