| Literature DB >> 28401891 |
Sebastian Reyes-Chin-Wo1, Zhiwen Wang2, Xinhua Yang2, Alexander Kozik1, Siwaret Arikit3, Chi Song2, Liangfeng Xia2, Lutz Froenicke1, Dean O Lavelle1, María-José Truco1, Rui Xia4, Shilin Zhu2, Chunyan Xu2, Huaqin Xu1, Xun Xu2, Kyle Cox1, Ian Korf1,5, Blake C Meyers3,4, Richard W Michelmore1,5,6,7.
Abstract
Lettuce (Lactuca sativa) is a major crop and a member of the large, highly successful Compositae family of flowering plants. Here we present a reference assembly for the species and family. This was generated using whole-genome shotgun Illumina reads plus in vitro proximity ligation data to create large superscaffolds; it was validated genetically and superscaffolds were oriented in genetic bins ordered along nine chromosomal pseudomolecules. We identify several genomic features that may have contributed to the success of the family, including genes encoding Cycloidea-like transcription factors, kinases, enzymes involved in rubber biosynthesis and disease resistance proteins that are expanded in the genome. We characterize 21 novel microRNAs, one of which may trigger phasiRNAs from numerous kinase transcripts. We provide evidence for a whole-genome triplication event specific but basal to the Compositae. We detect 26% of the genome in triplicated regions containing 30% of all genes that are enriched for regulatory sequences and depleted for genes involved in defence.Entities:
Mesh:
Year: 2017 PMID: 28401891 PMCID: PMC5394340 DOI: 10.1038/ncomms14953
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Figure 1Improvement of the L. sativa cv. Salinas genome assembly after scaffolding with HiRise.
(a) Higher contiguity of the genome assembly by scaffold using HiRise as a function of an increase in the sequence size and proportion of large sequences. (b) Genotype calls for RILs with crossovers across the longest HiRise superscaffold. Red bars represent L. sativa alleles, blue bars represent L. serriola alleles and the yellow bars represent heterozygotes. Alternating discontinuities in the black line of top of the genotype represent joins between SOAPdenovo scaffolds. (c) Ordering, orientation and incorporation of additional scaffolds spanning the Major Resistance Cluster 2 (ref. 31) in the two HiRise assemblies. Expanded view shows a single super scaffold that organized four genetic bins into a single unit.
Assembly statistics for the genome of L. sativa cv. Salinas.
| SOAPDenovo | HiRise (2 lanes) | |
| | ||
| N50 (size/number) | 36 kb/21,116 | — |
| Largest | 253 kb | — |
| Total size | 2.21 Gb | — |
| Total number | 153,952 | — |
| | ||
| N50 (size/number) | 476 kb/1,445 | 1.8 Mb |
| N90 (size/number) | 118 kb/5,237 | 360 kb/1,520 |
| Largest | 3.1 Mb | 12.2 Mb |
| Total size | 2.38 Gb | 2.38 Gb |
| Total number | 21,686 | 11,474 |
| Family | Total Length | |
| Transposable elements | Retroelements | 1.5 Gb (61.5%) |
| DNA elements | 29.5 Mb (1.2%) | |
| MITEs | 103.7 Mb (4.4%) | |
| Others | 115.3 kb (<1%) | |
| Unknown | 152.9 Mb (6.3%) | |
| Total | 1.8 Gb (74.2%) | |
| Type | Copies | |
| Non-coding RNA | rRNAs | 2,587 |
| tRNAs | 1,347 | |
| Predicted miRNAs | 483 | |
| Detected miRNAs | 86 | |
| snRNAs | 1,514 | |
| Protein coding genes | Total number | 38,919 |
| Annotated transcripts | 31,348 | |
| Average CDS length | 1.05 kb | |
MITE, Miniature Inverted-Repeat Transposable Elements.
*Annotation provided for HiRise assembly.
Figure 2Overview of the L. sativa cv. Salinas genome.
(a) Number of scaffolds in 1 Mb intervals indicating the greater contiguity with the HiRise analysis. Blue, SOAPdenovo scaffolds; red, HiRise superscaffolds. (b) Chromosomal pseudomolecules. Dark areas indicate the 63% of the genome that is positioned and oriented accurately. (c) Gene density (in 1 Mb windows). (d) Repeat density (in 1 Mb windows). (e) Density of single-nucleotide polymorphism used for genetic map construction (in 1 Mb windows). (f) Size of tandem gene arrays. Black blocks underneath show MRC regions31. The coloured lines in the centre show links between syntenic blocks of at least five genes derived from the most recent whole-genome triplication.
Figure 3Detection of ancient poliploydization events.
(a) Syntenic dotplot of L. sativa versus V. vinifera (x axis: L. sativa chromosomes; y axis: V. vinifera chromosomes). (b) Density distribution of estimated synonymous substitution rate (ds) of syntelog pairs for intragenomic comparisons and for L. sativa against a panel of Asterid species; inserts are enlargements of the main plot. (c) RAxML phylogenetic tree of the Asterid clade; scale is estimated nucleotide substitutions per site. Numbers represent inferred positions of the whole-genome duplication/triplication events observed in the syntelogs data (b). (d) Distributions of triplicated paralogous genes within the lettuce genome. Chromosomal pseudomolecules are arranged progressively along the three axes: x: LG1, LG4, LG7; y: LG2, LG5, LG8; z: LG3, LG6, LG9. WGT, whole-genome triplication.
Figure 4Genome-wide distribution of triplicated regions detected in L. sativa relative to genomic features and specific gene families.
Each plot represents a single chromosomal pseudomolecule showing the location of the different triplicated regions represented as coloured bars. Repeat, genic and exonic densities are displayed above each chromosomal bar. Due to the large differences in scale, repeat content is scaled on the left axis and genic/exonic content is called on the right axis. Directly below the chromosomal bar are the MRC regions31. The lower four tracks show the distribution of different types of genes.