| Literature DB >> 31604826 |
Remco Stam1,2, Tetyana Nosenko3,4, Anja C Hörger5, Wolfgang Stephan6, Michael Seidel3, José M M Kuhn7, Georg Haberer3, Aurelien Tellier2.
Abstract
Wild tomato species, like Solanum chilense, are important germplasm resources for enhanced biotic and abiotic stress resistance in tomato breeding. S. chilense also serves as a model to study adaptation of plants to drought and the evolution of seed banks. The absence of a well-annotated reference genome in this compulsory outcrossing, very diverse species limits in-depth studies on the genes involved.We generated ∼134 Gb of DNA and 157 Gb of RNA sequence data for S chilense, which yielded a draft genome with an estimated length of 914 Mb, encoding 25,885 high-confidence predicted gene models, which show homology to known protein-coding genes of other tomato species. Approximately 71% of these gene models are supported by RNA-seq data derived from leaf tissue samples. Benchmarking with Universal Single-Copy Orthologs (BUSCO) analysis of predicted gene models retrieved 93.3% of BUSCO genes. To further verify the genome annotation completeness and accuracy, we manually inspected the NLR resistance gene family and assessed its assembly quality. We find subfamilies of NLRs unique to S. chilense Synteny analysis suggests significant degree of the gene order conservation between the S. chilense, S. lycopersicum and S. pennellii genomesWe generated the first genome and transcriptome sequence assemblies for the wild tomato species Solanum chilense and demonstrated their value in comparative genomics analyses. These data are an important resource for studies on adaptation to biotic and abiotic stress in Solanaceae, on evolution of self-incompatibility and for tomato breeding.Entities:
Keywords: Evolutionary Genomics; Genome sequence assembly; NLR genes; Tomato; Transcriptome
Mesh:
Year: 2019 PMID: 31604826 PMCID: PMC6893187 DOI: 10.1534/g3.119.400529
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Solanum chilense populations in their natural habitat (by R. Stam). The top panels show coastal and lowland habitats; the lower panels show typical mountain habitats. LA3111 originates from a mountainous habitat similar to the last panel.
Figure 2A) Maximum Likelihood (ML) phylogenetic tree constructed based on six CT loci (nuclear genes) extracted from the S. chilense LA3111 sample sequenced in this study (in bold; marked with gray rectangle) and previously sequenced S. peruvianum and S. chilense samples. For the samples from whole genome data, sequences were aligned to the S. pennellii reference genome and the sequence data for the CT loci was extracted. Single-gene alignments were concatenated; the resulting super alignment was used as in input for RaxML to construct the ML tree (1000 bootstrap replicates). The branch length is shown as expected number of substitutions per site and bootstrap values are reported on each tree node. S. ochranthum was used as an outgroup. The sequence IDs containing chil and peru represent Sanger sequences from S. chilense and S. peruvianum individuals, respectively, followed by the accession/ individual number. The sequences with IDs containing SRR- and ERR-numbers followed by the accession number were extracted from previously generated whole genome data. B) Phylogeny constructed based on chloroplast SNP data extracted from S. chilense LA3111 (in bold and marked with gray rectangle) and previously sequenced S. peruvianum whole genome sequence data. Chloroplast sequences were aligned to the S. pennellii reference genome. The tree was built using resulting alignments and PhyML (GTR, NNI, BioNJ, 1000 bootstrap replicates). The branch length is shown as expected number of substitutions per site and bootstrap values are reported on each tree node. aSequences of individuals with IDs containing SRR1572692, SRR1572693, SRR1572694, SRR1572695 and SRR1572696 were obtained from Lin . bSequences with IDs containing ERR418084, ERR418094, ERR418097 and ERR418098 originate from 100 Tomato Genome Sequencing Consortium (2014). Individual SRR1572696 was reported as S. chilense in the main text of the paper (ref) and as S. peruvianum in the supplementary, which contain all original data. The original classification of the sequences with IDs ERR418097 and ERR418098 as S. chilense has been later withdrawn from the CGN database.
S. chilense genome assembly
| Total size (Mbp) | 913.89 |
| Scaffolds | 81,307 |
| N50 Scaffolds (bp) | 70,632 |
| Max Scaffold length (bp) | 1,123,112 |
| High confidence gene loci | 25,885 |
S. chilense de novo transcriptome assemblies
| Total contig number | 41,666 | 35,470 |
| Minimum length (bp) | 123 | 123 |
| Maximum length (bp) | 16,476 | 16,473 |
| Average length (bp) | 831 | 943 |
| Median length (bp) | 504 | 684 |
| N50 (bp) | 1383 | 1458 |
| N90 (bp) | 351 | 432 |
Figure 3Maximum Likelihood (ML) phylogenetic tree of the NLR genes identified in S. chilense. The tree was made as described in Stam et al. 2016. Clades with high (>80%) bootstrap values are collapsed. Most previously described clades can be identified and are indicated as such. The TNL family is highlighted in yellow. Several previously identified NLR genes from different species are included for comparison and Apaf1.1 and Ced4 are used as an outgroup, similar as in Andolfo and Stam . A list of these genes and their species of origin can be found in Table S8. Clades marked with an asterisk are NRC-dependent. NLR with orthologs (based on reciprocal best blast hits) in S. pennellii are in bold. The branch length is shown as expected number of substitutions per site. Clades CNL20 and CNL21 are new for S. chilense.