| Literature DB >> 25809074 |
Zoltan Kevei1, Robert C King2, Fady Mohareb3, Martin J Sergeant4, Sajjad Z Awan4, Andrew J Thompson3.
Abstract
A recombinant in-bred line population derived from a cross between Solanum lycopersicum var. cerasiforme (E9) and S. pimpinellifolium (L5) has been used extensively to discover quantitative trait loci (QTL), including those that act via rootstock genotype, however, high-resolution single-nucleotide polymorphism genotyping data for this population are not yet publically available. Next-generation resequencing of parental lines allows the vast majority of polymorphisms to be characterized and used to progress from QTL to causative gene. We sequenced E9 and L5 genomes to 40- and 44-fold depth, respectively, and reads were mapped to the reference Heinz 1706 genome. In L5 there were three clear regions on chromosome 1, chromosome 4, and chromosome 8 with increased rates of polymorphism. Two other regions were highly polymorphic when we compared Heinz 1706 with both E9 and L5 on chromosome 1 and chromosome 10, suggesting that the reference sequence contains a divergent introgression in these locations. We also identified a region on chromosome 4 consistent with an introgression from S. pimpinellifolium into Heinz 1706. A large dataset of polymorphisms for the use in fine-mapping QTL in a specific tomato recombinant in-bred line population was created, including a high density of InDels validated as simple size-based polymerase chain reaction markers. By careful filtering and interpreting the SnpEff prediction tool, we have created a list of genes that are predicted to have highly perturbed protein functions in the E9 and L5 parental lines.Entities:
Keywords: InDel; S. pimpinellifolium; SL2.50; SNP; Solanum lycopersicum; introgression; large effect polymorphisms; recombinant inbred lines
Mesh:
Substances:
Year: 2015 PMID: 25809074 PMCID: PMC4426381 DOI: 10.1534/g3.114.016121
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Relative distribution of single-nucleotide polymorphisms (SNPs) and insertion−deletion mutations (InDels) on the E9 and L5 chromosomes. The number of SNPs and InDels per unit length of chromosome are given for each chromosome. For both genomes, values are expressed relative to the chromosome with the maximum linear density of polymorphisms.
Figure 2Linear density of single-nucleotide polymorphisms (SNPs) along the chromosomes of E9 and L5. The number of SNPs within each 1-Mb window (Y axis) is plotted against chromosomal positions for each of the 12 chromosomes (X axis). Blue: E9; red: L5. Blue star marks the E9 specific SNP accumulations, red stars stand for L5 specific increase. Black stars stand for similar SNP score in both lines; green stars show the regions with a zero SNP value because of gaps in the SL2.50 reference sequence. Gene rich regions (more than 50 genes in 0.5 Mb) are marked with a brown line.
Figure 3Linear density of insertion−deletion mutations (InDels) along the chromosomes of E9 and L5. The number of InDels within each 1-Mb window (Y axis) is plotted against chromosomal position for each of the 12 chromosomes (X axis). Blue: E9; red: L5. Blue star marks the E9-specific single-nucleotide polymorphisms (SNPs) accumulation, red star stands for L5 specific increase. Black stars stand for high SNP score in both lines; green stars show the regions with a zero InDel value because of gaps in the SL2.50 reference sequence. Gene rich regions (more than 50 genes in 0.5 Mb) are marked with a brown line.
Figure 4Filtering process for frame shift (FS) mutated genes. The different steps of the flowchart show the process of establishing which FS insertion−deletion mutations (InDels) led to genuine, large effect changes in the protein structure. SNP, single-nucleotide polymorphism.
The frame shift altered proteins in E9 compared with reference genome
| Chr | Gene Number | Gene Annotation | Sequence and Structural Change in E9 |
|---|---|---|---|
| 3 | Solyc03g115650 | Translation initiation factor 5A-1 | Missing most part of the protein, including the S1-like RNA recognition motif |
| 12 | Solyc12g038920 | Serine/threonine-protein kinase 16 | Missing most part of the protein, truncated kinase domain |
| 12 | Solyc12g100290 | Histone-lysine | Missing most part of the protein, truncated SET domain |
The table explains the effects on protein modification.
The frame shift altered proteins in L5 compared with reference genome
| Chr | Gene Number | Gene Annotation | Sequence and Structural Change in L5 |
|---|---|---|---|
| 1 | Solyc01g005290 | SEC14 cytosolic factor protein | Altered C-terminal sequence |
| 1 | Solyc01g017050 | PG1 protein like | Altered C-terminal sequence |
| 1 | Solyc01g050040 | C3HC4-type RING finger protein | Altered N- and C-terminal sequence |
| 1 | Solyc01g058160 | Agenet domain-containing protein | Altered C-terminal sequence |
| 1 | Missing most part of the protein including the prolyl 4-hydroxylase a subunit | ||
| 1 | Solyc01g091150 | Golgi SNAP receptor complex member 1-2-like | Altered C-terminal sequence |
| 1 | Solyc01g095620 | Hydroquinone glucosyltransferase-like | Altered C-terminal sequence |
| 1 | Solyc01g095680 | Root primordium defective 1-like | Altered C-terminal sequence |
| 1 | Solyc01g103200 | Conserved uncharacterized protein | Possessing a longer new protein sequence on C-terminal |
| 2 | Solyc02g064630 | Telomere repeat-binding factor 1-like | Altered C-terminal sequence |
| 3 | Missing most part of the protein, the glutamate decarboxylase domain is truncated | ||
| 3 | Solyc03g111720 | Peptide methionine sulfoxide reductase | Altered C-terminal sequence |
| 3 | Solyc03g121000 | Zinc finger CCCH domain-containing protein 4-like | Altered C-terminal sequence |
| 3 | Solyc03g121720 | Succinic semialdehyde reductase isofom 2 | Altered C-terminal sequence |
| 4 | Missing most part of the protein, including the RNA recognition motif | ||
| 4 | Solyc04g016350 | 40S ribosomal protein S4-like isoform 1 | Altered C-terminal sequence |
| 5 | Solyc05g009260 | Transport inhibitor response 1 | Altered C-terminal sequence |
| 5 | Solyc05g013390 | Unknown protein | Altered C-terminal sequence |
| 5 | Solyc05g017900 | EamA-like transporter membrane protein | Altered C-terminal sequence |
| 6 | Solyc06g005080 | Vacuolar protein sorting-associated protein 18 | Altered N-terminal sequence |
| 6 | Solyc06g005450 | NAD-specific glutamate dehydrogenase | Altered C-terminal sequence |
| 6 | Solyc06g065440 | C2H2-type zinc finger family protein | Altered C-terminal sequence |
| 6 | Solyc06g066210 | Unknown protein | Altered C-terminal sequence |
| 6 | Solyc06g066570 | Peroxisome biogenesis protein 2-like | Altered C-terminal sequence |
| 6 | Solyc06g084160 | Serine/threonine-protein kinase BUD32 homolog | Altered N-terminal sequence |
| 7 | Solyc07g039330 | WD-40 repeat-containing protein MSI4-like | Missing most part of the protein, the WD40 repeat is truncated |
| 8 | Solyc08g006070 | AIG2-like protein-like | Altered C-terminal sequence |
| 9 | Missing most part of the protein, the TLC domain is truncated | ||
| 9 | Solyc09g082630 | 1,2-dihydroxy-3-keto-5-methylthiopentene dioxygenase 1 | Altered N-terminal sequence |
| 9 | Missing most part of the protein, including the prolyl 4-hydroxylase a subunit | ||
| 10 | Solyc10g037910 | Shaggy-related protein kinase eta | Altered C-terminal sequence |
| 10 | Solyc10g050080 | Serine/threonine-protein kinase C01C4.3-like | Missing most part of the protein |
| 11 | Solyc11g005420 | Unknown protein | Altered N-terminal sequence |
| 11 | Solyc11g006510 | Nuclear transport factor 2 (NTF2) family protein | Missing most part of the protein |
| 11 | Missing most part of the protein, including the PTK catalytic domain | ||
| 11 | Solyc11g056680 | DNA-damage-repair/toleration protein DRT100-like | Altered C-terminal sequence |
| 11 | Missing most part of the protein, including the STK catalytic domain | ||
| 12 | Solyc12g041980 | Breast cancer susceptibility 1 homolog | Altered C-terminal sequence |
| 12 | Solyc12g055850 | Lecithin retinol acyltransferase | Altered C-terminal sequence |
The table explains the effects on protein modification. Altered N- or C-terminal category marks the proteins up to ~25% of changes on the protein ends. The proteins that are likely to result from wrong predictions in Heinz 1706 (), and partial gene sequences () are marked. Important genes possessing large effects are in bold.
The frame shift altered proteins where E9 and L5 are similar and they both differ from the reference genome
| Chr | Gene Number | Gene Annotation | Sequence and Structural Changes Both in E9 and L5 or in Heinz 1706 |
|---|---|---|---|
| 2 | Missing most part of the protein, including the RNA recognition motif both in E9 and L5 | ||
| 5 | Solyc05g051870 | Pollen olee1-like protein | Altered C-terminal sequence in Heinz 1706 |
| 5 | Missing most part of the protein, including the thiamine pyrophosphate and pyrimidine binding domain both in E9 and L5 | ||
| 5 | Solyc05g055680 | Putative adenosylhomocysteinase | Missing most part of the protein both in E9 and L5 |
| 5 | Solyc05g055990 | Aquaporin PIP2-7-like | Altered C-terminal sequence in Heinz 1706 |
| 6 | Solyc06g005210 | Cytochrome P450 like | Missing most part of the protein both in E9 and L5 |
| 7 | Solyc07g062310 | Plant protein of unknown function, | Altered N-terminal sequence in Heinz 1706 |
| DUF641 domain | |||
| 7 | Missing most part of the protein in Heinz 1706 | ||
| 7 | Missing most part of the protein, including the catalytic domain and E2 ubiquitin-conjugating enzyme interaction site both in E9 and L5 | ||
| 9 | Missing most part of the protein including the RDX domain in Heinz 1706 | ||
| 9 | Altered N-terminal sequence both in E9 and L5 | ||
| 10 | Altered C-terminal sequence both in E9 and L5 | ||
| 10 | Missing most part of the protein, including the ankyrin repeat both in E9 and L5 | ||
| 12 | Solyc12g010020 | Leucine aminopeptidase | Altered C-terminal sequence in Heinz 1706 |
| 12 | Altered internal (exonal) sequence in Heinz 1706 |
The table explains the effects of the frame shift on protein structure. Altered N- or C-terminal category indicates that the proteins are up to ~25% different at one of the termini. The proteins likely to have a bad annotated prediction in Heinz 1706 SL2.40 reference () or a partial gene sequence () are marked. Important genes possessing large effects are indicated in bold.
The occurrence of selected E9 and L5 frame shift alterations in other tomato accessions
| E9 | L5 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Solyc03g115650 | ||||||||||||||
| Solyc12g038920 | ||||||||||||||
| Solyc12g100290 | ||||||||||||||
| Solyc01g090610 | ||||||||||||||
| Solyc03g098240 | ||||||||||||||
| Solyc04g010040 | ||||||||||||||
| Solyc09g018670 | ||||||||||||||
| Solyc09g089690 | ||||||||||||||
| Solyc11g012050 | ||||||||||||||
| Solyc11g067080 | ||||||||||||||
| Solyc02g085420 | ||||||||||||||
| Solyc05g054640 | ||||||||||||||
| Solyc07g065220 | ||||||||||||||
| Solyc07g065630 | ||||||||||||||
| Solyc09g005580 | ||||||||||||||
| Solyc09g007770 | ||||||||||||||
| Solyc10g083190 | ||||||||||||||
| Solyc10g083870 | ||||||||||||||
| Solyc12g011030 |
The table shows a nonexhaustive list of the tomato accessions from the 150 tomato genome resequencing project that have the same FS InDels as E9 and/or L5.