Genome-wide mutations induced by ethyl methanesulfonate (EMS) and gamma irradiation in the tomato Micro-Tom genome were identified by a whole-genome shotgun sequencing analysis to estimate the spectrum and distribution of whole-genome DNA mutations and the frequency of deleterious mutations. A total of ~370 Gb of paired-end reads for four EMS-induced mutants and three gamma-ray-irradiated lines as well as a wild-type line were obtained by next-generation sequencing technology. Using bioinformatics analyses, we identified 5920 induced single nucleotide variations and insertion/deletion (indel) mutations. The predominant mutations in the EMS mutants were C/G to T/A transitions, while in the gamma-ray mutants, C/G to T/A transitions, A/T to T/A transversions, A/T to G/C transitions and deletion mutations were equally common. Biases in the base composition flanking mutations differed between the mutagenesis types. Regarding the effects of the mutations on gene function, >90% of the mutations were located in intergenic regions, and only 0.2% were deleterious. In addition, we detected 1,140,687 spontaneous single nucleotide polymorphisms and indel polymorphisms in wild-type Micro-Tom lines. We also found copy number variation, deletions and insertions of chromosomal segments in both the mutant and wild-type lines. The results provide helpful information not only for mutation research, but also for mutant screening methodology with reverse-genetic approaches.
Genome-wide mutations induced by ethyl methanesulfonate (EMS) and gamma irradiation in the tomato Micro-Tom genome were identified by a whole-genome shotgun sequencing analysis to estimate the spectrum and distribution of whole-genome DNA mutations and the frequency of deleterious mutations. A total of ~370 Gb of paired-end reads for four EMS-induced mutants and three gamma-ray-irradiated lines as well as a wild-type line were obtained by next-generation sequencing technology. Using bioinformatics analyses, we identified 5920 induced single nucleotide variations and insertion/deletion (indel) mutations. The predominant mutations in the EMS mutants were C/G to T/A transitions, while in the gamma-ray mutants, C/G to T/A transitions, A/T to T/A transversions, A/T to G/C transitions and deletion mutations were equally common. Biases in the base composition flanking mutations differed between the mutagenesis types. Regarding the effects of the mutations on gene function, >90% of the mutations were located in intergenic regions, and only 0.2% were deleterious. In addition, we detected 1,140,687 spontaneous single nucleotide polymorphisms and indel polymorphisms in wild-type Micro-Tom lines. We also found copy number variation, deletions and insertions of chromosomal segments in both the mutant and wild-type lines. The results provide helpful information not only for mutation research, but also for mutant screening methodology with reverse-genetic approaches.
Tomato (Solanum lycopersicum) is one of the most important vegetables worldwide; therefore, several cultivars exhibiting various fruit colours, sizes and shapes, as well as tastes and flavours, have been bred to be suitable for local environments and cuisines. The wild relatives of tomato, for example S. pennellii, S. peruvianum and S. pimpinellifolium, have been used as breeding materials, conferring beneficial traits such as disease resistance and particular fruit characters (Shirasawa and Hirakawa, 2013). The genome sequences of S. arcanum, S. habrochaites, S. lycopersicum, S. pennellii and S. pimpinellifolium have been published (Bolger et al., 2014; The 100 Tomato Genome Sequencing Consortium et al., 2014; The Tomato Genome Consortium, 2007), and attempts to identify genome‐wide single nucleotide polymorphisms (SNPs) within S. lycopersicum as well as shared polymorphisms between closely related species have been conducted using a whole‐genome resequencing strategy with next‐generation sequencing (NGS) technology (Bolger et al., 2014; Causse et al., 2013; Ercolano et al., 2014; Kobayashi et al., 2014; Lin et al., 2014; Shirasawa et al., 2013; The 100 Tomato Genome Sequencing Consortium et al., 2014). These studies have confirmed previous reports that indicate low genetic diversity in S. lycopersicum (Shirasawa et al., 2010a).Tomato mutant libraries therefore possess great potential to increase genetic diversity and would contribute to breeding programmes to generate attractive varieties. As it requires money, time and labour to develop, maintain and manage the mutant libraries, gene banks and genetic resource centres have contributed to the distribution of mutant lines to the community (Menda et al., 2004; Minoia et al., 2010; Saito et al., 2011). The mutants are usually screened using forward and reverse‐genetic approaches based on observations of phenotypic and genotypic variations, respectively. While forward genetics require a large number of plants in fields, greenhouses or chambers to identify phenotypic variants, reverse genetics require information on gene function, which can be inferred from model plant studies. Several molecular techniques are used for reverse genetics, for example single‐strand conformation polymorphism, denaturing‐gradient gel electrophoresis, temperature‐gradient gel electrophoresis and high‐resolution melting analysis. Targeting induced local lesions in genomes (TILLING) (McCallum et al., 2000) is a popular technology for reverse genetics in tomato (Minoia et al., 2010; Okabe et al., 2011), and a targeted‐sequencing strategy has also been developed (Rigola et al., 2009).Mutant libraries in plants are often generated by chemical or physical mutagenesis. Ethyl methanesulfonate (EMS) is a popular chemical mutagen, and N‐methyl‐N‐nitrosourea (MNU) is also used for this purpose (Suzuki et al., 2008). In EMS and MNU mutageneses, guanines are alkylated, and thymines often mispair with the O‐6‐ethyl G (instead of cytosine) in the complementary strand. This misplacement introduces base substitutions from C to T or from G to A (abbreviated as C/G to T/A, hereafter), which are the most frequent mutations observed for chemical mutagenesis (Cooper et al., 2008; Greene et al., 2003). Alternatively, gamma rays, X‐rays, UV and heavy ions are used as physical mutagens. X‐rays were the first artificial mutagen used to induce mutations in Drosophila (Muller, 1927), and UV causes heritable mutations via cyclobutane pyrimidine dimers (CPDs) and (6‐4) pyrimidine photoproducts (Ikehata and Ono, 2011). Gamma and heavy‐ion irradiation produce reactive oxygen species (ROS), which cause base substitutions as well as genome rearrangements, for example insertions, deletions, inversions and translocations (Morita et al., 2009). The ROS convert guanines into 8‐oxo‐Gs, resulting in mispairings with adenine. The irradiation‐induced genome rearrangements are caused by double‐strand breaks (DSBs) followed by error‐prone nonhomologous end joining (NHEJ) rather than error‐free homologous recombination. DSBs occurring at two or more genomic regions and subsequent NHEJ of one end with another noncounterpart end result in copy number variations (CNVs), presence/absence variations (PAVs) and translocations (Morita et al., 2009; Naito et al., 2005).In the reverse genetics strategy with NGS, understanding preferential mutations induced by various mutagens would be important to accurately identify mutations. In addition, induced mutations should be distinguished from spontaneous mutations or standing variations. The cultivar Micro‐Tom is a popular tomato line for mutant studies because it is a model system for tomato genetics (Meissner et al., 1997); there are known polymorphisms in the Micro‐Tom lines (Hirakawa et al., 2013; Kobayashi et al., 2014; Shirasawa et al., 2010a). In this study, a whole‐genome resequencing analysis was performed for tomato Micro‐Tom mutants derived from EMS treatment and gamma‐ray irradiation to characterize the induced mutations. Subsequently, the effects of the mutants on gene function were also investigated to determine the frequencies of knockdown and knockout mutations.
Results
Whole‐genome resequencing of the Micro‐Tom lines
In this study, to distinguish spontaneous mutations, that is naturally occurring polymorphisms, we refer to artificially induced single nucleotide substitutions and insertions/deletions (indels) as single nucleotide variations (SNVs) and indel mutations, respectively. We describe spontaneous substitutions and indels as SNPs and indel polymorphisms, respectively.To identify genome‐wide EMS‐ and gamma‐induced mutations, we obtained a total of 369.8 Gb of paired‐end reads for eight Micro‐Tom lines, including a wild‐type line, four EMS mutants and three gamma‐ray mutants, by Illumina sequencing. The data corresponded to approximately 48.7× coverage of the tomato genome of ca. 950 Mb per line (Table S1). In the subsequent bioinformatics data processing, the high‐quality reads were mapped and covered 85.8% of the tomato genome (SL2.40) on average with a mean coverage depth of 37.1× per mutant (Table S1). In addition, we included the published genome sequences of two wild‐type Micro‐Tom lines (Kobayashi et al., 2014), tentatively named Micro‐Tom NBRP(k) and Micro‐Tom MM(k), in our analysis (see Plant materials in Experimental procedures for details). We discovered a total of 3 609 640 candidate mutations, including small (≤17 bp) indels, after the mapping alignment to the reference sequence of the tomato genome SL2.40 without any filters. We validated a total of 336 randomly selected candidate mutations using a pyrosequencing method of Roche sequencing technology. Of the 336 candidate loci, 260 were successfully scored, and 93 and 167 were identified as actual sequence variations and false positives, respectively. We set threshold values, including quality scores of >50, read depths between 10 and 100 and genotyping scores of ≥20, to eliminate 92.2% (154/167) of false positives and to maintain 86.0% (80/93) of actual mutations from the candidates. Using these threshold values, the indel candidates were 95.7% accurate; in other words, 90 of 94 indels were verified with specific PCR amplification. Finally, applying the threshold values, we identified a total of 1 217 567 single nucleotide substitution and indel candidates.
Identification of single nucleotide substitutions and indels
Of the 1 217 567 candidates, 1112 were commonly scored over the six wild‐type Micro‐Tom lines, which we called Micro‐Tom AM, Micro‐Tom MM(h), Micro‐Tom TGRC, Micro‐Tom KDRI, Micro‐Tom NBRP(h) (Hirakawa et al., 2013) and Micro‐Tom Brazil (see Plant materials in Experimental procedures for details). They were subjected to a diversity analysis together with the eight lines and two control lines. Pairwise genetic distances among lines were calculated from the 1112 mutations commonly genotyped across the 16 total lines, and the resultant dendrogram identified four major groups within the wild‐type Micro‐Tom lines: clusters NIVTS (Micro‐Tom AM), FRA (Micro‐Tom MM(h), Micro‐Tom MM(k) and Micro‐Tom Brazil), USA (Micro‐Tom TGRC) and JPN (Micro‐Tom NBRP(h), Micro‐Tom NBRP(k) and Micro‐Tom KDRI) (Figure S1). The seven mutant lines were classified into three clusters: NIVTS (TOMJPG1679_1, TOMJPG2893_1, EMS bulk set C_1 and EMS bulk set C_3), FRA (EMS bulk set C_2) and JPN (TOMJPE6018_1 and TOMJPG1269_1). Therefore, the 1 217 567 candidates were further filtered using the following criteria: (i) common homozygous mutations within the NIVTS, FRA and/or JPN clusters were filtered out as spontaneous SNPs/indel polymorphisms and (ii) variants, resulting from probable sequencing and/or alignment errors, that were common to more than two lines were removed; however, we cannot exclude the possibility that they may also be the result of mutations that seldom occur. In total, 1 140 687 and 70 960 candidates were removed from the data set by criteria 1 and 2, respectively. The remaining 5920 loci were considered induced mutation candidates. Indeed, only seven of the 5920 (0.1%) candidates were identical to the approximately 1.4 million spontaneous SNPs previously discovered in six tomato cultivars (Shirasawa et al., 2013).However, of the 5920 candidates, including the seven previously identified mutations, 71 mutations in EMS bulk set C_1 and 704 in EMS bulk set C_2 were located in particular genomic regions, including a 116‐kb region in chromosome 12 (from 1 641 739‐bp to 1 808 578‐bp position, 1 mutation/2350 bp) and a 475‐kb region in chromosome 9 (from 961 065‐bp to 1 435 621‐bp position, 1 mutation/674 bp), respectively. Because the frequency of mutations in the two regions was remarkably higher than other regions (see the Characterization of the SNVs and indels section), the 775 mutations were presumed to be introgressed from anonymous relatives of tomato and were removed from the data set. As a result, the remaining 5145 mutations were identified as highly confident SNV/indel mutations caused by mutagenesis. The average heterozygosity of the EMS mutants and gamma‐ray lines was 31.9% and 32.7%, respectively.
Characterization of the SNVs and indels
Among the four EMS mutants, we detected a total of 4690 mutations, including 4629 SNVs (98.7%), 47 deletion mutations (1.0%) and 14 insertions (0.3%), throughout the entire genome (Figure 1 and Table S2). The mean densities of the SNVs, deletion mutations and insertions were 1 SNV/0.8 Mb, 1 deletion/81 Mb and 1 insertion/271 Mb. Except for 92 mutations on SL2.40ch00, 3836 and 762 were located in heterochromatin (1 mutation/0.6 Mb) and euchromatin regions (1 mutation/1.0 Mb), respectively (Figure 2A–D). The SNVs consisted of 3209 C/G to T/A transitions (69.3%), 596 A/T to T/A transversions (12.9%), 437 C/G to A/T transversions (9.4%), and 246 A/T to G/C transitions (5.3%), followed by 86 A/T to C/G transversions (1.9%) and 55 C/G to G/C transversions (1.2%). The ratio of transitions to transversions was 2.9 (Figure 1). The most frequent indel observed was a 1‐bp deletion (23/47, 48.9%), followed by a 2‐bp deletion (8/47, 17.0%), and the longest was a 17‐bp deletion in EMS bulk set C_1 (Figure 3). The number of mutations in each mutant genome ranged from 409 in EMS bulk set C_2 to 2720 in TOMJPE6018_1 with densities ranging from 1 mutation/2.3 Mb to 1 mutation/349 kb, distributed throughout the genome (Table S2).
Figure 1
Numbers and proportions of induced mutations in the mutant genomes.
Figure 2
Genome positions of mutations of the mutant lines. The boxes represent the genomes of EMS bulk set C_1 (A), EMS bulk set C_2 (B), EMS bulk set C_3 (C), TOMJPE6018_1 (D), TOMJPG1269_1 (E), TOMJPG1679_1 (F) and TOMJPG2893_1 (G). Mutations with high‐, moderate‐, modifier‐ and low‐impact effects are shown with red, black, blue and grey lines. Vertical solid bars indicate heterochromatin regions.
Figure 3
Size distribution of indel mutations.
Numbers and proportions of induced mutations in the mutant genomes.Genome positions of mutations of the mutant lines. The boxes represent the genomes of EMS bulk set C_1 (A), EMS bulk set C_2 (B), EMS bulk set C_3 (C), TOMJPE6018_1 (D), TOMJPG1269_1 (E), TOMJPG1679_1 (F) and TOMJPG2893_1 (G). Mutations with high‐, moderate‐, modifier‐ and low‐impact effects are shown with red, black, blue and grey lines. Vertical solid bars indicate heterochromatin regions.Size distribution of indel mutations.We found 455 mutations, including 354 SNVs (77.8%), 92 deletions (20.2%) and 9 insertions (1.2%), in the genomes of the gamma‐ray mutants, corresponding with densities of 1 SNV/8 Mb, 1 deletion/31 Mb and 1 insertion/317 Mb on average (Figure 1). Other than eight mutations on SL2.40ch00, 344 and 103 were located in the heterochromatin (1 mutation/5.0 Mb) and euchromatin regions (1 mutation/5.6 Mb), respectively (Figure 2E–G and Table S2). The SNVs consisted of 93 C/G to T/A transitions (26.3%), 82 A/T to T/A transversions (23.2%), 71 A/T to G/C transitions (20.1%), 56 C/G to A/T transversions (15.8%), 35 A/T to C/G transversions (9.9%) and 17 C/G to G/C transversions (4.8%). The ratio of transitions to transversions was 0.9 (Figure 1). The most frequent deletion size was 1 bp (42/91, 46.2%), followed by 2 bp (19/91, 20.9%) and 3 bp (11/91, 12.1%) (Figure 3). The number of mutations per line ranged from 142 (1 mutation/7 Mb in TOMJPG2893_1) to 157 (1 mutation/6 Mb in TOMJPG2893_1) (Table S2).To investigate biases in the base composition flanking SNVs, we investigated base types in the flanking ±2 positions of SNVs (Figure 4). We observed significant biases in the base composition of the loci neighbouring SNVs compared with genome‐wide averages. In the flanking sequences of A (and those of T in the complementary strands) of the EMS mutations, T, C, G and A were overrepresented in the −2, −1, +1 and +2 positions, respectively. In the flanking sites of C (and G) of the EMS mutations, A and T were observed more frequently than expected at ±2 flanking positions, especially in the C/G to T/A transition sites. On the other hand, ±1 flanking bases of A and C SNVs showed a significant bias in the gamma‐ray mutations, and no biases were observed for the −2 and +2 positions.
Figure 4
Frequency of the preferential sequences flanking the induced mutations. Significant differences with respect to the genome background are indicated with * (P < 0.01) and ** (P < 0.05).
Frequency of the preferential sequences flanking the induced mutations. Significant differences with respect to the genome background are indicated with * (P < 0.01) and ** (P < 0.05).
Effects of the mutations on gene function
The functional effects of the 5145 induced mutations on 5152 gene models were predicted; six mutations were annotated on two overlapping gene models. The effects were roughly classified into the following four predefined impact categories using the SnpEff program (Cingolani et al., 2012): high (e.g. nonsense mutations and frameshift mutations)‐, moderate (e.g. missense mutations)‐, modifier (e.g. intron and intergenic mutations)‐ and low‐impact mutations (e.g. synonymous mutations) (see http://snpeff.sourceforge.net for details). Of the mutations, 11 (0.2%), 108 (2.1%), 4984 (96.7%) and 47 (0.9%) mutations had possible high, moderate, modifier and low impacts on gene function (Table 1 and Table S3). The predominant mutations were modifier intergenic variants (4662, 90.5%), followed by modifier intron (302, 5.9%), moderate missense (104, 2.0%) and low synonymous variants (47, 0.9%). The mutations with high and moderate impacts were more common in the gamma‐ray mutants (4.2%) than in the EMS mutants (2.2%).
Table 1
Number of sequence variations in the mutant and wild‐type genomes
EMS mutations
Gamma‐ray mutations
Spontaneous SNPs
%
%
%
High‐impact mutations
8
0.2
3
0.7
976
0.1
Moderate‐impact mutations
92
2.0
16
3.5
14 753
1.3
Modifier‐impact mutations
4551
96.9
433
95.2
1 112 192
97.4
Low‐impact mutations
44
0.9
3
0.7
13 315
1.2
Not predicted
2
0.0
0
0.0
280
0.0
Total
4697
100.0
455
100.0
1 141 516
100.0
Number of sequence variations in the mutant and wild‐type genomesOf the 11 high‐impact mutations, six were nonsense mutations, two frameshift mutations, two splice‐site acceptor variants and one missense mutation at the initial codon (Table 2). They were found in TOMJPE6018_1 (Solyc02g077450, Solyc03g006520, Solyc06g009290, Solyc07g044750 and Solyc08g065910), TOMJPE6018_1 (Solyc07g015860 and Solyc12g009110), EMS bulk set C_1 (Solyc01g006110), TOMJPG1269_1 (Solyc05g055020), TOMJPG1679_1 (Solyc01g107170) and TOMJPG2893_1 (Solyc03g118360); EMS bulk set C_2 did not contain any high‐impact mutations by the SNVs and indel variations.
Table 2
Genes with high‐impact mutations in the mutant genes
Mutant line
Mutation
Transcript ID
Description
EMS bulk set C_1
Splice acceptor variant
Solyc01g006110.2.1
[PDF1A] PP2A regulatory subunit TAP46
EMS bulk set C_2
Gene deletion
Solyc01g006580.2.1
2‐oxoglutarate‐dependent dioxygenase
EMS bulk set C_3
Frameshift (Δ216)
Solyc07g015860.2.1
Peptide deformylase
EMS bulk set C_3
Nonsense (W340*)
Solyc12g009110.1.1
O‐methyltransferase‐like protein
TOMJPE6018_1
Nonsense (Q30*)
Solyc02g077450.2.1
One zinc finger protein
TOMJPE6018_1
Splice acceptor variant
Solyc03g006520.2.1
Splicing factor 3b subunit 2
TOMJPE6018_1
Nonsense (C1046*)
Solyc06g009290.2.1
Lipid A export ATP‐binding/permease protein msbA
TOMJPE6018_1
Nonsense (Q297*)
Solyc07g044750.2.1
ATPase BadF/BadG/BcrA/BcrD‐type family
TOMJPE6018_1
Nonsense (E104*)
Solyc08g065910.1.1
[Bli5] Myb‐related transcription factor
TOMJPG1269_1
Nonsense (K134*)
Solyc05g055020.2.1
Light‐dependent short hypocotyls 1
TOMJPG1269_1
Gene deletion
Solyc06g053720.1.1
Ramosa1 C2H2 zinc finger transcription factor
TOMJPG1269_1
Gene deletion
Solyc06g053730.1.1
Serine/threonine protein kinase
TOMJPG1269_1
Gene deletion
Solyc06g053740.2.1
Ubiquitin carboxyl‐terminal hydrolase (Fragment)
TOMJPG1269_1
Gene deletion
Solyc06g053750.2.1
Ubiquitin carboxyl‐terminal hydrolase family protein expressed
Genes with high‐impact mutations in the mutant genes
Spontaneous polymorphisms in the Micro‐Tom lines
As described, we identified 1 140 687 SNPs and indel polymorphisms that varied between the NIVTS, FRA and JPN clusters, and the reference sequence, SL2.40. While 849 236 polymorphisms (74.4%) were shared in the NIVTS, FRA or JPN clusters, 23 083 (2.0%), 50 103 (4.4%) and 16 842 (1.5%) were shared between two of the three clusters, specifically NIVTS and FRA, NIVTS and JPN, and FRA and JPN, respectively (Figure S2). The remaining 173 552 (15.2%), 25 039 (2.2%) and 2832 (0.2%) were specific to the NIVTS, FRA and JPN clusters, respectively (Figure S2). The number of polymorphisms between NIVTS and FRA, NIVTS and JPN, and FRA and JPN was estimated to be 265 536 (173 552 + 50 103 + 25 039 + 16 842), 216 309 (173 552 + 23 083 + 2832 + 16 842) and 101 057 (25 039 + 23 083 + 2832 + 50 103), respectively.Most of the spontaneous polymorphisms in the Micro‐Tom lines were located in chromosome 5 followed by chromosomes 11 and 2 (Figure S3). On the other hand, the polymorphisms specific to the FRA cluster were more common in chromosome 6 (0–2 Mb) and chromosome 7 (2–5, 54–56 and 61–63 Mb) in comparison with the NIVTS and JPN clusters, while those specific to the NIVTS cluster were predominantly in chromosomes 4 (58–64 Mb) and 12 (whole chromosome).A total of 976 high‐impact SNPs with respect to gene function were predicted in 821 genes of the three Micro‐Tom clusters. While 674 high‐impact SNPs in 577 genes were shared in the three Micro‐Tom clusters, the other 302 in 244 genes were specific to one or two Micro‐Tom clusters (Figure S2 and Table S3). In addition, a total of 14 753 moderate‐impact SNPs, including 14 151 mis‐sense SNPs and 602 inframe indels, as well as 1 112 192 modifier‐impact SNPs, 13 315 low‐impact SNPs and 280 SNPs unassigned to the categories were also randomly distributed across the three Micro‐Tom clusters (Table S3).
Spontaneous and induced CNVs in the Micro‐Tom lines
Using the clusters defined in the SNP analysis (Figure S1), we analysed CNVs in the wild‐type and mutant lines belonging to FRA, NIVTS and JPN using the wild‐type line TMPJPF00001_1 as a reference. As expected, we did not observe any obvious CNVs in the lines of JPN, as the reference belonged to the cluster. On the other hand, we found eight (four insertions and four deletions) and six (one insertions and five deletions) spontaneous CNV candidates in chromosomes 4, 5, 6, 7 and 10 of FRA and chromosomes 4, 5, 6, 7 and 9 of NIVTS, respectively (Figure S4 and Table S4). Moreover, we observed multiple significant CNVs in chromosome 12 of the NIVTS cluster.In addition, we found CNVs that were probably induced by mutagenesis in chromosome 6 of TOMJPG1269_1 and chromosome 1 of EMS bulk set C_2 (Figure 5a,b, Figure S4 and Table S4). Subsequent PCR and sequencing analyses confirmed the deletion of a 37 390‐bp fragment (from positions 33 060 224 to 33 097 613 of SL2.40ch06) in the PCR product of TOMJPG1269_1 (Figure 5c; DDBJ accession number: LC003260). The deleted fragment consisted of four genes (Solyc06g053720, Solyc06g053730, Solyc06g053740 and the first exon of Solyc06g053750). The break points of the deletion had possible 10‐bp inverted repeat sequences, suggesting that this deletion was generated from a DSB around position 33 097 613, probably caused by gamma‐ray irradiation, followed by homologous recombination between the inverted repeats. The deletion in chromosome 1 of EMS bulk set C_1 was presumed to be approximately 23.5 kb, including one gene (Solyc01g006580), and a breaking point was not identified (Figure 5d; DDBJ accession number: LC003259). While the nucleotide sequences of both ends of the PCR products, 898 and 279 bases, showed significant similarities to those of chromosome 1 (from 1 165 002 to 1 165 899 and from 1 196 378 to 1 196 959), the internal sequences were highly similar to entire sequences of contigs that were unassigned to the tomato chromosomes, for example SL2.40ct26763 (GenBank accession number: AEKE02026766; 3121 bp in length) and SL2.40ct23375 (GenBank accession number: AEKE02023381; 2199 bp), and had no significant similarities for 1690 bases. Although candidates for a ~100‐kb deletion were observed in chromosome 0, it was excluded from the analysis due to unassigned and probable discontinuous sequences.
Figure 5
Copy number variations detected by CNV‐seq and the genome sequences. (a and b) CNVs in chromosome 6 of TOMJPG1269_1 and chromosome 1 of EMS bulk set C_2. Pentagons above the panels show the gene models predicted in the tomato genome, and those that were deleted from the mutant genomes are filled and named. (c) Genome sequences of the deleted site of the TOMJPG1269_1 (DDBJ accession number LC003260) and the corresponding position from 33 060 224 to 33 097 613 of the reference genome of SL2.40. The deleted regions in the TOMJPG1269_1 genome and the corresponding region in the reference tomato genome (SL2.40) are represented with colons and red letters, respectively, and the possible 10‐bp inverted repeat sequences are indicated in italics accompanied by arrows. (d) Genome structures of the deleted site of EMS bulk set C_2 (DDBJ accession number LC003259) and the corresponding region from 1 165 002 to 1 196 656 of the reference genome of SL2.40. Black, grey and white boxes indicate sequences conserved between the two genomes, showing significant similarities to the unassigned contigs to the tomato genome (names of the contigs are shown in the boxes), and lacking in the mutant genome, respectively. Dashed lines in the SL2.40 genome show undetermined sequences represented by N bases.
Copy number variations detected by CNV‐seq and the genome sequences. (a and b) CNVs in chromosome 6 of TOMJPG1269_1 and chromosome 1 of EMS bulk set C_2. Pentagons above the panels show the gene models predicted in the tomato genome, and those that were deleted from the mutant genomes are filled and named. (c) Genome sequences of the deleted site of the TOMJPG1269_1 (DDBJ accession number LC003260) and the corresponding position from 33 060 224 to 33 097 613 of the reference genome of SL2.40. The deleted regions in the TOMJPG1269_1 genome and the corresponding region in the reference tomato genome (SL2.40) are represented with colons and red letters, respectively, and the possible 10‐bp inverted repeat sequences are indicated in italics accompanied by arrows. (d) Genome structures of the deleted site of EMS bulk set C_2 (DDBJ accession number LC003259) and the corresponding region from 1 165 002 to 1 196 656 of the reference genome of SL2.40. Black, grey and white boxes indicate sequences conserved between the two genomes, showing significant similarities to the unassigned contigs to the tomato genome (names of the contigs are shown in the boxes), and lacking in the mutant genome, respectively. Dashed lines in the SL2.40 genome show undetermined sequences represented by N bases.
Discussion
We identified artificial mutations, including SNVs, indels and CNVs, induced by EMS and gamma rays by a whole‐genome resequencing analysis. We determined the preferential mutations caused by the chemical and physical mutagens, the sequences that were sensitive to the mutagens and the causative effects of the mutations on gene function.Of the SNVs caused by EMS mutants, C/G to T/A transitions were most frequent (Figure 1). This result was consistent with previous studies, and fit the proposed theory that alkylated guanines are easily paired with thymines, and led to misplaced adenines in the positions of guanines (Greene et al., 2003; Cooper et al., 2008). Therefore, C/G to T/A transitions are targeted in reverse‐genetic mutant screenings, and other types of mutations are filtered out (Rigola et al., 2009; Uchida et al., 2009). However, in the present study, we also observed non‐C/G‐to‐T/A‐transitions in the EMS mutants as reported in barley and rice as well as tomato (Caldwell et al., 2004; Minoia et al., 2010; Till et al., 2011). The mechanisms underlying the base substitutions are not clear. They are thought to result from the error‐prone NHEJ pathway for repairing DSBs occasionally induced by chemical mutagens due to less biased flanking sequences such as substitutions in gamma‐ray mutants, which are mainly derived from NHEJ and DSBs rather than base modifications. This was supported by the fact that the gamma‐ray mutants have very frequent small deletions (Sato et al., 2006), which are generated by the same mechanism as the NHEJ pathway (Morita et al., 2009; Naito et al., 2005).The induced mutations were evenly distributed throughout the genome regardless of the chromatin nature of gene‐rich euchromatin and repeat‐rich gene‐poor heterochromatin (Figure 2 and Table S2), indicating that artificial mutations induced by EMS and gamma‐ray irradiation occurred randomly throughout the genome. The mutations were, however, biased with respect to the flanking bases, especially in the C/G to T/A transition sites (Figure 4). This might indicate the efficiency of base modifications by chemical mutagens and/or that the misplacing of noncompatible bases in the opposite sides of the modified bases would be influenced by the flanking sequences. On the other hand, only the −1 and +1 positions were significantly biased in the flanking sequences of mutations other than the C/G to T/A transitions of the EMS mutants and mutations of the gamma‐ray mutants, which were supposed to be derived from DSBs and the NHEJ pathway (Morita et al., 2009; Naito et al., 2005). Considering the wild‐type nucleotide sequences of the genes, the desirable mutated codon could be predictable and selectable from mutant pools due to sequence biases at the flanking sites of mutated bases after EMS and gamma‐ray mutagenesis.One to five high‐impact variations were discovered in each mutant line by whole‐genome sequencing and the subsequent SNV, indel and CNV analyses (Table 2). This result suggested that the whole‐genome NGS technique is a convenient approach to identify high‐impact‐mutated genes associated with phenotypic variation. For instance, TOMJPE6018_1 had a nonsense mutation in the gene Solyc08g065910 encoding Myb‐related transcription factor protein, which is known as Blind‐like5 (Bli5), one of the homologs of a key regulator of leaf dissection in tomato (Busch et al., 2011). Because TOMJPE6018_1 indeed possesses abnormal leaf morphology (http://tomatoma.nbrp.jp/strainDetailAction.do?mutantNo=982), the Solyc08g065910 gene is a highly likely candidate gene for this mutant phenotype. To confirm this prediction, further functional analysis is required, for example complementation tests and expression analysis, as TOMJPE6018_1 had a number of moderate and modifier variations with potential phenotypic effects.We discovered a large number of spontaneous sequence variations in the wild‐type Micro‐Tom lines. To the best of our knowledge, there has been no report about such variations in other tomato varieties. The distribution of polymorphism over the genome agreed with previous studies (Asamizu et al., 2012; Kobayashi et al., 2014). In general, spontaneous polymorphisms should be distinguished from induced sequence variations in mutant studies. Therefore, SNP information on the three clusters (JPN, FRA and NIVTS) would be useful to eliminate background noise due to spontaneous variations in the genomes of Micro‐Tom‐derived mutants. Although approximately 28 000 SNPs have been reported between Micro‐Tom NBRP (cluster JPN) and Micro‐Tom MM (cluster FRA), the genome positions have not been released (Kobayashi et al., 2014). To facilitate mutant studies of Micro‐Tom, the polymorphism data from this study, as well as the induced mutation data for the mutant lines, were released in the Kazusa Tomato Genomic DataBase (KaTomicsDB: http://www.kazusa.or.jp/tomato).Furthermore, information on both spontaneous and induced mutations would be useful not only for background noise elimination in mutant studies, but also for the development of DNA markers in linkage analyses. Crossing Micro‐Tom mutants with genetically distant lines such as wild relatives often leads to phenotyping errors in the segregants because of differences in many genes with minor effects and QTLs conferring some agriculturally important traits, for example fruit size and shape. Chromosome segment substitution lines, in which only targeted genomic regions are substituted with sequences of wild relatives, would help avoid this problem, because many genomic regions are identical or similar to parental lines (Eshed and Zamir, 1994). MutMap is an alternative technique to identify genes with minor effects and QTLs by whole‐genome sequencing of bulked F2 segregants derived from a cross between a mutant and the original wild‐type line (Abe et al., 2012). In addition to those, the mutant lines sequenced in this study would be potential parents for crosses with Micro‐Tom mutants of interests. The progeny could be genotyped and the identified mutations could be used as DNA markers, so that the phenotypic analyses are not influenced by other minor‐effect genes.In conclusion, we clarified the frequency and types of genome‐wide mutations induced by EMS treatment and gamma‐ray irradiation. The results are useful for developing mutant screens with reverse‐genetic methods, and for further studies on mutations in a genetics and genomics framework.
Experimental procedures
Plant materials
The tomato mutant lines, that is three gamma‐ray irradiation lines (TOMJPG1269, TOMJPG1679 and TOMJPG2893), one EMS line (TOMJPE6018) and EMS bulk set C, derived from a cultivar Micro‐Tom, were obtained from the National BioResource Project (NBRP), Japan, through the University of Tsukuba (Saito et al., 2011). A single plant was randomly selected from each line for sequencing analysis except for three plants from EMS bulk set C, which were denoted by a branch number followed by an underscore. The phenotypes, plant ontology categories and photographs of the lines are available at the TOMATOMA database (http://tomatoma.nbrp.jp). A wild‐type Micro‐Tom (TOMJPF00001) was used as a control.In addition, data from eight wild‐type Micro‐Tom lines obtained from six distributors were employed:Micro‐Tom AM (Hirakawa et al., 2013; Shirasawa et al., 2010a, 2013): NARO Institute of Vegetable and Tea Sciences (NIVTS), Japan;Micro‐Tom MM(h) (Hirakawa et al., 2013; Shirasawa et al., 2010a, 2013): Institut National de la Recherche Agronomique (INRA Bordeaux), France;Micro‐Tom MM(k) (Kobayashi et al., 2014): INRA Bordeaux, France;Micro‐Tom NBRP(h) (Hirakawa et al., 2013; Shirasawa et al., 2013): NBRP, University of Tsukuba, Japan (accession number: TOMJPF00001);Micro‐Tom NBRP(k) (Kobayashi et al., 2014): NBRP, University of Tsukuba, Japan (accession number: TOMJPF00001);Micro‐Tom KDRI (Hirakawa et al., 2013; Shirasawa et al., 2013): Kazusa DNA Research Institute (KDRI), Japan;Micro‐Tom TGRC (Hirakawa et al., 2013; Shirasawa et al., 2013): Tomato Genetics Resource Center (TGRC), University of California, Davis, USA (accession number: LA3911); andMicro‐Tom Brazil: Dr Lázaro Eustáquio Pereira Peres, Universidade de São Paulo, Brazil. The SNP data are registered in Sol Genomics Network (Mueller et al., 2005) produced by Drs Martin Sargent (University of Birmingham, UK), Sajjad Awan (University of Warwick, UK) and Andrew Thompson (Cranfield University, UK).
Illumina sequencing analysis
Genomic DNA was extracted from leaves with the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). The DNA was sheared into 100‐ to 1000‐bp fragments with Acoustic Solubilizer (Covaris, Woburn, MA) and subjected to the SPRIworks System I for the Illumina sequencer (Beckman Coulter, Brea, CA) to polish the ends of the fragments, ligate indexed adapters (Bioo Scientific, Austin, TX) and select 300–600‐bp DNAs. The resultant DNAs were further separated on agarose gels for size selection of 450–550 bp, purified with the QIAquick Gel Extraction Kit (Qiagen) and subjected to real‐time quantitative PCR with KAPA Library Quantification Kits (KAPA Biosystems, Wilmington, MA) and ABI‐9700HT (Life Technologies, Carlsbad, CA) as described in the manufacturer's protocol. The nucleotide sequences were determined using massively parallel sequencing by synthesis with HiSeq 1000 (Illumina, San Diego, CA) in paired‐end, 101‐bp mode and MiSeq (Illumina) in paired‐end, 251‐bp mode.
Identification of SNVs and CNVs
The sequence reads were trimmed with fastq_quality_filter (‐q 10 ‐p 10), fastq_quality_trimmer (‐t 10 ‐l 250 for the MiSeq reads or 101 for the HiSeq reads), fastx_artifacts_filter, and fastx_clipper (‐a AGATCGGAAGAGC ‐l 250 or 101 ‐M 10 ‐n) in FASTX‐Toolkit (http://hannonlab.cshl.edu/fastx_toolkit; version 0.10.1), and the filtered reads were mapped onto the tomato reference genome sequence (SL2.40) with Bowtie 2 (Langmead and Salzberg, 2012; version 2.1.0; parameters of ‐‐minins 100 ‐‐no‐mixed ‐k 2). From the generated SAM files, only paired reads with distances of 100–500 base (excluding adaptor sequences) were collected and converted to BAM files. The BAM files were subjected to SNV calling with the mpileup command of SAMtools (Li et al., 2009; version 0.1.19; parameters of ‐Duf) and the view command of BCFtools (parameters of ‐vcg) and filtered with VCFtools (Danecek et al., 2011; version 0.1.11; parameters of ‐‐minQ 50, ‐‐minGQ 20, ‐‐minDP 10 and ‐‐maxDP 100). The resultant VCF files were categorized based on the criteria described in the Results section with VCFtools. The effects of mutations on gene function were predicted with SnpEff (Cingolani et al., 2012; version 3.4i; parameters of ‐no‐downstream and ‐no‐upstream), and Release‐21 files for genome assembly (Solanum_lycopersicum.SL2.40.21.dna.genome.fa), gene annotation (Solanum_lycopersicum.SL2.40.21.gtf) and peptide sequences (Solanum_lycopersicum.SL2.40.21.pep.all.fa) obtained from the Ensembl Plants database (Kersey et al., 2012: http://plants.ensembl.org) were used. In the SnpEff analysis, mutations were automatically assigned to impact categories predefined by the program, that is high‐, moderate‐, modifier‐ and low‐impact variations. Raw sequence data for Micro‐Tom MM(k) and Micro‐Tom NBRP(k) (Kobayashi et al., 2014; DRR000741, ERR340383 and ERR340384) were obtained from the DDBJ Sequence Read Archive (http://trace.ddbj.nig.ac.jp/dra), and the data were processed using the same methods described above.In addition, CNVs were detected with CNV‐seq (Xie and Tammi, 2009; version 0.2.7; parameter of ‐‐genome‐size 781666411) using the BAM files. The seven mutant lines as well as the two control lines (Kobayashi et al., 2014) were subjected to the CNV analysis as test lines using the wild‐type TOMJPF00001_1 as a reference line.
Validation of the SNV, indel and CNV candidates
The template DNA for the seven Micro‐Tom mutants and the wild‐type sample were prepared using 336 pairs of primers (Table S5) to amplify 336 randomly selected SNV candidate loci using 84 sets of 4‐plex PCR with the Multiplex PCR Kit (Qiagen). The primers were designed to amplify SNV candidates 200 bases from the primer binding sites with Primer3 (version 2.2.3, parameters of PRIMER_PRODUCT_SIZE_RANGE = 401–500, PRIMER_OPT_SIZE = 23, PRIMER_MIN_SIZE = 20 and PRIMER_MAX_SIZE = 25), and the thermal cycling conditions were as follows: a 15‐min initial denaturation at 95 °C; 35 cycles of 30‐s denaturation at 94 °C, 90 s of annealing at 60 °C and a 90‐s extension at 72 °C, and a 10‐min final extension at 72 °C. The amplicons were pooled, and 400‐ to 600‐bp fragments were selected on a 2% agarose gel and purified with the Gel Extraction Kit (Qiagen). The DNA samples were used for library preparation with the GS Titanium Rapid Library Preparation Kit and GS Titanium Rapid Library MID Adaptors Kit, and for sequencing using pyrosequencing technology with GS Junior (Roche, Basel, Switzerland). The sequence reads were mapped to the tomato reference sequences (SL2.40) to detect mutants by gsMapper in Newbler (Roch; version 2.9, parameters of ‐ud).The indel candidates were validated by a fluorescent fragment length analysis with a fluorescent fragment analyser, ABI‐3730xl, as described by Shirasawa et al. (2010b). The sequence information for the primer pairs is available in Table S5. The data were analysed with GeneMarker (SoftGenetics, State College, PA).The CNV candidates were validated by PCR with primer pairs designed at 1‐kb or 10‐kb intervals in the candidate regions (Table S5). PCR followed by gel electrophoresis was performed as described by Shirasawa et al. (2010b). In the subsequent sequencing analysis, PCR fragments of 8 and 2.3 kb were amplified from EMS bulk set C_2 with primer pairs 5′‐TCTTAACGGACGTCGACAAA‐3′ and 5′‐TCGATCGGAAGTCATTTCAC‐3′ and TOMJPG1269_1 with primer pairs 5′‐AAAAATGAACAAAGAGAAAGGTGA‐3′ and 5′‐TTTTCCATCCCCAATCATGT‐3′. The amplified fragments were fragmented with the HydroShear DNA Shearing System (Digilab, Marlborough, MA) and ligated in the pGEM‐T Easy Vector (Promega, Madison, WI) or pUC118 vector (Agilent Technologies, Santa Clara, CA). Nucleotide sequences of the plasmids were determined with BigDye Terminator v3.1 Cycle Sequencing Kit (Life Technologies) on ABI‐3730xl DNA sequencer (Life Technologies), and the sequence data were processed with Phred/Phrap (Ewing et al., 1998b; and Ewing and Green, 1998a) and Sequencher (Gene Codes Corporation, Ann Arbor, MI) and subjected to similarity searches against the tomato reference genome sequence, SL2.40 using BLAST (Altschul et al., 1997).
Clustering analysis of the Micro‐Tom lines
In addition to the SNP genotyping data of eight mutant and control lines, as well as those of Micro‐Tom MM(k) and Micro‐Tom NBRP(k) (Kobayashi et al., 2014), the data of six lines, that is Micro‐Tom AM, Micro‐Tom MM(h), Micro‐Tom NBRP(h), Micro‐Tom KDRI, Micro‐Tom TGRC and Micro‐Tom Brazil, were obtained from the Tomato Functional SNP DataBase (Hirakawa et al., 2013; http://www.kazusa.or.jp/tomato) and Sol Genomics Network (Mueller et al., 2005; http://www.sgn.cornell.edu). Genetic distances between all pairwise combinations of lines were calculated with Jaccard's method implemented in GGT2 (van Berloo, 2008), and a dendrogram was established using the neighbour‐joining method in MEGA5 (Tamura et al., 2011).Figure S1 A dendrogram of Micro‐Tom lines based on genetic distances calculated by the neighbor‐joining method.Click here for additional data file.Figure S2 Number of sequence variations shared across the three Micro‐Tom lines of JPN, FRA, and NIVTS.Click here for additional data file.Figure S3 Numbers of sequence variations specific to the three Micro‐Tom lines within non‐overlapping windows of 1 Mb size.Click here for additional data file.Figure S4 Genome positions of copy number variations.Click here for additional data file.Table S1 Statistics of the resequencing data.Click here for additional data file.Table S2 Number of sequence variations in the mutant genomes.Click here for additional data file.Table S3 Number of sequence variations for each sequence ontology.Click here for additional data file.Table S4 Copy number variations in the mutant and wild‐type genomes.Click here for additional data file.Table S5 Primer sequences for SNV, indel, and CNV validations.Click here for additional data file.
Authors: Paul J Kersey; Daniel M Staines; Daniel Lawson; Eugene Kulesha; Paul Derwent; Jay C Humphrey; Daniel S T Hughes; Stephan Keenan; Arnaud Kerhornou; Gautier Koscielny; Nicholas Langridge; Mark D McDowall; Karine Megy; Uma Maheswari; Michael Nuhn; Michael Paulini; Helder Pedro; Iliana Toneva; Derek Wilson; Andrew Yates; Ewan Birney Journal: Nucleic Acids Res Date: 2011-11-08 Impact factor: 16.971
Authors: Kai P Purnhagen; Esther Kok; Gijs Kleter; Hanna Schebesta; Richard G F Visser; Justus Wesseler Journal: Nat Biotechnol Date: 2018-09-06 Impact factor: 54.908
Authors: Nathaniel Graham; Gunvant B Patil; David M Bubeck; Raymond C Dobert; Kevin C Glenn; Annie T Gutsche; Sandeep Kumar; John A Lindbo; Luis Maas; Gregory D May; Miguel E Vega-Sanchez; Robert M Stupar; Peter L Morrell Journal: Plant Physiol Date: 2020-05-26 Impact factor: 8.340
Authors: Johann Petit; Cécile Bres; Jean-Philippe Mauxion; Fabienne Wong Jun Tai; Laetitia B B Martin; Eric A Fich; Jérôme Joubès; Jocelyn K C Rose; Frédéric Domergue; Christophe Rothan Journal: Plant Physiol Date: 2016-04-19 Impact factor: 8.340
Authors: Zhe Yan; Michela Appiano; Ageeth van Tuinen; Fien Meijer-Dekens; Danny Schipper; Dongli Gao; Robin Huibers; Richard G F Visser; Yuling Bai; Anne-Marie A Wolters Journal: Genes (Basel) Date: 2021-05-11 Impact factor: 4.096