| Literature DB >> 24552708 |
Ji-Eun Kim1, Sang-Keun Oh, Jeong-Hee Lee, Bo-Mi Lee, Sung-Hwan Jo.
Abstract
The tomato (Solanum lycopersicum L.) is a model plant for genome research in Solanaceae, as well as for studying crop breeding. Genome-wide single nucleotide polymorphisms (SNPs) are a valuable resource in genetic research and breeding. However, to do discovery of genome-wide SNPs, most methods require expensive high-depth sequencing. Here, we describe a method for SNP calling using a modified version of SAMtools that improved its sensitivity. We analyzed 90 Gb of raw sequence data from next-generation sequencing of two resequencing and seven transcriptome data sets from several tomato accessions. Our study identified 4,812,432 non-redundant SNPs. Moreover, the workflow of SNP calling was improved by aligning the reference genome with its own raw data. Using this approach, 131,785 SNPs were discovered from transcriptome data of seven accessions. In addition, 4,680,647 SNPs were identified from the genome of S. pimpinellifolium, which are 60 times more than 71,637 of the PI212816 transcriptome. SNP distribution was compared between the whole genome and transcriptome of S. pimpinellifolium. Moreover, we surveyed the location of SNPs within genic and intergenic regions. Our results indicated that the sufficient genome-wide SNP markers and very sensitive SNP calling method allow for application of marker assisted breeding and genome-wide association studies.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24552708 PMCID: PMC3907006 DOI: 10.14348/molcells.2014.2241
Source DB: PubMed Journal: Mol Cells ISSN: 1016-8478 Impact factor: 5.034
Summary of sequencing data and statistics obtained from mapping against the tomato reference genome
| Platform | Accession name | Accession no. (SRX#) | Total raw bases | Total raw reads | Reads after trimming | Reads mapped |
|---|---|---|---|---|---|---|
| HiSeq 2000 | 118405 | 40,049,336,282 | 396,528,082 | 378,384,282 | 366,065,700 | |
| GAII | 032869 | 39,527,019,832 | 391,356,632 | 320,611,344 | 285,949,799 | |
| GAIIx | PI212816 SE1 | 111861 | 921,292,920 | 15,354,882 | 12,807,056 | 12,223,430 |
| GAIIx | PI212816 SE2 | 111862 | 1,554,019,740 | 18,500,235 | 15,596,070 | 14,707,350 |
| GAIIx | PI114490 | 111858 | 1,809,075,540 | 30,151,259 | 26,055,403 | 25,613,628 |
| GAIIx | T5 | 111853 | 1,708,971,720 | 28,482,862 | 25,350,357 | 24,903,243 |
| GAIIx | OH9242 | 111849 | 1,353,093,900 | 22,551,565 | 20,993,463 | 20,369,579 |
| GAIIx | NC84173 | 111845 | 1,383,236,700 | 23,053,945 | 21,504,517 | 20,862,511 |
| GAIIx | FL7600 | 111557 | 1,702,107,120 | 28,368,452 | 25,681,817 | 25,130,637 |
| GS FLX | 036616 | 39,349,666 | 150,688 | 47,513 | 19,093 | |
| GS FLX | 036614 | 41,558,631 | 209,378 | 206,577 | 23,963 | |
| GS FLX | 036612 | 23,888,768 | 46,661 | 45,909 | 21,949 |
Summary of the genetic diversity of the tomato reference genome according to sequence coverage
| Coverage | Pileup | Homo-type SNPs | Hetero-type SNPs | Total SNPs | ||
|---|---|---|---|---|---|---|
| 2× | Pileup | 4,492 | (5.1%) | 84,441 | (94.9%) | 88,933 |
| New_Pileup | 12,874 | (17.5%) | 60,645 | (82.5%) | 73,519 | |
| 10× | Pileup | 5,550 | (4.1%) | 129,732 | (95.9%) | 135,282 |
| New_Pileup | 11,727 | (4.7%) | 237,800 | (95.3%) | 249,527 | |
| 20× | Pileup | 5,753 | (5.1%) | 108,110 | (94.9%) | 113,863 |
| New_Pileup | 11,100 | (6.0%) | 172,871 | (94.0%) | 183,971 | |
| 40× | Pileup | 6,052 | (7.3%) | 77,110 | (92.7%) | 83,162 |
| New_Pileup | 10,801 | (6.3%) | 159,585 | (93.7%) | 170,386 | |
Fig. 1.Venn diagram of SNPs according the raw data sequence coverage. (A) Homo-type SNPs in pileup, (B) homo-type SNPs in new pileup, (C) hetero-type SNPs in pileup, (D) hetero-type SNPs in new pileup. Colored lowercase letters a, b, c, and d indicate raw data sets representing 2×, 10×, 20× and 40× genome coverage, respectively. Numbers under the colored lowercase letters represent the number of SNPs.
Statistics of SNPs called from one resequencing and seven transcriptome data sets
| Accession name | Total # of SNP | SNP classified type
| |||||||
|---|---|---|---|---|---|---|---|---|---|
| Homo-type | Hetero-type | ||||||||
|
|
| ||||||||
| Total # of SNP | Total # of SNP | ||||||||
| Intron | Intron | ||||||||
| | 4,680,647 | 4,210,454 (89.9%) | 3,853,232 (91.5%) | 108,637 (2.6%) | 248,585 (5.9%) | 470,193 (10.1%) | 432,796 (92.0%) | 17,491 (3.8%) | 19,906 (4.2%) |
| PI212816 | 71,637 | 66,410 (92.7%) | 14,568 (21.9%) | 49,987 (73.8%) | 2,855 (4.3%) | 5,227 (7.3%) | 1,129 (28.2%) | 4,008 (76.7%) | 90 (1.7%) |
| PI114490 | 23,902 | 17,868 (74.8%) | 4,211 (23.6%) | 12,877 (72.1%) | 780 (4.4%) | 6,034 (25.2%) | 1,344 (22.3%) | 4,557 (75.5%) | 133 (2.2%) |
| T5 | 9,544 | 4,780 (50.1%) | 1,210 (25.3%) | 3,339 (69.9%) | 231 (4.8%) | 4,764 (49.9%) | 1,090 (22.9%) | 3,593 (75.4%) | 81 (1.7%) |
| OH9242 | 8,313 | 5,712 (68.7%) | 1,222 (21.4%) | 4,254 (74.5%) | 236 (4.1%) | 2,601 (31.3%) | 552 (21.2%) | 1,989 (76.5%) | 60 (2.3%) |
| NC84173 | 7,744 | 5,203 (67.2%) | 1,218 (23.4%) | 3,766 (72.4%) | 219 (4.2%) | 2,541 (32.8%) | 508 (20.0%) | 1,977 (77.8%) | 56 (2.2%) |
| FL7600 | 10,466 | 6,501 (62.1%) | 1,665 (25.6%) | 4,537 (69.8%) | 299 (4.6%) | 3,965 (37.9%) | 844 (21.3%) | 3,048 (76.9%) | 73 (1.8%) |
| M82 | 179 | 80 (44.7%) | 10 (12.5%) | 68 (85.0%) | 2 (2.5%) | 99 (54.3%) | 16 (16.2%) | 82 (82.8%) | 1 (1.0%) |
Intergenic region is defined as DNA sequences located between genes within the genome.
Genic region consists of exons and introns.
Exon includes the 3′-UTR, 5′-UTR, and coding regions.
Fig. 2.The SNPs distribution and density in S. pimpinellifolium. (A) The distribution of total SNPs in 12 chromosomes of S. pimpinellifolium: homo- and hetero-type SNPs of 12 chr. (B) The density of SNPs in 12 chr. of S. pimpinellifolium. The density was calculated as the average number of SNPs within a 1 kb region of each chromosome.
Fig. 3.The distribution of SNPs detected with (A) resequencing and (B) transcriptome data along 12 chromosomes from the S. pimpinellifolium. Homo- and hetero-type SNPs exhibit varied distribution across different chromosomes. The left y-axis represents the number of SNPs while the right y-axis indicates gene count. The horizontal x-axis represents the length (Mb) of each chromosome. Gray shade boxes in (B) are regions identified low gene number.