| Literature DB >> 34828355 |
Hyeonso Ji1, Yunji Shin1, Chaewon Lee1, Hyoja Oh1, In Sun Yoon1, Jeongho Baek1, Young-Soon Cha1, Gang-Seob Lee1, Song Lim Kim1, Kyung-Hwan Kim1.
Abstract
Next-generation sequencing technologies have enabled the discovery of numerous sequence variations among closely related crop varieties. We analyzed genome resequencing data from 24 Korean temperate japonica rice varieties and discovered 954,233 sequence variations, including 791,121 single nucleotide polymorphisms (SNPs) and 163,112 insertions/deletions (InDels). On average, there was one variant per 391 base-pairs (bp), a variant density of 2.6 per 1 kbp. Of the InDels, 10,860 were longer than 20 bp, which enabled conversion to markers resolvable on an agarose gel. The effect of each variant on gene function was predicted using the SnpEff program. The variants were categorized into four groups according to their impact: high, moderate, low, and modifier. These groups contained 3524 (0.4%), 27,656 (2.9%), 24,875 (2.6%), and 898,178 (94.1%) variants, respectively. To test the accuracy of these data, eight InDels from a pre-harvest sprouting resistance QTL (qPHS11) target region, four highly polymorphic InDels, and four functional sequence variations in known agronomically important genes were selected and successfully developed into markers. These results will be useful to develop markers for marker-assisted selection, to select candidate genes in map-based cloning, and to produce efficient high-throughput genome-wide genotyping systems for Korean temperate japonica rice varieties.Entities:
Keywords: InDel; SNP; japonica; marker; resequencing; rice; variation
Mesh:
Year: 2021 PMID: 34828355 PMCID: PMC8623644 DOI: 10.3390/genes12111749
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Number of variants per chromosome.
| Chromosome | No. of SNP | No. of InDels | No. of Variants 1 | Variant Rate 2 | Variant Density 3 |
|---|---|---|---|---|---|
| 1 | 51,577 | 12,651 | 64,228 | 673.7 | 1.5 |
| 2 | 40,194 | 9828 | 50,022 | 718.4 | 1.4 |
| 3 | 24,705 | 6669 | 31,374 | 1160.6 | 0.9 |
| 4 | 51,392 | 12,223 | 63,615 | 558.1 | 1.8 |
| 5 | 16,300 | 4302 | 20,602 | 1454.2 | 0.7 |
| 6 | 89,193 | 15,998 | 105,191 | 297.1 | 3.4 |
| 7 | 41,475 | 10,382 | 51,857 | 572.7 | 1.7 |
| 8 | 103,970 | 18,501 | 122,471 | 232.2 | 4.3 |
| 9 | 40,190 | 8938 | 49,128 | 468.4 | 2.1 |
| 10 | 74,963 | 13,057 | 88,020 | 263.7 | 3.8 |
| 11 | 168,447 | 33,650 | 202,097 | 143.6 | 7.0 |
| 12 | 88,715 | 16,913 | 105,628 | 260.6 | 3.8 |
| All | 791,121 | 163,112 | 954,233 | 391.1 | 2.6 |
1 Sum of SNPs and InDels; 2 mean base pair length within which a variant occurs; 3 mean number of variants per 1 kbp.
Figure 1Distributions of sequence variation and nucleotide diversity per 100 kbp on each of the 12 rice chromosomes. X-axis shows the physical distance along each chromosome in mega base-pairs (Mbp). Left-hand Y-axis shows the common logarithm of the number of variations; blue bars show variation frequency. Right-hand Y-axis shows nucleotide diversity within 100 kbp windows (π), represented by the orange line. The positions of well-known agronomically important genes harboring sequence variations in 24 Korean temperate japonica rice varieties were indicated by red arrows.
Figure 2Distribution of InDel sizes. Minus values are deletions, and positive values are insertions.
Figure 3Development of markers based on sequence variation between 24 Korean japonica rice varieties. (a) Development of InDel markers in the qPHS11 region. (b) Development of markers based on highly polymorphic InDels. (c) Development of gene-based markers; gene names are given on the right-hand side of the photograph. M: standard size markers; 1–24 represent the varieties Cheongho, Dami, Dongan, Dongjin, Giho, Haechanmulgyeol, Hiami, Hwacheong, Hwayeong, Ilpum, Jinbu43, Jopyeong, Joun, Junam, Nampyeong, Odae, Saeilmi, Saenuri, Samgwang, Seogan, Seomyeong, Sindongjin, Sobi, and Unbong40, respectively.
Classification of variants by predicted effects on gene function.
| Chromosome | Impact of Variant Effects | Sum 1 | |||
|---|---|---|---|---|---|
| High | Moderate | Low | Modifier | ||
| 1 | 350 | 2747 | 2360 | 58,771 | 64,228 (6.7%) |
| 2 | 254 | 2088 | 1779 | 45,901 | 50,022 (5.2%) |
| 3 | 76 | 632 | 591 | 30,075 | 31,374 (3.3%) |
| 4 | 310 | 2405 | 2119 | 58,781 | 63,615 (6.7%) |
| 5 | 61 | 557 | 581 | 19,403 | 20,602 (2.2%) |
| 6 | 238 | 2057 | 1792 | 101,104 | 105,191 (11.0%) |
| 7 | 187 | 1618 | 1489 | 48,563 | 51,857 (5.4%) |
| 8 | 393 | 2698 | 2530 | 116,850 | 122,471 (12.8%) |
| 9 | 198 | 1429 | 1322 | 46,179 | 49,128 (5.1%) |
| 10 | 266 | 2250 | 2009 | 83,495 | 88,020 (9.2%) |
| 11 | 903 | 6906 | 6074 | 188,214 | 202,097 (21.2%) |
| 12 | 288 | 2269 | 2229 | 100,842 | 105,628 (11.1%) |
| Total 2 | 3524 | 27,656 | 24,875 | 898,178 | 954,233 |
1 Number (percentage of chromosome); 2 number (percentage of impact).
Classification of variants by their effects.
| Impact of SNP Effect | SNP Effect | No. |
|---|---|---|
| HIGH | Frameshift | 2518 |
| Stop_gained | 478 | |
| Stop_lost | 147 | |
| Splice_acceptor_variant | 143 | |
| Splice_donor variant | 127 | |
| Start_lost | 74 | |
| Gene_fusion | 34 | |
| Feature_ablation | 3 | |
| Sum | 3524 | |
| MODERATE | Missense_variant | 25,436 |
| Inframe_insertion/deletion | 2219 | |
| 5_prime_UTR_truncation&exon_loss_variant | 1 | |
| Sum | 27,656 | |
| LOW | Synonymous_variant | 19,629 |
| Splice_region_variant | 3481 | |
| 5_prime_UTR_premature_start_codon_gain_variant | 1736 | |
| Stop_retained_variant | 25 | |
| initiator_codon_variant | 4 | |
| Sum | 24,875 | |
| MODIFIER | upstream_gene_variant | 361,453 |
| intergenic_region | 301,015 | |
| downstream_gene_variant | 144,322 | |
| intron_variant | 48,461 | |
| 3_prime_UTR_variant | 24,980 | |
| 5_prime_UTR_variant | 13,281 | |
| non_coding_transcript_exon_variant | 4663 | |
| intragenic_variant | 3 | |
| Sum | 898,178 |
Summary of sequence variations in well-known agronomically important genes.
| Gene Name | Gene ID | Trait | No. of Variation Sites 1 | Reference |
|---|---|---|---|---|
|
| Os01g0620100 | Cold tolerance | 2 | [ |
|
| Os01g0633500 | Grain color | 3 | [ |
|
| Os01g0846450 | Days to heading | 1 | [ |
|
| Os01g0883800 | Culm length | 2 | [ |
|
| Os02g0787300 | Grain size | 1 | [ |
|
| Os03g0407400 | Grain size | 1 | [ |
|
| Os03g0762000 | Days to heading | 1 | [ |
|
| Os04g0401000 | Blast disease resistance | 1 | [ |
|
| Os04g0615000 | Leaf width | 1 | [ |
|
| Os06g0142600 | Days to heading | 1 | [ |
|
| Os06g0275000 | Days to heading | 5 | [ |
|
| Os07g0580500 | Plant architecture | 1 | [ |
|
| Os08g0143400 | Days to heading | 1 | [ |
|
| Os11g0187200 | Days to heading | 4 | [ |
|
| Os11g0225100 | Blast disease resistance | 20 | [ |
|
| Os11g022530 | Blast disease resistance | 21 | [ |
|
| Os11g0598500 | Blast disease resistance | 29 | [ |
|
| Os12g0285100 | Blast disease resistance | 16 | [ |
1 Number of high- or moderate-impact effect variation sites.
Figure 4Structure and phylogeny analysis of 24 Korean temperate japonica rice varieties. (a) Assignment of 24 Korean temperate japonica rice varieties into three populations (A, B, and C) using STRUCTURE 2.3.4 software. The different colors represent different populations. (b) Phylogenetic tree of 24 Korean temperate japonica rice varieties. The phylogenetic tree was inferred using the maximum likelihood method and Tamura–Nei model. The tree with the highest log likelihood is shown.