| Literature DB >> 30992303 |
Roven Rommel Fuentes1,2, Dmytro Chebotarov1, Jorge Duitama3,4, Sean Smith5, Juan Fernando De la Hoz4, Marghoob Mohiyuddin6, Rod A Wing1,7,8, Kenneth L McNally1, Tatiana Tatarinova9,10,11,12, Andrey Grigoriev5, Ramil Mauleon1, Nickolai Alexandrov1.
Abstract
Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 million allelic variants. We found enrichment of long SVs in promoters and an excess of shorter variants in 5' UTRs. Across the rice genomes, we identified regions of high SV frequency enriched in stress response genes. We demonstrated how SVs may help in finding causative variants in genome-wide association analysis. These new insights into rice genome biology are valuable for understanding the effects SVs have on gene function, with the prospect of identifying novel agronomically important alleles that can be utilized to improve cultivated rice.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30992303 PMCID: PMC6499320 DOI: 10.1101/gr.241240.118
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Distribution of structural variants per SV type
Figure 1.Distribution and classification of SVs. (A) Frequency of observations per SV cluster. Only 562 high-coverage samples were used for insertion detection. (B) Distribution of variant sizes by SV type. (C) Classification of variants in each peak (cluster frequency > 10 samples). (D) Frequencies of events with 98% sequence identity to known or potentially active TEs in rice.
Figure 2.Structure analysis based on selected CNVs and assuming K = [2, …, 9] subpopulations.
Figure 3.SVs in genome features. (A) Enrichment/depletion of deletions (green) and insertions (orange) in various genomic regions. As expected, genic regions have fewer SVs than intergenic ones, with CDSs and exons being the most conserved regions. (B) Distribution of deletion and insertion clusters near the transcription start site (TSS). Although the total number of SNPs is much larger than SV clusters, SVs affect more positions. The bump at about −366 bp just before the core promoter is explained by longer SVs associated with transposons. (C) Distribution of the number of deletions in the vicinities of start and end of transcription and translation (Supplemental Fig. S16). (D) P-values of the independence tests between predicted TFBS and deletions. Strong anti-correlation is observed at the TSS and ∼100 bp upstream. Distribution of P-values shows that in the core promoter area ([TSS-200, TSS]), deletions and TFBS are not independent.
Figure 4.Deleted genes in variety groups. (A) Percentage of deleted genes in each variety group. (B) Number of deleted genes (frequency ≥ 5) that are unique or shared between variety groups. Note that the number of the deleted genes in Japonica is lower can be explained by the bias introduced by using Nipponbare genome as a reference.