| Literature DB >> 35590085 |
Mateusz Kołomański1, Joanna Szyda1, Magdalena Frąszczak1, Magda Mielczarek2.
Abstract
Copy number variants (CNVs) may cover up to 12% of the whole genome and have substantial impact on phenotypes. We used 5867 duplications and 33,181 deletions available from the 1000 Genomes Project to characterise genomic regions vulnerable to CNV formation and to identify sequence features characteristic for those regions. The GC content for deletions was lower and for duplications was higher than for randomly selected regions. In regions flanking deletions and downstream of duplications, content was higher than in the random sequences, but upstream of duplication content was lower. In duplications and downstream of deletion regions, the percentage of low-complexity sequences was not different from the randomised data. In deletions and upstream of CNVs, it was higher, while for downstream of duplications, it was lower as compared to random sequences. The majority of CNVs intersected with genic regions - mainly with introns. GC content may be associated with CNV formation and CNVs, especially duplications are initiated in low-complexity regions. Moreover, CNVs located or overlapped with introns indicate their role in shaping intron variability. Genic CNV regions were enriched in many essential biological processes such as cell adhesion, synaptic transmission, transport, cytoskeleton organization, immune response and metabolic mechanisms, which indicates that these large-scaled variants play important biological roles.Entities:
Keywords: 1000 Genomes Project; Copy number variants; DNA sequence complexity; GC content
Mesh:
Year: 2022 PMID: 35590085 PMCID: PMC9365719 DOI: 10.1007/s13353-022-00704-0
Source DB: PubMed Journal: J Appl Genet ISSN: 1234-1983 Impact factor: 2.653
Fig. 1Duplication and deletion length (bp)
Guanine-cytosine pair content (%) in the investigated regions
| Region | Min | Mean | Max | Sd |
|---|---|---|---|---|
| Set 1 (randomised duplications) | 31.74 | 41.59 | 65.74 | 5.63 |
| Upstream duplications | 7.00 | 41.24 | 83.00 | 11.59 |
| Downstream duplications | 6.00 | 42.60 | 86.00 | 10.73 |
| Set 3 (randomised upstream and downstream duplications) | 1.00 | 41.42 | 84.00 | 10.54 |
| Set 2 (randomised deletions) | 20.56 | 41.54 | 77.19 | 6.53 |
| Upstream deletions | 0.00 | 41.82 | 84.00 | 10.53 |
| Downstream deletions | 0.00 | 42.05 | 89.00 | 10.47 |
| Set 4 (randomised upstream and downstream deletions) | 0.00 | 41.41 | 89.00 | 10.66 |
Fig. 2The number of LCRs overlapped duplications and deletions
Content of low-complexity regions (LCR) within CNV-related regions
| Regions | Number of overlapped LCRs | Content of LCRs (%) | ||||
|---|---|---|---|---|---|---|
| Min | Mean | Max | Min | Mean | Max | |
| Set 1 (randomised duplications) | 0 | 58 | 114 | 0.00 | 4.49 | 59.44 |
| Upstream duplications | 0 | 0 | 3 | 0.00 | 4.73 | 100.00 |
| Downstream duplications | 0 | 0 | 2 | 0.00 | 3.47 | 100.00 |
| Set 3 (randomised upstream and downstream duplications) | 0 | 0 | 3 | 0.00 | 4.19 | 100.00 |
| Set 2 (randomised deletions) | 0 | 6 | 25 | 0.00 | 4.55 | 98.07 |
| Upstream deletions | 0 | 0 | 3 | 0.00 | 4.44 | 100.00 |
| Downstream deletions | 0 | 0 | 3 | 0.00 | 4.16 | 100.00 |
| Set 4 (randomised upstream and downstream deletions) | 0 | 0 | 4 | 0.00 | 4.36 | 100.00 |
Fig. 3Low-complexity sequence content in duplications and deletions (CNVs not overlapping any region are not presented)
Functional annotation of duplications and deletions
| Consequences for duplications | SO accession | Percent of variants |
|---|---|---|
| Intron variant | SO:0,001,627 | 25 |
| Transcript amplification | SO:0,001,889 | 24 |
| Coding sequence variant | SO:0,001,580 | 12 |
| Feature elongation | SO:0,001,907 | 8 |
| Non-coding transcript exon variant | SO:0,001,792 | 8 |
| 5-prime UTR variant | SO:0,001,623 | 7 |
| 3-prime UTR variant | SO:0,001,624 | 6 |
| Upstream gene variant | SO:0,001,631 | 3 |
| Downstream gene variant | SO:0,001,632 | 3 |
| NMD transcript variant | SO:0,001,621 | 2 |
| Non-coding transcript variant | SO:0,001,619 | 1 |
| Other | - | 1 |
| Feature truncation | SO:0,001,906 | 31 |
| Intron variant | SO:0,001,627 | 31 |
| Non-coding transcript variant | SO:0,001,619 | 10 |
| Upstream gene variant | SO:0,001,631 | 4 |
| Non-coding transcript exon variant | SO:0,001,792 | 4 |
| Downstream gene variant | SO:0,001,632 | 4 |
| Transcript ablation | SO:0,001,893 | 4 |
| Intergenic variant | SO:0,001,628 | 3 |
| Coding sequence variant | SO:0,001,580 | 3 |
| NMD transcript variant | SO:0,001,621 | 3 |
| 5-prime UTR variant | SO:0,001,623 | 1 |
| 3-prime UTR variant | SO:0,001,624 | 1 |
| Stop lost | SO:0,001,578 | 1 |