| Literature DB >> 34707120 |
Jing Chen1, Jun-Tao Guo2.
Abstract
Insertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34707120 PMCID: PMC8551294 DOI: 10.1038/s41598-021-00583-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Somatic transcript indels in BRCA and LUAD.
| Cancer type | # of total indels | Overlap with germline indels | # of somatic transcript indels* | Deletions | Insertions | # of transcripts with indels |
|---|---|---|---|---|---|---|
| BRCA | 109,856 | 18,391 (16.74%) | 61,543 | 36,109 (58.67%) | 25,434 (41.33%) | 14,519 |
| LUAD | 91,159 | 17,900 (19.64%) | 43,684 | 27,148 (62.15%) | 16,536 (37.85%) | 13,593 |
| BRCA ∩ LUAD | 16,909 | 3916 (23.16%) | 9988 | 5330 (53.36%) | 4658 (46.64%) | 6600 |
| Germline | 1,267,008 | – | 498,938 | 284,597 (57.04%) | 214,341 (42.96%) | 17,278 |
*The numbers of somatic transcript indels in BRCA and LUAD are indels on transcripts after removing the ones that overlap with germline indels.
Somatic coding (CDS) indels in BRCA and LUAD.
| Cancer type | # of transcripts with CDS indels | CDS indels* | Deletions | Insertions | FS indels | NFS indels |
|---|---|---|---|---|---|---|
| BRCA | 3979 | 5320 (8.64%) | 3078(57.86%) | 2242 (42.14%) | 3947 (74.19%) | 1373(25.81%) |
| LUAD | 5458 | 7813 (17.89%) | 5526(70.73%) | 2287 (29.27%) | 6387 ( 81.75%) | 1426(18.25%) |
| BRCA ∩ LUAD | 798 | 835(8.36%) | 364(43.59%) | 471 (56.41%) | 496 (59.40%) | 339(40.60%) |
| Germline | 1180 | 1370 (0.62%) | 885(64.60%) | 485 (35.40%) | 679 (49.56%) | 691(50.44%) |
*The percentages are calculated against the number of transcript indels.
Top ten genes with multiple somatic CDS indels in BRCA and LUAD.
| BRCA* | LUAD* | BRCA ∩ LUAD* | Germline |
|---|---|---|---|
| SSPOP | |||
| PABPC3 | HLA-DRB1 | ||
| TTN | MUC5B | TEKT4 | |
| MUC16 | HAVCR1 | OR4C5 | |
| DSPP | PABPC1 | SCYGR8 | |
| RBM10 | ACIN1 | MAML3 | |
| RYR2 | ZFHX4 | ZFPM1 | |
| SPEN | CSMD3 | ABCA10 | |
| ZAN | KRT14 | ||
| TTN | FAM71D | MYO15B |
*The genes are ranked by the number of somatic CDS indels and the genes in bold are the ones in the 125 SMG list.
Figure 1Distribution of secondary structure types of somatic NFS indels and germline NFS indels.
Figure 2Distributions of the phyloP scores in somatic and germline CDS and UTR indels. (a) BRCA CDS indels; (b) LUAD CDS indels; (c) BRCA UTR indels; and (d) LUAD UTR indels. Blue is for somatic indels and red is for germline indels. The dashed line represents insertions and the solid line represents deletions.
Somatic transcript indels overlapping with TFBSs.
| Cancer type | Transcript indels | Overlapping with TFBSs | CDS indels | CDS indels overlapping with TFBSs | Non-CDS transcript indels | Non-CDS indels overlapping with TFBSs |
|---|---|---|---|---|---|---|
| BRCA | 61,543 | 16,646 (27.05%) | 5320 | 2367 (44.49%) | 56,223 | 14,279 (25.40%) |
| LUAD | 43,684 | 12,830 (29.37%) | 7813 | 3140 (40.19%) | 35,871 | 9690 (27.01%) |
| BRCA ∩ LUAD | 9988 | 2977 (29.81%) | 835 | 332 (39.76%) | 9153 | 2645 (28.90%) |
| Germline | 498,938 | 87,156 (17.47%) | 1370 | 520 (37.96%) | 497,568 | 86,636 (17.41%) |
Top ten genes with multiple high phyloP scores (> 5) in somatic CDS indels.
| BRCA | LUAD | Germline | |||
|---|---|---|---|---|---|
| CDS overlap with TFBS | CDS not overlap with TFBS | CDS overlap with TFBS | CDS not overlap with TFBS | CDS overlap with TFBS | CDS not overlap with TFBS |
| ADGRL3 | LZTR1 | RFX7 | |||
| CDH8 | ZEB2 | CLTCL1 | |||
| LRFN5 | CDK8 | RBBP6 | |||
| TAF2 | ATP2B1 | GJB7 | CHD9 | ||
| YTHDF2 | TTN | DBX1 | NEDD4 | ||
| TM9SF2 | PCDH9 | KCNH7 | GMNC | RERE | |
| TM9SF4 | PLCE1 | MEAF6 | CNOT1 | OR5AU1 | CARD11 |
| EPHB3 | TUBB8B | DHX9 | SRRM3 | HYDIN | |
| TTN | ADCYAP1R1 | DOCK5 | PSEN2 | ZNF730 | TMCC1 |
| PDE11A | TAPT1 | COG2 | SPON1 | DNAJC28 | |
*The genes in bold are among the125 SMG list.
Somatic transcript indels in 125 SMGs of BRCA and LUAD.
| Cancer type | Indels in SMGs | SMGs with indels in CDS regions | Indels in SMGs’ CDS | SMG CDS indels overlapping with TFBSs | Indels in SMGs’ non-CDS regions | SMG non-CDS indels overlap with TFBSs |
|---|---|---|---|---|---|---|
| BRCA | 1032 | 70 (56.00%) | 349 (33.82%) | 172 (49.28%) | 683 | 154 (22.55%) |
| LUAD | 685 | 71 (56.80%) | 267 (38.98%) | 132 (49.44%) | 418 | 105 (25.12%) |
| BRCA ∩ LUAD | 129 | 12 (9.6%) | 19 (14.73%) | 9 (47.37%) | 110 | 35 (31.82%) |
| Germline | 4818 | 9 (7.2%) | 11 (0.23%) | 5 (45.45%) | 4807 | 620 (12.90%) |