| Literature DB >> 35562597 |
Diamanto Skopelitou1,2, Aayushi Srivastava1,2, Beiping Miao1, Abhishek Kumar1,3,4, Dagmara Dymerska5, Nagarajan Paramasivam6, Matthias Schlesner7, Jan Lubinski5, Kari Hemminki1,8, Asta Försti1, Obul Reddy Bandapalli9,10.
Abstract
About 15% of colorectal cancer (CRC) patients have first-degree relatives affected by the same malignancy. However, for most families the cause of familial aggregation of CRC is unknown. To identify novel high-to-moderate-penetrance germline variants underlying CRC susceptibility, we performed whole exome sequencing (WES) on four CRC cases and two unaffected members of a Polish family without any mutation in known CRC predisposition genes. After WES, we used our in-house developed Familial Cancer Variant Prioritization Pipeline and identified two novel variants in the solute carrier family 15 member 4 (SLC15A4) gene. The heterozygous missense variant, p. Y444C, was predicted to affect the phylogenetically conserved PTR2/POT domain and to have a deleterious effect on the function of the encoded peptide/histidine transporter. The other variant was located in the upstream region of the same gene (GRCh37.p13, 12_129308531_C_T; 43 bp upstream of transcription start site, ENST00000266771.5) and it was annotated to affect the promoter region of SLC15A4 as well as binding sites of 17 different transcription factors. Our findings of two distinct variants in the same gene may indicate a synergistic up-regulation of SLC15A4 as the underlying genetic cause and implicate this gene for the first time in genetic inheritance of familial CRC.Entities:
Keywords: Familial colorectal cancer; Germline variant; SLC15A4; Whole exome sequencing
Mesh:
Substances:
Year: 2022 PMID: 35562597 PMCID: PMC9250485 DOI: 10.1007/s00438-022-01896-0
Source DB: PubMed Journal: Mol Genet Genomics ISSN: 1617-4623 Impact factor: 2.980
Fig. 1a Pedigree of the studied family with CRC aggregation over three generations and the presence of the missense and upstream variants in the SLC15A4 gene b Graphical overview of the filtering process according to the Familial Cancer Variant Prioritization Pipeline version 2 (FCVPPv2)
Overview of the top exonic variants prioritized in the studied CRC family
| Gene name | Chromosomal position | Exonic classification | Pedigree segregation | NFE allele frequency | CADD SCORE | Conservational scores | Intolerance scores (%) | Deleteriousness scoresa (%) | Amino acid change | Snap2 | Protein function | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ExAC | gnomAD | GERP + + | PhyloP | PhastCons | Effect score | Accuracy (%) | |||||||||
| PTGES | 9_132501952_C_T | Nonsyn SNV | III2, III3, III4, III5, IV2 | 2.10 × 10–4 | 8.43 × 10–5 | 34 | 4.67 | 7.723 | 1 | 75 | 80 | A133T | 16 | 59 | Glutathione-dependent prostaglandin E synthase, involved in inflammatory responses, fever, pain |
| SLC15A4 | 12_129285482_T_C | Nonsyn SNV | III2, III3, III4, III5, IV2 | 0 | 0 | 23.7 | 5.49 | 5.609 | 1 | 100 | 90 | Y444C | 44 | 71 | Proton-dependent peptide/histidine transporter, regulation of innate immune responses |
Chromosomal position, classification, pedigree segregation, allele frequency in the Non-Finnish European (NFE) population, PHRED-like CADD score, conservational score and the percentage of reached intolerance and deleteriousness scores are summarized for each variant. Snap2 results for the predicted amino acid changes are included with calculated effect scores and accuracies given in %. Respective protein functions of the encoded gene products are derived from Genecards (Stelzer et al. 2016). Non-syn SNV-non-synonymous single nucleotide variant
aFollowing predictions given by deleteriousness scores were considered as favorable in our analysis: SIFT–Damaging (D); Polyphen2_HumDiv, Polyphen2_HumVar–Probably damaging (D) and Possibly damaging (P); LRT–Deleterious (D); MutationTaster–Disease causing (D) and disease causing automatic (A); MutationAssesor–High (H) and medium (M); FATHMM–Damaging (D); MetaSVM–Damaging (D); MetaLR–Damaging (D); Reliability Index ≥ 5; VEST3 ≥ 0.5; PROVEAN–Damaging (D)
Fig. 2In silico analysis results of the SLC15A4 variant p.Y444C a Graphical overview of the SLC15A4 protein with the PTR2 domain. Somatic mutations identified in CRC were extracted from cBioPortal (www.cbioportal.org) on 13th of December 2020 using the TCGA PanCancer data and are represented by dark pins. The germline missense variant identified in the studied CRC family is highlighted in the form of a yellow pin. b Snap2 heatmap depicting the functional impact of amino acid substitutions. The missense mutation p.Y444C is highlighted by grey boxes. c Extract of multiple sequence alignment of amino acids 430–460 of SLC15A4 and orthologs. The mutation site is highlighted by a yellow box
Fig. 3In silico analysis results of the PTGES variant p.A133T a Graphical overview of the PTGES protein with MAPEG domain. Somatic mutations identified in CRC are extracted from cBioPortal (www.cbioportal.org) on 13th of December 2020 using the TCGA PanCancer data and are represented by dark pins. The germline missense variant identified in the studied CRC family is highlighted in the form a yellow pin. b Snap2 heatmap depicting the functional impact of amino acid substitutions. The missense mutation p.A133T is highlighted by grey boxes. c Extract of multiple sequence alignment of amino acids 120–150 of PTGES and orthologs. The mutation site is highlighted by a yellow box
Analysis results of the SLC15A4 upstream variant identified in the studied CRC family
| Gene name | Chromosomal position | Variant annotation | Pedigree segregation | NFE allele frequency | CADD v1.6 | Bedtools intersect | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ExAC | gnomAD | CADD SCORE | Chromatin state | TFBS | TFBSPeaksc | Promoter | ||||||||
| ChromHMMa state | ChromHMMa score | Segwayb | Start | End | Strand | |||||||||
| SLC15A4 | 12_129308531_C_T | upstream | III2, III3, III4, III5, IV2 | 0 | 3.75 × 10–3 | 11.38 | TssA | 0.969 | TSS | 52 | 115 | 129,308,487 | 129,308,588 | – |
Chromosomal position, variant annotation, pedigree segregation and allele frequency in the Non-Finnish European (NFE) population are listed. The PHRED-like CADD score, annotation of the chromatin state and location within transcription factor binding sites (TFBS) are derived from CADD v1.6. Affected promoter region according to Bedtools intersect function and SEA, FANTOM5 databases are included with respective start and end positions (Lizio et al. 2015; Wei et al. 2016)
aChromHMM: The ChromHmm score shows the proportion of 127 cell types of the Roadmap Epigenomics project in a particular chromatin state with scores closer to 1 indicating more cell types in the particular chromatin state. The 15 chromatin states are defined as follows: TssA–Active transcription start site (TSS), TssAFInk – Flanking active TSS, TxFlnk–Transcribed at gene 5′ and 3′, Tx–Strong transcription, TxWk–Weak transcription, EnhG–Genic enhancers, Enh–Enhancers, ZNF/Rpts–ZNF genes and repeats, Het–Heterochromatin, TssBiv–Bivalent/Poised TSS/Enhancers, BivFlnk–Flanking bivalent TSS/Enhancer, EnhBiv–Bivalent enhancers, ReprPC–Repressed PolyComb, ReprPCWk–Weak Repressed PolyComb, Quies– Quiescent/low (Ernst and Kellis 2012; Roadmap Epigenomics et al. 2015)
bSegway: Segway uses a genomic segmentation method to annotate the chromatin state based on multiple datasets of ChIP-seq experiments. The chromatin states can be annotated as follows: D–dead, F0/1–FAIRE, R0/1/2/4/5–Repressed Region, H3K9me1–histone 3 lysine 9 monomethylation, L0/1–Low zone, GE0/1/2–Gene body (end),TF0/1/2–Transcription factor activity, C0–CTCF, GS–Gene body (start), E/GM–Enhancer/gene middle, GM0/1–Gene body (middle), TSS–Transcription start site, ZnfRpts–zinc finger repeats (Hoffman et al. 2012)
cTFBS peaks: The number of overlapping ChIP TFBS peaks summed over different cell types/tissue
Summary of transcription factors exclusively targeting either the wild type (WT) or the mutant sequence (MUT) of SLC15A4 upstream region
| Transcription factor | Targeting | Matrix ID | Relative scorea | Start | End | Strand | Predicted sequence |
|---|---|---|---|---|---|---|---|
| MEIS2 | WT | MA0774.1 | 0.84 | 116 | 123 | + | gggacAGG |
| NR1D2 | WT | MA1532.1 | 0.81 | 108 | 122 | + | tgggttctgggacAG |
| RARA::RXRG | WT | MA1149.1 | 0.80 | 109 | 126 | + | gggttctgggacAGGTGA |
| RBPJ | WT | MA1116.1 | 0.86 | 113 | 122 | + | tctgggacAG |
| RORC | WT | MA1151.1 | 0.82 | 110 | 121 | + | ggttctgggacA |
| SREBF1 | WT | MA0595.1 | 0.80 | 118 | 127 | – | GTCACCTgtc |
| STAT1 | WT | MA0137.2 | 0.84 | 109 | 123 | – | CCTgtcccagaaccc |
| MA0137.3 | 0.88 | 111 | 121 | + | gttctgggacA | ||
| TGIF2LX | WT | MA1571.1 | 0.81 | 117 | 128 | – | GGTCACCTgtcc |
| 0.81 | 117 | 128 | + | ggacAGGTGACC | |||
| TGIF2LY | WT | MA1572.1 | 0.82 | 117 | 128 | – | GGTCACCTgtcc |
| 0.82 | 117 | 128 | + | ggacAGGTGACC | |||
| GRHL2 | MUT | MA1105.2 | 0.83 | 116 | 127 | + | ggaacAGGTGAC |
| MYF6 | MUT | MA0667.1 | 0.82 | 118 | 127 | + | aacAGGTGAC |
| NFATC2 | MUT | MA0152.1 | 0.90 | 115 | 121 | – | Tgttcca |
| PRDM4 | MUT | MA1647.1 | 0.81 | 114 | 124 | – | ACCTgttccag |
| SCRT1 | MUT | MA0743.1 | 0.83 | 114 | 128 | + | ctggaacAGGTGACC |
| MA0743.2 | 0.85 | 113 | 128 | + | tctggaacAGGTGACC | ||
| SCRT2 | MUT | MA0744.1 | 0.85 | 114 | 126 | + | ctggaacAGGTGA |
| MA0744.2 | 0.85 | 113 | 128 | + | tctggaacAGGTGACC | ||
| TEF | MUT | MA0843.1 | 0.80 | 110 | 121 | – | Tgttccagaacc |
| ZBTB26 | MUT | MA1579.1 | 0.92 | 107 | 121 | – | Tgttccagaacccag |
Respective transcription factor binding sites (TFBS) are identified with Jaspar2020 and the default relative profile score threshold of 80%. Matrix ID, relative scores, start and end positions, strand information as well as respective binding sequences are included
aA relative score of 1 is representing the maximum likelihood sequence for the motif