| Literature DB >> 32041992 |
John M Darlow1,2, Mark G Dobson3,4, Andrew J Green3,5, Prem Puri4,5,6, David E Barton3,5.
Abstract
ROBO2 gene disruption causes vesicoureteric reflux (VUR) amongst other congenital anomalies. Several VUR patient cohorts have been screened for variants in the ubiquitously expressed transcript, ROBO2b, but, apart from low levels in a few adult tissues, ROBO2a expression is confined to the embryo, and might be more relevant to VUR, a developmental disorder. ROBO2a has an alternative promoter and two alternative exons which replace the first exon of ROBO2b. We screened probands from 251 Irish VUR families for DNA variants in these. The CpG island of ROBO2a, which includes the non-coding first exon, was found to contain a run of six variants abolishing/creating CpG dinucleotides, including a novel variant, present in the VUR cases in one family, that was not present in 592 healthy Irish controls. In three of these positions, the CpG was created by the non-reference allele, and the reference allele was not the nucleotide that would result from spontaneous deamination of methylcytosine to thymine, suggesting that there might have been selection for variability in number of CpGs in this island. This is in marked contrast to the CpG island at the start of ROBO2b, which only contained a single variant that abolishes a CpG.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32041992 PMCID: PMC7010700 DOI: 10.1038/s41598-020-58818-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1ROBO2a RNA transcripts and protein sequences. Exon 2 of ROBO2a codes for the 36 amino-acids of the ROBO2a leader sequence. Exon 1 of ROBO2b codes for the 20 amino-acids of the ROBO2b leader sequence. The rest of the protein is coded by the other 25 exons from Exon 3 of ROBO2a, which is Exon 2 of ROBO2b. After cleavage of the leader sequences, the mature ROBO2a protein has just 4 more amino-acids than ROBO2b. Diagram modified from Yue et al.[8]; the transcript displays in our figure are from the University of California Santa Cruz (UCSC) Genome Browser.
DNA variants upstream and in the first two exons of ROBO2a and their flanking regions in VUR index cases.
| Index case screen | Family studies | Control studies | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| hg19 co-ord | hg38 co-ord | Region | Position | DNA change | Creates/removes CpG | HV | Het | HR | Allele freq | dbSNP No. | Cons.RS (GERP) | VUR + var | VUR − var | No. of fams | Allele freq | HV | Het | HR | Source |
| 75,954,739–41 | 75,905,588–90 | Upstream | c.-1386_-1384del | delATC TT | No | 0 | 1 | 237 | 0.0021 | rs748917584 | 0.714 | 1 | 1 | 1 | 0.0005 | 0 | 7 | 14,994 | gnomAD-NFE |
| 75,954,807 | 75,905,656 | Promoter | c.-1318 | AGG → AAG | No | 0 | 6 | 232 | 0.0126 | rs150510983 | 0.452 | — | — | — | 0.0165 | 0 | 3 | 88 | 1000 GP-GBR |
| 75,955,183 | 75,906,032 | Promoter | c.-942 | AGA → ATA | No | 0 | 2 | 237 | 0.0042 | rs151063300 | −0.468 | — | — | — | 0.0055 | 0 | 1 | 90 | 1000 GP-GBR |
| 75,955,760 | 75,906,609 | Promoter | c.-365 | 78 | 113 | 60 | 0.5359 | rs35862955 | 0.328 | — | — | — | 0.5330 | 25 | 47 | 19 | 1000 GP-GBR | ||
| 75,955,805 | 75,906,654 | Promoter | c.-320 | 0 | 2 | 249 | 0.0040 | rs754253590 | 1.27 | 2 | 4 | 2 | 0.0068 | 0 | 8 | 584 | This study | ||
| 75,955,894 | 75,906,743 | 5′UTR | c.-231 | 0 | 10 | 241 | 0.0199 | rs34678735 | 2.49 | — | — | — | 0.0304 | 1 | 34 | 557 | This study | ||
| 75,956,014 | 75,906,863 | 5′UTR | c.-111 | 0 | 1 | 250 | 0.0020 | — | 1.56 | 2 | 0 | 1 | 0 | 0 | 0 | 592 | This study | ||
| 75,956,047 | 75,906,896 | 5′UTR | c.-78 | 0 | 2 | 249 | 0.0040 | rs532303838 | 1.55 | 2 | 3 | 2 | 0.0036 | 0 | 54 | 15,002 | gnomAD-NFE | ||
| 75,956,058 | 75,906,907 | 5′UTR | c.-67 | 13 | 91 | 147 | 0.2331 | rs13073919 | −4.74 | — | — | — | 0.2308 | 2 | 38 | 51 | 1000 GP-GBR | ||
| 75,956,283 | 75,907,087 | IVS1 | c.-14 + 127 | 0 | 2 | 249 | 0.0040 | rs182521620 | 0.303 | 3 | 0 | 2 | 0.0017 | 0 | 2 | 590 | This study | ||
| 75,986,568 | 75,937,417 | IVS1 | c.-13–64 | T | Removes | 0 | 17 | 234 | 0.0339 | rs539728114 | 0.236 | — | — | — | 0.0330 | 0 | 6 | 85 | 1000 GP-GBR |
‘Position’ is based on the ‘A’ of the initiation codon methionine (‘ATG’) at position 300 of NCBI Reference Sequence NM_001128929.3 (GI: 586946395) = nucleotide 1. However, this reference sequence begins at c.-299, and positions 5′ to that can be found on the genomic reference sequence NC_000003.12 (GRCh38.p7 Primary Assembly) using any genome browser and the hg38 co-ordinates given in column 2 above. Description of the variants is as recommended by HGVS (www.hgvs.org/mutnomen). ‘DNA change’ means the change relative to the allele currently held to be the reference allele. The first variant listed is a deletion of 3 nucleotides. In the gnomAD database, this variant is described as residing at one place to 5′ of the HGVS starting position, i.e. at 75,954,738 hg19, and having alleles as Reference ‘TATC’ and Variant ‘T’, whereas we have used the HGVS recommended description, delATC. To illustrate the change, we have shown the two reference nucleotides on either side of the 3 deleted. All the other variants listed are single nucleotide substitutions, but we show one nucleotide on either side of the variant position. The reference sequence is on the left; the variant sequence is on the right. ‘Creates/removes CpG’ also applies relative to the reference allele, and the CpG removed or created by the variant allele is underlined. The bold and italics indicates the variants within the 549-bp ROBO2a CpG island (see main text). Abbreviations: HV, homozygous variant; Het, heterozygote; HR, homozygous RefSeq allele; Cons., conservation in mammals by ‘rejected substitution’ (RS) score by Genomic Evolutionary Rate Profiling (GERP); VUR + var, number of VUR cases found to have the variant; VUR – var, number of VUR cases not having the variant in the same families; gnomAD-NFE allele and genotype frequencies in non-Finnish Europeans in the gnomAD database; 1000 GP-GBR, allele and genotype frequencies in the ‘British in England and Scotland (GBR)’ subset of the 1000 Genome Project data.
Figure 2Comparison of the arrangement of CpG island, 5′ untranslated region (□) and coding (■) of the protein leader sequence in the ROBO2a isoform with the arrangement of the same elements of the ROBO2b isoform.
DNA variants previously reported in our VUR index cases in the equivalent region of ROBO2b to the variants reported in ROBO2a in Table 1.
| hg19 co-ord | hg38 co-ord | Region | Position | DNA Change | Changes CpG | dbSNP No. |
|---|---|---|---|---|---|---|
| 77,088,313 | 77,039,162 | Promoter | c.-1624 | GCT → GTT | No | — |
| 77,088,402 | 77,039,251 | Promoter | c.-1535 | CCA → | Creates | rs73110408 |
| 77,088,753 | 77,039,602 | Promoter | c.-1084 | rs11919722 | ||
| 77,088,759 | 77,039,608 | Promoter | c.-1078 | rs77461867 | ||
| 77,089,095 | 77,039,944 | Promoter | c.-842 | rs9835590 | ||
| 77,089,241 | 77,040,090 | Promoter | c.-696 | TCC → TTC | No | — |
| 77,089,395 | 77,040,244 | 5′UTR | c.-592 | TTC → TCC | No | rs3923745* |
| 77,089,478 | 77,040,327 | 5′UTR | c.-459 | GGG → GAG‡ | No | — |
| 77,089,698 | 77,040,547 | 5′UTR | c.-239 | AGT → AAT | No | — |
| 77,089,699 | 77,040,548 | 5′UTR | c.-238 | GTG → GGG | No | rs3923744 |
| 77,090,112 | 77,040,961 | IVS1 | c.61 G + 115 | TCC → TTC | No | rs114060047 |
All variants recorded in this table are single-nucleotide substitutions, but one nucleotide on either side of the variant position is also shown, as in Table 1, in order to show whether a CpG dinucleotide is present or not.
*This variant was not recorded in dbSNP at the time of publication of our ROBO2b paper (ref. [4]). The variants without rs numbers, which were found in single index cases and are reported in Appendix 2 of ref. [4], are still not in dbSNP and none is present in gnomAD either.
‡This variant was erroneously reported in ref. [4] as having been found in only one PCR. On checking chromatograms for this paper, we find that it was present in two overlapping amplicons using different primer-pairs, so is unequivocal.
‘Position’ is based on the ‘A’ of the initiation codon methionine (‘ATG’) at position 644 of NCBI Reference Sequence NM_002942.4 (GI: 299116179) = nucleotide 1. However, this reference sequence begins at c.-643, and positions 5′ to that can be found on the genomic reference sequence NC_000003.12 2 (GRCh38.p7 Primary Assembly) using any genome browser and the hg38 co-ordinates given in column 2 above. The bold and italics indicates the variants within the 700-bp ROBO2b CpG island (see main text). ‘Moves left M’ means that the base-change shifts a CpG by one position to 5′ (it changes CpCpG to CpGpG) and that the cytosine of the reference CpG is methylated in some cell-lines of ENCODE. For other notes and abbreviations, see legend to Table 1.
Figure 3Variation in numbers of CpG dinucleotides in the CpG islands at the beginnings of the ROBO2 ‘a’ and ‘b’ transcripts.