| Literature DB >> 25237057 |
Sandra Steyaert1, Wim Van Criekinge2, Ayla De Paepe2, Simon Denil2, Klaas Mensaert2, Katrien Vandepitte3, Wim Vanden Berghe4, Geert Trooskens2, Tim De Meyer2.
Abstract
Monoallelic gene expression is typically initiated early in the development of an organism. Dysregulation of monoallelic gene expression has already been linked to several non-Mendelian inherited genetic disorders. In humans, DNA-methylation is deemed to be an important regulator of monoallelic gene expression, but only few examples are known. One important reason is that current, cost-affordable truly genome-wide methods to assess DNA-methylation are based on sequencing post-enrichment. Here, we present a new methodology based on classical population genetic theory, i.e. the Hardy-Weinberg theorem, that combines methylomic data from MethylCap-seq with associated SNP profiles to identify monoallelically methylated loci. Applied on 334 MethylCap-seq samples of very diverse origin, this resulted in the identification of 80 genomic regions featured by monoallelic DNA-methylation. Of these 80 loci, 49 are located in genic regions of which 25 have already been linked to imprinting. Further analysis revealed statistically significant enrichment of these loci in promoter regions, further establishing the relevance and usefulness of the method. Additional validation was done using both 14 whole-genome bisulfite sequencing data sets and 16 mRNA-seq data sets. Importantly, the developed approach can be easily applied to other enrichment-based sequencing technologies, like the ChIP-seq-based identification of monoallelic histone modifications.Entities:
Mesh:
Year: 2014 PMID: 25237057 PMCID: PMC4227762 DOI: 10.1093/nar/gku847
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of bioinformatics pipeline to detect putative monoallelically methylated SNP loci starting from MethylCap-seq data. After mapping with BOWTIE the non-duplicate, uniquely mapped reads are screened for SNPs using dbSNP. To reduce the computational load SNP loci with a too high MAF and/or a too low overall coverage are filtered. In this reduced data set, an additional sequencing error correction was performed with two iterations. The corrected data was next put in the newly developed data-analytical framework with 1000 and 1 000 000 iterations, respectively. Only loci that obtained a P-value smaller than or equal to 0.005 after the first iteration were kept as input for the second iteration. If the P-value obtained for a locus was smaller than the P-value corresponding with an FDR of 0.1 the monoallelic methylation on this locus was called significant. After determining the functional annotation of these SNP positions an enrichment analysis was performed. Finally, the resulting loci were validated using both literature and WGBS data.
Monoallelic DNA-methylation per chromosome
| Chr | Input entries | ||
|---|---|---|---|
| 1 | 37 259 | 98 | 3 |
| 2 | 34 184 | 113 | 10 |
| 3 | 22 065 | 73 | 4 |
| 4 | 26 442 | 68 | 3 |
| 5 | 20 500 | 54 | 1 |
| 6 | 25 708 | 112 | 3 |
| 7 | 31 450 | 153 | 8 |
| 8 | 20 868 | 63 | 0 |
| 9 | 19 908 | 78 | 0 |
| 10 | 30 138 | 108 | 7 |
| 11 | 19 255 | 82 | 8 |
| 12 | 20 484 | 51 | 0 |
| 13 | 11 709 | 88 | 3 |
| 14 | 12 297 | 40 | 1 |
| 15 | 11 829 | 47 | 2 |
| 16 | 29 432 | 64 | 4 |
| 17 | 24 824 | 158 | 3 |
| 18 | 11 783 | 29 | 1 |
| 19 | 25 066 | 96 | 8 |
| 20 | 16 201 | 45 | 7 |
| 21 | 11 977 | 40 | 2 |
| 22 | 15 012 | 70 | 2 |
| X | 7699 | 27 | 0 |
| TOTAL | 486 090 | 1757 | 80 |
The first two columns show the specific chromosome (Chr) and the number of input entries for the statistical analysis. The third and fourth columns show the amount of loci, which obtained a P-value smaller than (or equal to) 0.005 (after first iteration) and 0.000016 (after second iteration, corresponding with FDR = 0.1), respectively.
Figure 2.Genomic distribution of the 80 loci for which the monoallelic DNA-methylation was called significant. Chromosomes are shown on a circular representation and divided in regions of 5 000 000 bp. The inner circle shows the histogram of all SNPs found in a specific region, whereas the outer circle shows the histograms of the significant SNPs in that same region, normalized to the number of SNPs found in that region.
SNPs featured by monoallelic methylation located in a gene-associated region
| GeneID | Gene symbol | Description | Biotype | Location |
|---|---|---|---|---|
| ENSG00000183929 | Dual specificity phosphatase 5 pseudogene | Pseudogene | 1:228757936 | |
| ENSG00000200624 | RNA, 5S ribosomal 6 | rRNA | 1:228757936 | |
| ENSG00000169604 | Anthrax toxin receptor 1 | Protein coding | 2:69347244 | |
| ENSG00000233786 | Cell division cycle 27 homolog (S.cerevisiae) pseudogene 1 | Pseudogene | 2:133018988,133020085 | |
| ENSG00000163975 | Antigen p97 (melanoma associated) | Protein coding | 3:196722009 | |
| ENSG00000184985 | Sortilin-related VPS10 domain containing receptor 2 | Protein coding | 4:7635629 | |
| ENSG00000138641 | Hect domain and RLD 3 | Protein coding | 4:89618837 | |
| ENSG00000177432 | Nucleosome assembly protein 1-like 5 | Protein coding | 4:89618837 | |
| ENSG00000087116 | ADAM metallopeptidase with thrombospondin type 1 motif, 2 | Protein coding | 5:178650557 | |
| ENSG00000145945 | Family with sequence similarity 50, member B | Protein coding | 6:3849305 | |
| ENSG00000238158 | Processed transcript | Processed transcript | 6:3849305 | |
| ENSG00000184465 | WD repeat domain 27 | Protein coding | 6:170055316 | |
| ENSG00000223838 | lncRNA | lncRNA | 7:19534519 | |
| ENSG00000155093 | Protein tyrosine phosphatase, receptor type, N polypeptide 2 | Protein coding | 7:158041459, 158041458, 157923845 | |
| ENSG00000075826 | SEC31 homolog B (S.cerevisiae) | Protein coding | 10:102279295,102279294 | |
| ENSG00000255339 | NADH dehydrogenase (ubiquinone) 1 beta subcomplex subunit 8, mitochondrial | Nonsense mediated decay | 10:102279295,102279294 | |
| ENSG00000166136 | NADH dehydrogenase (ubiquinone) 1 beta subcomplex 8, 19kDa | Protein coding | 10:102279295,102279294 | |
| ENSG00000053918 | Potassium voltage-gated channel, KGT-like subfamily, member 1 | Protein coding | 11:2721568 | |
| ENSG00000258492 | KCNQ1 opposite strand/antisense transcript 1 | Antisense | 11:2721568 | |
| ENSG00000211502 | microRNA 675 | miRNA | 11:2019496,2019618 | |
| ENSG00000130600 | H19, imprinted maternally expressed transcript | Processed transcript | 11:2021164, 2019496, 2019618, 2021206, 2021980, 2022023 | |
| ENSG00000102802 | Chromosome 13 open reading frame 33 | Protein coding | 13:31481030 | |
| ENSG00000226317 | Long intergenic non-protein coding RNA 351 | lncRNA | 13:85969909,85969941 | |
| ENSG00000258807 | lncRNA | lncRNA | 14:88237822 | |
| ENSG00000214265 | SNRPN upstream reading frame | Protein coding | 15:25201659 | |
| ENSG00000128739 | Small nuclear ribonucleoprotein polypeptide N | Protein coding | 15:25201659,25123472 | |
| ENSG00000122390 | N(alpha)-acetyltransferase 60, NatF catalytic subunit | Protein coding | 16:3493495 | |
| ENSG00000167981 | Zinc finger protein 597 | Protein coding | 16:3493495 | |
| ENSG00000175643 | RecQ mediated genome instability 2, homolog (S.cerevisiae) | Protein coding | 16:11415785 | |
| ENSG00000207986 | miRNA ncRNA | miRNA | 16:33960762 | |
| ENSG00000108684 | Amiloride-sensitive cation channel 1, neuronal | Protein coding | 17:31340444 | |
| ENSG00000074181 | Notch 3 | Protein coding | 19:15279411 | |
| ENSG00000251948 | miRNA ncRNA | miRNA | 19:24184564 | |
| ENSG00000198300 | Zinc finger, imprinted 2 | Protein coding | 19:57350463 | |
| ENSG00000259486 | Zinc finger, imprinted 2 | Protein coding | 19:57350463 | |
| ENSG00000130844 | Zinc finger protein 331 | Protein coding | 19:54057515, 54057777, 54041242, 54057156, 54040861 | |
| ENSG00000235590 | GNAS antisense RNA 1 | Antisense | 20:57427132, 57414110, 57426449, 57426726 | |
| ENSG00000087460 | GNAS complex locus | Protein coding | 20:57427132, 57414110, 57426449, 57426726, 57431165 | |
| ENSG00000160183 | Transmembrane protease, serine 3 | Protein coding | 21:40757887 | |
| ENSG00000182093 | Tryptophan rich basic protein | Protein coding | 21:40757887 | |
| ENSG00000183486 | Myxovirus (influenza virus) resistance 2 (mouse) | Protein coding | 21:44011806 | |
| ENSG00000100138 | NHP2 non-histone chromosome protein 2-like 1 (S.cerevisiae) | Protein coding | 22:42078666 | |
| ENSG00000219438 | Family with sequence similarity 19, member A5 | Protein coding | 22:49077801 |
Following parameters are indicated: Location (chromosome:location), (Ensembl) Gene ID, Gene symbol, Description and Biotype. *known imprinted gene; **predicted imprinted gene.
Outcome of the additional validation of the putative loci with 14 WGBS data sets
| Chr: SNP position | Global | Colon | Colon tumour | Cortex Normal1 | Cortex Normal2 | Cortex AD1 | Cortex AD2 | fFF | IMR90 | HepG2 | HSCP | bcell | dMesenchy | dNPC | dMEEN |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1:228757936 | 0 | 0.0195 | - | 0.387 | - | 0.2695 | 0 | 0 | 0 | 0 | 0 | 0.122 | 0 | 0 | 0 |
| 2:133018988 | 0.248 | 1 | 0.5405 | 0.525 | 1 | 1 | 0.1735 | - | - | - | - | - | 0.0685 | - | 0.482 |
| 2:133020085 | 0.0005 | 0.002 | 0.7665 | 0.7875 | 0.117 | 0.2655 | 1 | 1 | 1 | - | 0.029 | 0.124 | 0.0925 | - | 0.3705 |
| 2:207122438** | 0 | 0.1345 | 0.325 | 0.002 | 0.4405 | - | 0.0125 | 0.3495 | 0.1685 | 0.0015 | 0.5 | - | 0.03 | 0.155 | 0.0015 |
| 2:69347244 | 0.582 | - | - | 1 | - | 0.541 | - | - | - | - | - | - | - | - | - |
| 2:133033524 | 0.2535 | - | - | - | - | - | - | 0.28 | - | - | 0.6135 | - | - | - | - |
| 2:133029769 | 0.4335 | 1 | - | - | - | - | 1 | 0.175 | - | - | - | - | - | - | - |
| 2:133032580 | 0 | 0 | 0 | 0.7585 | 0.119 | 0.5695 | 0.034 | 0 | 0 | 0 | 0 | 0 | 0 | - | 0 |
| 3:162561619 | 0 | - | - | - | 0 | - | - | 0 | - | - | 0.0265 | - | - | - | 0.5525 |
| 4:7635629 | 0 | 0.0355 | - | 0.6355 | - | - | 0 | 0.004 | - | - | - | - | 0.00555 | - | 0.0005 |
| 4:89618837* | 0 | - | - | - | 0.0005 | 0 | 0.001 | 0.33 | 0 | - | - | 0.0035 | 0 | - | 0.0035 |
| 4:49099668 | 0 | 0.002 | 0.0015 | 0.034 | 0.0755 | 0.151 | 0 | 0 | 0.0595 | 0.0015 | 0.004 | 0.3295 | 0.089 | - | - |
| 5:178650557 | 0 | - | - | - | 0 | - | - | - | - | - | 0.3215 | - | 0.0045 | 0.226 | - |
| 6:170055316 | 0 | - | - | 0 | - | - | - | 0.002 | - | 0.2125 | 0.4965 | 0.566 | 0 | - | 0.2345 |
| 6:3849305* | 0 | 0 | 0 | 0 | 0.0065 | 0 | 0.0025 | 0.6925 | 0.48 | 0.0025 | - | 0 | 1 | 0.1505 | - |
| 6:168784228 | 0 | 0 | 0.04 | - | - | 0.0315 | 0 | 0.0005 | 0.0025 | 0 | - | - | 0.003 | - | - |
| 7:64895556 | 0.0265 | - | 0.38 | - | - | - | 1 | 0.0005 | - | - | - | - | - | 0.075 | - |
| 7:157923845 | 0.3995 | - | - | - | - | - | - | 0.3995 | - | - | - | - | - | - | - |
| 7:61080848 | 1 | - | - | - | - | - | - | - | - | - | 1 | 1 | - | - | - |
| 7:56437045 | 0 | - | - | 0 | - | - | - | 0 | - | - | - | - | - | 0.001 | 0.0775 |
| 7:19534519 | 0.307 | - | - | - | - | - | - | 1 | - | - | 0.3905 | - | - | - | 0.33 |
| 7:57554497 | 0.0205 | - | - | - | - | - | - | 0.0275 | - | - | 0.269 | - | - | - | 0.574 |
| 10:42800026 | 0.5405 | - | - | - | - | - | - | 1 | - | - | - | 1 | - | - | 0.27 |
| 11:2721568* | 0 | 0 | 0 | 0 | 0 | - | 0 | 0 | - | - | 0 | 0.1275 | - | - | - |
| 11:51579458 | 0.5285 | - | - | - | - | - | - | 0.5285 | - | - | - | - | - | - | - |
| 13:31481030 | 0.096 | 0.1905 | - | - | - | - | - | - | 0.174 | - | - | - | - | - | - |
| 14:88237822 | 0.17 | 0.184 | - | - | 0.2815 | 1 | - | 0.1725 | - | - | - | 0.0445 | - | - | 0.6605 |
| 15:25123472* | 0 | 0 | 0 | 0.0475 | 0.003 | 0 | 0 | 0.4215 | 0.314 | - | 0.0045 | 0 | - | - | - |
| 15:25201659* | 0 | - | - | 0 | 0 | - | - | 0.5625 | - | - | 0.4805 | 0.012 | 0 | 0 | 0 |
| 16:46411729 | 0 | 1 | 0.6475 | - | - | - | - | 0 | - | 0.546 | - | - | - | - | - |
| 16:3493495* | 0 | - | - | 0 | 0 | - | 0.0355 | 0 | 0 | - | 0.861 | 0.0095 | 0 | - | 0.001 |
| 16:11415785 | 0.002 | - | - | - | - | - | - | 0.0475 | 0 | - | - | - | - | - | - |
| 17:22252007 | 0.74 | 1 | - | 0.6095 | - | - | 0.627 | - | - | - | 0.6155 | - | - | - | - |
| 17:22259640 | 0.7215 | 0.8215 | 0.7795 | 0.3425 | 1 | 1 | 0.613 | 0.3785 | 0.6265 | - | 0.212 | 0.7165 | 1 | - | - |
| 18:18517029 | 0 | - | 0.0415 | - | 0.017 | 0.1795 | 0.0045 | 0 | - | - | 0 | 0.307 | - | - | 0 |
| 19:15279411 | 0 | - | - | 0.299 | 0.054 | 0 | 0.0005 | 0 | - | 0.568 | 0.839 | 0.0155 | 0 | - | 0.016 |
| 19:57350463* | 0 | 0 | 0.0005 | 0 | 0 | 0 | - | 0.004 | - | 0 | 0.2945 | 0.0095 | 0 | 0 | 0 |
| 19:24184564 | 0 | - | - | - | - | - | - | 1 | - | - | - | - | - | 0 | 0.0005 |
| 20:57415110* | 0 | - | - | 0.016 | 0 | - | 0 | 0 | 0.0095 | 0.4675 | - | 0 | 0 | - | 0 |
| 20:57431165* | 0.0625 | - | - | 0.5545 | - | - | - | 0.011 | 0.8245 | - | - | - | - | - | - |
| 21:44011806 | 0.0145 | - | - | - | - | - | - | 0.0145 | - | - | - | - | - | - | - |
| 21:40757887 | 0 | 0 | 0 | - | - | - | - | 0.0255 | - | - | 0 | - | 1 | - | - |
| 22:42078666 | 0 | 0 | 0.005 | - | 0 | - | - | - | - | - | - | - | 0 | - | 0.013 |
| 22:49077801 | 0.715 | - | - | - | - | - | - | 0.715 | - | - | - | - | - | - | - |
Global and sample-specific P-values are shown for the 44 SNP loci (Chr:SNP location) that were covered by at least one heterozygous WGBS sample. Value ‘-’ in the sample columns indicates that the sample did not cover or was not heterozygous for the corresponding SNP loci. *known imprinted genomic region; **predicted imprinted genomic region. Samples: colon adjacent normal (Colon), colon primary tumour (Colon tumour), mid frontal cortex normal (Cortex Normal1/2), mid frontal cortex Alzheimer (Cortex AD1/2), newborn foreskin fibroblasts (fFF), human foetal lung cell line (IMR90), human liver carcinoma cell line (HepG2), hematopoietic stem cell progenitors (HSCP), human B cells (bcell), H1-derived mesenchymal stem cells (dMesenchy), H1-derived neuronal progenitor cells (dNPC) and H1+BPM4-derived mesendoderm cells (dMEEN).
Figure 3.Pie charts representing the relative number of significant SNPs in the different functional classes. (A) Functional classification (i.e. promoter, exon, intron and intergenic) of the significant SNPs (i.e. loci with significant monoallelic DNA-methylation). (B) Functional classification of random SNPs resulting from 1000 iterations.
Validation of ASE
| GeneID | Gene symbol | # Tissues | Tissues | Type of expression |
|---|---|---|---|---|
| ENSG00000219438 | 1 | testis | ASE | |
| ENSG00000184985 | 1 | brain | ASE | |
| ENSG00000169604 | 1 | adipose | ASE | |
| ENSG00000255339 | 1 | heart | ASE | |
| ENSG00000166136 | 10 | adipose, adrenal, brain, breast, colon, heart, ovary, prostate, testis, thyroid | ASE | |
| ENSG00000130600 | 10 | adipose, adrenal, breast, colon, kidney, ovary, prostate, skeletal muscle, testis, thyroid | ASE | |
| ENSG00000175643 | 1 | testis | ASE | |
| 1 | ovary | BE | ||
| ENSG00000122390 | 1 | brain | ASE | |
| 10 | adipose, adrenal, breast, colon, heart, kidney, liver, lymph node, ovary, thyroid | BE | ||
| ENSG00000138641 | 1 | brain | ASE | |
| 4 | heart, leukocyte, ovary, prostate | BE | ||
| ENSG00000155093 | 1 | brain | ASE | |
| 1 | prostate | BE | ||
| ENSG00000182093 | 2 | brain, liver | ASE | |
| 7 | adrenal, heart, leukocyte, ovary, skeletal muscle, testis, thyroid | BE | ||
| ENSG00000198300 | 2 | brain, ovary | ASE | |
| 1 | testis | BE | ||
| ENSG00000183486 | 2 | adipose, leukocyte | ASE | |
| 4 | breast, ovary, testis, thyroid | BE | ||
| ENSG00000130844 | 2 | brain, ovary | ASE | |
| 1 | lung | BE | ||
| ENSG00000074181 | 4 | adipose, adrenal, breast, testis | ASE | |
| 6 | colon, heart, lymph node, ovary, skeletal muscle, thyroid | BE | ||
| ENSG00000214265 | 4 | brain, lymph node, prostate, testis | ASE | |
| 2 | heart, thyroid | BE | ||
| ENSG00000128739 | 7 | adrenal, colon, leukocyte, lymph node, ovary, prostate, testis | ASE | |
| 1 | brain | BE | ||
| ENSG00000100138 | 8 | adrenal, brain, heart, kidney, leukocyte, ovary, prostate, testis | ASE | |
| 7 | adipose, breast, colon, liver, lung, lymph node, thyroid | BE | ||
| ENSG00000087460 | 13 | adipose, adrenal, brain, breast, heart, kidney, leukocyte, lung, lymph node, ovary, prostate, testis, thyroid | ASE | |
| 1 | colon | BE | ||
| ENSG00000235590 | 1 | testis | BE | |
| ENSG00000087116 | 3 | adipose, breast, ovary | BE |
Results are shown for the 21 genes with one (or more) monoallelic methylated SNP(s) in their genic regions and reached the thresholds to investigate putative ASE. Six genes exclusively show ASE in one or multiple tissues, 13 genes have both ASE and biallelic expression (BE) in different tissues and 2 genes only show BE. Following columns are indicated: (Ensembl) Gene ID, Gene symbol, number and annotation of tissues for which ASE/BE could be examined (# Tissues and Tissues, respectively) and the Type of expression found for these tissues (ASE or BE). *known imprinted genomic region; **predicted imprinted genomic region.