| Literature DB >> 35249174 |
Hafid Laayouni1,2, Yolanda Espinosa-Parrilla3,4,5, Pablo Villegas-Mirón6, Alicia Gallego7, Jaume Bertranpetit6.
Abstract
The occurrence of natural variation in human microRNAs has been the focus of numerous studies during the last 20 years. Most of them have been focused on the role of specific mutations in disease, while a minor proportion seek to analyse microRNA diversity in the genomes of human populations. We analyse the latest human microRNA annotations in the light of the most updated catalogue of genetic variation provided by the 1000 Genomes Project. By means of the in silico analysis of microRNA genetic variation we show that the level of evolutionary constraint of these sequences is governed by the interplay of different factors, like their evolutionary age or genomic location. The role of mutations in the shaping of microRNA-driven regulatory interactions is emphasized with the acknowledgement that, while the whole microRNA sequence is highly conserved, the seed region shows a pattern of higher genetic diversity that appears to be caused by the dramatic frequency shifts of a fraction of human microRNAs. We highlight the participation of these microRNAs in population-specific processes by identifying that not only the seed, but also the loop, are particularly differentiated regions among human populations. The quantitative computational comparison of signatures of population differentiation showed that candidate microRNAs with the largest differences are enriched in variants implicated in gene expression levels (eQTLs), selective sweeps and pathological processes. We explore the implication of these evolutionary-driven microRNAs and their SNPs in human diseases, such as different types of cancer, and discuss their role in population-specific disease risk.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35249174 PMCID: PMC9522702 DOI: 10.1007/s00439-021-02423-8
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 5.881
Fig. 1Description of human miRNAs in terms of genomic context, evolutionary age groups, expression levels and clustering. a Description of the miRNA hairpin regions identified and analysed in the study. Not all the primary sequences present two mature sequences annotated by miRbase. When the two mature sequences are not given (incomplete annotation), the precursor region is extended from the first mature to the other flanking (Flank) region. b TE-derived miRNA frequencies across conservation groups (Primates, 1; Eutherians, 2; Metatheria and Prototheria, 3; Conserved beyond mammals, 4). c Integrated hosting of miRNAs showing the combination of the different hosting elements that overlap with miRNA sequences. The “Others” group is made with the minor categories (PC + LNC and PC + LNC + TE) that represent less than 1% of the total dataset (Supplementary Table S2). d Number of tissues where the miRNA is expressed across evolutionary ages. e Mean expression level on reads per million (RPM) of miRNAs across evolutionary ages. f Whole genome clustering patterns of miRNAs. The upper plot represents the frequency of miRNAs that belong to a certain cluster in each chromosome (Members) and the frequency of clusters in the whole genome (Clusters). The lower plot represents the miRNA clusters per chromosome, according to the number of members and their frequency among the clustered miRNAs. g Fraction of clustered and isolated miRNAs across evolutionary ages. Intg (Intergenic), LCN (long non-coding RNA), TE (transposable element), PC (protein-coding)
Fig. 2Nucleotide diversity differences between miRNAs in different annotation categories and functional regions. a Differences between the genomic contexts where the human miRNAs are found. Wilcoxon pairwise comparisons (Bonferroni corrected) show that TEs present a significantly higher diversity than other environments (TE vs LNC, p = 0.022; TE vs Intg, p = 0.022. b Differences across miRNA conservation groups. Primate-specific miRNAs (group 1) show a significantly higher diversity in comparison with the others (1 vs 2, p = 0.00057; 1 vs 3, p = 0.0178; 1 vs 4, p = 3.93e.10; Wilcoxon pairwise comparisons, Bonferroni corrected). Significant differences are also seen for the miRNAs conserved beyond mammals (group 4) (4 vs 3, p = 0.0178; 4 vs 2, p = 2.6e−05; Wilcoxon pairwise comparisons, Bonferroni corrected). c Differences between miRNAs found isolated and organised in clusters. Isolated miRNAs are associated with a significantly higher diversity than the members of clusters (Wilcoxon pairwise comparisons, p = 3.663e−10). d Diversity comparison between the different functional regions identified in the miRNA hairpins. Mean values (right axis) are indicated by a coloured diamond. The seed region (2–8 nucleotides) presents a significantly higher diversity than other conserved regions (seed vs loop, p = 0.0011 and seed vs mat, p = 0.0056; Wilcoxon pairwise comparisons, Bonferroni corrected). e SNP density per functional region calculated in the whole miRNA dataset. Mean values (right axis) are indicated by a colored diamond. f Mean nucleotide diversity of the miRNA functional regions across the SNP MAF range. g Mean nucleotide diversity calculated in each relative position of the precursor miRNA. The zoomed region corresponds to the diversity per position found in the mature sequence. Intg (Intergenic), LCN (long non-coding RNA), TE (transposable element), PC (protein-coding), flank (flanking region), pre (precursor), mat (mature)
Fig. 3Analysis of Fst values across miRNA regions and candidates. a Mean Fst values per miRNA region across all population comparison groups. The Fst values were calculated in all the variant regions. b Combined Annotation Dependent Depletion (CADD) scores distributions, as a measure of the predicted level of deleteriousness of the variants, across miRNA regions. c Manhattan plot showing the mean Fst values per miRNA mature sequence in the three comparisons of reference. Two Fst thresholds were used to extract the potential miRNA candidates under positive selection (1% and 5%). d Heatmap showing the per-SNP Fst values of the variants found in the mature outside seed (14) and seed (10) regions of the top 5% miRNA candidates, where the columns correspond to SNPs and rows to all 243 possible population comparisons
Top 5% miRNA candidates under putative positive selection
| Chr | Mature ID | Mature SNP | Seed SNP | Evolutionary Age | Genomic Context | Max | Max | Max | CADD | Disease association |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | hsa-miR-4781-3p | – | rs74085143 | Primate | PC;TE | 0.21 | –/1.51 | –/1.28 | –/7.85 | PD1, AD2 |
| 2 | hsa-miR-6071 | rs56790095 | – | Primate | PC;TE | 0.21 | 0.67/– | 0.37/– | 5.64/– | GB3, CRC4,5 |
| 2 | hsa-miR-6811-3p | rs2292879 | – | Primate | PC;TE | 0.26 | 2.73/– | 1.73/– | 2.71/– | – |
| 3 | hsa-miR-6826-5p | rs115693266 | rs6771809 | Primate | PC | 0.27 | 0.22/1.80 | 0.88/2.22 | 2.92/1.01 | CRC6, BC7 |
| 4 | hsa-miR-1269a | rs73239138 | – | Primate | TE | 0.22 | 1.84/– | 2.12/– | 0.70/– | GC8, HC9,10,16, CC11, BC12, LC13,14, CRC15 |
| 6 | hsa-miR-10524-5p | – | rs77651740 | Non-classified | TE | 0.30 | –/1.69 | –/1.40 | –/NA | – |
| 8 | hsa-miR-1322 | rs59878596 | – | Non-classified | PC | 0.23 | 1.33/– | 2.25/– | NA/– | HC17, ESC18 |
| 8 | hsa-miR-4472 | – | rs28655823 | Primate | Intg | 0.36 | –/2.02 | –/1.38 | –/2.87 | BC19,20,21, PC21, CC21 |
| 8 | hsa-miR-8084 | rs404337 | – | Non-classified | TE | 0.27 | 1.45/– | 1.89/– | NA/– | BC22, OC23 |
| 10 | hsa-miR-938 | – | rs12416605 | Primate | PC | 0.21 | –/0.93 | –/1.78 | –/7.68 | GC24,25 |
| 11 | hsa-miR-1304-3p | rs2155248 | – | Primate | PC;TE | 0.23 | 1.72/– | 1.13/– | 4.44/– | GC26, HC27, HNC28, EM29,LC30 |
| 12 | hsa-miR-196a-3p | rs11614913 | – | Eutherians | PC | 0.24 | 1.74/– | 1.21/– | 18.77/– | LC31,38, HC31,33, HNC31, GM32, OC33, BC33,35,37,81, DM134, CAD36, CRC76, GC77,78,79,80 |
| 14 | hsa-miR-412-3p | rs61992671 | – | Eutherians | LNC | 0.43 | 1.44/– | 1.33/– | 15.52/– | OS39, CC40 |
| 14 | hsa-miR-4707-3p | – | rs2273626 | Eutherians | PC | 0.57 | –/2.33 | –/1.22 | –/10.85 | POAG41, ESC42 |
| 15 | hsa-miR-4513 | – | rs2168518 | Meta/Prototheria | PC | 0.59 | –/2.09 | –/1.40 | –/5.01 | CAD43,46, LC44,45, GC47, BC48, OSCC49 |
| 17 | hsa-miR-548 h-5p | rs9913045 | – | Primate | PC;TE | 0.24 | 2.56/– | 3.28/– | 1.31/– | GM50 |
| 17 | hsa-miR-1269b | rs12451747 | rs7210937 | Primate | PC;TE | 0.33 | 1.67/1.11 | 2.23/1.27 | 0.31/0.39 | OPSCC51, LC52 |
| 17 | hsa-miR-4739 | rs73410309 | – | Primate | LNC;TE | 0.37 | 2.01/– | 3.08/– | 12.73/– | PF53, PC54, DM155,56, GC57, AML58 |
| 18 | hsa-miR-4741 | – | rs7227168 | Eutherians | PC | 0.27 | –/3.37 | –/1.74 | –/13.31 | MY59, HC60, CRC61, CC61 |
| 19 | hsa-miR-6796-3p | – | rs3745198 | Primate | PC | 0.35 | –/2.39 | –/1.63 | –/3.67 | UR62 |
| 20 | hsa-miR-646 | rs6513497 | – | Primate | LNC;TE | 0.22 | 2.99/– | 3.06/– | 6.33/– | GC63,68, HC64, LAC65, LC66,71, BC67, CRC69, RC70, OS72 |
| 22 | hsa-miR-3928-5p | rs5997893 | – | Non-classified | TE | 0.23 | 1.36/– | 2.01/– | NA/– | HD73, HNC74, OS75 |
The Max. Fst value represents the maximum mean Fst of the mature sequence among the three comparisons of reference. The selection test values (iHS and nSL) correspond to the population that exhibit the maximum value of the mature SNP (left) and seed SNP (right). The CADD column provides the predicted deleteriousness scores of the mature SNP (left) and seed SNP (right). Disease association for most of the candidates are indicated in the disease column and some examples are described in the main text: PD Parkinson disease, AD Alzheimer’s disease, GB glioblastoma, CRC colorectal cancer, ESC esophageal squamous cell carcinoma, BC breast cancer, GC gastric cancer, HC hepatocellular carcinoma, CC colon cancer, HNC head and neck squamous cell carcinoma, EM endometriosis, LC lung cancer, POAG open-angle glaucoma, ESC esophageal squamous cell carcinoma, GM glioma, OC ovarian cancer, DM1 type 1 diabetes mellitus, CAD coronary artery disease, OSCC oral squamous cell carcinoma, OPSCC oral and pharyngeal squamous carcinoma, PF pleural fibrosis, PC prostate cancer, AML acute myeloid leukemia, MY myeloma, UR urolithiasis, LAC laryngeal carcinoma, RC renal carcinoma, OS osteosarcoma, HD Huntington disease. (1) Beecham et al. 2015, (2) Satoh et al. 2015, (3) Zhou et al. 2020, (4,5) Slattery et al. 2018a, b, (6) Kijima et al. 2017, (7) Danková et al. 2020, (8) Li et al. 2017, (9) Min et al. 2017, (10) Xiong et al. 2015, (11) Mao et al. 2017, (12) Sarabandi et al. 2021, (13) Jin et al. 2018, (14) Wang et al. 2020a, b, c, d, (15) Bu et al. 2015, (16) Wang et al. 2019a, b, (17) Zhao et al. 2020, (18) Zhang et al. 2013, (19) Li et al. 2020, (20) Wang et al. 2018, (21) Kim et al. 2012, (22) Gao et al. 2018, (23) Chong et al. 2015, (24) Torruella‐Loran et al. 2019, (25) Arisawa et al. 2012, (26) Kurata and Lin 2018, (27) Oura et al. 2019, (28) Petronacci et al. 2020, (29) Xu et al. 2017, (30) Othman et al. 2013, (31) Liu et al. 2018, (32) Yang et al. 2020a, b, (33) Choupani et al. 2019, (34) Ibrahim et al. 2019, (35) Ahmad and Shah 2020, (36) Fragoso et al. 2019, (37) Zhao et al. 2016, (38) Wang et al. 2017, (39) Martin-Guerrero et al. 2018, (40) Zhu et al. 2020a, b, (41) Ghanbari, et al. 2017a, b, (42) Bi et al. 2020, (43) Mir et al. 2019, (44) Ghanbari M et al. 2014, (45) Ghanbari M et al. 2017, (46) Li et al. 2015, (47) Ding et al. 2019, (48) Li et al. 2019, (49 Xu et al. 2019, (50) Ji et al. 2020, (51) Chen et al. 2016, (52) Yang et al. 2020a, b, (53) Wang et al. 2019a, b, (54) Wang et al. 2020a, b, c, d, (55) Delić et al. 2016, (56) Li et al. 2018, (57) Dong et al. 2015, (58) Cattaneo et al. 2015, (59) Zhang et al. 2019, (60) Liu et al. 2019, (61) Cojocneanu et al. 2020, (62) Liang et al. 2019, (63) Cai et al. 2016, (64) Wang et al. 2014, (65) Yuan et al. 2020, (66) Wang et al. 2020a, b, c, d, (67) Darvishi et al. 2020, (68) Zhang et al. 2017, (69) Dai et al. 2017, (70) Li et al. 2014, (71) Pan et al. 2016, (72) Sun et al. 2015, (73) Reed et al. 2018, (74) Fadhil et al. 2020, (75) Xu et al. 2014, (76) Yan et al. 2017, (77) Ni et al. 2015, (78) Yan et al. 2017, (79) Peng et al., 2010, (80) Wang et al 2013, (81) Qi et al. 2015
Target Scan Human predicted target genes for the seed-variant miRNA candidates
| Mature ID | SNP | AA | DA | Targets (AA) | Targets (DA) | Overlapping targets | Cosine similarity |
|---|---|---|---|---|---|---|---|
| hsa-miR-938 | rs12416605 | C | T | 2678 | 2594 | 573 | 0.22 |
| hsa-miR-4472 | rs28655823 | G | C | 3257 | 835 | 322 | 0.19 |
| hsa-miR-4513 | rs2168518 | G | A | 2532 | 2693 | 2118 | 0.81 |
| hsa-miR-1269b | rs7210937 | G | C | 2437 | 3167 | 626 | 0.23 |
| hsa-miR-4707-3p | rs2273626 | C | A | 1167 | 2592 | 356 | 0.20 |
| hsa-miR-4741 | rs7227168 | C | T | 3665 | 2231 | 676 | 0.23 |
| hsa-miR-4781-3p | rs74085143 | A | G | 2339 | 2724 | 558 | 0.22 |
| hsa-miR-6796-3p | rs3745198 | C | G | 2331 | 2855 | 484 | 0.19 |
| hsa-miR-6826-5p | rs6771809 | C | T | 3191 | 2032 | 517 | 0.20 |
| hsa-miR-10524-5p | rs77651740 | G | T | 2853 | 3332 | 2234 | 0.72 |
Two sets of target genes were predicted for each candidate holding both ancestral (AA) and derived alleles (DA). The overlap between these two lists of target genes is provided and the similarity is estimated with the cosine similarity
Fig. 4Analysis of signatures of positive selection in the candidate SNP rs2273626. a World wide Minimum Allele Frequency (MAF) distribution of rs2273626. b Extended haplotype homozygosity (EHH) decay in both ancestral and derived alleles of rs2273626 (upper plot) and haplotype patterns around the ancestral and derived alleles (bottom plot) in Utah Europeans (CEU), Han Chinese (CHB) and Peruvian (PEL) population