Literature DB >> 31608947

PopTargs: a database for studying population evolutionary genetics of human microRNA target sites.

Andrea Hatlen1, Mohab Helmy1, Antonio Marco1.   

Abstract

There is an increasing interest in the study of polymorphic variants at gene regulatory motifs, including microRNA target sites. Understanding the effects of selective forces at specific microRNA target sites, together with other factors like expression levels or evolutionary conservation, requires the joint study of multiple datasets. We have compiled information from multiple sources and compared it with predicted microRNA target sites to build a comprehensive database for the study of microRNA targets in human populations. PopTargs is a web-based tool that allows the easy extraction of multiple datasets and the joint analyses of them, including allele frequencies, ancestral status, population differentiation statistics and site conservation. The user can also compare the allele frequency spectrum between two groups of target sites and conveniently produce plots. The database can be easily expanded as new data becomes available and the raw database as well as code for creating new custom-made databases is available for downloading. We also describe a few illustrative examples.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31608947      PMCID: PMC6790967          DOI: 10.1093/database/baz102

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


Introduction

As genome sequencing costs continue to decrease, the interest in population genetics increases. In particular, the analysis of variation at regulatory sites is becoming critical to understand how non-coding sequences emerge and evolve (1). MicroRNAs are important gene regulators that target gene transcripts by partial complementarity (2). The fact that their targets can be predicted from their primary sequence has been exploited to study the potential impact of single-nucleotide polymorphisms at their target sites. Indeed, a number of studies have reported selective pressures at these target sites by investigating the variation in populations (3–6). A number of databases for analyzing polymorphic microRNA target sites exists (e.g. (7)). However, these databases are designed to explore the functional and biomedical implications of single-nucleotide polymorphisms. Despite the interest in population genetics at microRNA target sites, there is currently not a dedicated platform to study evolutionary and population genetics at canonical microRNA target sites. Here we aim to fill this gap. We have developed a database which cross-links allele frequencies and other variables of evolutionary interest at predicted microRNA target sites, as well as expression and evolutionary conservation information from other sources, permitting the analysis of frequency spectrums and population differentiation at target sites.

Methods

Source of data

The human 3′UTRs were downloaded with BiomaRt (8) and the BiomaRt R package (9) from Ensembl database version 96 (human genome assembly GRCh38), and keeping only 3′UTRs from protein-coding transcripts. All mature human microRNAs were downloaded from miRBase version 22 (10). SNPs were also retrieved from the 1000 Genomes Project (11) as compiled in dbSNP Build 151 [Ensembl Variation 96] (12). Genes were classified as ‘over-‘ or ‘under-expressed’ by tissue according to the Bgee database, version 14.0 (13). MicroRNA tissue expression information was obtained from five RNA-Seq datasets from Meunier et al. (14) and 46 datasets cataloged in miRmine (15) (accession numbers are listed in Supplementary Table 1). The microRNA data was classified into four groups for analysis, based on their expression in each tissue: (i) zero RPM (reads per million), (ii) broad expression (>50 RPM), (iii) high expression (>500 RPM) and (iv) specifically expressed in one tissue (highly expressed compared to the other tissues: 1.5 times the interquartile range plus the upper quartile across tissues. Target and near-target (one nucleotide difference with a target) sites were found using seedVicious 1.1 (16), which predicts canonical target sites without filtering out for sequence conservation. Only SNP locations in which one allele was a target and another allele was a near-target were further considered. This important feature allows the study of target sites that are not in the reference genome, but that can be targets in some populations (see ‘Results and Discussion’ section).

Access and implementation

The database is built in MySQL, and it is freely accessible via a dedicated web portal at https://poptargs.essex.ac.uk/. The database provides three main options to explore microRNA target sites. First, users can search (Search tab) specific microRNAs or genes, or compare the allele frequencies between two lists of microRNAs or genes (User lists tab). The web form also gives the option to plot the allele frequencies side to side to a fast visual inspection of results. In the computation of these plots, only unique SNPs are used to avoid duplicated results, and P values from the two one-tailed Kolmogorov–Smirnov tests are provided for convenience. Alternatively, the users may browse the database (Browse tab) and select microRNAs with specific expression profiles and/or sequence conservation. This data can be retrieved for all or for specific human populations. The database also provides computations for target sites in the reverse complement strand to the transcript, which can be used as background distributions for statistical purposes. Finally, the user has the option to download the whole MySQL database (Downloads tab). Researchers can also create their own databases with custom sequences as we also provide the source code and full instructions at https://github.com/ash8/PopTargs.

Results and Discussion

The basic search function of PopTargs is the ‘Search’ form. Users can look for microRNAs or genes (Ensembl unique IDs) to find out potential polymorphic sites in which one of the targets is a target site. For each target site, the output reports the following features: (i) gene, transcript and SNP accession numbers, with links to the data source; (ii) SNP chromosome and position with a link to the UCSC Genome Browser (17); (iii) ancestral and target alleles, together with allele and derived allele frequencies; and (iv) PhyloP scores (average for the whole target site) as pre-computed in UCSC (18). In addition, for each microRNA, the database reports whether it is catalogued as ‘high-quality’ in miRBase and whether it exists in MirGeneDB (19) and if the mature sequence is the same or not between these two databases. Users can also provide lists of mature microRNAs’ names and gene names in the User lists form. Allele frequency distributions as generated from the PopTargs web server. The left panel shows the target allele frequency distribution for microRNAs highly expressed in testes (grey bars) and for microRNAs whose expression was not detected in testes (white bars). Likewise, the right panel shows the target allele frequency distribution of derived alleles, that is, where the ancestral allele is a non-target. The latter plot is also often called the site frequency spectrum. Target sites for testis-expressed microRNAs with a high degree of population differentiation The target allele frequencies are provided for East Asian (EAS), Mixed American (AMR), African (AFR), European (EUR) and South Asian (SAS) populations, as described in the 1000 genomes project (see ‘Methods’ section). As we considered near-targets (see above) during the database assembly, the user will also find target sites that are not in the reference genome, yet one of the alleles is associated with a target site. This feature can be exploited to detect putative target sites not present in the current reference genome sequence (see discussion at the end of this section). The table provides the population frequencies of the target allele and also reports which allele is ancestral to human populations. Lists of microRNAs of interest can be obtained from miRBase (10) but also from curated databases that may allow the filtering of microRNAs based on evolutionary conservation or other features (e.g. MirGeneDB (19)). The possibility of providing lists of both microRNAs and genes helps to narrow down the targets of interest when a specific subset of experimentally validated interactions (for instance, from TarBase (20) or miRTarBase (21)) is to be explored. The database also allows the possibility of plotting allele frequencies for the queried microRNA/gene interactions. In this case, one can plot the allele frequencies at target sites and compare it with the allele frequencies of either an alternative list of microRNAs or an alternative list of genes. This is particularly handy when visually exploring large amount of data (see below). To explore variation at target sites in pre-computed lists, the ‘Browse’ form allows to study microRNAs with different levels of expression, expression breath, evolutionary conservation and even sub-population structure. For instance, we recently reported that in human populations there is detectable selection against microRNA target sites (6). We can explore some specific cases with PopTargs. If we use the Browse option, we can compare target sites for microRNAs highly expressed in testis (for instance) versus microRNAs not detected in testis. PopTargs will produce an allele frequency and a derived allele frequency plot, showing that the frequency of the target allele is significantly lower for the targets of highly expressed microRNAs (Figure 1). This result suggests that when a target site for a testis microRNA randomly appears in a testis expressed gene, there will be selective pressures to remove this allele from the population.
Figure 1

Allele frequency distributions as generated from the PopTargs web server. The left panel shows the target allele frequency distribution for microRNAs highly expressed in testes (grey bars) and for microRNAs whose expression was not detected in testes (white bars). Likewise, the right panel shows the target allele frequency distribution of derived alleles, that is, where the ancestral allele is a non-target. The latter plot is also often called the site frequency spectrum.

We can download a full table with the results, which will contain allele and derived allele frequencies but also the target allele frequencies for different human populations and the estimated Fst (22). From the results produced, we can detect 12 unique segregating target:non-target allele pairs for microRNAs highly expressed in testis (Table 1) that have a high degree of population differentiation (Fst > 0.7). For instance, transcripts from the MTAP gene have a conserved target site for let-7a-5p, but this target site is not detected in the reference genome. Indeed, the loss of the ancestral target site happened in European populations while other human groups mostly maintain the target allele (dbSNP entry rs6912739, Table 1). This result illustrates how population dynamics can be used to detect target sites that are not in the reference genome and, therefore, escape most target prediction programs (23).
Table 1

Target sites for testis-expressed microRNAs with a high degree of population differentiation

MicroRNA Gene SNP Target is ancestral EAS AMR AFR EUR SAS All Fst
miR-202-5pATP1A1rs1885802yes0.04270.12680.81090.03380.04500.25590.7914
miR-130a-3pSLC30A9rs12511999yes0.04860.23630.91300.24950.22190.37730.7302
miR-513aTCERG1rs3822506no0.71830.11670.03250.09150.21980.23070.7224
miR-151a-3pMTAPrs12003714no0.99210.93800.20420.99500.88240.75670.8638
let-7a-5pMTAPrs7875199yes0.05360.05760.79580.00700.12880.25550.7997
miR-24-3pSCN2Brs624328no0.94050.89050.25190.95130.96730.76010.7674
miR-192-5pC12orf65rs1533703yes0.99800.72620.14900.76940.79450.65120.7074
miR-25/92a-3pPTK6rs186332no0.96430.87320.04690.92840.59300.63040.8487

The target allele frequencies are provided for East Asian (EAS), Mixed American (AMR), African (AFR), European (EUR) and South Asian (SAS) populations, as described in the 1000 genomes project (see ‘Methods’ section).

We provided all scripts used to generate the original database and full documentation such that interested users can generate their own database. As the number of available genome sequences increases, this feature can be of use to those interested in expanding the current database. Click here for additional data file.
  20 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions.

Authors:  Chih-Hung Chou; Sirjana Shrestha; Chi-Dung Yang; Nai-Wen Chang; Yu-Ling Lin; Kuang-Wen Liao; Wei-Chi Huang; Ting-Hsuan Sun; Siang-Jyun Tu; Wei-Hsiang Lee; Men-Yee Chiew; Chun-San Tai; Ting-Yen Wei; Tzi-Ren Tsai; Hsin-Tzu Huang; Chung-Yu Wang; Hsin-Yi Wu; Shu-Yi Ho; Pin-Rong Chen; Cheng-Hsun Chuang; Pei-Jung Hsieh; Yi-Shin Wu; Wen-Liang Chen; Meng-Ju Li; Yu-Chun Wu; Xin-Yi Huang; Fung Ling Ng; Waradee Buddhakosai; Pei-Chun Huang; Kuan-Chun Lan; Chia-Yen Huang; Shun-Long Weng; Yeong-Nan Cheng; Chao Liang; Wen-Lian Hsu; Hsien-Da Huang
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

3.  A map of human genome variation from population-scale sequencing.

Authors:  Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal:  Nature       Date:  2010-10-28       Impact factor: 49.962

4.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser.

Authors:  Webb Miller; Kate Rosenbloom; Ross C Hardison; Minmei Hou; James Taylor; Brian Raney; Richard Burhans; David C King; Robert Baertsch; Daniel Blankenberg; Sergei L Kosakovsky Pond; Anton Nekrutenko; Belinda Giardine; Robert S Harris; Svitlana Tyekucheva; Mark Diekhans; Thomas H Pringle; William J Murphy; Arthur Lesk; George M Weinstock; Kerstin Lindblad-Toh; Richard A Gibbs; Eric S Lander; Adam Siepel; David Haussler; W James Kent
Journal:  Genome Res       Date:  2007-11-05       Impact factor: 9.043

5.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt.

Authors:  Steffen Durinck; Paul T Spellman; Ewan Birney; Wolfgang Huber
Journal:  Nat Protoc       Date:  2009-07-23       Impact factor: 13.491

6.  Human polymorphism at microRNAs and microRNA target sites.

Authors:  Matthew A Saunders; Han Liang; Wen-Hsiung Li
Journal:  Proc Natl Acad Sci U S A       Date:  2007-02-20       Impact factor: 11.205

7.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

8.  miRBase: annotating high confidence microRNAs using deep sequencing data.

Authors:  Ana Kozomara; Sam Griffiths-Jones
Journal:  Nucleic Acids Res       Date:  2013-11-25       Impact factor: 16.971

9.  PolymiRTS Database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways.

Authors:  Anindya Bhattacharya; Jesse D Ziebarth; Yan Cui
Journal:  Nucleic Acids Res       Date:  2013-10-24       Impact factor: 16.971

10.  1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans.

Authors:  Marc Pybus; Giovanni M Dall'Olio; Pierre Luisi; Manu Uzkudun; Angel Carreño-Torres; Pavlos Pavlidis; Hafid Laayouni; Jaume Bertranpetit; Johannes Engelken
Journal:  Nucleic Acids Res       Date:  2013-11-25       Impact factor: 16.971

View more
  1 in total

1.  Pervasive Selection against MicroRNA Target Sites in Human Populations.

Authors:  Andrea Hatlen; Antonio Marco
Journal:  Mol Biol Evol       Date:  2020-12-16       Impact factor: 16.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.