| Literature DB >> 30902979 |
Ei-Wen Yang1, Jae Hoon Bahn1, Esther Yun-Hua Hsiao1,2, Boon Xin Tan1, Yiwei Sun1, Ting Fu1,3, Bo Zhou4, Eric L Van Nostrand5,6, Gabriel A Pratt5,6, Peter Freese7, Xintao Wei8, Giovanni Quinones-Valdez2, Alexander E Urban4, Brenton R Graveley8, Christopher B Burge7, Gene W Yeo5,6,9,10, Xinshu Xiao11,12,13,14.
Abstract
Allele-specific protein-RNA binding is an essential aspect that may reveal functional genetic variants (GVs) mediating post-transcriptional regulation. Recently, genome-wide detection of in vivo binding of RNA-binding proteins is greatly facilitated by the enhanced crosslinking and immunoprecipitation (eCLIP) method. We developed a new computational approach, called BEAPR, to identify allele-specific binding (ASB) events in eCLIP-Seq data. BEAPR takes into account crosslinking-induced sequence propensity and variations between replicated experiments. Using simulated and actual data, we show that BEAPR largely outperforms often-used count analysis methods. Importantly, BEAPR overcomes the inherent overdispersion problem of these methods. Complemented by experimental validations, we demonstrate that the application of BEAPR to ENCODE eCLIP-Seq data of 154 proteins helps to predict functional GVs that alter splicing or mRNA abundance. Moreover, many GVs with ASB patterns have known disease relevance. Overall, BEAPR is an effective method that helps to address the outstanding challenge of functional interpretation of GVs.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30902979 PMCID: PMC6430814 DOI: 10.1038/s41467-019-09292-w
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview of BEAPR (Binding Estimation of Allele-specific Protein–RNA interaction) and its performance. a Overall work flow of BEAPR. See Methods for details. b Crosslinking-induced bias in the SMInput sample of RBFOX2 (HepG2 cells). Y-axis shows the relative % of each nucleotide observed at each position relative to the crosslinking site (x = 0). c The square of the coefficient of variation (CV2) plotted as a function of the observed allelic read counts (mean of the two replicates) in the RBFOX2 enhanced crosslinking and immunoprecipitation (eCLIP) data in HepG2 cells (see Methods). d Performance comparison of three methods using simulated data (with simulated crosslinking-induced biases) and true allelic ratio of 0.8 for allele-specific binding (ASB). Data derived from 1000 simulation experiments each encompassing 5000 single-nucleotide variants (SNVs) (10% of which being ASB). FET: Fisher’s exact test; CHI: χ2 test; AUC: area under the curve of the precision-recall curve. SEN95: sensitivity at 95% specificity; SPE95: specificity at 95% sensitivity. e Percentage of ASB events among all tested SNVs by the three methods using simulated data as in d. The x-axis shows different read coverage bins (using average read coverage of each SNV in two simulated replicates). The red dashed line corresponds to the 10% value, that is, the percentage of true ASB events in the simulation. f Box plots of p values calculated by the three methods at different levels of read coverage. Boxplot center lines indicate the median and the boxes extend to lower and upper quartiles with whiskers depicting 1.5 interquartile range (IQR). The discrete points are the outliers. (Source data are provided as a Source Data file)
Fig. 2Allele-specific binding (ASB) events identified in the enhanced crosslinking and immunoprecipitation (eCLIP) data of HepG2 and K562 cells. a Number of heterozygous single-nucleotide variants (SNVs) identified via whole-genome DNA sequencing or eCLIP. Those that are common to both methods are illustrated. b Percentage of ASB events among all testable SNVs in each read coverage bin. The average read coverage of each SNV in the two replicated eCLIP experiments is shown here. A minimum read coverage of 10 was required. c Number of ASB events identified for each RNA-binding protein (RBP) in HepG2 and K562 cells. Only RBPs with ≥50 ASB events are shown. The number of usable eCLIP-Seq reads (in millions (M)) is shown for each RBP. d Number of ASB SNVs associated with one or more than one RBPs. e The overlap of ASB SNVs between HepG2 and K562. Only heterozygous SNVs common to the two cell lines are included. P value was calculated by Fisher’s exact test. (Source data are provided as a Source Data file)
Fig. 3Bioinformatic and experimental validation of allele-specific binding (ASB). a–f Enrichment of RNA Bind-n-Seq (RBNS) motifs in the regions around ASB single-nucleotide variants (SNVs) (x = 0) of each RNA-binding protein (RBP) (upper panel). Y-axis shows fold change in the enrichment relative to randomly chosen control regions (Methods). Ten sets of controls were constructed, with the regression curve and 95% confidence interval of the average fold change shown in the panel. RBNS motifs used in the analysis are shown (middle panel). The relative frequency of the ASB SNVs overlapping each motif position is shown in the bar graph (lower panel). g Electrophoretic mobility shift assays (EMSA) results of PTBP1 binding to its ASB targets. Alternative alleles of the ASB SNVs were synthesized, as labeled above the gel images. Read count (sum of two replicates) for each allele in the enhanced crosslinking and immunoprecipitation (eCLIP) data of PTBP1 (HepG2 cells) is shown. (Images are cropped, with uncropped images in Supplementary Figure 16.) The sequences of the synthetic RNA fragments are shown below each gel image, where the ASB SNV is highlighted in red. The arrow indicates RNA–protein complex. Increasing concentrations of PTBP1 were used in different lanes of the gel image (from left to right: 0, 0.6, 1.2, 2.5, and 5 μg). (Source data are provided as a Source Data file)
Fig. 4Functional relevance of allele-specific binding (ASB) single-nucleotide variants (SNVs) in splicing regulation. a Distance of intronic ASB SNVs to the nearest splice sites. Controls consist of randomly chosen SNVs in the same introns. A total of 100 sets of controls were constructed, with the average and standard deviation shown in the plot. b Absolute change in the percent spliced-in (PSI) values of exons associated with ASB events of splicing factors upon knockdown of the respective splicing factor in HepG2 or K562 cells. Controls were random intronic SNVs in the same introns. P value was calculated by the Kolmogorov–Smirnov test. c Overlap between ASB SNVs of splicing factors and heterozygous SNVs associated with genetically modulated alternative splicing (GMAS) events in the genes harboring ASB SNVs. P values were calculated by the hypergeometric test (see Methods). d Fraction of ASB SNVs located in GTEx splicing quantitative trait loci (sQTL) exons or within 500 nucleotide (nt) in their flanking introns among the union of ASB SNVs of all splicing factors in the HepG2 or K562 data. This fraction was calculated for each sample individually, with the distribution of all samples in a tissue shown in the box plots. Control: fraction of randomly chosen heterozygous SNVs within genes in the above regions in each sample. e Splicing reporter validation of the function of ASB events. Three exon skipping events and two intron retention events are included. The gene names with the ASB events, the associated RNA-binding proteins (RBPs), RNA alleles of the ASB SNV, and their read counts in enhanced crosslinking and immunoprecipitation (eCLIP) are shown. (Images are cropped, with uncropped images in Supplementary Figure 16.) The red arrows in the exon–intron diagrams indicate positions of PCR primers. Inclusion level (three biological replicates) of the exon or intron is shown below each gel image. P values were calculated by Student’s t-test. Note that the IL17RB minigene had alternative splice sites in the intron, which led to the extra bands (black arrows). Boxplot center lines indicate the median and the boxes extend to lower and upper quartiles with whiskers depicting 1.5 interquartile range (IQR). (Source data are provided as a Source Data file)
Fig. 5Functional relevance of allele-specific binding (ASB) single-nucleotide variants (SNVs) in regulating messenger RNA (mRNA) abundance. a Fraction of differentially expressed genes (up- or down-regulated, false discovery rate (FDR) <10%) upon UPF1 knockdown in HepG2 or K562 cells. n.s.: not significant. Data for genes with ASB SNVs of UPF1 in their 3′-untranslated regions (3′-UTRs) and control genes are shown, where the controls were chosen as genes without UPF1 enhanced crosslinking and immunoprecipitation (eCLIP) peaks and with similar expression levels as UPF1 targets (within +/−30% of RPKM). P values were calculated to test the null hypothesis that UPF1 ASB target genes are not enriched with up-regulated expression upon UPF1 knockdown (KD), compared to controls (binomial test). b Fraction of ASB SNVs located in expression quantitative trait loci (eQTL) genes among the union of ASB SNVs of all RNA-binding proteins (RBPs) of each cell line. eQTL genes were extracted from the The Cancer Genome Atlas (TCGA) project for liver hepatocellular carcinoma (LIHC) and acute myeloid leukemia (LAML), respectively, to match the cell type of HepG2 and K562. This fraction was calculated for each sample individually, with the distribution of all samples shown in the box plots. Control: fraction of control SNVs located in eQTL genes where control SNVs were randomly chosen from heterozygous SNVs located within genes. P values were calculated by pair-wise t-test. c Genomic context of ASB SNVs that are heterozygous in TCGA samples and located in the eQTL genes. Exon: coding exons; no non-coding transcripts existed among eQTL genes included in this analysis. d Similar as (b), for eQTL genes in GTEx tissues. e Expression of minigenes carrying alternative alleles of ASB SNVs in the 3′-UTR of mCherry. mCherry expression measured via real-time quantitative reverse transcription PCR (qRT-PCR) was normalized by that of eYFP (driven by bi-directional promoters). Three biological replicates were analyzed. P values were calculated by Student’s t-test. Box plots (top) show the normalized gene expression values of the host genes in selected GTEx tissues, grouped by genotypes of the ASB SNV (coordinates shown above box plots). The expression values and eQTL p values were obtained from the GTEx portal[33]. Boxplot center lines indicate the median and the boxes extend to lower and upper quartiles with whiskers depicting 1.5 interquartile range (IQR). (Source data are provided as a Source Data file)
Fig. 6Allele-specific binding (ASB) events inform functional interpretation of disease-associated variants. a Numbers of ASB single-nucleotide variants (SNVs) that are also disease-related SNVs annotated by different databases. b Genomic context of genome-wide association study (GWAS) single-nucleotide polymorphisms (SNPs) (stacked bars) located in the same or different genes as ASB SNVs. Most GWAS SNPs are located in introns, whose function was elusive. NC exon: exons in non-coding transcripts. Splicing-related: those located in splice site signals. c Splicing reporter validation of two ASB SNVs, similar as Fig. 4e. (Images are cropped, with uncropped images in Supplementary Figure 16.) d Minigene reporter validation of one SNP for its function in modulating messenger RNA (mRNA) abundance, similar as Fig. 5e. Boxplot center lines indicate the median and the boxes extend to lower and upper quartiles with whiskers depicting 1.5 inter-quartile range (IQR). (Source data are provided as a Source Data file)