Literature DB >> 28213901

High-Throughput Assays to Assess the Functional Impact of Genetic Variants: A Road Towards Genomic-Driven Medicine.

J Ipe¹, M Swart¹, K S Burgess^1,2, T C Skaar¹.

Abstract

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Genetic Markers

Year: 2017 PMID： 28213901 PMCID： PMC5355973 DOI： 10.1111/cts.12440

Source DB: PubMed Journal: Clin Transl Sci ISSN： 1752-8054 Impact factor: 4.689

× No keyword cloud information.

INTRODUCTION

Genome‐wide genotyping and DNA sequencing has led to the identification of large numbers of genetic variants that are associated with many clinical phenotypes. The functional impacts of most of the variants are unknown. In this article, we review high‐throughput assays that have been developed to assess a variety of the functional impacts of the variants. A better understanding of their functions should facilitate the implementation of many more variants in genomic‐driven medicine. A cornerstone of precision medicine is the incorporation of genetic information into healthcare decisions. This approach relies on understanding the genome complexity, the genetic differences that exist between individuals, and the functional consequences of the genetic variants. In the personal genome era, improvements in sequencing technologies are leading to continuous identification of new variants and further illustrating the complexity of the human genome and the genetic diversity between populations.

Genomic variants across individuals

Large‐scale high‐throughput sequencing studies, such as the 1000 Genomes and NHLBI GO Exome Sequencing Projects, have already identified millions of genetic variants among individuals from different populations and have established a comprehensive resource on human genetic variation.1, 2 The genetic variants are cataloged in public databases, such as dbSNP (https://www.ncbi.nlm.nih.gov/snp/) and dbVAR (https://www.ncbi.nlm.nih.gov/dbvar/) (Table 1). The current build of dbSNP (build 147, updated on 14 April 2016) contains ∼154 million single nucleotide variants (SNVs) of which about 101 million have been validated and nearly 89 million are within genes. dbVAR (updated on 28 September 2016) contains ∼5 million structural variants and ∼2.3 million, 1.3 million, and 1.2 million of these variants are contributed by copy number variants, short tandem repeats, and insertions, respectively.

Table 1

Databases that catalog human genetic variation and phenotypic relationships

Databases	Description	Link
1000 Genomes Project	Comprehensive catalog of human genetic variation	http://browser.1000genomes.org/index.html
ClinVar	Information about genomic variation and its relationship to human health	https://www.ncbi.nlm.nih.gov/clinvar/
Catalog of somatic mutations in cancer (COSMIC)	Comprehensive resource for exploring the impact of somatic mutations in human cancer	http://cancer.sanger.ac.uk/cosmic
DBASS	Database of new exon boundaries induced by pathogenic mutations in human disease genes	http://www.dbass.org.uk
dbGaP	NCBI's database of genotypes and phenotypes	https://www.ncbi.nlm.nih.gov/gap
dbSNP	NCBI's database of single nucleotide polymorphisms (SNPs) and multiple small‐scale variations (including insertions/deletions, microsatellites and nonpolymorphic variants)	https://www.ncbi.nlm.nih.gov/snp
dbVar	NCBI's database of genomic structural variation	https://www.ncbi.nlm.nih.gov/dbvar
DECIPHER	Web‐based database incorporating a suite of tools designed to aid the interpretation of genomic variants	https://decipher.sanger.ac.uk
Exome Aggregation Consortium	Aggregate of exome sequencing data from a variety of large‐scale sequencing projects	http://exac.broadinstitute.org
GTEx Portal	Data repository for genotype and tissue‐specific gene expression data	http://www.gtexportal.org/home/
miRBase	Database of published microRNA sequences and annotation	http://mirbase.org
MirSNP	Collection of human SNPs in predicted microRNA target sites	http://bioinfo.bjmu.edu.cn/mirsnp/search/
NHGRI‐EBI GWAS Catalog	Catalog of published genome‐wide association studies	http://www.ebi.ac.uk/gwas/
NHLBI Exome Sequencing Project	Data repository for exome sequence variants related to heart, lung and blood disorders	http://evs.gs.washington.edu/EVS/
Online Mendelian Inheritance in Man	Catalog of human genes and genetic disorders for relationships between phenotype and genotype	http://omim.org/
PharmGKB	Pharmacogenomics knowledge resource encompassing clinical information	https://www.pharmgkb.org
PolymiRTS	Database of naturally occurring DNA variations in microRNA seed regions and microRNA target sites	http://compbio.uthsc.edu/miRSNP/
SPHINX – A resource of the eMERGE Network	A web‐based tool to access the pharmacogenetics gene sequence data of the eMERGE‐PGx project	https://www.emergesphinx.org
The Human Gene Mutation Database	Collection of published gene lesions responsible for human inherited disease	http://www.hgmd.cf.ac.uk/ac/index.php

Databases that catalog human genetic variation and phenotypic relationships In the 1000 Genomes Project, sequencing was carried out on 2,504 individuals from 26 populations in Africa, East Asia, Europe, South Asia, and the Americas. More than 88 million variants were identified, of which 84.7 million were single nucleotide polymorphisms (SNPs), 3.6 million were short indels, and 60,000 were structural variants. Only 8 million of the identified autosomal variants were observed in more than 5% of individuals, while 64 million rare variants (frequency of <0.5%) were identified. Substantial differences exist in the distribution of variants between populations, with 762,000 variants being rare (frequency of <0.5%) in the global population, but more common (>5%) in at least one population group. Eighty‐six percent of variants were only present in a single continental group. Sequencing of individuals from South Asian and African populations contributed to 24% and 28%, respectively, of novel variants discovered.1 Sudmant et al.3 reported the identification of 68,818 structural variants when analyzing sequencing data from the 1000 Genomes Project. The majority of these structural variants are deletions (42,279) with a median site size of 2,455 bp and median alleles per individual of 2,788. The nucleotide substitution rate is an important factor underlying the degree of genetic variation between individuals. Scally4 reported a present‐day germline mutation rate of 0.5 × 10−9bp−1year−1. This mutation rate translates into ∼30 de novo variants in each offspring that are absent in the parents. The introduction of 30 new DNA variants with every meiosis event over a period of 3.7–6.6 million years (evolution of the human species) and rapid expansion of the human population during the last 10,000 years resulted in the observed enormous diversity of the human genome. For most of the genetic variants, the impact on gene function and the effect on disease susceptibility remains unknown.

Genomic variants within individuals

High‐throughput sequencing continues to produce a more accurate estimation of how much genetic variation exists within and between genomes of individuals of different ethnicities. Typically, each genome has 4–5 million sites that differ from the reference human genome; the greatest number of variant sites were observed among individuals of African ancestry. Although SNPs and indels account for >99.9% of variants, the typical genome contains 2,100–2,500 structural variants that affect about 20 million bases of sequence. Deep sequencing allows for the identification of rare variants and an estimated 1–4% of variants (40,000–200,000) observed in a genome are rare (frequency of <0.5%). A typical genome reportedly contained 149–182 sites with protein truncation variants, 10,000–12,000 sites with nonsynonymous variants and 459,000–565,000 variant sites within regulatory regions (untranslated regions, promoters, insulators, enhancers, and transcription factor binding sites). The number of ClinVar variants (those associated with clinical phenotypes) within a typical genome range from 24–30.1 Tennessen et al.2 suggested that 2.3% of SNVs per individual exome are thought to disrupt protein function of about 313 of the 23,500 protein‐coding genes and nearly 96% of SNPs predicted to affect gene function are rare. (Figure 1) is a representation of functionally important regulatory and gene regions with the number of variants within these regions for a typical genome.

Figure 1

Representation of important gene regions with the number of genetic variants within a typical European or African human genome shown in brackets. Number of genetic variants within upstream enhancer regions#, transcription factor binding sites†, promoter regions‡, 5ꞌ‐ and 3ꞌ‐untranslated regions±, and intronic regions*. »Number of nonsynonymous;synonymous genetic variants within coding regions.1

Genomic variants associated with phenotypes

Genome‐wide association studies (GWAS) have been used to determine which of the identified variants are associated with diseases. To date, more than 3,200 GWA studies have been conducted (http://www.ebi.ac.uk/gwas) and ∼10,000 common SNPs have been associated with human traits and diseases through GWA studies.5 Gusev et al.6 estimated that ∼80% of phenotypic heritability of common diseases and traits are explained by variants in noncoding regulatory regions. Approximately 2,000 variants per genome have been associated with complex traits in GWA studies.1 However, testing of such a large number of SNPs in a GWA study requires correction for multiple testing to decrease the number of false‐positive associations by using very stringent significance thresholds. Bonferroni correction for multiple testing (0.05/number of tests) is often used, but it can result in overcorrection and, thus, miss SNPs that really are associated with the phenotype.7 A large number of study participants are also needed to identify rare causal variants with the use of GWAS.8 Genetic variants impact drug metabolism, efficacy, and adverse event risk and are especially relevant to precision medicine. Fujikura et al.9 analyzed sequencing data from the 1000 Genomes and the NHLBI GO Exome Sequencing Projects; they reported a total of 6,165 SNVs in the 57 cytochrome P450 (CYP) genes. Eighty‐three percent of the 4,025 SNPs within the coding regions were very rare (frequency of <0.1%) and 65% were nonsynonymous substitutions. The calculated total number of genetic variations in CYP genes of 1 million Europeans and Africans was 3.4 × 104 and 4.8 × 104, respectively.9 Furthermore, every individual of European descent carries on average 94.6 SNVs in CYP genes, of which 24.6 are nonsynonymous, within splice sites, or affect stop codons.9 In the recent PGRN‐seq study, 82 genes of pharmacogenomics relevance were sequenced among 5,639 individuals and 40,549 SNVs identified. Of the identified variants, 8,126 were in coding regions (4,858 missense, 3,169 synonymous, and 99 stop gain variants) and 19,923 were in noncoding regions (5,231 intronic, 5,981 upstream, 3,444 downstream, 4,165 3′UTR, 903 5′UTR, and 199 other variants). The majority (∼96%) of individuals had one or more Clinical Pharmacogenetics Implementation Consortium Level A actionable variants, while ∼23% (n = 1,273) of individuals have a single Level A actionable variant.10 The Human Gene Mutation Database (HGMD) is a repository of mutations associated with diseases; they are based on published literature, including GWA studies, and as of June 2013, had 141,161 germline mutation entries in 5,700 unique genes. Missense substitutions, nonsense substitutions, splicing substitutions, and substitutions within regulatory elements account for 44%, 11%, 9%, and 2% of the total disease‐associated mutations, respectively.11 Exome sequencing of healthy individuals revealed that each of these individuals carried 40–110 disease‐causing mutations as classified by HGMD.12

Combining genomic variants for phenotypic prediction

The large amount of genomic variation data now available has created a clear need for the functional characterization of many genetic variants; this should help to distinguish disease‐causing variants vs. passenger mutations. In most cases, genotype–phenotype association studies are impractical to assess the role of rare genetic variants, as very large patient cohorts are needed to include enough patients with the rare variants to achieve statistical significance. An alternative is to determine the functional impact of the rare variants and then combine variants with similar functional effects into one group. This has been done with previous studies focusing on CYP2D6 by assigning an “activity score” to each patient that is derived based on the patients’ genotype. Many variants, including rare variants, are classified as functional, partially functional, or nonfunctional. The activity score is then calculated and translated into a predicted CYP2D6 phenotype.13, 14 For any other gene, if the functional impact of the variants are known, this approach could be used to simplify the genotype interpretation and facilitate genotype–phenotype association studies. The remainder of this review will discuss the current status of many high‐throughput functional screening assays (Table 2). These assays should help distinguish the functional from passenger variants, which will provide valuable information for the successful implementation of genomic‐driven medicine.

Table 2

Summary of high‐throughput assays discussed in this review

Genomic region	Name of the assay	Description
Coding	Deep mutational scanning	Mutagenesis method where protein expression and mutant selection are coupled with high‐throughput sequencing to determine various functions of variants in the coding region.
Regulatory	Massively parallel reporter assay (MPRA)	Barcoded luciferase plasmids containing variants in cis‐regulatory elements are inserted into animal model/cell culture and analyzed by RNA‐seq.
	CRISPR Cas9‐mediated in situ saturating mutagenesis	CRISPR‐Cas9 mutagenesis to disrupt all sequences within an enhancer region. Cells are sorted and sequenced.
	Multiplexed editing regulatory assay (MERA)	CRISPR‐Cas9 based mutational tool to generate variations in the cis‐regulatory region of genes. Cells are sorted and sequenced.
	Self‐transcribing active regulatory region sequencing (STARR‐seq)	Ectopic, plasmid‐based assay that allows for active enhancers to self‐transcribe and analyzed by RNA‐seq.
Splicing	High‐throughput mini‐gene reporter assay	A modified mini‐gene assay where a pool of wildtype and variant splice sites are transfected into cells and splice products analyzed by RNA‐seq.
	In vitro splicing assay	Radiolabeled RNAs are incubated in nuclear extracts after which the splice products are separated and analyzed by RNA‐seq.
	Modified systematic evolution of ligands by exponential enrichment (SELEX)	A method that identifies altered RNA‐protein interactions in test splice‐sites that contain genetic variations.

Summary of high‐throughput assays discussed in this review

GENETIC VARIANTS WITHIN CODING REGIONS

The exome is ∼1% of the human genome and contains 23,500 protein‐coding genes with roughly 180,000 exons. Large‐scale sequencing studies have focused on identifying genetic variants within the exome, as these variants could alter protein function.1, 2 The number of identified genetic variants differs significantly between populations, with individuals of African descent being more genetically diverse.1 African individuals typically carry 12,200 nonsynonymous and 13,800 synonymous variants. The number of nonsynonymous (10,200) and synonymous (11,200) variants identified among individuals of East Asian or European ancestry were fewer (Figure 1).1 Despite the identification of numerous variants in regions coding for proteins, only a portion of these variants disrupt protein function and are likely disease‐causing. The average number of loss‐of‐function variants per genome ranged from 149 in Europeans to 182 among Africans.1 Missense or nonsense substitutions in protein‐coding genes contribute ∼55% of variants implicated in disease.11 Missense and nonsense nucleotide substitutions and frameshift indels alter the amino acid sequences of the proteins, which can lead to altered secondary, tertiary, and quaternary structures of proteins. These alterations can change many characteristics of the proteins, such as thermodynamic stability and cellular localization and, consequently, cellular functions of the protein, such as enzymatic activity, cell signaling, and ligand binding.15 Therefore, it is necessary to distinguish loss‐of‐function variants that could be disease‐causing from neutral indels or nucleotide substitutions. Multiple computational tools have been developed to predict if variants affect protein structure and stability and whether variants in conserved regions are neutral, deleterious, or hyperactivating.16, 17 These currently available tools lack accuracy and, thus, cannot be used in a clinical setting.18 The models used for these prediction tools are limited by the accuracy of annotated variant effects and evolutionary measures in the training data sets. The complex relationship between evolution and phenotypic effect could also result in high false‐positive and false‐negative prediction rates by these tools.17, 19, 20 The challenges with computational prediction tools can be improved by combining this approach with experimental functional characterization of variants. Large data sets with experimental measures of the phenotypic effects of variants can also be used to confirm predictions or act as training data sets to improve prediction algorithms.21 Mutagenesis studies are commonly used to assess the effect of genetic variants on protein function. A forward genetics approach is time‐consuming, because random mutations are created and genes are then identified based on the phenotype that develops. A reverse genetics approach involves the mutagenesis of defined genes, which is followed by functional assays to characterize protein sequence–function relationships; however, these are low‐throughput, as the effect of only a small number of variants are assessed. These traditional approaches are also especially laborious as they require the use of a wide variety of techniques and personnel expertise, depending on the function of a specific protein being tested.22

High‐throughput functional assays

To improve the rate of functional testing, deep mutational scanning has been developed as a high‐throughput assay to characterize the function of thousands of variants simultaneously.21, 22, 23, 24 This method can be used to test the functional impact of multiple variant types, including SNPs, indels, and larger structural variants. Design of a deep mutational scanning experiment depends on the type of protein functional assay that is used. For example, for genes that encode enzymes, an enzyme reporter assay may be used. For proteins without known functions or if the measurement of interest is protein stability, then quantifying the protein levels may be desirable. The deep mutational scanning process can be divided into several steps roughly defined as mutagenesis, protein expression, mutant selection, high‐throughput sequencing, and statistical analysis. A detailed deep mutational scanning protocol has recently been published by Fowler and Fields.23 Several studies have used deep mutational scanning with different mutagenesis methods, expression systems, and selection approaches to create sequence‐function maps. The first step involves the synthesis of a systematic or random library of mutants that target a specific site in a protein. This step can be performed by creating oligos either designed with defined mutations or mutations introduced randomly through polymerase chain reaction (PCR) amplification. Mingo et al.25 recently developed a one‐tube‐only standardized site‐directed mutagenesis approach. Oligo synthesis is followed by the introduction of the mutant oligo pool into an expression system. Forsyth et al.26 demonstrated the use of mammalian cell‐based assays where a protein is expressed from a plasmid or viral vector. Alternatively, M13 or T7 phage‐display systems can be used to display up to 1012 clones, about 1010 clones in bacteria, 106 in yeast expression systems, or more than 1012 proteins in ribosome display systems.27, 28, 29, 30 The choice of protein expression system to use depends on features of the system, such as how well the expressed protein variant represent the phenotype, ability of the system to do appropriate post‐translational modifications, and not only on the number of possible clones. A moderate selection pressure is applied that is appropriate to the protein function assessed. Variant effects have been assessed by testing their impact on protein structure, mechanism of action, catalytic or enzymatic activity (for example, phosphorylation or ubiquitination), thermodynamic stability, protein interaction, peptide binding, DNA or RNA binding, ligand binding, epitope binding, protein aggregation, or expression of a fluorescent protein.18, 21, 22, 23 Multiple studies have suggested that additional selection rounds improved accuracy of estimating the fitness for each variant.31, 32 High‐throughput sequencing is then used to identify the variants with the altered phenotypic activity. During the creation of the pool of plasmids a barcoding strategy, where ∼20 bp barcodes specific to each mutation is added outside of the open reading frame, is useful for massive parallelization and correction for library amplification biases and the sequencing error rate.33 A library of 100,000 variants requires about 107 sequence reads for adequate coverage (i.e., at least 100 reads per variant).22, 23 It is important to determine the initial frequency of each mutant within the pool before selection. Enrichment of beneficial variants or depletion of deleterious variants are calculated by comparing the frequency (again determined by sequencing) of each mutant after one or multiple rounds of selection to the initial frequency. Statistical analysis is used to identify mutants that are significantly increased or decreased in frequency during selection.18, 21, 22, 23, 24, 34 Data analysis is straightforward when direct selection is used for a protein property; for example, assessment of thermodynamic stability by thermal denaturation. Fowler et al.35 developed a freely available software package, Enrich, to convert high‐throughput sequencing data into a functional score for each variant and create a sequence‐function map. Recently, another software package called dms_tools was developed to infer mutation impact from mutation count data by using a likelihood‐based analysis.36 However, standards for deep mutational scanning data analysis have not yet been developed.22, 23 The context in which mutations occur further complicates the prediction of phenotype from genotype. Deep mutational scanning offers an unbiased technique to test the effect of a combination of mutants at once. Mutants can display epistasis when the observed effect is different from the expected additive effect of the mutants. Mutants might together either cause an unexpected large change in activity or one variant might rescue the destabilizing effect of another.18 Hemani et al.37 estimated that pairwise epistasis explains approximately one‐tenth the amount of phenotypic variance that additive effects do. Wu et al.38 developed a method to calculate the estimated mutational stability effect from double‐substitution functional fitness profiles to account for the effect of variant combinations. Several challenges and potential improvements to the current deep mutational scanning approach have been suggested. Assessment of variant function is difficult for proteins for which the function is unknown; selection in those experiments may be limited to assays assessing thermostability or degradation rate of the gene product.22, 23, 34 Selection assays used during deep mutational scanning are often specific to a protein and its function being tested. Designing these assays frequently remains a challenge. For example, coupling of cell‐based properties such as protein localization with high‐throughput sequencing, and might not reflect the complexity of human disease. Paired‐end sequencing and inclusion of replicate samples can be used to correct for the average per‐base sequencing error rate of ∼1%. Furthermore, inclusion of known completely nonfunctional variants is useful for estimating error rates and can improve the accuracy of fitness estimates.21 Kowalsky et al.39 developed a standardized protocol to resolve sequence–function relationships for full‐length proteins by using a gene tiling technique to divide long gene sequences into different sequencing libraries to overcome the disadvantages of short sequencing reads.

Deep mutational scanning in precision medicine

Functional characterization of genetic variants with the use of deep mutational scanning, in addition to genotype–phenotype association studies, is valuable in diagnosing, treating, and understanding disease risk or prognosis.18 For example, variants of unknown significance are continuously identified in BRCA1 in cancer patients by DNA sequencing. Recently, deep mutational scanning was used in a prospective manner to measure the effects of nearly 2,000 missense substitutions in the BARD1 RING domain of BRCA1 on its E3 ubiquitin ligase activity and binding to this domain. The resulting variant functional scores were used to create a prediction model of variant effect on homology‐directed DNA repair. This model will likely improve the interpretation of variants observed in the clinical sequencing of the BRCA1 gene.40 In addition to understanding the function of variants of uncertain significance, it is also important to be able to discriminate driver mutations from passenger mutations in protein domains or entire cancer‐related proteins; furthermore, it is also useful to understand the impact of the mutation on the protein's function, its effect on cellular function, and its drugability. The study by Wagenaar et al.41 is another example of how deep mutational scanning can be used in precision medicine. Mutant selection with vemurafenib exposure in mammalian cells and mouse xenografts was used to identify variants in the kinase active site of BRAF that are involved in resistance to treatment of BRAFV600E‐positive melanomas. The kinase activity of the BRAFV600E/L505H mutation combination was higher than that of the well‐characterized V600E mutation alone. The increased kinase activity of the BRAFV600E/L505H mutation combination could result in this mutation combination being moderately resistant to mitogen‐activated extracellular signal‐regulated kinase (MEK) inhibition. Additional crystal structure comparisons suggest that other BRAF inhibitors will be more effective than a MEK inhibitor in eliciting a response in BRAFV600E/L505H‐containing melanomas. This method could also be applied to other proteins for evaluating resistance to inhibitors.41 A deep mutational scanning method was also developed for antibody complementarity‐determining regions to simultaneously determine the effect of every possible single amino acid substitution on antigen binding. This method was then applied to a humanized version of the anti‐epidermal growth factor receptor antibody cetuximab. Although the majority of complementarity‐determining region substitutions are neutral or deleterious to antibody interaction, 67 of the 1,060 tested point mutations increased its affinity. This approach will likely be useful in the future for the development of additional antibody therapies that target cells with specific genetic mutations or variants.26 Cystic fibrosis is an autosomal recessive genetic disorder that affects chloride transport. A genetic variant, G551D, exists in the coding region of the CF transmembrane conductance regulator (CFTR) gene. The functional consequence of this variant was discovered with the use of high‐throughput assays. Although the variant does not impact transport of the CFTR protein to the cell surface, it impairs the ability of the membrane channel to open. This phenotypic effect is associated with abnormalities in the respiratory, endocrine, gastrointestinal, and reproductive systems. In cystic fibrosis patients, this phenotype can be improved with the therapeutic agent ivacaftor. Ivacaftor is a CFTR potentiator because it can alter the activity of the channel by increasing the opening probability and flow of ions. The development of ivacaftor for use in G551D variant carriers provides a further example of how high‐throughput functional assays can facilitate identification of actionable drug targets and development of targeted therapeutics.42, 43

GENETIC VARIANTS WITHIN REGULATORY REGIONS

Endogenous gene expression is controlled by a variety of regulatory regions in the DNA. These regions serve as binding sites for several activator and repressor proteins and RNAs that alter gene expression. Genetic variations in these enhancer elements, transcription factor binding sites, promoter regions, and untranslated regions (UTRs) can alter the binding of these proteins and RNAs leading to changes in gene expression.44, 45, 46 The ultimate level of gene expression is the combination of the effects of all of these binding sites together. According to the 1000 Genomes Project Consortium, the median number of variants among continental population groupings range from 288,000–354,000 variants in enhancers, 748–927 in transcription factor binding sites, 82,000–102,000 in promoters, and 30,000–37,200 in the UTRs per typical human genome (see Figure 1). Estimates indicate that ∼500,000 of these variant sites are likely to be functional.1 Traditionally, the gold standard for studying this process has been to use individual reporter assays that have enhancer elements inserted upstream of a minimal promoter. The strength and effects of the enhancer elements are measured by determining the expression of a reporter gene (e.g., LacZ, luciferase, and green fluorescent protein (GFP)) that is driven by the minimal promoters and enhancers.47, 48 In addition, these assays are also standard assays for assessing the effect of genetic alterations in miRNA binding sites.45, 49 Given the large number of regulatory SNPs that need to be tested, this has led to advances towards more high‐throughput functional testing. Massively parallel reporter assays are one of the high‐throughput functional assays that have been used to assess genetic variants in regulatory regions. For example, using this technique, enhancer sequences containing variations at many positions in the cis core regulatory elements were synthesized using programmable microarrays and inserted into a promoter. Tagging barcodes were also inserted into the expressed sequence. These regulatory variants were then transcribed in vitro and the expression of each barcode was measured by RNA‐sequencing; the expression of each barcode reflected the relative activity of the promoter variants that each barcode was tagging.50 This method was then modified by synthesizing over 100,000 enhancer variants that were cloned upstream of a minimal promoter luciferase plasmid with the barcode precloned in the 3ꞌUTR to allow for random barcoding. The library of plasmids was injected into mice via tail vein injection. The plasmids are taken up and expressed in the liver and the reporter expression was measured by RNA‐seq.51 Variations of massively parallel reporter assays have been developed and utilized by several groups involving different cell lines and animal models, as well as adaptations to increase the throughput of the assay.52, 53, 54, 55, 56, 57, 58, 59, 60 CRISPR‐Cas9‐mediated in situ saturating mutagenesis has been used to assess the effects of genetic variants in the BCL11A gene enhancer. This gene is a repressor of fetal hemoglobin levels and a therapeutic target for β‐hemoglobin disorders. Using the CRISPR‐Cas9 nuclease system, they deleted the 12‐kb enhancer of BCL11A gene using a pair of guide RNAs (gRNAs) to create paired double‐strand breaks.61 To further assess single nucleotide changes and a complete knockout of the enhancer in BCL11A, they then synthesized a saturating gRNA library tiling the enhancer region. They disrupted almost all the sequences within the enhancer with Cas9 cleavage and nonhomologous end joining repair. The library was cloned into a lentiviral vector and transduced into HUDEP‐2 cells at low multiplicity to achieve a single gRNA per cell. After expansion and differentiation, cells were sorted by fetal hemoglobin levels, which has been previously validated to be regulated by BCL11A. DNA was isolated, sequenced, and mapped back to the genome to assess variations in the enhancer region associated with the high‐ and low‐fetal hemoglobin pools.61 Multiplexed editing regulatory assay (MERA) is another high‐throughput functional screen for genomic variant effects on gene expression. This technique is a CRISPR‐Cas9‐based mutational screening tool in which adaptations have allowed for one regulatory element to be targeted per cell. This is performed through integration of a single gRNA expression construct into a universally accessible ROSA locus of mouse embryonic stem cells. This gRNA expression construct was driven by a U6 promoter driving the expression of a dummy gRNA inserted into stem cells using CRISPR‐Cas9‐mediated homologous recombination. A library of over 3,900 gRNAs tiling the cis‐regulatory region was created for each of the four genes of interest, Nanog, Rpp25, Tdgf1, and Zfp42. Homologous recombination was used to replace the dummy gRNA with a gRNA from the library which occurred in ∼30% of cells to create a functional gRNA expression construct. GFP knock‐in lines that were generated for these four stem cell‐specific genes were sorted based on GFP expression and deep‐sequencing of gRNA‐induced mutations were analyzed to assess which mutations induced loss of GFP expression. A linear regression model was developed in order to detect statistically significant gRNAs that are expressed in the different GFP populations using the GFP targeting gRNAs as the positive control and dummy gRNAs as a negative control.62 Another high‐throughput functional assay that can be adapted to study SNP effects in enhancers is STARR‐seq. Self‐transcribing active regulatory region sequencing (STARR‐seq) involves cloning enhancer elements downstream of a minimal promoter and into the 3ꞌUTR of reporter genes. This ectopic, plasmid‐based assay allows for these active enhancers to self‐transcribe and become part of the reporter transcripts when transfected into cells. Expression of the transcripts, which include the inserted enhancer sequences, are measured by RNA‐seq. This method was first developed and assessed using the Drosophila melanogaster genome and has the capacity to identify and quantify enhancer activity in humans.63 This method has been applied to several enhancer elements, such as hormone responsive enhancers, as well as a modified capture approach (CapSTARR‐seq) in which DNA fragments are captured on a custom‐designed microarray and cloned into STARR‐seq vector.64, 65 Collectively, there are multiple high‐throughput technologies for assessing the impact of genetic variants on regulatory motifs. The diversity of regulatory mechanisms requires that each type of motif has a specific technology. The variations in regulatory elements that alter gene expression can be detected using assays that sort cells with high and low expression of reporter genes, such as the green fluorescent protein. Then the separate pools can be sequenced to determine which variants resulted in the change in activity. Since regulatory domains are the sites of many clinically important genetic variants, these assays will be critical for identifying the variants with functional implications.

GENETIC VARIANTS THAT ALTER SPLICING

The transcribed regions of most eukaryotic genes are made up of introns (noncoding regions) and exons (coding regions). Following the transcription of the DNA into RNA, a diverse group of trans‐acting ribonucleoproteins interact with cis‐sequences in the pre‐mRNA to remove the introns and join the exons to form the mature mRNA. This process of mRNA splicing is not a perfect reaction and the majority of human transcripts (∼95%) exist in multiple isoforms due to alternative splicing.66, 67 Alternative splicing can have large impacts on the functions of the proteins by altering the amino acid sequences of the translated proteins, the RNA sequences of regulatory RNAs, or the regulatory domains within the RNAs. Core sequences that are involved in splicing are the exon–intron junction (5’ splice site), the intron–exon junction (3’ splice site), and a branch point within the intron. These sequences determine how frequently a given splice site is used. Complete conservation of splice site sequences is limited to a GU at the 5’ end, an AG at the 3’ end, and an A at the branch point of the intron. In contrast, there is 35–80% variation in the other positions around the splice junctions and in the branch points that create variability in the effectiveness of the splice site (Figure 1). In addition to the core sequences, auxiliary sequences both within the exon and intron influence the effectiveness of a given splice site. Exonic splicing enhancers (ESEs) within the exons and intronic splicing enhancers (ISEs) within the introns increase the probability of an adjacent splice site being used. Conversely, exonic splicing silencers (ESSs) and intronic splicing silencers (ISSs) suppress splicing.68, 69 Genetic variations within any of the core or auxiliary sequences can influence splice site selection.70, 71 Since the alternatively spliced transcripts can function differently than the normal transcripts, it is not surprising that variations in splice junctions can impact many phenotypes. About 14% of hereditary diseases are caused by SNPs in noncoding regions, 90% of which are due to splice altering variations.72 Approximately 30% of disease‐causing variants in HGMD are due to variants that perturb normal splicing; around 15% are due to mutations within the conserved dinucleotide motifs.73 Variants within the highly conserved dinucleotide motifs at the beginning and end of the introns reduce the spliceosome binding and reduce the mRNA splicing.71 There have also been several reports of diseases attributed to variants around branch points,74, 75, 76 although the identification of disease‐causing variants in the branch points has been limited by the low number of known branch point sequences.77 The majority of the splice‐altering variants are located in auxiliary splice sequences. The resulting changes in the levels of aberrant transcripts or the ratio of splice variants have been implicated in a variety of diseases.78 Using next‐generation sequencing and array‐based genome wide screens, large numbers of variants have been associated with clinical phenotypes. Many of these variants are located in noncoding regions of the genome and are predicted to alter splicing. Understanding their role in altering clinical phenotypes requires understanding the function of the variants. However, functionally testing thousands of variants individually is not practical due to the time and resources needed. Typically, computational methods are employed on identified variants to predict functional significance or causality. Examples of widely used computational methods are reviewed by Soemedi et al.79 In addition, newer methods, such as ExonImpact are also continually being developed.80 Most algorithms that predict the functional significance of SNPs in splice sites scan for disruption of protein‐binding motifs in the RNA. RNA‐protein binding predictions are similar to DNA‐protein predictions, but RNA also has a secondary structure element that may result in altered accessibility and protein binding at sites far away from the site of the variation.81 Although the rapidly evolving computational methods help narrow down potential functional variants from large data sets, experimental approaches to test these predictions are important to validate the predictions. While human in vivo studies provide the best system in terms of the context of the physiological background, the results may be difficult to interpret amidst the other complex genetic variability and environmental factors in humans. In addition, some of the variants are rare, so it can be difficult to find subjects that carry the variants of interest. Lastly, the analysis of the mRNA splicing often requires tissue samples to evaluate tissue‐specific mRNA splicing that are too invasive to be biopsied. Thus, in vitro bioassays are most commonly used to determine the impact of variants in splice junctions. Mini‐gene reporter assays are the gold standard for functionally testing sequences and variants predicted to alter splicing.82 In this assay, a target genomic region containing a cis‐sequence is inserted between known splicing reporters (usually exons) within an expression plasmid. The ability of the target region to induce splicing is measured by expressing the transcript either in vitro (in a nuclear extract) or in vivo (transfected into cells) and determining the frequency of splicing in the mature mRNA transcript. The spliced transcripts are usually detected by quantitative PCR (qPCR) assays, which detect specific splice products and lariat fragments. By comparing the splicing of wildtype and variant sequences, one can determine the functional impact of the genetic variants on mRNA splicing. This assay can be used to compare allele‐specific splicing by comparing the splice‐products of the wildtype and variant version of the test sequence. Although this assay is useful for low‐throughput testing and validation of high‐throughput assays, it is too resource‐intensive to individually test hundreds or thousands of variants. Since most functional variants in splice‐junctions result in gain or loss of RNA–protein interactions, assays that measure RNA–protein interactions can also be used to functionally test variants in splice junctions. There are several low‐throughput assays that detect differences in protein–RNA interactions. Some examples include electrophoretic mobility shift assay (EMSA), nitrocellulose membrane binding assay, and immunoprecipitation methods. These assays are commonly performed as in vitro assays that involve incubation of nucleic acid constructs with a known protein or a protein pool followed by the separation of protein‐bound from free nucleic acids. Although these techniques are useful in low‐throughput assays to determine RNA–protein interactions, they will need to be modified to be employed in high‐throughput applications. These assays may provide some insights into the mechanism of altered splicing; however, they cannot replace direct measures of mRNA splicing. These assays may be more useful after the functional significance of a variant has been established by assays, such as the mini‐gene reporter assay. With high volumes of data being generated through genomic technologies and shared through databases, the use of low‐throughput assays are a major bottleneck in testing large numbers of variants in splice junctions. There are thousands of variants that have been computationally predicted to alter splicing. Many of those variants are in high linkage disequilibrium with other potentially functional variants, making it necessary to test large numbers of variants. High‐throughput bioassays that isolate individual variants and test for intrinsic splicing differences in alleles are critical in identifying the causal variants. Once functional variants are identified, investigation into the mechanism of action of these variants will also require high‐throughput methods. Although significant strides have been made in computational methods that predict these functional variants, progress towards developing high‐throughput experimental approaches have been lacking until recently. Pioneering work has recently been done to develop high‐throughput functional assays where several low‐throughput functional assays were modified into cost‐effective high‐throughput bioassays. These high‐throughput methods use pooled‐oligonucleotides containing the target sequences that are synthesized in highly parallel reactions on arrays. Those pools were originally generated by heat‐treating custom oligonucleotide arrays.83 More recently, pools of oligos are now commercially available where thousands of custom oligonucleotides ranging from 10–200 bp in length are electrochemically synthesized on an array and chemically released into a tube to create the custom pool (e.g., OligoMix, LC Biosciences, Houston, TX, USA). Such pools have been used in massively parallel reporter assays, as capture probes, CRISPR‐Cas9 applications, and for gene synthesis.84, 85 Pooled‐oligonucleotide based assays are ideal for testing the functional significance of SNPs and short indels where the test sequences are of a similar length. In high‐throughput studies investigating variants in splice motifs, these pooled oligonucleotides have been used in modified mini‐gene splicing assay,79 in vitro splicing assays,83 and in RNA‐binding assays.86 In a modified mini‐gene reporter assay, a large number of target sequences are inserted into the mini‐gene reporter assay to create a pool of reporter plasmids. The targets are synthesized as a pool of single‐stranded DNA oligonucleotides that contain the target sequence along with the endogenous flanking region (≤200 nucleotides total length). For each target site, a wildtype and variant version of the oligonucleotide is included in the pool. They are synthesized with universal primer binding sequences on each end, which are used to amplify the targets by PCR. In a single reaction, the oligonucleotides are cloned into the mini‐gene vector to create a pool of plasmids that are amplified in bacteria. The pool of plasmids is transfected into cultured cells to be transcribed and spliced. Spliced and unspliced products are quantified by RNA‐seq. Following the normalization to the amount of each plasmid in the pool, the ratio of spliced and unspliced product is compared between the wildtype and variant sequences; this determines the effect of the variant on the splicing effectiveness. A limitation to this methodology is the poor representation of the input sequences in the final pool of plasmids.87 Soemeidi et al. found that only <35% of the allele pairs in the oligo pool was represented in plasmids pool.79 An alternate approach is the use of PCR ligation of oligo fragments with overlap to create a PCR library that contains all the intrinsic features required for transcription and splicing. Another approach to testing splice sites is to use an in vitro splicing assay.88, 89 This high‐throughput splicing assay uses a pool of pre‐mRNAs that have been constructed by synthesizing oligonucleotides with the mini‐gene component and target splicing sequence preceded by an upstream T7 promoter. The oligonucleotides are made double stranded by PCR and transcribed in the presence of α32‐UTP. The radiolabeled pre‐mRNA pool is incubated in a splicing‐competent nuclear extract.90 The products of the splicing reactions are the free 5’ exon and lariat intermediate from the first splicing step, and the spliced exons and the shorter lariat product from the second splicing step. To identify and isolate each product, the α32‐UTP products are separated by denaturing polyacrylamide gel electrophoresis. The RNA products are purified and analyzed by RNA‐seq to confirm the quantity of splicing. This method not only identifies differences in splicing between wildtype and variant sequences, but it also helps identify the steps of splicing that are affected by the variation. These methods of testing genetic variants for altered splicing also lend themselves well to determining if the mechanism is related to altered protein binding. Since the major reason for altered splicing effectiveness is altered binding to spliceosome proteins, the following approach can be used to identify which proteins may be impacted. This is accomplished using a modified binding motif detection assay, SELEX,91 to develop a protein binding assay where a pool of RNAs are tested for their ability to bind protein.83 The pool of double‐stranded oligonucleotides, as described above, is transcribed in vitro and the resulting RNA pool is incubated with nuclear extracts to facilitate RNA–protein interactions. The RNA fragments that are bound to proteins are isolated by physically separating the bound and unbound fractions. This can be done with a nonspecific method, such as nitrocellulose based protein binding or for specific proteins by immunoprecipitation. The RNA populations in the bound and unbound fractions are reverse‐transcribed and analyzed by RNA‐seq. Differences in the binding of wildtype and variant RNAs in the bound fraction indicates altered RNA‐protein interactions.

Therapies that target alternative splicing

The impact of aberrant splicing on a variety of human health parameters has stimulated the pursuit of individualized therapies that specifically target the splicing process. Variants in cis‐motifs that affect nearby splicing events are likely the best candidate targets for designing therapies. Most defects in splicing are due to aberrant splice site selection as a result of altered spliceosome‐factor binding. Therapeutic approaches aim to restore normal splice site selection by blocking the interaction between cryptic sites and the spliceosome.92 Modified oligonucleotides and RNA binding small molecules have been used with some success in animal models and are currently under clinical investigation.93 For example, the most common form of spinal muscular atrophy is caused by the deficiency of the SMN protein; the deficiency is due to a splice site mutation that results in the skipping of exon 7. 2′‐O methyl phosphothioate‐modified oligonucleotides that bind to hnRNP A1 (a splicing silencer) binding site, leads to the inclusion of exon 7 and, thus, increase the amount of functional SMN protein.94 Subcutaneous administration of the modified oligo has been shown to be effective for an extended period of time in animal models.94 Similarly, partial rescue of aberrant splicing that leads to Duchenne muscular atrophy has been observed using oligonucleotides that target dystrophin pre‐mRNA.93 This approach of using oligonucleotides could also be expanded by the addition of splicing regulatory sequences or factors to antisense targeting.95 The therapeutic potential of small molecule regulators of splicing has also been explored with significant success. Using high‐throughput drug screens, compounds that promote the inclusion of exon 7 of SMN protein have been identified.96 One of the compounds stabilizes the interaction between a core spliceosomal small nuclear ribonucleoprotein U1 at the SMN exon 7/intron junction. Other compounds such as kinetin,97 cardiac glycosides,98 and RECTAS99 have been shown to improve the recognition of mutated splice sites in the IKBKAP gene involved in familial dysautonomia. The improved recognition in both of these genes is proposed to be due to stabilization of base paring between the binding site and the RNA component of the ribonucleoprotein. Spliceosome‐mediated RNA trans‐splicing (SMaRT) therapies are another approach to treat unwanted splicing aberrations. These are compounds that modify the secondary structure of pre‐mRNAs and, thereby, regulate splicing factor accessibility.100 Trans‐splicing is a process that is observed in a variety of organisms ranging from protozoa to mammalian cells. In this process, exons from different pre‐mRNAs are spliced together to generate a single transcript. As a therapeutic approach, splicing aberrations that result in a nonfunctional protein are repaired by inducing trans‐splicing of endogenous mutated pre‐mRNA with an exogenous pre‐trans‐splicing molecule. The exogenous molecule contains the desirable sequence to replace the aberration, resulting in a chimeric transcript that encodes for a functional protein. Despite poor in vivo efficacy of the trans‐splicing process, therapies towards treating diseases such as cystic fibrosis,101 spinal muscular atrophy,102 Duchenne muscular atrophy,103 and retinitis pigmentosa104 are being tested.

CONCLUSIONS AND FUTURE DIRECTIONS

Genome sequencing and genotyping technologies have uncovered enormous genetic variation in the human population. Every person has a large number of variants, but the minor allele frequencies of most individual variants are relatively rare. In addition, many new germline variants are created in every individual. This low frequency of many of the variants makes population genotype–phenotype associations impractical for most variants if their functional impact is not known. Thus, for the majority of the genetic variants, an important first step towards translating this information into clinically usable information is to determine the impact of the variant on the function of the gene product. For genes with known functions and clinical utility, these variants can then be used to guide risk assessment and therapies based on their effect on the host gene. By understanding the functional alterations in genes that have already been associated with clinical phenotypes, this may also help understand the etiology of specific phenotypes and, thus, lead to future curative therapies. For genes that are being tested in clinical association studies, the rare variants within a gene can be grouped together using gene activity scores that can be used for associations with the clinical phenotypes. In order to functionally classify variants and assign activity scores to the large number of variants that currently have unknown functional consequences, we need more and better high‐throughput functional assays. As we have described in this review, there are several relatively high‐throughput assays for a variety of functional assays. However, many of them still lack the scalability needed to assess the large number of variants and do it economically. Advances are also needed to improve accuracy and turnaround time for many of them, so as new variants are discovered, they can be clinically implemented. Furthermore, there are many functions that yet still do not have high‐throughput assays. For those assays that do exist, and as new ones are developed, centralized databases would be useful to simplify the collection and comparison of data from multiple laboratories and to make the functional data easily accessible by others. These improvements will be critical to maximize the clinical utility of the large amount of existing genomic data.

Author contributions

J.I., M.S., K.S.B., and T.C.S. wrote the article. The first three authors contributed equally to this work.

Conflict of interest

The authors declared no conflicts of interest.

104 in total

Review 1. Mechanisms of RNA-mediated disease.

Authors: Jason R O'Rourke; Maurice S Swanson
Journal: J Biol Chem Date: 2008-10-28 Impact factor: 5.157

2. Widespread recognition of 5' splice sites by noncanonical base-pairing to U1 snRNA involving bulged nucleotides.

Authors: Xavier Roca; Martin Akerman; Hans Gaus; Andrés Berdeja; C Frank Bennett; Adrian R Krainer
Journal: Genes Dev Date: 2012-05-15 Impact factor: 11.361

Review 3. Genetic variation and RNA binding proteins: tools and techniques to detect functional polymorphisms.

Authors: Rachel Soemedi; Hugo Vega; Judson M Belmont; Sohini Ramachandran; William G Fairbrother
Journal: Adv Exp Med Biol Date: 2014 Impact factor: 2.622

4. Hormone-responsive enhancer-activity maps reveal predictive motifs, indirect repression, and targeting of closed chromatin.

Authors: Daria Shlyueva; Christoph Stelzer; Daniel Gerlach; J Omar Yáñez-Cuna; Martina Rath; Łukasz M Boryń; Cosmas D Arnold; Alexander Stark
Journal: Mol Cell Date: 2014-03-27 Impact factor: 17.970

5. Kinetin improves IKBKAP mRNA splicing in patients with familial dysautonomia.

Authors: Felicia B Axelrod; Leonard Liebes; Gabrielle Gold-Von Simson; Sandra Mendoza; James Mull; Maire Leyne; Lucy Norcliffe-Kaufmann; Horacio Kaufmann; Susan A Slaugenhaupt
Journal: Pediatr Res Date: 2011-11 Impact factor: 3.756

6. Normal and mutant human beta-globin pre-mRNAs are faithfully and efficiently spliced in vitro.

Authors: A R Krainer; T Maniatis; B Ruskin; M R Green
Journal: Cell Date: 1984-04 Impact factor: 41.582

7. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model.

Authors: Robin P Smith; Leila Taher; Rupali P Patwardhan; Mee J Kim; Fumitaka Inoue; Jay Shendure; Ivan Ovcharenko; Nadav Ahituv
Journal: Nat Genet Date: 2013-07-28 Impact factor: 38.330

8. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay.

Authors: Alexandre Melnikov; Anand Murugan; Xiaolan Zhang; Tiberiu Tesileanu; Li Wang; Peter Rogov; Soheil Feizi; Andreas Gnirke; Curtis G Callan; Justin B Kinney; Manolis Kellis; Eric S Lander; Tarjei S Mikkelsen
Journal: Nat Biotechnol Date: 2012-02-26 Impact factor: 54.908

9. High-resolution sequence-function mapping of full-length proteins.

Authors: Caitlin A Kowalsky; Justin R Klesmith; James A Stapleton; Vince Kelly; Nolan Reichkitzer; Timothy A Whitehead
Journal: PLoS One Date: 2015-03-19 Impact factor: 3.240

10. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

14 in total

1. The Bardet-Biedl syndrome protein complex regulates cell migration and tissue repair through a Cullin-3/RhoA pathway.

Authors: Deng-Fu Guo; Kamal Rahmouni
Journal: Am J Physiol Cell Physiol Date: 2019-06-19 Impact factor: 4.249

2. Variant Interpretation: Functional Assays to the Rescue.

Authors: Lea M Starita; Nadav Ahituv; Maitreya J Dunham; Jacob O Kitzman; Frederick P Roth; Georg Seelig; Jay Shendure; Douglas M Fowler
Journal: Am J Hum Genet Date: 2017-09-07 Impact factor: 11.025

3. Cooperation between non-essential DNA polymerases contributes to genome stability in Saccharomyces cerevisiae.

Authors: Damon Meyer; Becky Xu Hua Fu; Monique Chavez; Sophie Loeillet; Paula G Cerqueira; Alain Nicolas; Wolf-Dietrich Heyer
Journal: DNA Repair (Amst) Date: 2019-02-06

4. Functional Assays Are Essential for Interpretation of Missense Variants Associated with Variable Expressivity.

Authors: Karen S Raraigh; Sangwoo T Han; Emily Davis; Taylor A Evans; Matthew J Pellicore; Allison F McCague; Anya T Joynt; Zhongzhou Lu; Melis Atalar; Neeraj Sharma; Molly B Sheridan; Patrick R Sosnay; Garry R Cutting
Journal: Am J Hum Genet Date: 2018-05-24 Impact factor: 11.025

Review 5. Hunting for genes that shape human faces: Initial successes and challenges for the future.

Authors: Seth M Weinberg; Jasmien Roosenboom; John R Shaffer; Mark D Shriver; Joanna Wysocka; Peter Claes
Journal: Orthod Craniofac Res Date: 2019-05 Impact factor: 1.826

6. Perspective on Beyond Statistical Significance: Finding Meaningful Effects.

Authors: Howard J Edenberg
Journal: Complex Psychiatry Date: 2021-05-20

7. Single-nucleotide polymorphism rs13426236 contributes to an increased prostate cancer risk via regulating MLPH splicing variant 4.

Authors: Fankai Xiao; Peng Zhang; Yuan Wang; Yijun Tian; Michael James; Chiang-Ching Huang; Lidong Wang; Liang Wang
Journal: Mol Carcinog Date: 2019-10-29 Impact factor: 4.784