Literature DB >> 34190025

An integrated DNA and RNA variant detector identifies a highly conserved three base exon in the MAP4K5 kinase locus.

Małgorzata Kurkowiak1, Giuseppa Grasso2, Jakub Faktor1,3, Lisa Scheiblecker4, Małgorzata Winniczuk1, Marcos Yebenes Mayordomo1,2, J Robert O'Neill5, Bodil Oster6, Borek Vojtesek3, Ali Al-Saadi2, Natalia Marek-Trzonkowska1,7, Ted R Hupp1,2.   

Abstract

RNA variants that emerge from editing and alternative splicing form important regulatory stages in protein signalling. In this report, we apply an integrated DNA and RNA variant detection workbench to define the range of RNA variants that deviate from the reference genome in a human melanoma cell model. The RNA variants can be grouped into (i) classic ADAR-like or APOBEC-like RNA editing events and (ii) multiple-nucleotide variants (MNVs) including three and six base pair in-frame non-canonical unmapped exons. We focus on validating representative genes of these classes. First, clustered non-synonymous RNA edits (A-I) in the CDK13 gene were validated by Sanger sequencing to confirm the integrity of the RNA variant detection workbench. Second, a highly conserved RNA variant in the MAP4K5 gene was detected that results most likely from the splicing of a non-canonical three-base exon. The two RNA variants produced from the MAP4K5 locus deviate from the genomic reference sequence and produce V569E or V569del isoform variants. Low doses of splicing inhibitors demonstrated that the MAP4K5-V569E variant emerges from an SF3B1-dependent splicing event. Mass spectrometry of the recombinant SBP-tagged MAP4K5V569E and MAP4K5V569del proteins pull-downs in transfected cell systems was used to identify the protein-protein interactions of these two MAP4K5 isoforms and propose possible functions. Together these data highlight the utility of this integrated DNA and RNA variant detection platform to detect RNA variants in cancer cells and support future analysis of RNA variant detection in cancer tissue.

Entities:  

Keywords:  Cancer; RNA editing; mass spectrometry; proteogenomics; splicing

Mesh:

Substances:

Year:  2021        PMID: 34190025      PMCID: PMC8632122          DOI: 10.1080/15476286.2021.1932345

Source DB:  PubMed          Journal:  RNA Biol        ISSN: 1547-6286            Impact factor:   4.652


Introduction

Proteogenomics platforms aim to define the expressed and mutated genome in a diseased state to better determine the mechanisms whereby signal transduction is re-wired through mutant protein signalling [1]. These approaches have been implemented in many cancer types including ovarian, colorectal, endometrial, lung, prostate and renal tissues [2-6]. Although disease-specific signalling maps can be constructed, whole-genome cancer sequencing has revealed a patient-specific cancer barcode [7,8] that complicates stratification of patients based on gene mutation status. In addition, although the vast majority of anti-cancer medicines target wild-type proteins, there are ever emerging successes in targeting mutated enzymes (kinases) with effective drug leads [9,10]. This provides the proof of concept that drugging mutated signalling networks has therapeutic value and presents an opportunity to develop precision, personalized therapeutics based on expressed, mutant proteins. Understanding the expression of mutant proteins in a diseased state could form a platform for the development of a range of mutation-dependent therapeutics [11]. For example, mutant neoantigen vaccines could be developed using scaffolds such as synthetic mutant proteins [12], activated dendritic cells [13], nucleic acids including RNA [14] or synthetic viral vectors [15]. Common sets of mutated neoantigens can be identified in patients with microsatellite instability cancers emerging from mutations within microsatellite regions [16]. Nevertheless, the study of mutant proteome landscapes is only in its infancy. This task is complicated because building mutant proteomes involves the integration of methodologies that link the fields of informatics and mass spectrometry with cancer biology. There are major challenges with integrating DNA sequencing, RNA sequencing, and mass spectrometric datasets [17]. Although packages/platforms for identification of mutations/variants in DNA sequences are perhaps more evolved [18], the algorithms for defining mutations with RNA sequencing datasets are very complex because the RNA mutation landscape is more pleiotropic than DNA. RNA mutations can be encoded by exons and by introns from pre-spliced RNA [19,20]. In addition, defining cancer-specific RNA edits [21], tumour-specific spliced mRNAs generating tumour-specific proteins [22-24], or even translation of UTRs provide novel mutant protein signalling landscapes [25]. As the diversity in software and computational tools tend not to be benchmarked against each other, there is as of yet no unified and validated roadmap. In a previous report, we have benchmarked an integrated DNA and RNA variant identification software platform (CLC) to define expressed, p53-dependent single nucleotide variants (SNVs) in a human melanoma cell model [26]. The expressed SNVs detected in mRNA from the cell lines were validated using mass spectrometry in order to define the integrity of the variant identification software. This approach yielded mutant protein signal transduction maps that are enriched in either the wt-p53 or the p53-null isogenic cell model. In addition to RNA mutations that are genetically encoded, it is of interest to expand on the type of RNA variants, including RNA edits, that deviate from the genomic DNA sequence in tumour cell models. Emerging software tools are improving the detection of A-to-I RNA editing events [27,28] including developing a library of edited targets [29-32]. Here we expand on the use of the CLC integrated DNA and RNA variant detection software to define both the classic RNA edits and also multiple-nucleotide variants (MNVs) that reflect small exons not previously annotated to a reference genome. Prior to our current study, the CLC integrated DNA and RNA variant detection software has been used to identify RNA editing events in 34 protein-coding mitochondrial transcripts of four Populus species, a genus with a relatively small number of RNA editing sites relative to other angiosperms [33]. We were able to define sets of classical RNA edits as well as MNVs in the tumour models. We focus on validating one of these MNVs; the differential splicing of a previously non-annotated three base pair, highly conserved exon in the MAP4K5 gene that results in the insertion of one amino acid in the protein. These data together establish the utility of this integrated DNA and RNA variant identification software to discover novel RNA variant landscapes in cell models and further highlights that understanding proteome regulation requires more accurate tools to define the RNA reference set.

Results

Defining the global RNA variant landscape in A375 melanoma cells

In a prior report [26], the CLC variant detection platform was benchmarked against Varscan2 to identify DNA encoded expressed RNA variants in an isogenic wt-p53 A375 and p53-null A375 melanoma cell model. A total of 1468 mutated mRNA species encoded by 989 mutated genes were defined [26]. These data were then used as a reference database to define the baseline p53-dependent and p53-independent mutant proteome networks using mass spectrometry which includes over 300 mutant proteins [26]. Here we aim to expand on the RNA variant landscape from wt-p53 and p53-null A375 melanoma cells, without and with interferon treatment, that deviate from the reference genomic DNA sequence. The reason for using four biological states from highly similar genomes was to focus on common variant pathways that reflect robustness of the RNA variant type. Such RNA variants might be derived from; (i) classical A-I RNA editing events; (ii) non-canonical splicing events generating novel MNVs (exons) that have not been annotated by the reference transcriptome; or (iii) possible pseudogene expression incorrectly mapped to a reference gene but not reflecting true RNA edits (Fig. 1(a)). The RNA variants were filtered using the Fisher’s exact test which was used previously to define the likelihood of true variance in RNA edits (Fig. 1(b)) [34,35].
Figure 1.

RNA variant detection processes

RNA variant detection processes Using the Fisher’s exact test, the number of variants detected that are nonsynonymous, synonymous or non-coding are presented in Supplementary Tables 1A-G. In the four biological states used, the range of non-synonymous variants with 2 RNA variant reads or higher ranges from 3392 to 6377 (Table 1(a), i). The range of total non-synonymous variants with 10 RNA variant reads or higher ranges from 102–559 (Table 1(a), iii). The range of non-synonymous SNV variants with 10 RNA variant reads or higher ranges from 29 to 197 (Table 1(a), iii sublevel). Specific RNA variants (i.e. A-G and C-T) detected from 2–10 reads or higher identified in these four biological states are highlighted in Table 1(b) (i, ii, iii). For example, with a minimum of 10 variant RNA reads, the number of ADAR-like A-I variants (defined as A/G sequencing reads) that passes the Fisher’s exact test filtering was 154 and the number of APOBEC-like C-U variants (defined as C/T sequencing reads) was 163 (Table 1(b), iii). G to A variants (Table 1(b)) might arise from APOBEC3A-dependent processes [36], whilst U-C variants (Table 1(b)) might arise from transamination and transglycosylation [37]. A calculation of the extent of classic non-synonymous RNA editing events (A-I + C-U; 720) over the total non-synonymous RNA variants from Supplementary Table 1A (3,123) is 23.05%.
Table 1.

Different types of RNA variants that pass the Fisher’s exact test. (A) Number of variant types in isogenic A375 cell lines (p53 WT and null, treated and not treated with interferon (IFN)). (B) Number of A-G and C-T variants in A375 p53-WT

Different types of RNA variants that pass the Fisher’s exact test. (A) Number of variant types in isogenic A375 cell lines (p53 WT and null, treated and not treated with interferon (IFN)). (B) Number of A-G and C-T variants in A375 p53-WT Examples of specific SNV variants of interest are summarized in Table 2. One non-coding RNA variant we include (from Supplementary Table 1 F) is the miRNA, MIR663AHG, which has two A-G variants represented by over 1,000 reads each (Table 2; Supplementary Table 1 F). MIR663AHG has not been previously shown to be a target of the RNA editing machinery; however, other miRNAs have been shown to be edited by the ADAR-dependent mechanisms [38,39]. Several protein kinases are highlighted including CDK13, CDK10, CDK11, and CDK12, which exhibit A-G variants in the sequencing reads (Table 2). For example, CDK13 exhibits 17 out of 22 variant A-G RNA reads, with a total of 43 wt-DNA reads yielding a Fisher’s exact test p-value of 1.4 e-11. CDK10 exhibits 12 out of 20 variant A-G RNA reads, with a total of 45 wt-DNA reads yielding a Fisher’s exact p-value of 3.1 e-08. These A-G variants are most likely ADAR-dependent A-I RNA editing events. A representative nucleotide insertion event (C) in the RNA encoding SEPT7 produces a theoretical frame shift; SEPT7 exhibits 103 out of 146 variant C insertion RNA reads, with a total of 19 wt-DNA reads yielding a Fisher’s exact p-value of 1.1 e-09. A gene with a representative MNV (YTHDF3) exhibits 13 out of 15 RNA variant reads, with a total of 66 wt-DNA reads yielding a Fisher’s exact p-value of 2.7e-13. This representative MNV is most likely an alternate or non-canonical short exon, similar to MAP4K5 (See below).
Table 2.

Examples of specific single nucleotide variants (SNV) of interest. SNVs in miRNA are highlighted in green, while SNVs in kinases are highlighted in yellow

        
Examples of specific single nucleotide variants (SNV) of interest. SNVs in miRNA are highlighted in green, while SNVs in kinases are highlighted in yellow Summarized next are groups of genes with high-confidence, canonical A-I (shown as A-G change in the sequencing results) ADAR-like editing events (Table 3). Several genes of interest from the stress-activated signalling and cancer field emerge and we generally focus on protein kinases which form drugabble targets in oncology. We first focus on the previously identified edits in the cell cycle-dependent kinase superfamily member, CDK13 [40]. The latter study linked this editing event to poor prognosis in hepatocellular cancer patients, but biochemical characterization to provide a potential mechanism was not performed. The identification of this CDK13 mRNA editing event in the A375 cell line using the CLC software (predicted amino acid change of Gln103Arg) forms a type of internal validation that suggests this informatics tool has value for identifying RNA variants from a reference set. In addition, to the ADAR-like editing events, we also present a list of the top genes with canonical C-U (presented as C-T changes in sequencing data) APOBEC-like editing events (Table 4). For example, HSPD1 mRNA exhibited a variant rate of 70.31% (45 out of 64 RNA sequencing reads are C to T). In general, it is thought that APOBEC RNA editing forms a smaller landscape than ADAR, but in our analysis, C-U variants were as wide-spread as A-I variants (Table 1(b)). Future research will determine whether these C-U variants are APOBEC-dependent. We also include non-canonical RNA variants (A-T) of unknown origin that emerge from this software. A-T variants only represent a small proportion of the total, classical RNA editing events if variants are included that are only above 10% of the total RNA reads (Table 5). If the total A-T RNA variants are included from this analysis, then there are hundreds that fall below the 10% threshold (Supplementary Table 1 G). These low-frequency A-T RNA variants might be either hotspots of RNA polymerase errors [41] or hotspot sequencing errors arising from the next-generation shotgun sequencing methodology [42]. Although the origin of such A-T variants are not known, in this manuscript we validate one of the high-frequency variants as being attributed to a non-annotated small exon in MAP4K5 (Table 5).
Table 3.

Summarized groups of genes with high-confidence A-G variants

ChrRegionTypePresence in REDIportalRefAlleleCount(RNA)Coverage(RNA)Frequency(RNA) [%]GeneAA changeNon-synonymous mutationCount (DNA)Coverage (DNA)Fisher P-value
1914,883,288SNVNoAG505198.04EMR2ENSP00000319883:p.Leu74ProYes0541.82112894190718e-29
1149,400,160SNVNoAG497466.22RP5-998N21.10, RP5-998N21.7, HIST2H3PS2ENSP00000476960:p.Val128AlaYes0571.22483931860094e-17
1225,974,614SNVYes (also in ATLAS, RADAR, DANRED)AG439644.79SRP9ENSP00000355804:p.Ile64MetYes070.039579993607879
632,557,461SNVNoAG336650HLA-DRB1ENSP00000353099:p.Met20ThrYes0201.26361088993366e-05
177,479,846SNVNoAG323591.43SENP3-EIF4A1, EIF4A1, SNORA67ENSP00000293831:p.Gln117ArgYes0325.71510505664653e-16
1258,223,322SNVNoAG2010219.61CTDSP2ENSP00000381148:p.Leu41ProYes0480.000435700500593407
739,990,548SNVYes (also in ATLAS, RADAR)AG172277.27CDK13ENSP00000181839:p.Gln103ArgYes0431.40982069224812e-11
1258,223,310SNVNoAG1710017CTDSP2ENSP00000381148:p.Phe45SerYes0450.00155609502793716
756,183,824SNVNoAG151693.75NUPR1LENSP00000455442:p.Trp62ArgYes0752.91788201938187e-16
1342,887,184SNVNoAG153246.88AKAP11ENSP00000025301:p.Asn1759AspYes0731.01637496378703e-09
3155,643,079SNVNoAG142176.45GMPSENSP00000419851:p.Gln495ArgYes01090.0063405658448091
1689,754,046SNVNoAG122060CDK10ENSP00000426264:p.Gln46ArgYes0453.12750563793603e-08
337,067,261SNVNoAG121468.22MLH1ENSP00000231790:p.Gln391ArgYes0470.041180490531472
756,183,781SNVNoAG111573.33NUPR1LENSP00000455442:p.Leu76ProYes0463.26481485245141e-09
12123,879,645SNVNoAG111945.67SETD8ENSP00000384629:p.Gln114ArgYes01440.00309856873055919
1258,221,358SNVNoAG111149.65CTDSP2ENSP00000381148:p.Cys77ArgYes0860.00276332413384754
441,668,668SNVNoAG101526.58LIMCH1ENSP00000316891:p.Asn743AspYes0860.0152486973238669
18669,140SNVNoAG101456.9TYMSENSP00000315644:p.Arg175GlyYes0910.00775777613161119
1674,516,925SNVNoAG101456.9GLG1ENSP00000405984:p.Trp557ArgYes01050.00588521683893524
1232,891,213SNVNoAG91147.89DNM1LENSP00000415131:p.Arg538GlyYes0620.0277227730166222
12121,855,472SNVNoAG91306.92RNF34ENSP00000355137:p.Asn131AspYes0940.0112941289917348
1740,714,905SNVNoAG91486.08COASYENSP00000377406:p.Asn89AspYes01180.00513004249124648
X55,103,084SNVNoAG88100PAGE2BENSP00000364110:p.Glu56GlyYes0657.43929938132365e-11
227,464,094SNVNoAG81485.41CADENSP00000264705:p.Gln1936ArgYes01670.00214257228352634
12133,588,015SNVNoAG72825ZNF26ENSP00000437420:p.Lys497ArgYes0784.85858311809867e-05
11,653,093SNVNoAG72429.17CDK11A, RP1-283E3.8ENSP00000348529:p.Cys23ArgYes0674.27606152065275e-05
11111,631,628SNVNoAG7917.69PPP2R1BENSP00000437193:p.Trp152ArgYes0560.0443114605110571
132,622,469SNVNoAG71026.86KPNA6ENSP00000362728:p.Asn52AspYes0700.0423127006073164
1921,301,076SNVYes (also in ATLAS, RADAR)AG73818.42ZNF714ENSP00000472368:p.Lys536GluYes0290.0163066351077906
568,390,093SNVNoAG7798.86SLC30A5ENSP00000379836:p.Lys4ArgYes0960.00328207326350711
1556,134,222SNVNoAG7808.75NEDD4ENSP00000424827:p.Leu1002SerYes01050.00242154665846764
10127,530,479SNVNoAG77010BCCIP, DHX32ENSP00000284690:p.Leu459SerYes0980.001815531799968
1171,509,868SNVNoAG76111.48PRRC2CENSP00000356716:p.Lys1088ArgYes01050.000719618475390669
1197,111,808SNVNoAG71295.43ASPMENSP00000356379:p.Val525AlaYes02440.000530692684709058
11120,335,968SNVNoAG7818.64ARHGEF12ENSP00000380942:p.Lys879ArgYes01730.000279281870195161
5176,942,758SNVNoAG61006DDX41ENSP00000422753:p.Tyr167HisYes0720.0409477737535999
3182,591,758SNVNoAG61035.83ATP11BENSP00000321195:p.Lys736ArgYes0870.0320681506824627
576,732,198SNVNoAG61045.77WDR41ENSP00000296679:p.Phe372SerYes0920.030601355098557
1945,153,108SNVNoAG6619.84PVRENSP00000402060:p.Gln152ArgYes0620.0130666255617345
1949,303,354SNVNoAG6996.06BCAT2ENSP00000385161:p.Phe99LeuYes01270.00647390928657926
438,879,797SNVNoAG5667.58FAM114A1ENSP00000351740:p.His33ArgYes0750.0206732093679803
1725,633,838SNVNoAG5796.33WSB1ENSP00000262394:p.Gln214ArgYes0940.0185003944601614
X134,986,647SNVNoAG53116.13SAGE1ENSP00000445959:p.Thr78AlaYes0380.0151186371364254
350,138,033SNVNoAG5568.93RBM5ENSP00000343054:p.Asn160AspYes0800.0106125718158434
674,446,113SNVNoAG5865.81CD109ENSP00000388062:p.Lys172ArgYes01320.00888927610736033
1953,856,722SNVNoAG51631.25ZNF845ENSP00000388311:p.Ile932ValYes0290.00357517317245054
1953,385,141SNVNoAG53912.82ZNF320ENSP00000473091:p.Tyr80HisYes01270.00058248128157573
Table 4.

Summarized groups of genes with high-confidence C-T variants

ChrRegionTypeRefAlleleCount(RNA)Coverage(RNA)Frequency(RNA) [%]GeneAA changeNon-synonymous mutationCount (DNA)Coverage (DNA)Fisher P-value
2198,361,863SNVCT456470.31HSPD1ENSP00000441296:p.Gly143AspYes0591.25874384373074e-18
2017,639,820SNVCT384438.58RRBP1ENSP00000367044:p.Ala445ThrYes0590.0149732630163734
2017,639,790SNVCT314107.56RRBP1ENSP00000246043:p.Ala455ThrYes0600.0226710686326192
1241,767,881SNVCT283775.68OPN3ENSP00000355512:p.Gly125GluYes0571.89876937313551e-16
7151,933,014SNVCT288732.18KMT2CENSP00000347325:p.Arg886HisYes0200.00149079919815577
7151,932,997SNVCT289230.43KMT2CENSP00000262189:p.Gly892ArgYes0190.00304242853041439
1258,223,346SNVCT279329.03CTDSP2ENSP00000448951:p.Arg33HisYes0531.14233434634688e-06
203,765,524SNVCT224155.30CENPBENSP00000369075:p.Gly536AspYes0810.0343572449619834
9138,758,382SNVCT226334.92CAMSAP1ENSP00000374183:p.Val196IleYes0230.000490132112142799
116,890,676SNVCT205735.09NBPF1ENSP00000474456:p.Ser1061AsnYes0110.0261283744915462
1258,223,305SNVCT179717.53CTDSP2ENSP00000381148:p.Ala47ThrYes0440.00150595341736571
1716,097,786SNVCT169317.20NCOR1ENSP00000387727:p.Arg33HisYes0460.00134445382694989
1258,218,102SNVCT1510414.42CTDSP2ENSP00000381148:p.Val138MetYes0490.00283758574140028
196,741,282SNVCT1512711.81TRIP10ENSP00000469360:p.Pro63SerYes0680.00145574724301353
1336,909,801SNVCT141897.41SPG20ENSP00000414147:p.Gly56GluYes0720.0130234368145754
11102,953,493SNVCT142036.90DCUN1D5ENSP00000260247:p.Gly109ArgYes0680.0243485205933667
46,711,266SNVCT131399.35MRFAP1L1ENSP00000318154:p.Val31IleYes0550.0212807038888119
X102,508,880SNVCT1311311.50TCEAL8ENSP00000353093:p.Gly10ArgYes0750.00188230611527362
11104,820,408SNVCT121597.55CASP4ENSP00000388566:p.Gly215ArgYes0600.039548573622814
X64,754,507SNVCT121657.27LAS1LENSP00000473471:p.Gly30GluYes01908.14320230043981e-05
7158,445,167SNVCT126119.67NCAPG2ENSP00000388326:p.Gly918ArgYes0410.00143797023710544
1171,715,090SNVCT1211410.53NUMA1ENSP00000260051:p.Gly924GluYes0850.00140855481829364
1614,859,247SNVCT1212100NPIPA2ENSP00000432029:p.Arg363TrpYes030.0021978021978022
7140,154,980SNVCT111895.82MKRN1ENSP00000255977:p.Gly384GluYes0690.0397674501449123
1668,191,775SNVCT113135.48NFATC3ENSP00000454451:p.Pro36LeuYes0489.36951954107799e-06
7127,973,362SNVCT101178.55RBM28ENSP00000223073:p.Gly334GluYes0890.00555186538955707
1088,259,939SNVCT101297.75WAPALENSP00000298767:p.Arg354GlnYes01899.73518126101261e-05
5143,543,718SNVCT101456.90YIPF5ENSP00000397704:p.Gly129AspYes0880.01481321402193
1147,505,962SNVCT101636.14CELF1ENSP00000436864:p.Gly169ArgYes0840.0175118335014926
1558,913,707SNVCT101636.14ADAM10ENSP00000260408:p.Gly492ArgYes01870.000412333791144602
20377,219SNVCT101745.75TRIB3ENSP00000415416:p.Pro348LeuYes01800.000719031949974406
1276,804,402SNVCT1010010OSBPL8ENSP00000261183:p.Gly77GluYes0490.0308234990088144
X122,766,729SNVCT91018.91THOC2ENSP00000347959:p.Gly767ArgYes0760.0108213950102794
1258,217,833SNVCT91038.74CTDSP2ENSP00000381148:p.Val182 MetYes0880.00400185600488213
10101,997,834SNVCT91436.29CWF19L1ENSP00000326411:p.Gly400GluYes0710.0312683693697371
1097,447,031SNVCT96513.85TCTN3ENSP00000265993:p.Gly255ArgYes0780.000599795019212057
1222,732,066SNVCT8889.09TAF1AENSP00000327072:p.Gly430GluYes0530.0248719435348009
3111,304,184SNVCT81186.78CD96ENSP00000283285:p.Pro272SerYes0950.00927347172924355
255,844,280SNVCT81296.20SMEK2ENSP00000272313:p.Gly48ArgYes01230.00708427312593516
3112,280,348SNVCT81515.30ATG3ENSP00000420259:p.Gly10ArgYes0980.0238355685119133
1528,391,439SNVCT84219.05HERC2ENSP00000261609:p.Arg3651HisYes0896.82328197605092e-05
1323,928,934SNVCT86412.5SACSENSP00000371735:p.Gly606GluYes0890.000715857797288114
510,417,482SNVCT87310.96MARCH6ENSP00000274140:p.Pro750LeuYes0430.0250897173719705
1337,619,420SNVCT87910.13SUPT20HENSP00000419754:p.Gly87ArgYes0910.00178233562399633
1157,076,547SNVCT7868.14TNKS1BP1ENSP00000350990:p.Gly1213GluYes0960.00460131725368528
1556,134,226SNVCT7887.96NEDD4ENSP00000424827:p.Gly1001ArgYes01070.00332772079909333
1728,599,835SNVCT7897.87BLMHENSP00000261714:p.Gly295GluYes01300.001583509804342
6155,581,413SNVCT71036.80TFB1MENSP00000356134:p.Gly263GluYes0590.0486704448703024
10101,915,939SNVCT71036.80ERLIN1ENSP00000410964:p.Met236IleYes01150.00470385687118436
15101,718,348SNVCT71076.54CHSY1ENSP00000254190:p.Gly552ArgYes0950.0151457707745357
921,333,906SNVCT71086.48KLHL9ENSP00000351933:p.Arg318GlnYes01740.00106812510654314
9125,585,299SNVCT71096.42PDCLENSP00000259467:p.Gly117GluYes0670.0451669622310842
686,246,557SNVCT71096.42SNX14ENSP00000313121:p.Gly521ArgYes01350.00318115735029092
128,857,124SNVCT71258.33RCC1ENSP00000362937:p.Pro55SerYes0992.33190502692993e-08
1757,775,101SNVCT71245.65PTRH2ENSP00000387180:p.Gly81GluYes01600.00274293160122872
1246,729,361SNVCT71305.38TFB2MENSP00000355471:p.Gly27GluYes0800.0457812335157669
1200,522,681SNVCT71325.30KIF14ENSP00000356319:p.Trp1594*Yes01050.0184523225925478
2033,969,744SNVCT71325.30UQCC1ENSP00000398531:p.Gly118ArgYes01110.0166866857610569
1611,827,898SNVCT74216.67TXNDC11ENSP00000283033:p.Gly170GluYes0400.0120002120619595
1051,123,969SNVCT74714.89PARGENSP00000384408:p.Trp92*Yes0450.0123589350527848
986,495,276SNVCT71070KIF27ENSP00000297814:p.Arg860GlnYes01934.72665418421337e-11
598,221,290SNVCT6619.84CHD1ENSP00000284049:p.Gly854ArgYes0660.0107412065230634
5132,227,887SNVCT6619.84AFF4ENSP00000265343:p.Gly869GluYes0720.00809811748698665
170,650,500SNVCT6619.84LRRC40ENSP00000359990:p.Gly169ArgYes0850.00458033379491009
323,934,608SNVCT6639.52NKIRAS1ENSP00000396063:p.Gly186GluYes0720.00904512436566907
1668,380,182SNVCT6659.23PRMT7ENSP00000454776:p.Thr397IleYes0460.0406165295081866
10119,043,909SNVCT6669.09PDZD8ENSP00000334642:p.Gly779ArgYes0890.00520287864187153
12116,453,025SNVCT6688.82MED13LENSP00000281928:p.Gly355GluYes0640.0281686928835132
1235,377,180SNVCT6698.70ARID4BENSP00000264183:p.Gly582GluYes01800.000384796464701192
859,502,088SNVCT6807.5NSMAFENSP00000411012:p.Gly743ArgYes01080.00531158603809772
5891,357SNVCT6847.14BRD9ENSP00000419765:p.Gly105ArgYes0660.0347863388668885
2119,704,462SNVCT6847.14TMPRSS15ENSP00000284885:p.Trp531*Yes0720.0309839093122811
1276,767,173SNVCT6956.32OSBPL8ENSP00000261183:p.Gly623GluYes01260.00575144655643513
1258,217,838SNVCT61045.77CTDSP2ENSP00000381148:p.Cys180TyrYes0870.0324661227483437
1289,992,916SNVCT61085.56ATP2B1ENSP00000261173:p.Gly1110AspYes01050.0291612251146297
9117,844,992SNVCT61135.31TNCENSP00000443478:p.Trp742*Yes0920.0338353295201289
16,659,126SNVCT61165.17KLHL21ENSP00000366886:p.Gly470ArgYes0810.0437866381921444
1088,260,312SNVCT61195.04WAPALENSP00000298767:p.Gly230ArgYes01490.007134150552398
131,501,660SNVCT61195.04PUM1ENSP00000362846:p.Gly175ArgYes01540.00637857742112903
195,679,241SNVCT61346.15C19orf70ENSP00000465739:p.Gly109GluYes0190.0018936384342391
12112,743,983SNVCT62128.57HECTD4ENSP00000449784:p.Ser263AsnYes0620.00014376582334118
1342,352,178SNVCT64413.64VWA8ENSP00000281496:p.Met764IleYes0660.00329577076425277
5176,025,977SNVCT64613.04GPRIN1ENSP00000305839:p.Gly287ArgYes0540.00785772420742579
1075,888,943SNVCT65311.32AP3M1ENSP00000361831:p.Trp242*Yes0370.0406066012293993
X76,937,786SNVCT65510.91ATRXENSP00000362441:p.Gly988ArgYes0530.0271469474581715
1166,473,160SNVCT65710.53SPTBN2ENSP00000433593:p.Gly601GluYes0900.00287126633204993
12112,744,038SNVCT62030HECTD4ENSP00000366783:p.Val245MetYes0460.000426596143148232
X76,854,946SNVCT65012ATRXENSP00000362441:p.Gly1964ArgYes01030.000984972978224279
1431,598,133SNVCT66010HECTD1ENSP00000450697:p.Gly1482ArgYes01090.00169242132906296
1445,711,445SNVCT6758MIS18BP1ENSP00000309790:p.Gly312GluYes0900.00787641260466534
162,983,473SNVCT5519.80FLYWCH1ENSP00000253928:p.Pro380LeuYes0940.0047152135544559
325,773,866SNVCT5529.62NGLY1ENSP00000280700:p.Gly457ArgYes0730.0110815071465415
325,761,636SNVCT5529.62NGLY1ENSP00000280700:p.Trp553*Yes0780.00907953366294329
533,616,115SNVCT5568.93ADAMTS12ENSP00000422554:p.Gly736ArgYes01750.000727926763242569
867,547,235SNVCT5568.93VCPIP1ENSP00000309031:p.Gly1057GluYes01820.000626185005446896
1323,910,175SNVCT5578.77SACSENSP00000371729:p.Gly2614ArgYes01470.00149414096748599
4140,468,098SNVCT5608.33SETD7ENSP00000427300:p.Gly49GluYes0670.0214824130461269
1535,273,591SNVCT5618.20ZNF770ENSP00000348673:p.Gly682GluYes0760.0159252501523542
1737,565,419SNVCT5647.81MED1ENSP00000300651:p.Gly1019ArgYes0760.018285713057795
1179,095,556SNVCT5667.58ABL2ENSP00000427562:p.Gly215ArgYes0690.025777100835717
2219,529,098SNVCT5687.35RNF25ENSP00000295704:p.Gly321GluYes0790.0195201998292625
4187,557,906SNVCT5687.35FAT1ENSP00000406229:p.Glu1269LysYes01690.00174550457677628
X67,742,699SNVCT5697.25YIPF6ENSP00000417573:p.Pro178SerYes0740.0242049264822056
439,304,691SNVCT5836.02RFC1ENSP00000371321:p.Gly732ArgYes01320.00794774435001324
29,645,403SNVCT5845.95ADAM17ENSP00000309968:p.Gly479GluYes01420.00656975877421133
543,299,016SNVCT5855.88HMGCS1ENSP00000322706:p.Gly18ArgYes01650.00419616037612641
10104,853,028SNVCT5905.56NT5C2ENSP00000339479:p.Gly343ArgYes01450.00768056708153503
7152,007,061SNVCT5905.56KMT2CENSP00000453752:p.Gly280GluYes02260.00172785254115418
1377,581,337SNVCT5925.44FBXL3ENSP00000347834:p.Trp410*Yes0960.0265119978875076
1675,448,509SNVCT5925.44CFDP1, RP11-77K12.1ENSP00000457654:p.Gly107GluYes01120.0175485315174974
1780,543,923SNVCT5945.32FOXK2ENSP00000335677:p.His475TyrYes01520.00761675302767957
13108,863,054SNVCT53016.67LIG4ENSP00000349393:p.Arg188GlnYes01180.000257841011265669
2220,467,278SNVCT53613.89STK11IPENSP00000295641:p.Thr187IleYes0560.00766600278080494
2220,467,196SNVCT53713.51STK11IPENSP00000295641:p.Leu160PheYes0400.0220620043258832
1231,081,143SNVCT53912.82TTC13ENSP00000355621:p.Gly191ArgYes0550.0104890931335979
264,144,056SNVCT54511.11VPS54ENSP00000272322:p.Gly736ArgYes01140.00153729294019848
6116,966,945SNVCT54610.87ZUFSPENSP00000357565:p.Gly541ArgYes0760.00661279945558258
1073,912,705SNVCT54710.64ASCC1ENSP00000339404:p.Gly251GluYes0660.0109282521598832
1360,565,350SNVCT52025DIAPH3ENSP00000383178:p.Gly435ArgYes0780.00022829926004181
Table 5.

A-T variants observed

ChrRegionTypeRefAlleleCount(RNA)Coverage(RNA)Frequency(RNA) [%]GeneAA changeNon-synonymous mutationCount (DNA)Coverage (DNA)Fisher P-value
1450,904,729SNVAT222395.65MAP4K5ENSP00000013125:p.Val569GluYes0225.82989055086052e-12
1254,740,397SNVAT8988.89COPZ1ENSP00000449341:p.Thr107SerYes0476.3358236816299e-09
1424,684,868SNVAT374386.05MDP1, NEDD8-MDP1ENSP00000474249:p.Val63GluYes0751.03839186240207e-24
1120,263,791SNVAT5683.33PHGDHENSP00000358415:p.Gln12LeuYes0891.0355447454656e-07
1133,733,049SNVAT111478.57CD59ENSP00000436737:p.Leu58MetYes0133.39011780659378e-05
110,386,409SNVAT5771.43KIF1BENSP00000366290:p.Leu972PheYes0514.58303543602999e-06
1940,325,306SNVAT5955.56FBLENSP00000472419:p.*230ArgYes0822.70941766486129e-06
1095,157,002SNVAT5955.56MYOFENSP00000360544:p.*446ArgYes090.0294117647058823
270,528,734SNVAT61154.55FAM136A, AC022201.5ENSP00000391468:p.Val30GluYes01442.64556738491617e-08
8144,900,361SNVAT71643.75PUF60ENSP00000432091:p.Val66GluYes01472.14908810100266e-08
975,769,255SNVAT175232.69ANXA1ENSP00000412489:p.Arg6TrpYes0541.12413403199044e-06
375,832,479SNVAT94121.95ZNF717ENSP00000417902:p.Leu12GlnYes0250.010718368892246
11244,159SNVAT31421.43PSMD13ENSP00000396937:p.Arg72TrpYes0230.0468468468468469
1258,221,366SNVAT2311120.72CTDSP2ENSP00000381148:p.Leu74GlnYes0885.2180189878547e-07
978,790,200SNVAT115619.64PCSK5ENSP00000365958:p.Glu685AspYes0270.0135143112551926
8146,016,146SNVAT73619.44RPL8ENSP00000433703:p.*168ArgYes0740.000262325862231386
Summarized groups of genes with high-confidence A-G variants Summarized groups of genes with high-confidence C-T variants A-T variants observed Using the totality of these RNA variants, we evaluated the dominating gene family groups that exhibit overlapping or diverse RNA variant expression in the four biological states (Supplementary Figure 1A). In general, for transcription gene products (‘Transcription, DNA-templated’ and ‘Regulation of Transcription’; Supplementary Figure 1A), the wt-p53 cells (untreated) exhibited similar extents of RNA variants as p53-null cells (treated with interferon). We focus here on summarized the ‘DNA repair’ gene family of proteins which are highest in the p53-null cells (not treated with interferon) (Supplementary Figure 1A). When these were dissected gene-by-gene, then the gene products that exhibit the elevated variant mRNA production are listed in Supplementary Figure 1B. Together, these data highlight the feasibility of using the CLC software to study changes in the RNA variant landscape. Below, we focus on validating two key RNA variants that represent different types of variants that can be measured; an RNA editing event in CDK13 and a non-canonical splicing event in MAP4K5.

Sanger-sequencing validation of the dominant RNA editing events in the CDK13 mRNA

The genome browser view of the region in CDK13 which is edited at the RNA level is shown in Fig. 2(a). CDK13 shows A-to-I editing within exon 1 in A375 melanoma cells as well as SiHa cervical carcinoma cells. These data, along with prior reports of CDK13 RNA editing [40,43], suggests that editing at this position is widespread. The dominant edit would change a glutamine to an arginine (Q to R) in codon 103 if the mRNA were translated. This high-level editing in CDK13 transcript suggests a functional role for the predominant edited variant of the gene product [44]. In addition, we also detect the A-G variant nearby codon 103, the K96R variant, although the reads are substantially reduced and they both exist on the same transcript (i.e. these two RNA editing events are not apparently mutually exclusive on the same transcript [40] (Fig. 2(a)).
Figure 2.

RNA variants that represent potential A-to-I editing events

RNA variants that represent potential A-to-I editing events Despite the fact that CDK13 has been previously shown to be edited at codon 103 [40], we validated this target further using Sanger sequencing since it is the most abundant editing event. Due to the GC-rich nature of CDK13, nested PCR primers were first used to amplify the CDK13 fragment from cDNA libraries (Fig. 2(b)). Using these optimized primers, we could detect edited (I, recognized as G) and non-edited (A) species in a range of conditions as defined by the ratio of G and A sequencing reads (Fig. 2(c)), We were unable to detect changes in the G/A ratio in this cell model in response to UV irradiation (Fig. 2(c) i), splicing inhibitor treatment (Fig. 2(c) ii), or the replacement of cells in serum-free media (Fig. 2(c) iii). However, the long-term incubation of cells with an siRNA control reduced the G-A ratio so that the non-edited sequence predominated after 4 days of incubation (Fig. 2(d)ii vs Fig. 2(d))i. These latter data suggest that editing of CDK13 can be regulated by signalling changes. Future work will define the function of the edited version of CDK13 gene product and how this editing event can be regulated. Sanger validation of the minor RNA editing event in relation to the major RNA editing event was also shown in Fig. 2(e-g); the K96R variant exhibits less editing that the Q103R variant based on peak intensities of the G or A sequencing reads (Fig. 2(e) vs 2(f)), which is consistent with the number of reads in the CLC browser (Fig. 2(a) and Supplementary Table 1A). There are seven other non-synonymous RNA editing events localized in this small exonic region and the editing is variable (Fig. 2(g)). The translation of these mRNAs with multiple editing events would, in principle, lead to single, double, triple, and quadruple, etc. mutant CDK13 proteins. As such, functional characterization of these multi-mutant proteins was not evaluated in this study because of the potential variable number of mutant protein products. Nevertheless, our data validate the utility of the software to identify RNA editing events. CDK13 forms an interesting target in the future to validate because the kinase can play a role in splicing regulation by controlling the phosphorylation status and the activity of splicing factors [45]. Furthermore, CDK13 is localized in the nucleus, particularly in speckles which are the storage site for splicing factors [46]. More recently, CDK13 depletion was shown to lead to defects in RNA processing [47]. In recent years, CDK13 has been recognized as a novel oncogene with potent oncogenic activity in various cancer types as it affects cell cycle regulation, proliferation, and chromosome stability functions [48]. Furthermore, pancreatic disease link associations could be made for upregulated CDK13 by pathway network linkages to p53 [49].

Validation of the non-canonical RNA variant in MAP4K5 mRNA

The CLC RNA variant detection software identifies RNA species that deviate from the reference sequence. This can include not just single nucleotide variants typical of classic RNA editing (A-I or C-U) (Table 1(b)), but also MNVs and indels (Fig. 1(a); Supplementary Tables 1 C-E). Focusing again on protein kinases, using Ensembl RefSeq v74 as a reference sequence, the A-T RNA variant in MAP4K5 mRNA can be mapped to an exon-intron boundary (Fig. 3(c)) that is spliced at the downstream intron-exon boundary to create an in frame single amino acid insertion. However an updated version of the RefSeq (Ensembl v91) maps the change to a non-canonical three base exon (Fig. 3(d)). These data highlight the relative difficulty in mapping very small non-canonical RNA variants to a reference dataset. We think it is more likely that the ‘A-T variant’ is not mapped correctly to the reference genome and it is more likely that these RNA reads derive from a non-canonical three base exon that deviates from the genomically predicted valine codon at 569 position (Fig. 3(a,b)). In addition, it is unlikely that this three-base non-canonical exon is mapped to the correct intronic position (as reported in Fig. 3(d)) as we can identify several GAA (CTT) motifs across the 2,001 bp intron 23–24.
Figure 3.

RNA variants in the MAP4K5 gene

RNA variants in the MAP4K5 gene Regardless of the mechanisms generating the RNA variant in MAP4K5 mRNA, the genomic annotation of MAP4K5 at this position is not resolved based on public data in different vertebrate species (Fig. 3(b)). The human MAP4K5 protein found in Ensembl and in NCBI displays three different transcripts with regard to the codon that has the RNA variant (Fig. 3(b) Human). One variant has a reference sequence that codes for the Valine amino acid (GTA). The second transcript has the variant (GAA) which codes for the amino acid Glutamic acid. The third transcript has this codon deleted. Pan species has the same three transcripts to humans (Fig. 3(b) Chimpanzee). Murine species has the codon GCA, instead of GTA, in one transcript which codes for Alanine (Fig. 3(b) Mouse). The other two transcripts are similar to the second and third transcripts to humans with the variant codon GAA and with the deleted codon, respectively. Danio species have two transcripts only; one with the variant codon GAA, and one with the deleted codon (Fig. 3(b) Zebrafish). Thus, this RNA variant is conserved in many species suggesting selection pressures exist for this specific variant in vertebrates. We next validated the RNA variant in MAP4K5 mRNA and whether we could find evidence that it is regulated by splicing mechanisms. First, the reference cDNA sequence predicts a Valine at position 569 based on the Ensembl genomic reference sequence that is encoded by a GTA at the exon23-intron23-24 boundary (Fig. 4(a,b)). If the V569 or E569 allele is derived from the alternative splicing of an exon encoding a Glutamate (E) or Valine (V), then the splicing pattern would be as predicted in Fig. 4(c). We next developed a plasmid expressing MAP4K5 protein including the variant V569E and the 569del allele. Sanger sequencing of the expression plasmid DNAs using primers depicted in Fig. 4(a), gave rise to the SLSGKT amino acid stretch that derives the 569del allele (Fig. 4(d)) or the SLSEGKT that derived the V569E allele (Fig. 4(e)). Having optimized primers for quantifying the RNA variant region, we purified RNA from a range of cells and tissues to determine whether these variants can be detected and quantified. Supplementary Figure 2A reveals that the V569E allele dominates in RNA derived from muscle, adipose, normal human fibroblasts (NHF), or A549 lung cancer cells. In addition, the treatment of A549 lung cancer cells by starvation or over confluence does not alter the expression of the V569E allele (Supplementary Figure 2B). Thus, we see no evidence of the annotated Valine amino acid, although it remains possible that other conditions or cell models might incorporate a different 3 bp exon encoding Valine in place of Glutamate.
Figure 4.

Developing Sanger sequencing of two MAP4K5 isoforms

Developing Sanger sequencing of two MAP4K5 isoforms We next wanted to determine whether we could drive production of the 569del mRNA species through inhibition of RNA splicing. We used the SF3B1-specific inhibitor Pladienolide B (Fig. 5 and Supplementary Figure 3) [50]. At the 24-h time point, the lower dose of Pladienolide B marginally increased the amount of the 569del allele (Supplementary Figure 3C vs 3B and 3A). However, by 48 hours there was a substantial increase in the 569del allele and reduction in the V569E allele at the lower dose of inhibitor (10 nM; Supplementary Figure 3F vs 3E and 3D). The higher dose of the Pladienolide B inhibitor (100 nM) did not reduce the splicing as affective as the lower dose. Quantification of the Sanger sequencing demonstrated that the ratio of non-spliced:V569E spliced mRNA was 0.32 in non-treated conditions but 4.06 in the presence of Pladienolide B (10 nM; Fig. 5(a)). A different splicing inhibitor (Herboxidiene [50]) that also inhibits SF3B1 was titrated to determine effects on the splicing ratio. This inhibitor also increased the level of the 569del allele which removes the 3 bp Glutamate encoding exon (Supplementary Figure 4(C,F) vs 4A and 4D). Quantification of the Sanger sequencing bases demonstrated that the ratio of non-spliced:V569E spliced mRNA was 0.33 in non-treated conditions but 0.67 in the presence of Herboxidiene B (10 nM; Fig. 5(b)). These data confirm that the RNA variant has emerged due to splicing of the 3 bp exon encoding Glutamate, although further research will be required to map the location of this exon and to determine whether the cell might use the valine encoding 3 bp exon under different physiological conditions.
Figure 5.

MAP4K5 Sanger sequencing results quantitation after Pladienolide B and Herboxidiene treatment

MAP4K5 Sanger sequencing results quantitation after Pladienolide B and Herboxidiene treatment Having establish that the 569del allele and the V569E allele as the two dominating isoforms detected at the RNA level, we asked whether either of these two variants exhibited any differences in clonogenic activity and/or interaction partners. We would expect some equilibrium shift in the interactome of these two proteins considering the small exon variant is highly conserved within vertebrates. First, the 569del allele and the V569E allele were subcloned into pEXPR-IBA105 containing an SBP tag for affinity purification (Fig. 6(a)). When these two plasmids were transfected into wt-p53 and p53 null melanoma cell lines, the expression of both could be quantified (Fig. 6(b)), and the levels of both were equivalently expressed in p53-null cells. These data suggest the variations in the proteins do not dramatically alter their steady state levels. Upon transfection and dilution for limited cell number to measure clonogenicity, there were two notable observations. First, both alleles induced a growth suppression, rather than growth stimulation, suggesting that they function more like tumour suppressors than oncogenes. Second the V569E allele was marginally more active as a growth suppressor, with the V569E suppressing growth by 19% and the 569del suppressing growth by 33% relative to the control vector (Fig. 6(c)). Finally, interactomic immunoprecipitation (IP) identified shared protein–protein interaction sites, but also differences in the quantitative capture of certain protein targets (Fig. 6(d)). The aggregate of all the significant interacting proteins (above log0.5) that form a network using STRING is shown in Fig. 6(e). Based on these data, the core function of both MAP4K5 isoforms could to be interactions with the protein disulphide isomerase family members including peroxiredoxin (Fig. 6(d), green upper right quadrant). Peroxiredoxin can regulate oxidative stress responses including protein folding [51]. Related to this, the protein RAD23 is also detected in the pull-down experiments and this protein has also been shown to play a role in protein degradation in response to ERAD (endoplasmic reticulum-associated degradation) quality control [52] through interactions with ubiquitin. Based on this core function, and the differential quantitative detection of common interacting proteins, we could speculate that the 569del form of MAP4K5 interacts stronger to the ubiquitin regulator UBA1 and the protein disulphide isomerases – PDIs (PDIA6 and PRDX3) (Fig. 6(d), blue quadrant). By contrast, the V569E isoform shows stronger interactions with secretory or barrier function proteins, such as Fillagrin, XP32, and PRB2 (Fig. 6(d), red quadrants). Together, these data suggest that the 3 bp exon inclusion changes the equilibrium of the kinase towards distinct protein networks and its conservation from fish to mammals suggests this small non-canonical exon plays an important role in MAP4K5 protein function.
Figure 6.

Biochemical evaluation of MAP4K5 variants

Biochemical evaluation of MAP4K5 variants

Discussion

Cancer genome sequencing has revolutionized our understanding of the genetic basis of cancer, the classes of mutagenic events that drive cancer development, and the identification of genetic drivers [53,54]. However, we do not yet know how this mutated code is translated into an expressed phenotype. Identifying the expressed cancer genome, using RNAseq and protein quantitation methods such as mass spectrometry, provides a more accurate view of the state of the cancer tissue at the time of presentation in the clinic. Although mass spectrometric software including Proteome Discoverer or Maxquant [55,56] provides a coding-independent user interface that raises the impact of mass spectrometry, the vast majority of next-generation data analysis using DNA variant detectors derived from Varscan or Mutect requires computational coding [57,58]. An integrated DNA and RNA variant detection software tool (CLC) was utilized that is similar in scope to the Proteome Discoverer software tool used by mass spectrometrists that does not require computational coding [59]. This democratizes DNA and RNA variant detection to the life-science community, from the plant sciences community through to the field of human disease [60-65]. In our prior study, we benchmarked both CLC and Varscan2 as two independent variant detection platforms to define the overlap in their mutation detection and define their dual utility in creating a mutant genomic reference database for optimizing mutant peptide detection using mass spectrometry [66]. Mass spectrometry has also been used previously to identify peptides derived from RNA editing events [21]. We have compared Varscan to CLC variant detection and have confirmed their similarity at DNA variant identification in a cancer cell model and also confirmed mRNA SNVs at the proteome level through the use of mass spectrometry to identify mutated peptides [66]. Based on the data obtained from the plant science community on RNA-editing using the CLC tools, we now have applied this RNA variant detection platform to our human melanoma cell model to determine whether the RNA variant landscape can be defined. The data output can be exported as the number of synonymous or non-synonymous RNA reads that deviate from or match a reference DNA read set (Supplementary Table 1A and Tables 3 and 4). The RNA sequence deviations from the reference genome can be further annotated into SNVs, MNVs, indels, and replacements (Supplementary Tables 1B-E). In addition, the overall landscape of SNV type can be stratified into A-G/G-A or C-T/T-C mutations (Table 1(b)). In general, the data demonstrate that the ratio of A-G or C-T mutations in RNA is within an order of magnitude of each other. As most RNA editing software focus on a type of RNA editing (such as A-I), it is not clear whether the C-U or A-I RNA editing landscape defined by CLC is representative of other platforms or other cell models. For example, even in studies where APOBEC dependent RNA editing is amplified [67,68], such studies do not usually compare the ratio of ADAR to APOBEC events. In our current study, we first focused on validating one of the A-I RNA-editing events we detected, that has been reported previously. CDK13 mRNA over-editing has been reported in liver cancer [40] using RNAEditor [69]. This latter study identified two RNA editing events that could give rise to Q103R and K96R mutation in CDK13. We also can observe both RNA A-I editing events, with the G sequencing reads detected at codon 103 and 96 positions (in blue and red lines, Fig. 2(a i)) although the Q103R variant predominates (Fig. 2(a)). The SiHa cell model only identified RNA editing changes at the 103 codon position (Fig. 2(a)). It is noteworthy that the other CDK orthologues-CDK10, CDK11, and CDK12 which share similar G-C rich regions as CDK13 also display a degree of A-I editing in their transcripts (Table 2, CDK10 (Gln46Arg); CDK11 (Cys23Arg); and CDK12 (Gly1425Arg)). It will be interesting in the future to expand on the editing regulation of these CDK orthologues, determine whether mutant proteins are produced as a result of the edit, and what might be the change in function of the edited genes. In addition to classic A-I or C-U RNA edits, the CLC software also defines RNA variations in a cDNA sequence that does not match a reference genome. Genes like MAP4K5 or DIAPH2 (Supplementary Table 1C) have in frame small exons (3–6 bp) that represent ‘unmapped’ or non-canonical exons that are difficult to annotate with large intronic regions. We focus here on validating one representative class with a small in-frame exon, MAP4K5. The genomic sequence of MAP4K5 predicts a GTA sequence at exon23-intron23-24 boundary that encodes a Valine at codon 569 (Figs. 3 and 4). However, the RNA sequencing produces data that is defined as a variant by the CLC software (Fig. 3) that replaces a GTA with GAA leading to an Glutamate at codon 569. However, the genome sequence cannot accommodate this unless an A-T edit is annotated (Fig. 3(c)) or a downstream GAA exon (CTT reverse strand) is annotated into the intron (Fig. 3(d)). The A-T deviation can be altered using splicing inhibitors that reduces the GAA sequence (Fig. 5 and Supplementary Figures 3 and 4) suggesting that the deviation comes from a 3 bp exon in the intron. However, as a 1 bp exon has been reported previously [70], we cannot be confident whether the GAA exon arises from a 3 bp exon within intron 23–24, or a 1 bp exon that fuses to the terminal end of exon 23 and deletion of T nucleotide on the beginning of exon 24. Nevertheless, so-called microexons have been observed previously with as little as 3 bp exons being detected [71]. In conclusion, the integrated DNA and RNA variant detection software described in our study can open the door to more routine analysis of these splicing phenomenon by the life-science community and support future analysis of RNA variant detection in cancer tissue.

Methods

Cells and reagents

All chemicals were obtained from Sigma Aldrich unless otherwise indicated. A375 cells were reported previously [72]. The p53-specific gRNA sequence was 5ʹ-CTGAGCAGCGCTCATGGTGGNGG-3ʹ, and was used to develop the isogenic p53-null cell line as reported previously [73]. A549 cells were described previously [74]. The cell line from ATCC was cultured in DMEM (Gibco) + 10% FBS (Gibco) medium. Cells were split every 2 days, 0.05% Trypsin-EDTA (Gibco) was used to detach cells. The GI50 for cell growth inhibition for the splicing inhibitor Herboxidiene (Focus Biomolecules) [50,75,76]. The anti-SBP tagged antibody was form IBA (Mouse, Monoclonal). The HRP-conjugated Anti-mouse (was PO260 from Dako) and the HRP-conjugated Anti-rabbit (was PO217 from Dako).

DNA and RNA sequencing

Exome Sequencing of DNA derived from A375 cells was performed using Agilent V5+ UTR Exome Capture Kit (75Mb) and 100 bp paired-end reads were acquired using a coverage of 100x (performed by Otogenetics, USA). The paired fastq files from the A375 cell line (available upon request) from exome sequencing were imported into the CLC Biomedical Genomics Workbench. Adaptor sequences and bases with low quality were trimmed, DNA sequencing reads were mapped to the human reference genome hg19. Adaptor sequences and bases with low quality were trimmed and reads were mapped to human genome 19 (hg19). Variants were detected in the exome data with the CLC Probabilistic Variant Caller using the following parameters: Minimum coverage (number of reads) = 5; Minimum frequency = 5%; Minimum number of variants = 2; Variants in normal germline DNA = 0, and the coverage in the germline DNA should be at least 5 reads at the variant site. Sequencing of RNA derived from both wt-p53 and p53-null A375 cell panel, untreated and treated with IFNγ (1 ng/ml for 24 hours), to generate biological replicates based on the common genomic DNA reference file from the parental A375 cell line, was performed using total RNA, depleted of ribosomal RNA, followed by random priming to generate cDNA. From this template paired-end Illumina HiSeq2500 was used to generate approximately 20 million reads. Paired fastq files (available upon request) from RNAseq reads were imported into the CLC Biomedical Genomics Workbench. The RNA sequencing reads were mapped to the human reference genome hg19. Paired de-multiplexed fastq files from RNAseq libraries were trimmed for stretches of adapter sequences, joined into a single read followed by quality trimming using commands from the CLC Assembly Cell. The fastq files were then imported into the CLC Biomedical Genomics Workbench (version 2.5). Sequences were mapped to the A375 cancer genome sequence where at least 2 mutant RNA reads were identified that do not match the reference genomic DNA.

MAP4K5 immunoprecipitation and SWATH-MS

A plasmid containing the MAP4KV569E gene was acquired from Addgene (Addgene plasmid # 23,611) and the gene was cloned in pEXPR-IBA105 vector containing Streptavidin Binding Peptide (SBP). The MAP4KV569E expression plasmid was subsequently mutagenized using the DpnI method [77] to create a deletion of 3 base pairs (at codon 569) to obtain the MAP4K569del form (Fig.6(a)). The primers included; MAP4K5 cloning (F, with EcoRI restriction site): 5ʹ-GTCCCGAATTCGATGGAGGCCCCGCTG-3ʹ; MAP4K5 cloning (R, with BamHI restriction site): 5ʹ-CCCGGGGATCCCTTAGTAACTATTTTCATGTCCAGCCAAGAT3ʹ; MAP4K5 mutagenesis (F): 5ʹ-ATTATCAGGAAAAACCTTTCAGC-3ʹ; and MAP4K5 mutagenesis (R): 5ʹ-AGCTGAAAGGTTTTTCCTGATAAT-3ʹ. The MAP4K5 isoforms (Fig.6(a-b)) were transfected into A375 cells (as in Fig.6(b)) and immunoprecipitation (IP) was carried out with the MAP4K5-expression vectors and the SBP empty vector (pEXPR-IBA105). Cells were transfected at about 70–80% of confluency using Attractene, harvested the next day, and lysed with Triton lysis buffer (100 mM KCl, 20 mM HEPES pH7.5, 1 mM EDTA, 1 mM EGTA, 0.5 mM Na3VO4, 10% glycerol, 0.5X Protease Inhibitor Mix, 10 mM NaF, 0.1% Triton x-100). 30 μl of streptavidin agarose conjugate beads (Millipore) was washed three times with 500 μl of PBS and lysate was added to the beads and incubated on a rotor wheel for 2 hours at room temperature. After the incubation, the sample was washed one time with 500 μl of Triton lysis buffer (without Triton x-100) and two times with 500 μl of PBS. Finally, the sample was eluted in 120 μl of elution buffer (8 M urea, 2 mM DTT and 20 mM HEPES pH 8) incubating at 85◦C for 5 min. The lysates was processed by the method of FASP [78] to obtain tryptic peptides. FASP-processed tryptic peptides from the streptavidin bead pull-down were separated on an Eksigent Ekspert nanoLC 400 (SCIEX, California, USA) online connected to a TripleTOF 5600+ (SCIEX, Toronto, Canada) mass spectrometer. Data acquisition was performed in technical triplicates. A cartridge trap column (300 μm i.d. × 5 mm) packed with a C18 PepMap100 sorbent with a 5 μm particle size (Thermo Fisher Scientific, Waltham, MA, USA) was used to concentrate and wash peptides. Peptides were washed in 0.05% trifluoroacetic acid in 5% acetonitrile and 95% water for 10 minutes. Following, peptides were separated using a gradient of acetonitrile/water (300nL/minute) on an analytical capillary emitter column PicoFrit® (75 μm × 210 mm (New Objective, Massachusetts, USA)) self-packed with ProntoSIL 120-3-C18 AQ sorbent with 3 µm particles (Bischoff, Leonberg, Germany). Analytical gradient was mixed from Mobile phase A composed from 0.1% (v/v) formic acid in water, and mobile phase B composed of 0.1% (v/v) formic acid in acetonitrile. Gradient elution started at 5% mobile phase B for the first 30 minutes and then the proportion of mobile phase B increased linearly up to 40%B for the following 120 minutes. Output from the separation column was directly coupled to an ion source (nano-electrospray). The spectral library sample was prepared by pooling equal volume (10 µl) of all samples followed by data-dependent shotgun measurement in positive mode (IDA). Precursor range in MS was set from m/z 400 up to m/z 1250 while MS/MS spectra were acquired from m/z 200 up to m/z 1600. DIA method acquired and fragmented 20 the most intensive precursor ions in each cycle. Cycle time was 2.3 seconds. Once measured precursors were excluded for 12 seconds. Protein identification was performed using Protein Pilot 4.5 (SCIEX, Toronto, Canada) search engine. Acquired MS and MS/MS spectra were searched against Uniprot+Swissprot database (02. 2016, 69,987 entries) restricted to Homo sapiens taxonomy. Alkylation on cysteine using iodoacetamide as a fixed modification and digestion using trypsin was specified in the search engine. A decoy database was generated to perform FDR analysis. A spectral library was generated in Peakview 1.2.0.3 (SCIEX, Toronto, Canada) from DIA search files where only proteins with FDR below 1% were imported into the spectral library. SWATH measurements were operated in positive high sensitivity mode. Precursor ions spanning from m/z 400 up to m/z 1200 were measured in windowed manner. Precursor mass range was divided into 67 precursor windows with 12 Da width and 1 Da overlap. Accumulation time of 50 ms was set per each SWATH precursor window resulting in 3.0 seconds cycle time. Product ions mass range was set from m/z 400 up to m/z 1600. SWATH data extraction was performed in Peakview 1.2.0.3 (SCIEX, Toronto, Canada) with the spectral library. The retention time window for extraction was manually set to 10 minutes. Up to 4 peptides and 6 product ions per each peptide were used to quantitate each protein. Only non-modified high confidence peptides (peptide confidence>99%) were used for quantitation. Protein summed peak areas were determined from the sum of corresponding transition peak areas. Normalization was performed using the total area sums option in MarkerView 1.2.1.1 (SCIEX, Toronto, Canada). Extracted quantitative data from three technical replicates were statistically evaluated in MarkerView 1.2.1.1 (AB-SCIEX, Canada). Pairwise T-test was performed to determine protein fold changes and P values of fold change for all proteins listed in the spectral library.

Pladienolide B and Herboxidiene treatment

A549 cells were treated with Pladienolide B (10 or 100 nM) or Herboxidiene (5 or 10 nM) dissolved in full-medium for 24 and 48 hours (Pladienolide B) or for 48 and 72 hours (Herboxidiene). Control samples contained DMSO alone. After treatment, cells were harvested and total RNA was extracted. RNA was isolated from cells using Universal RNA Purification kit (EURex, cat no. E3598), according to the manufacturer’s protocol. The RNA concentration was measured with the use of NanoReady (Life Real) device. Reverse transcription was performed in 20 µl reaction volume, with the use of High Capacity cDNA Reverse Transcription Kit (REF: 4,368,814, Thermo Fisher Scientific), according to the manufacturer’s protocol. 500ng of RNA was used for this reaction.

PCR amplification and purification

Amplification of MAP4K5 fragment was performed using Phusion High-Fidelity DNA Polymerase kit (Thermo Fisher Scientific) using 100 ng cDNA as a template and subsequent primer sequences: M4K5e23F: TCCACGGAAGTGTACTTGGC; M4K5e26R: TCCAGACTGTAAAGCTCCACA. The thermal cycler program for MAP4K5 gene amplification was as follows: 1. Denaturation: 95°C, 3 min; 2. Denaturation: 98°C, 20 sec; 3. Annealing: 60°C, 15 sec; 4. Elongation: 72°C, 20 sec; Steps 2–4 were repeated 29 times.; 5. Elongation: 72°C, 2 min; and 6. Hold: 4°C, Inf. Amplification of CDK13 fragment was performed using Phusion MM in HF buffer kit (Thermo Fisher Scientific) using 100 ng cDNA as a template and primer sequences for the 1st PCR reaction: CDK13 F3.3: GAGATGGCCAGGATCTGAC; CDK13 R1: GTGGAATACGAGGATGTGAGC. As a template for the 2nd PCR reaction 100 ng of purified (with the use of QIAquick PCR Purification Kit (QIAGEN)) product from the 1st PCR was used and the primer sequences were as follows: CDK13 F1: CTGCTCTTCCTGGCTGCTC and CDK13 R2: CAGGAGGCGGAGAAGCGTC. The thermal cycler program for both PCR reactions was as follows: for the 1st PCR: 1. Denaturation: 98°C, 2 min; 2. Denaturation: 98°C, 45 sec; 3. Annealing: 63°C, 30 sec; 4. Elongation: 72°C, 30 sec; Steps 2–4 were repeated 10 times.; 5. Elongation: 72°C, 10 min; and 6. Hold: 4°C, Inf.; for 2nd PCR: 1. Denaturation: 98°C, 2 min; 2. Denaturation: 98°C, 45 sec; 3. Annealing: 61°C, 30 sec; 4. Elongation: 72°C, 30 sec; Steps 2–4 were repeated 25 times.; 5. Elongation: 72°C, 10 min; and 6. Hold: 4°C, Inf. The PCR products were visualized on 1.5% agarose gel with the use of 1kb Gene Ruler (Thermo Fisher Scientific). The purification of PCR products was performed using NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel) and the concentration of purified PCR products was measured on Nanodrop or NanoReady device. Purified PCR products were sequenced to the Eurofins company. The results (chromatograms) were visualized in Chromas v2.6.6. Click here for additional data file.
  78 in total

1.  Emerging Treatment Paradigms for EGFR-Mutant Lung Cancers Progressing on Osimertinib: A Review.

Authors:  Andrew J Piper-Vallillo; Lecia V Sequist; Zofia Piotrowska
Journal:  J Clin Oncol       Date:  2020-06-18       Impact factor: 44.544

2.  Cancer immunotherapy. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells.

Authors:  Beatriz M Carreno; Vincent Magrini; Michelle Becker-Hapak; Saghar Kaabinejadian; Jasreet Hundal; Allegra A Petti; Amy Ly; Wen-Rong Lie; William H Hildebrand; Elaine R Mardis; Gerald P Linette
Journal:  Science       Date:  2015-04-02       Impact factor: 47.728

3.  The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.

Authors:  Stefka Tyanova; Tikira Temu; Juergen Cox
Journal:  Nat Protoc       Date:  2016-10-27       Impact factor: 13.491

4.  Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions.

Authors:  Steven A Roberts; Joan Sterling; Cole Thompson; Shawn Harris; Deepak Mav; Ruchir Shah; Leszek J Klimczak; Gregory V Kryukov; Ewa Malc; Piotr A Mieczkowski; Michael A Resnick; Dmitry A Gordenin
Journal:  Mol Cell       Date:  2012-05-17       Impact factor: 17.970

5.  Investigating RNA editing in deep transcriptome datasets with REDItools and REDIportal.

Authors:  Claudio Lo Giudice; Marco Antonio Tangaro; Graziano Pesole; Ernesto Picardi
Journal:  Nat Protoc       Date:  2020-01-29       Impact factor: 13.491

6.  Detection of RNA editing events in human cells using high-throughput sequencing.

Authors:  Iouri Chepelev
Journal:  Methods Mol Biol       Date:  2012

7.  A vaccine targeting mutant IDH1 induces antitumour immunity.

Authors:  Theresa Schumacher; Lukas Bunse; Stefan Pusch; Felix Sahm; Benedikt Wiestler; Jasmin Quandt; Oliver Menn; Matthias Osswald; Iris Oezen; Martina Ott; Melanie Keil; Jörg Balß; Katharina Rauschenbach; Agnieszka K Grabowska; Isabel Vogler; Jan Diekmann; Nico Trautwein; Stefan B Eichmüller; Jürgen Okun; Stefan Stevanović; Angelika B Riemer; Ugur Sahin; Manuel A Friese; Philipp Beckhove; Andreas von Deimling; Wolfgang Wick; Michael Platten
Journal:  Nature       Date:  2014-06-25       Impact factor: 49.962

8.  Proteogenomic biomarkers in colorectal cancers: clinical applications.

Authors:  Margherita Binetti; Augusto Lauro; Samuele Vaccari; Maurizio Cervellera; Valeria Tonini
Journal:  Expert Rev Proteomics       Date:  2020-06-22       Impact factor: 3.940

9.  Whole plastid transcriptomes reveal abundant RNA editing sites and differential editing status in Phalaenopsis aphrodite subsp. formosana.

Authors:  Ting-Chieh Chen; Yu-Chang Liu; Xuewen Wang; Chi-Hsuan Wu; Chih-Hao Huang; Ching-Chun Chang
Journal:  Bot Stud       Date:  2017-09-16       Impact factor: 2.787

10.  RADAR: a rigorously annotated database of A-to-I RNA editing.

Authors:  Gokul Ramaswami; Jin Billy Li
Journal:  Nucleic Acids Res       Date:  2013-10-25       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.