| Literature DB >> 22957051 |
Ernesto Picardi1, Angela Gallo, Federica Galeano, Sara Tomaselli, Graziano Pesole.
Abstract
RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22957051 PMCID: PMC3434223 DOI: 10.1371/journal.pone.0044184
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Working hypothesis and computational strategy.
Overview of working hypothesis and computational strategy adopted to detect significant A-to-G substitutions in a RNA-Seq experiment.
Figure 2Graphical overview of mapping strategy.
Short reads are mapped onto the assembled transcriptome comprising more than 370,000 variants from RefSeq, UCSC, ASPicDB and Ensembl using GSNAP tool. Aligned reads are then mapped onto the complete genome. Finally transcriptome and genome alignments are compared providing a SAM file of unique and concordant mappings.
List of significant A-to-G changes detected in SRP002274 study.
| Position | Gene | Ref | ST | CC | AAC | CodP | CovR | BCR [A, C, G, T] | BCR-F[A, C,G, T] | %Editing | Pvalue | FDR |
| chr4:57670991 | IGFBP7 | T | TC | AAG –> AGG | K –> R | 2 | 263 | [0, 167, 0, 96] | [0, 189,0, 173] | 63,5 | 2,62e-66 | 1,41e-63 |
| chr5:156669386 | CYFIP2 | A | AG | AAG –> GAG | K –> E | 1 | 153 | [63, 0, 90, 0] | [66, 0,113, 0] | 58,82 | 2,75e-34 | 7,40e-32 |
| chr1:116739332 | ATP1A1 | A | AG | ATC –> GTC | I –> V | 1 | 478 | [414, 0, 64, 0] | [486, 0,94, 1] | 13,39 | 2,00e-19 | 3,58e-17 |
| chr4:158500744 | GRIA2 | A | AG | AGA –> GGA | R –> G | 1 | 75 | [33, 0, 42, 0] | [36, 0,50, 0] | 56 | 2,00e-15 | 2,69e-13 |
| chr19:59638596 | TTYH1 | A | AG | CTA –> CTG | L –> L | 3 | 228 | [180, 0, 48, 0] | [245, 2,66, 0] | 21,05 | 6,21e-15 | 6,69e-13 |
| chr1:224041237 | SRP9 | A | AG | ATA –> ATG | I –> M | 3 | 57 | [26, 0, 31, 0] | [27, 0,41, 0] | 54,39 | 3,47e-11 | 3,12e-09 |
| chr4:158477325 | GRIA2 | A | AG | CAG –> CGG | Q –> R | 2 | 22 | [1, 0, 21, 0] | [1, 0,28, 0] | 95,45 | 2,31e-10 | 1,77e-08 |
| chrX:122426643 | GRIA3 | A | AG | AGA –> GGA | R –> G | 1 | 22 | [3, 0, 19, 0] | [3, 0,24, 0] | 86,36 | 1,94e-08 | 1,31e-06 |
| chr15:73433139 | NEIL1 | A | AG | AAA –> AGA | K –> R | 2 | 15 | [0, 0, 15, 0] | [0, 0,33, 0] | 100 | 1,03e-07 | 6,18e-06 |
| chr6:44228327 | TMEM63B | A | AG | CAG –> CGG | Q –> R | 2 | 92 | [70, 0, 22, 0] | [95, 0,24, 0] | 23,91 | 7,73e-07 | 4,17e-05 |
| chrX:153233144 | FLNA | T | TC | CAG –> CGG | Q –> R | 2 | 67 | [0, 20, 0, 47] | [0, 22,0, 50] | 29,85 | 2,30e-06 | 1,12e-04 |
| chr11:105309904 | GRIA4 | A | AG | AGA –> GGA | R –> G | 1 | 19 | [4, 0, 15, 0] | [6, 0,17, 0] | 78,95 | 3,35e-06 | 1,51e-04 |
| chr2:267003 | ACP1 | A | AG | CAA –> CGA | Q –> R | 2 | 44 | [26, 0, 18, 0] | [26, 0,18, 0] | 40,91 | 5,24e-06 | 2,17e-04 |
| chr6:102479282 | GRIK2 | A | AG | CAG –> CGG | Q –> R | 2 | 13 | [2, 0, 11, 0] | [2, 0,11, 0] | 84,62 | 1.06E-04 | 4,09e-03 |
| chr21:33845189 | SON | A | AG | CTA –> CTG | L –> L | 3 | 35 | [21, 0, 14, 0] | [26, 0,17, 0] | 40 | 1.17E-04 | 4,21e-03 |
| chr19:8484035 | ZNF414 | T | TC | CAG –> CGG | Q –> R | 2 | 54 | [0, 13, 0, 41] | [0, 14,0, 55] | 24,07 | 4.51E-04 | 1,52e-02 |
| chr14:25987370 | NOVA1 | T | TC | AGC –> GGC | S –> G | 1 | 30 | [0, 12, 0, 18] | [0, 14,0, 23] | 40 | 5.25E-04 | 1,67e-02 |
| chr16:357758 | MRPL28 | T | TC | TAT –> TGT | Y –> C | 2 | 104 | [0, 13, 0, 91] | [0, 14,0, 104] | 12,5 | 6.50E-04 | 1,95e-02 |
| chr20:35580977 | BLCAP | T | TC | CAG –> CGG | Q –> R | 2 | 110 | [0, 12, 0, 98] | [0, 12,1, 124] | 10,91 | 1.31E-03 | 3,71e-02 |
Positions already known to be edited from literature.
Positions validated in spinal cord by exome sequencing. Ref: Reference nucleotide; ST: substitution type; CC: codon change; AAC: amino acid change; CodP: codon position; CovR: RNA-Seq coverage; BCR: RNA-Seq base count. i.e. the distribution of supported RNA-Seq bases; BCR-F: RNA-Seq base count before all filtering steps; FDR: false discovery rate.
List of significant A-to-G substitutions in known protein coding regions detected in SRP002274 study and sorted according to ascending Pvalue. For each position corrected Pvalue (FDR), RNA-Seq bases supporting the editing event, codon and aminoacid change (if any), other than gene name, chromosome position and potential editing extent are reported.
Figure 3Evaluation of the effectiveness of three different mapping strategies in the editing detection.
Venn diagram comparing the effectiveness of three different mapping strategies in the detection of editing sites and reporting overlaps and differences in the number of predicted and literature validated (in brackets) editing events. Mapping strategies are: M1) Bowtie against the reference genome; M2) Tophat against the reference genome; and M3) GSNAP against transcriptome and reference genome.
Figure 4Sanger confirmation of editing candidates.
RNA editing events identified within coding regions of candidate genes from SRP002274 study were validated using classical Sanger sequencing method by comparing genomic and the corresponding cDNA portions: a) editing event within the TTYH1 at position chr19:59638596 in human brain; b) editing in TMEM63B at position chr6:44228327 in astrocytoma cell lines over-expressing ADAR2; c) editing in SON at position chr21:33845189, in human brain; d) editing in NOVA1 at position chr14:25987370, in human brain; (e) editing event predicted within ATP1A1 gene did not revealed any editing or SNP when compared both gDNA and cDNA sequences isolated from human brain; (f) we identified a SNP within ZNF414. Chromosome coordinates are referred to human genome assembly hg18.
Significant A-to-G substitutions detected in RNA-Seq of human spinal cord.
| Position | Gene | Ref | ST | CovR | BCR [A, C, G, T] | BCR-F[A, C,G, T] | CovE | BCE [A, C, G, T] | %Editing | Pvalue | LLR |
| chr9:35688080 | TLN1 | T | TC | 196 | [0, 79, 0, 117] | [0, 150,0, 122] | 21 | [0, 0, 0, 21] | 40.31 | 4.05E-27 | 321.98 |
| chr9:139026134 | ABCA2 | T | TC | 377 | [0, 76, 0, 301] | [0, 152,0, 508] | 0 | [0, 0, 0, 0] | 20.16 | 8.51E-24 | ND |
| chr16:357758 | MRPL28 | T | TC | 116 | [0, 60, 0, 56] | [0, 63,0, 58] | 6 | [0, 0, 0, 6] | 51.72 | 1.02E-21 | 176.42 |
| chr1:1299268 | AURKAIP1 | T | TC | 93 | [0, 43, 0, 50] | [0, 58,0, 55] | 1 | [0, 0, 0, 1] | 46.24 | 5.54E-15 | 120.94 |
| chr17:19624930 | ULK2 | T | TC | 116 | [0, 42, 0, 74] | [0, 86,0, 80] | 6 | [0, 0, 0, 6] | 36.21 | 6.36E-14 | 111.03 |
| chr4:78198704 | CCNI | T | TC | 174 | [0, 35, 0, 139] | [0, 54,0, 151] | 28 | [0, 0, 0, 28] | 20.11 | 8.89E-11 | 125.07 |
| chr4:191101378 | FRG1 | A | AG | 71 | [54, 0, 17, 0] | [56, 0,20, 0] | 0 | [0, 0, 0, 0] | 23.94 | 2.74E-05 | ND |
| chrX:153233144 | FLNA | T | TC | 76 | [0, 17, 0, 59] | [0, 51,0, 69] | 5 | [0, 0, 0, 5] | 22.37 | 2.95E-05 | 78.04 |
| chr11:61481492 | BEST1 | A | AG | 106 | [89, 0, 17, 0] | [95, 0,18, 0] | 4 | [4, 0, 0, 0] | 16.04 | 3.89E-05 | 39.64 |
| chr12:6043772 | VWF | T | TC | 133 | [0, 15, 0, 118] | [0, 15,0, 133] | 2 | [0, 0, 0, 2] | 11.28 | 1.80E-04 | 29.77 |
| chr4:191115593 | FRG1 | A | AG | 59 | [45, 0, 14, 0] | [45, 0,15, 0] | 159 | [159, 0, 0, 0] | 23.73 | 2.28E-04 | 35.87 |
| chr17:37235419 | NT5C3L | T | TC | 45 | [0, 13, 0, 32] | [0, 20,1, 67] | 2 | [0, 0, 0, 2] | 28.89 | 3.82E-04 | 37.26 |
| chr3:58116831 | FLNB | A | AG | 105 | [92, 0, 13, 0] | [101, 0,15, 0] | 11 | [11, 0, 0, 0] | 12.38 | 6.52E-04 | 28.84 |
| chr1:224041237 | SRP9 | A | AG | 30 | [19, 0, 11, 0] | [21, 0,22, 0] | 0 | [0, 0, 0, 0] | 36.67 | 1.23E-03 | ND |
| chr15:20421437 | TUBGCP5 | A | AG | 53 | [42, 0, 11, 0] | [46, 0,16, 0] | 8 | [8, 0, 0, 0] | 20.75 | 1.96E-03 | 25.85 |
Positions already known to be edited from literature. Ref: Reference nucleotide; ST: substitution type; CovR: RNA-Seq coverage; BCR: RNA-Seq base count. i.e. the distribution of supporting RNA-Seq bases; BCR-F: RNA-Seq base count before all filtering steps; CovE: exome coverage; BCE: exome base count; LLR: log-likelihood ratio.
List of significant A-to-G substitutions in known protein coding regions detected in transcriptome reads from spinal cord. Exome support is also reported for each position as well as Pvalue and log-likelihood ratio [49] [39].