| Literature DB >> 24722121 |
Jia-Yu Chen1, Zhiyu Peng2, Rongli Zhang1, Xin-Zhuang Yang1, Bertrand Chin-Ming Tan3, Huaying Fang4, Chu-Jun Liu1, Mingming Shi5, Zhi-Qiang Ye6, Yong E Zhang7, Minghua Deng4, Xiuqin Zhang1, Chuan-Yun Li1.
Abstract
Understanding of the RNA editing process has been broadened considerably by the next generation sequencing technology; however, several issues regarding this regulatory step remain unresolved--the strategies to accurately delineate the editome, the mechanism by which its profile is maintained, and its evolutionary and functional relevance. Here we report an accurate and quantitative profile of the RNA editome for rhesus macaque, a close relative of human. By combining genome and transcriptome sequencing of multiple tissues from the same animal, we identified 31,250 editing sites, of which 99.8% are A-to-G transitions. We verified 96.6% of editing sites in coding regions and 97.5% of randomly selected sites in non-coding regions, as well as the corresponding levels of editing by multiple independent means, demonstrating the feasibility of our experimental paradigm. Several lines of evidence supported the notion that the adenosine deamination is associated with the macaque editome--A-to-G editing sites were flanked by sequences with the attributes of ADAR substrates, and both the sequence context and the expression profile of ADARs are relevant factors in determining the quantitative variance of RNA editing across different sites and tissue types. In support of the functional relevance of some of these editing sites, substitution valley of decreased divergence was detected around the editing site, suggesting the evolutionary constraint in maintaining some of these editing substrates with their double-stranded structure. These findings thus complement the "continuous probing" model that postulates tinkering-based origination of a small proportion of functional editing sites. In conclusion, the macaque editome reported here highlights RNA editing as a widespread functional regulation in primate evolution, and provides an informative framework for further understanding RNA editing in human.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24722121 PMCID: PMC3983040 DOI: 10.1371/journal.pgen.1004274
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Genome-wide identification and verification of RNA editome in one rhesus macaque.
(A) Overview of the experimental design – genome-wide identification, and medium- or low-throughput verification of RNA-editing sites. (B) An example showing the genotyping results for the genomic DNA (gDNA) and cDNA (cDNA) of one verified RNA-editing site (chr11:5028364, KCNA1). The levels of RNA editing were estimated from high-throughput, medium-throughput and low-scale data on the basis of read number, signal intensity contrast and peak height ratio between the edited and wild-type alleles, respectively. The primer peak and the genotype peak on mass spectrum are indicated by dotted lines in red. (C) Comparison of the levels of RNA editing estimated by high-throughput (H), medium-throughput (M) and low-scale (L) platforms. The example in (B) is highlighted in red. Pearson correlation coefficients between different platforms are shown on the right.
Statistics of deep sequencing for one rhesus macaque.
| Tissue Type | Total Reads | Length | Q20 | GC Content | Uniquely-Aligned Reads | |||
| Genome Mapping | Transcriptome Mapping | |||||||
|
| ||||||||
|
| 100.7M | 90 bp×2 | 97.1% | 48.8% | 67.6M | 67.1% | 38.7M | 38.4% |
|
| 142.1M | 90 bp×2 | 97.0% | 47.7% | 95.2M | 67.0% | 60.8M | 42.8% |
|
| 120.0M | 90 bp×2 | 96.9% | 49.8% | 75.7M | 63.1% | 53.0M | 44.2% |
|
| 129.0M | 90 bp×2 | 96.4% | 47.5% | 92.5M | 71.7% | 42.6M | 33.0% |
|
| 113.6M | 90 bp×2 | 96.8% | 48.9% | 84.1M | 74.0% | 35.0M | 30.8% |
|
| 123.7M | 90 bp×2 | 97.1% | 46.6% | 72.9M | 58.9% | 72.2M | 58.4% |
|
| 95.7M | 90 bp×2 | 97.3% | 48.0% | 60.0M | 62.7% | 45.1M | 47.1% |
|
| ||||||||
|
| 2173.2M | 90 bp×2 | 94.3% | 41.7% | 1763.3M | 81.1% | - | - |
|
| ||||||||
|
| 83.9M | 90 bp×2 | 95.9% | 49.1% | 75.2M | 89.6% | - | - |
Figure 2Experimental and computational strategies for accurate editome identification in rhesus macaque.
Potential false-positives in the RNA editing calling workflow were minimized by a more thorough design in our pipeline strategy. (A) Two discrepancies between RNA and genomic-DNA sequences (highlighted by blue boxes) were located in a cis-natural antisense region where both DNA strands could be transcribed. Strand-specific RNA-Seq clearly distinguished the sequence reads transcribed from the two strands and correctly assigned this site as A-to-G editing, as no discrepancy was detected in the plus-strand transcribed gene. (B) Based on the macaque gene structures defined in-house (RhesusBase Structure), one of the exon-intron boundaries of ENSMMUT00000021567 was incorrectly defined by a previous annotation (Ensembl Structure). Two T-to-A DNA-RNA discrepancies highlighted by blue boxes would be incorrectly identified as T-to-A RNA editing with the RNA-Seq reads being aligned to the mis-annotated transcript structure. (C) The genotype of the site highlighted in the blue boxes was incorrectly recognized as homozygous in DNA and heterozygous in RNA, since only 1 out of 28 sequence reads supported the mutant allele T in DNA, leading to incorrect assignment of a C-to-T editing event. Both Sequenom mass array and Sanger sequencing validations excluded such false-positives, which may arise due to low sequencing coverage and biased allele capture efficiency in the exome-Seq assay.
28 verified editing sites in the macaque coding regions.
| Position | Form | Host Gene | Recoding Type | Function of the Host Gene |
|
| A→G | COPA | I→V non-synonymous change | nucleoside-triphosphatase regulator, ion binding, protein binding |
|
| A→G | ARIH2 | K→K synonymous change | ion binding, nucleic acid binding, protein binding |
|
| A→G | RICTOR | R→G non-synonymous change | protein binding |
|
| A→G | NOVA1 | S→G non-synonymous change, protein stability | nucleic acid binding, protein binding |
|
| A→G | GABRA3 | I→M non-synonymous change, trafficking | ion binding, protein binding, substrate-specific & transmembrane transporter, neurotransmitter binding, signal transducer |
|
| A→G | BLCAP | Y→C non-synonymous change, cancer biomarker | - |
|
| A→G | BLCAP | Q→R non-synonymous change, cancer biomarker | - |
|
| A→G | BLCAP | K→R non-synonymous change, cancer biomarker | - |
|
| A→G | KCNA1 | affinity for blocking particle, kinetics of channel inactivation | ion binding, protein binding, substrate-specific & transmembrane transporter |
|
| A→G | UNC80 | S→G non-synonymous change | - |
|
| A→G | COG3 | I→V non-synonymous change | protein binding, substrate-specific transporter |
|
| A→G | IGFBP7 | K→R non-synonymous change, proteolytic cleavage | protein binding |
|
| A→G | CYFIP2 | K→E non-synonymous change, biomarker for ALS | protein binding |
|
| A→G | FLNA | Q→R non-synonymous change, physiological properties | protein binding, signal transducer |
|
| A→G | ASIC1 | T→A non-synonymous change | - |
|
| A→G | NEIL1 | K→R non-synonymous change, nucleotide removal efficiency | ion binding, nucleic acid binding, protein binding, hydrolase |
|
| A→G | NEIL1 | K→K synonymous change, nucleotide removal efficiency | ion binding, nucleic acid binding, protein binding, hydrolase |
|
| A→G | GRIA2 | Q→R non-synonymous change, Ca-permeability | protein binding, substrate-specific & transmembrane transporter, signal transducer |
|
| A→G | GRIA2 | Q→Q synonymous change | protein binding, substrate-specific & transmembrane transporter, signal transducer |
|
| A→G | SMG5 | R→G non-synonymous change | protein binding |
|
| A→G | SON | L→L synonymous change | nucleic acid binding, protein binding |
|
| A→G | PDCD7 | Q→R non-synonymous change | - |
|
| A→G | FLNB | M→V non-synonymous change | protein binding |
|
| A→G | GRM4 | Q→R non-synonymous change | metabotropic glutamate, GABA-B-like receptor, signal transducer |
|
| A→G | TMEM63B | Q→R non-synonymous change | - |
|
| A→G | XKR6 | R→G non-synonymous change | - |
|
| A→G | GRIA2 | R→G non-synonymous change | protein binding, substrate-specific & transmembrane transporter, signal transducer |
|
| C→T | NOL11 | T→I non-synonymous change | - |
Figure 3Characteristics of the rhesus macaque editome.
(A) For editing sites in each type of tissue, the distribution of the levels of RNA editing was shown in boxplot. (B) Hierarchical clustering of editing levels of all editing sites across multiple macaque tissues and animals. Editing levels were estimated on the basis of RNA-Seq data in this study (Testis, Lung, Kidney, Heart, Muscle, Prefrontal cortex) and other public RNA-Seq data [Brain (1–6), Cerebellum (1–2), Muscle (1–8), Heart (1–5), Kidney (1–3), Lung (1–3), Testis (1–3)], with missing data shown in dark cyan. (C) Hierarchical clustering of editing levels is shown for selected RNA editing sites located in coding regions. Editing levels were estimated on the basis of mass array-based genotyping in seven macaque tissues derived from the same macaque (Testis, Lung, Kidney, Heart, Muscle, Cerebellum, Prefrontal Cortex), as well as five muscle and four brain samples obtained from different macaque animals [Muscles (A–E), Whole Brains (A–D)], with missing data shown in dark cyan. (D) The distribution of pair-wise comparison of intra-population and cross-tissue coefficient of variance (CV) values is shown in boxplot.
Figure 4ADARs-mediated enzymatic reactions is associated with the macaque editome.
(A) The enriched (above the top line) and depleted (below the bottom line) nucleotides nearby the focal editing sites are displayed in Two-Sample Logo, with the level of preference/depletion shown in height proportional to the scale. (B) The editing sites were divided into four categories on the basis of the local sequence context nearby the editing site, as described in . For each category, levels of RNA editing are shown in boxplots according to the tissue types. (C) Distribution of the percentages of editing sites showing tissue distribution of editing levels positively correlated with the expression of ADARs (Spearman's rank correlation coefficient at ≥0.5), for 10,000 permutation datasets neglecting tissue relationships for the tissue expression profile. The percentage for the real data was indicated by the arrow with Monte Carlo p-value. (D) Distributions of R values in models assuming association of editing level with ADARs expression are shown as the Real Data, as well as the Background, which correspond to randomly shuffled profiles. (E, F) The tissue expression profiles of ADAR1 or ADAR2 were ordered based on RNA expression levels, and normalized editing levels of A-to-G sites were aligned accordingly. These A-to-G editing sites showed similar trends in the distribution of editing levels along the ordered tissue expression profile of ADAR1 (E) or ADAR2 (F).
Figure 5Contribution of purifying selection to the RNA editome in primates.
(A) The percentages of macaque editing sites with corresponding editing sites in human and/or chimpanzee (red bars), or genomically encoded in the two species (blue bars), are shown for the total editome (top), or for editing sites in different genomic regions (bottom). (B) The genomic sequences nearby the macaque editing sites were compiled according to the distances to the editing sites. For each 6-nucleotide window, the proportion of divergent sites between human and rhesus macaque are shown for different genomic categories. (C) Distribution of human-macaque synonymous divergent sites nearby the A-to-G editing sites. The codons with RNA-editing sites are highlighted in yellow and each synonymous divergent site in purple. The distribution of synonymous divergence (dS) values near the RNA-editing site, calculated using a 6-codon window, is shown in the lower panel, with the genome-wide dN and dS between human and rhesus macaque indicated by the dotted line.