Literature DB >> 27687780

Single-cell RNA sequencing reveals dynamic changes in A-to-I RNA editome during early human embryogenesis.

Si Qiu^1,2, Wenhui Li², Heng Xiong², Dongbing Liu², Yali Bai^2,3, Kui Wu^2,4, Xiuqing Zhang², Huanming Yang^2,5, Kun Ma⁶, Yong Hou^7,8, Bo Li^9,10.

Abstract

BACKGROUND: A-to-I RNA-editing mediated by ADAR (adenosine deaminase acting on RNA) enzymes that converts adenosine to inosine in RNA sequence can generate mutations and alter gene regulation in metazoans. Previous studies have shown that A-to-I RNA-editing plays vital roles in mouse embryogenesis. However, the RNA-editing activities in early human embryonic development have not been investigated.
RESULTS: Here, we characterized genome-wide A-to-I RNA-editing activities during human early embryogenesis by profiling 68 single cells from 29 human embryos spanning from oocyte to morula stages. We demonstrate dynamic changes in genome-wide RNA-editing during early human embryogenesis in a stage-specific fashion. In parallel with ADAR expression level changes, the genome-wide A-to-I RNA-editing levels in cells remained relatively stable until 4-cell stage, but dramatically decreased at 8-cell stage, continually decreased at morula stage. We detected 37 non-synonymously RNA-edited genes, of which 5 were frequently found in cells of multiple embryonic stages. Moreover, we found that A-to-I editings in miRNA-targeted regions of a substantial number of genes preferably occurred in one or two sequential stages.
CONCLUSIONS: Our single-cell analysis reveals dynamic changes in genome-wide RNA-editing during early human embryogenesis in a stage-specific fashion, and provides important insights into early human embryogenesis.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Embryogenesis; RNA-editing; Single cell transcriptome

Year: 2016 PMID： 27687780 PMCID： PMC5043600 DOI： 10.1186/s12864-016-3115-2

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

A-to-I RNA-editing mediated by ADAR (adenosine deaminase acting on RNA) enzymes is the major RNA-editing that post-transcriptionally modifies nucleotide sequences on RNA molecules in metazoans [1]. RNA-editing can alter protein sequences, influence RNA stability and miRNA regulations in multiple biological processes including development and carcinogenesis [2]. The mammalian ADAR proteins include ADAR, ADARB1, and ADARB2 [3]. Recent studies have demonstrated that most mice with a null allele of ADAR died before E14 due to defects in the hematopoietic system [4], and most mice with editing deficient ADAR mutation knock-in died at E13.5 as a result of unedited transcripts activating interferon and dsRNA sensing pathway [5]. Shtrichman et al. [6] found that editing levels of various target genes are substantially greater in most adult tissues than corresponding fetal ones and that ADAR protein is substantially regulated in undifferentiated pluripotent hESCs. These findings suggest that RNA-editing plays important roles in embryogenesis. Although early human embryonic transcriptome profiles have been studied [7-9], no research on RNA-editing activities before blastocyst stage during human embryogenesis has been conducted. To investigate RNA-editing activities during early embryogenesis in humans, we profiled the RNA editome from 68 single cells from 29 human embryos ranging from oocyte to morula stages using published human embryonic single cell transcriptome data [8, 9].

Results

Characteristics of RNA editome during early human embryogenesis

By analyzing 68 single cells from 29 human embryos spanning from oocyte to morula stages in early embryogenesis (Additional file 1: Figure S1) using our RNA identification pipeline, we identified 14,049 candidate RNA mismatches, including 9,795 in Alu and 4,254 in non-Alu regions. Of the 9,795 mismatches in Alu regions we identified, A-to-G was the most prevalent mismatch type (account for 88.04 %), followed by T-to-C mismatches (account for 11.61 %), of which the majority were thought to be incorrect annotation of A-to-I editing because the RNA-seq libraries were not strand-specific [10]. The A-to-G and T-to-C mismatches together account for 99.65 % of the sites identified in Alu region (Fig. 1a). A typical ADAR-mediated editing is characterized by under-representation of G base in position −1 next to the edited site and over-representation of G base in position +1 next to the edited site [11]. Indeed, this characteristic was seen at our identified A-to-G sites and at the complementary strand of T-to-C sites (Fig. 1b). Of the 4,254 candidate RNA editing sites identified in non-Alu regions 2,247 were A-to-G mismatches (52.82 %) and 488 were T-to-C mismatches (11.47 %). The proportion of A-to-G/T-to-C mismatches in non-Alu regions is smaller than those in Alu regions (Fig. 1a). Similar findings have been reported by Fumagalli et al. [12] when they studied RNA-DNA single nucleotide differences (RDDs) in breast cancer. Considering that the proportion of validated RDDs was 90 % in Alu region and below 40 % outside Alu region in their study, and that the majority of editing events are in Alu elements in human [13], we only retain non-Alu A-to-G mismatches recorded in RADAR database [14] and Alu A-to-G mismatches as candidate A-to-I RNA-editing sites in our further analyses.

Fig. 1

Characteristics of candidate RNA-editings in cells during early human embryogenesis. a Percentages of mismatch types in each cell. Each column represents one cell. b Neighbor preferences for A-to-G (top) and T-to-C (bottom) sites in each cell. A-to-G sites and the complementary strand of T-to-C sites display the motif signature of ADAR mediated A-to-I editing. Each pair of dots and triangles represents the G/C base % at position −1 (blue) and +1 (red) in one cell. c Percentages of A-to-I RNA-editing sites in functional genome elements at different stages. ncRNA, noncoding RNA. Each column represents one cell. d Changes in candidate A-to-I RNA-editing sites at different stages. Each bar represents one cell. O: oocyte; P: pronucleus; Z: zygote; 2: 2-cell; 4: 4-cell; 8: 8-cell; M: morula. e RNA-editing frequency at each editing site (column) in each cell (row). Blank regions, uniquely mapped reads < 4, not qualified for RNA-editing determination Totally, we identified 8,813 candidate A-to-I RNA-editing sites (Additional file 2: Table S1), of which 97.84 % located in Alu regions and 3,253 sites were present in the RADAR database. We annotated A-to-I RNA-editing sites and found that the majority were located in 3’-UTR regions (47.12 %), followed by intronic (33.77 %), non-coding RNA (ncRNA, 17.12 %), coding (0.98 %), and 5’-UTR regions (1.01 %), respectively (Fig. 1c). We noticed that the proportion of intronic A-to-I editing sites identified in our study is much smaller than that (94.03 %) in RADAR database, which may due to the low-coverage of the single cell RNA-seq data we analyzed. We will discuss this issue later on. On average, we detected 819 ± 609, 702 ± 64, 953 ± 377, 1,057 ± 557, and 743 ± 330 A-to-I RNA-editing sites in each cell at oocyte, pronucleus, zygote, 2-cell, and 4-cell stages, respectively, whereas in cells at 8-cell and morula stages the A-to-I RNA-editing sites sharply dropped to only 190 ± 97 and 86 ± 35 per cell, respectively (Fig. 1d). To investigate the changes in RNA-editing patterns during human early embryogenesis, we clustered single cells based on editing frequencies (defined as the variant-supporting reads divided by all reads mapped to a specific RNA-editing site), and found that cells at 8-cell and morula stages were clustered together and separated from cells at earlier stages (Fig. 1e). The heatmap shows that ~15 % of A-to-I RNA-editing sites present in cells crossing from oocyte to 4-cell stages, but not in cells at 8-cell and morula stages, despite of expression of transcripts at moderate levels (Fig. 1e).

A-to-I RNA-editing levels are sharply decreased in cells at 8-cell and morula stages

To profile genome-wide A-to-I RNA-editing changing patterns in the embryonic development, we defined A-to-I RNA-editing level as edited bases per million mapped bases in each cell. Under this definition, A-to-I RNA-editing levels are not affected by the number of mapped bases (Pearson’s correlation test, P = 2.11E-01, r = −0.15; Fig. 2a).

Fig. 2

ADARs govern changes in A-to-I RNA-editing levels during early human embryogenesis. a No correlation between editing level and mapped bases. b Changes in RNA-editing levels during early human embryogenesis. * P < 0.05; ** P < 0.01; *** P < 0.001 (Wilcoxon rank sum test). c Changes in gene expression of ADAR, ADARB1, and ADARB2 during early human embryogenesis. Each colored dot (red, black, and blue) represents the gene expression levels of ADAR, ADARB1, and ADARB2 in a cell. d and e Correlation between editing levels and the expression levels of ADAR and ADARB1. Each dot represents one cell We observed that on average the RNA-editing levels remained relatively stable until 4-cell stage, but dramatically decreased (68 %) from 4-cell to 8-cell stages (Wilcoxon rank sum test, P = 7.92E-6), and continually decreased (41 %) at morula stage (Wilcoxon rank sum test, P = 1.48E-02; Fig. 2b and Additional file file 1: Figure S2). We then investigated the gene expression of ADAR, ADARB1, and ADARB2 in cells at different stages, and found that ADARB1 and ADARB2 expression remained at low level (~1 Reads Per Kilobase per Million mapped reads, RPKM) across all embryonic stages investigated (Fig. 2c). The substantially lower expression of ADARB1 than ADAR is also seen in many tissues in adult humans (Table S2) [15]. Amazingly, the changes in ADAR expression levels were almost in parallel with the changes in editing levels in cells at all stages investigated (Fig. 2b and Additional file 1: Figure S3). Correlation tests indicated that RNA-editing levels were strongly correlated with ADAR expression levels (Pearson’s correlation test, P = 5.99E-13, r = 0.74; Fig. 2d). Interestingly, we noticed the largest decreases in ADAR expression level and in A-to-I RNA-editing level occurred in the cells at 8-cell stage. It is worth noting that although the ADARB1 expression levels remained low in cells of all stages investigated, we detected a moderate correlation between the ADARB1 expression levels and the A-to-I RNA-editing levels (Pearson’s correlation test, P = 3.38E-4, r = 0.42; Fig. 2e).

Non-synonymous A-to-I RNA-editing events frequently occurred in cells before 8-cell stage

We detected 324 A-to-I RNA editing events at 54 non-synonymous sites on 37 genes, of which 292 non-synonymous A-to-I RNA-editing events on 36 genes occurred in cells before 8-cell stage (Fig. 3a). Among the 54 nonsynonymous sites, 27 sites were deposited in RADAR (Additional file 2: Table S3). In addition, we observed that 7 genes were each non-synonymously edited at a specific embryonic stage.

Fig. 3

Recoding RNA-editing events in cells during early embryonic stages. a Heatmap of non-synonymous RNA-editing events in cells during early embryonic stages. Each row represents a cell, and each column represents a non-synonymous A-to-I editing site. Frequently edited genes are highlighted with dashed lines. Blank regions, uniquely mapped reads < 4, not qualified for RNA-editing determination. b Changes in non-synonymous editing frequency are negatively associated with the gene expression of AZIN1. Left, changes in AZIN1’s editing frequency; middle, changes in AZIN1’s expression; right, editing frequency is negatively associated with the AZIN1 expression level, each open circle represents one single cell. * P < 0.05; ** P < 0.01; *** P < 0.001 We found that AZIN1, DCAF16, SLC3A1, TTF1, and NSMCE2 were non-synonymously edited in more than 25 % of cells, almost exclusively confined to stages before 8-cell. Previous study showed that the non-synonymous RNA-editing sites in AZIN1 (A1099G, ENST00000347770) and DCAF16 (A486G, ENST00000382247) are clinically relevant in cancers [16]. We observed that the A1099G editing events on AZIN1 occurred in 3 of 6 oocytes, 3 of 3 pronuclei, 5 of 5 zygotes, 6 of 9 2-cell cells, 14 of 15 4-cell cells, 4 of 20 8-cell cells and none in morulae cells, respectively, and the A486G editing events on DCAF16 occurred in 3 of 6 oocytes, 3 of 3 pronuclei, 4 of 5 zygotes, 7 of 9 2-cell cells, 6 of 15 4-cell cells, none in 20 8-cell cells and 2 of 10 morulae cells, respectively. In 32 cells, we detected 41 non-synonymous A-to-I RNA-editing events at three sites (A1630G, A1646G, and A1691G, ENST00000409387) in the extracellular topological domain of SLC3A1, which encodes a renal amino acid transporter [17]. We found non-synonymous A-to-I RNA-editing events at three sites (A2651G, A2677G, and A2678G, ENST00000334270) altering the TTF1 C-terminal amino acid sequence, which was suggested to mediate the termination of ribosomal gene transcription [18]. In addition, we found non-synonymous A-to-I RNA-editing events frequently altered the amino acid sequence of NSMCE2 (A262G and A268G, ENST00000519712), which is required for DNA repair [19].

Editing frequencies at the exonic RNA editing sites of four genes are negatively associated with their expression

We conducted Spearman’s correlation analysis between gene expression and editing frequency on 163 genes which were edited in the exon regions and found that A-to-I RNA-editing frequencies were negatively associated with the expression of four frequently edited genes (adjusted P < 0.1, Benjamini-Hochberg method). For instance, the frequency of non-synonymous editing A1099G on AZIN1 dramatically dropped from 4-cell to 8-cell stages (Wilcox rank sum test, P = 2.67E-6; Fig. 3b) while the AZIN1 expression substantially increased (edgeR, P = 3.24E-17; Fig. 3b). We found that AZIN1 expression was highly negatively associated with editing frequency of A1099G (Spearman’s correlation test, r = −0.60, adjusted P = 2.02E-05, Fig. 3b). This pattern was also seen in protein coding genes DCAF16 (Spearman’s correlation test, r = −0.43, adjusted P = 1.99E-02, Additional file 1: Figure S4A), and in long intergenic noncoding RNAs RPL23AP53 (Spearman’s correlation test, r = −0.58, adjusted P = 2.63E-03, Additional file 1: Figure S4B) and SNHG16 (Small Nucleolar RNA Host Gene 16) which was suggested to be associated with cell proliferation [20] (Spearman’s correlation test, r = −0.57, adjusted P = 4.66E-05, Additional file 1: Figure S4C).

A-to-I RNA-editing events in miRNA-targeted mRNA regions of many genes preferably occur at specific stages

It has been suggested that RNA-editing in miRNA-targeted regions could alter miRNA-mediated post-transcriptional gene silencing [21] and gene expression [22]. We identified 609 A-to-I RNA-editing sites in miRNA-targeted regions on 298 genes (Additional file 2: Table S4), and noticed that the editing sites are relatively stable in cells from oocyte to 4-cell stages (57 ± 51, 25 ± 5, 62 ± 39, 68 ± 43, 61 ± 29 in oocyte, pronucleus, zygote, 2-cell, and 4-cell stages, respectively). However, the editing sites in cells at 8-cell and morula stages sharply dropped to 13 ± 8 and 4 ± 2, respectively (Additional file 1: Figure S5). We noticed that the editing events in miRNA-targeted regions on 114 genes, enriched in cell cycle (GSEA, adjusted P = 5.94E-08, Additional file 2: Table S5), occurred preferably in cells crossing oocyte to 4-cell stages (Fisher’s exact test, P < 0.05). In addition to the differences in quantities of editing sites at different stages, A-to-I RNA-editing in miRNA-targeted regions on a substantial number of genes appears to be stage related. By calculating the percentage of cells with editing events in miRNA-targeted regions on each gene at each stage, we found that a group of 19 genes were more frequently edited in oocytes than in cells of other stages (Fisher’s exact test, P < 0.05), whereas a different group of 13 genes were more frequently edited in zygotes (Fisher’s exact test, P < 0.05), and another group of 36 genes were more frequently edited in cells at 2-cell stage (Fisher’s exact test, P < 0.05). Interestingly, the number of genes that were more frequently edited in miRNA-targeted regions increased to 42 in cells at 4-cell stage but suddenly decreased to 4 in cells at 8-cell stage (Fig. 4a). In addition, we observed a small number of genes that were more frequently edited in miRNA-targeted regions in cells crossing two sequential stages than other stages. For instance, one group of 5 genes were more frequently edited at zygote and 2-cell stages, while another 9 genes were more frequently edited at 2-cell and 4-cell stages (Fig. 4a). We noticed that 16 genes involved in generic transcription pathway and 18 genes involved in cell cycle, were edited in miRNA-targeted regions at most stages (Fig. 4b and c). Interestingly, NUP43 and MCM4 involved in cell cycle were frequently edited in miRNA-targeted regions in cells across zygote to 4-cell stages (Fig. 4b).

Fig. 4

Genes are edited in miRNA-targeted mRNA regions in a stage-specific fashion. a Heatmaps of the percentage of cells edited at each stage (left) and P-values by Fisher’s exact tests (excluding pronucleus stage for insufficient cells) on the number of edited and unedited cells between a specific stage and other stages (right). Vertical bars highlight the genes that are more frequently edited in miRNA-targeted regions at one (black) or two sequential stages (blue) than other stages. Arabic numeral denotes the number of genes in each group. b and c Heatmaps of the frequencies of cells edited on genes involved in cell cycle and in generic transcription pathway, respectively. Each column represents an embryonic stage, and each row represents a gene

Discussion

In this study, we detected genome-wide A-to-I RNA-editing in cells ranging from oocyte to morula stages, and defined editing-level to reflect RNA-editing activities in a cell. We showed that A-to-I RNA-editing levels dramatically decreased at 8-cell stage. By looking at the impacts of A-to-I RNA-editing on protein recoding, we found that seven genes such as AZIN1 were frequently non-synonymously edited in cells of multiple embryonic stages. We noticed that the proportion of intronic A-to-I editing sites identified in our study (33.70 %) is much smaller than that (94.03 %) in RADAR. Because introns are retained only in premature mRNA of genes, the sequencing depths in intron regions are usually lower than those in exon region of expressed genes in RNA-seq. Because the single cell RNA-seq data we analyzed are low-coverage, thus, the depths of many intronic regions are less than 4, which do not meet our requirement for determination of RNA editing events. This leads to a substantial reduction in A-to-I RNA editing sites being identified in intron regions, consequently largely reducing the percentage of intronic A-to-I RNA editing sites among the total A-to-I RNA editing sites identified. Human embryogenesis is a complex and genetically well-programmed developmental process that is controlled by cascades of genes. Our results indicated that A-to-I RNA-editing acted in a stage-specific fashion during human early embryogenesis. We noticed that genome-wide A-to-I editing level suddenly and dramatically dropped in cells at 8-cell stage, suggesting that this sudden drop of A-to-I RNA-editing level may have an important biological significance during human early embryogenesis. Although the biological function of this sudden drop of editing level is yet to be discovered, we consider this event particularly interesting. Previous studies suggested that the 8-cell stage is a turning point because many important biological events occur at this stage. For example, a recent study on human early embryogenesis showed that there was a dramatic change in gene expression in cells at 8-cell stage as compared to the previous stages. At 8-cell stage, the expression of 3037 genes, enriched with genes involved in regulation of transcription and regulation of RNA metabolic process, substantially up-regulated while 1941 genes, enriched with genes involved in regulation of transcription and cell cycle, substantially down-regulated [9]. The X chromosome inactivation (XCI) is an important mechanism that compensates for the difference in gene dosage between XX females and XY males in mammals. Interestingly, the XCI in humans appears to start from the 8-cell stage [23]. Moreover, embryonic left-right separation is an important event during embryogenesis of bilaterians. In a recent study, by analyzing multiple lines of molecular and cell biology evidence, Ma concluded that embryos of bilaterians are divided into left and right lateral halves at or shortly after 8-cell stage [24]. AZIN1 plays a role in cell growth and proliferation by maintaining polyamine homeostasis within cells [25]. A non-synonymous editing (A1099G) on AZIN1 was found to be conserved in human and mouse [11]. Previous study showed that the A1099G edited AZIN1 resulted in substantially enhancing cell proliferation in cultured liver cell lines [25]. In addition, previous studies showed that the editing frequencies at this site were significantly higher in tumors than in matched nontumorous tissues in hepatocellular carcinoma and esophageal squamous cell carcinoma [25, 26]. Besides, AZIN1 expression was significantly increased in esophageal tumors compared with their matched nontumor specimens [26]. We observed that A1099G edited AZIN1 are present in 31 out of 38 cells crossing from oocyte to 4-cell stages, and 4 out of 20 cells at 8-cell stages. Intriguingly, the editing frequency at this site dramatically dropped from 4-cell to 8-cell stage while the AZIN1 expression substantially increased. We found that AZIN1 expression was highly and negatively correlated with the A1099G editing frequencies. As either increase in A1099G editing or high expression of AZIN1 may promote cell proliferation, we speculate that the frequent A1099G editing in earlier stages may be a compensation of the low level of AZIN1 expression. When embryos produce more AZIN1, a decrease in edited AZIN1 could keep the stability of overall AZIN1 activity. We believe that the underlying biological significance of the frequent A1099G editing on AZIN1 and the function of the A1099G edited AZIN1 during human early embryogenesis requires further investigation. We also believe that the negative association between the AZIN1 expression and the A1099G editing frequencies during early human embryogenesis is particularly interesting, deserving more comprehensive investigation.

Conclusions

Taken together, our study indicates that human embryos undergo dynamic changes in genome-wide A-to-I RNA-editing during human early embryogenesis. Our findings underscore the importance of A-to-I RNA-editing during early human embryogenesis. It is worth noting that our findings are based on the observation of 68 cells from 29 embryos across 7 embryonic stages. Therefore we believe it is necessary to conduct more comprehensive studies to verify our findings and to understand the biological significance of the dynamic changes in A-to-I editing during human early embryogenesis.

Methods

Reads mapping and pre-processing

We downloaded single cell RNA-seq data of human embryos spanning from oocyte to morula stages from two previous studies [8, 9] from NCBI database. In both studies, RNA libraries were sequenced on Illumina platform but not strand-specific. Xue et al., sequenced the libraries as 90 bp-long pair-end reads, while Yan et al., obtained 100 bp-long single-end reads. We noticed that the sequencing quality near the 3’-ends of some single-end reads were not satisfactory. Therefore, we trimmed off 15 bp from the 3’-end of every single-end sequencing read to eliminate sequencing errors. We used SOAPnuke to filter out reads from both studies that contained adapters and low quality using default parameter. We downloaded the hg19 (GRCh37) genome sequences from the UCSC Genome Browser (http://genome.ucsc.edu). We aligned the filtered reads with Tophat2 [27] using command: Tophat2 --read-gap-length 3 --read-edit-dist 3 --no-novel-juncs --no-novel-indels --transcriptome-index = transcriptome-index hg19. The transcriptomes-index is generated using ensemble gene set. The command is: Tophat2 -G Homo_sapiens.GRCh37.75.gtf --transcriptome-index = transcriptome-index hg19. Then we removed PCR-induced duplications using Picard (v1.84; http://broadinstitute.github.io/picard/) and recalibrated base quality using GATK (v2.8-1).

RNA-editing detection

We only selected the data of 68 single cells from 29 embryos each with over 0.5 Gb uniquely mapped bases for further analyses. We summarized the base calls of pre-processed aligned RNA-reads to the human reference in pileup format. To identify candidate RNA-editing sites, we only used sequencing bases with base-quality ≥ 20. We determined RNA-editing sites as follows: We perform statistical tests based on binomial distribution B (n, p) to distinguish true variants from sequencing errors on every mismatch site, where p denotes the background mismatch rate of each transcriptome, and n denotes sequencing depth on this site. On a given specific site with k reads supporting variant in all n mapped reads, we use B (k, n, p) to calculate the probability that the k mismatches are all due to sequencing errors. This probability is adjusted using the Benjamini-Hochberg method. We only retain candidate sites with adjusted P-value < 0.01. In addition, the candidate variant sites should be with mapped reads ≥ 4, variant-supporting reads ≥3, and mismatch frequencies (variant-supporting-reads/mapped-reads) ≥ 0.1. We filter out variants with strand bias, referring to abnormal distribution of sense strand variant supporting reads and antisense strand variant supporting reads in sequencing data. This sequencing bias may introduce false positive. We estimated strand bias and filtered out variants with strand bias as follows. (a) We performed a two-tailed Fisher’s exact test (FET) using the following two-by-two table: (b) We estimated variant strand frequency (sense-strand variant-supporting reads divided by total variant-supporting reads), variant strand preference [absolute (variant strand frequency minus 0.5)], reference strand frequency (sense-strand reference minus supporting read number divided by total reference-supporting read number), and reference strand preference [absolute (reference strand frequency minus 0.5)]. (c) We filtered out variant sites displaying significant strand bias, defined as either FET P-value < 0.005 plus variant strand preference > reference strand preference, or variant strand frequency > 0.9, or variant strand frequency < 0.1. Because the mismatches at ends of sequencing reads are less credible than the ones in the middle of reads, therefore, we filter out variants with position bias, i.e. the majority of variant supporting bases are located at read ends. Read end is defined as 10 bp at 3’-end or 5 bp at 5’-end. Position bias are defined as either FET P-value < 0.05 plus read end frequency > read middle frequency, or read end frequency > 0.9. To reduce false positives introduced by misalignment of sequencing reads to high similarity regions of the reference sequences, we perform a realignment filtering. Specifically, we extract variant-supporting reads on candidate variant sites and realign them against a combination reference (hg19 genome + ensemble transcript reference v75) by bwa-0.6.2. We retain a candidate variant site if at least 90 % of its variant-supporting reads are realigned to this site. We remove variant sites that are either in simple repeat regions (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/), or in homopolymer regions (runs of ≥ 5 bp). For sites located in non-Alu regions, we additionally remove the sites within 5 bp from a splicing site. To filter out common DNA SNPs, we build combined DNA SNP datasets from dbSNP (V138), 1000 Genome SNP (Phase 3), and human populations of Dutch [28], Mongolian (unpublished), and Dai (unpublished). Candidate variants are filtered out if they are found in the combined DNA SNP datasets. For rare SNPs filtering, we filter out RNA variants that are only detected in one embryo, because true editing sites are often present in different individuals, whereas rare SNPs are most likely not, as Ramaswami suggested [10]. We also filter out candidate RNA variants if their mismatch frequencies > 0.95. Finally, we tried to filter out private DNA-induced variants using the exon DNA sequencing data from the father of 12 embryos in Xue’s study. We found no additional paternal exonic DNA variants in the RNA mismatches we identified (Additional file 1: Table S6), suggesting that the false positive rate of DNA-induced RNA variants within the RNA mismatches we identified is considerably low.

Hierarchical clustering

In cluster analysis, we created a matrix C (i, k) to store the editing frequency (0 for no edit and −1 for depth < 4) on site i in sample k. We calculated the euclidean distances among cells based on this matrix using the dist() function and performed hierarchical clustering using the hclust() function with default parameters. Both functions could be found in R package stats. The heatmap was drawn using R package pheatmap.

Gene expression analyses

We calculate the number of reads aligned to each gene using the featureCounts function in R package Rsubread [29]. We performed differential gene expression analyses using the R package edgeR [30]. The RPKM values and differential expression P-values of genes used in this study were reported by edgeR.

RNA-editing sites annotation

We used ANNOVAR [31, 32] to functionally annotate RNA-editing sites. The gene set used in annotation including ensemble gene set v75 and NONCODE v4 [33]. When studying the editing in miRNA-target regions, we focused on regions that were complementary to miRNA seed within 3’ untranslated regions based on the predicted miRNA targeted regions downloaded from http://www.microrna.org/microrna/getDownloads.do (August 2010 release, good mirSVR scores).

Gene set enrichment analysis

The gene set enrichment analysis is performed on website http://software.broadinstitute.org/gsea/msigdb/annotate.jsp [34, 35]. The gene sets we used to compute overlaps included Canonical pathways (CP), BioCarta gene sets (CP:BIOCARTA), KEGG gene sets (CP:KEGG) and Reactome gene sets (CP:REACTOME).

Sense strand variant supporting reads	Sense strand reference supporting reads
Antisense strand variant supporting reads	Antisense strand reference supporting reads

34 in total

1. The Genomic Landscape and Clinical Relevance of A-to-I RNA Editing in Human Cancers.

Authors: Leng Han; Lixia Diao; Shuangxing Yu; Xiaoyan Xu; Jie Li; Rui Zhang; Yang Yang; Henrica M J Werner; A Karina Eterovic; Yuan Yuan; Jun Li; Nikitha Nair; Rosalba Minelli; Yiu Huen Tsang; Lydia W T Cheung; Kang Jin Jeong; Jason Roszik; Zhenlin Ju; Scott E Woodman; Yiling Lu; Kenneth L Scott; Jin Billy Li; Gordon B Mills; Han Liang
Journal: Cancer Cell Date: 2015-10-01 Impact factor: 31.743

Review 2. Molecular diversity through RNA editing: a balancing act.

Authors: Sanaz Farajollahi; Stefan Maas
Journal: Trends Genet Date: 2010-04-13 Impact factor: 11.639

3. Recoding RNA editing of AZIN1 predisposes to hepatocellular carcinoma.

Authors: Leilei Chen; Yan Li; Chi Ho Lin; Tim Hon Man Chan; Raymond Kwok Kei Chow; Yangyang Song; Ming Liu; Yun-Fei Yuan; Li Fu; Kar Lok Kong; Lihua Qi; Yan Li; Na Zhang; Amy Hin Yan Tong; Dora Lai-Wan Kwong; Kwan Man; Chung Mau Lo; Si Lok; Daniel G Tenen; Xin-Yuan Guan
Journal: Nat Med Date: 2013-01-06 Impact factor: 53.440

4. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

5. ncRAN, a newly identified long noncoding RNA, enhances human bladder tumor growth, invasion, and survival.

Authors: Yuyan Zhu; Meng Yu; Zhenhua Li; Chuize Kong; Jianbin Bi; Jun Li; Zeshou Gao; Zeliang Li
Journal: Urology Date: 2010-12-13 Impact factor: 2.649

Review 6. The ADAR protein family.

Authors: Yiannis A Savva; Leila E Rieder; Robert A Reenan
Journal: Genome Biol Date: 2012-12-28 Impact factor: 13.583

7. Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants.

Authors: Robert Petryszak; Maria Keays; Y Amy Tang; Nuno A Fonseca; Elisabet Barrera; Tony Burdett; Anja Füllgrabe; Alfonso Muñoz-Pomer Fuentes; Simon Jupp; Satu Koskinen; Oliver Mannion; Laura Huerta; Karine Megy; Catherine Snow; Eleanor Williams; Mitra Barzine; Emma Hastings; Hendrik Weisser; James Wright; Pankaj Jaiswal; Wolfgang Huber; Jyoti Choudhary; Helen E Parkinson; Alvis Brazma
Journal: Nucleic Acids Res Date: 2015-10-19 Impact factor: 16.971

8. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors: Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal: Bioinformatics Date: 2009-11-11 Impact factor: 6.937

9. Adenosine deamination in human transcripts generates novel microRNA binding sites.

Authors: Glen M Borchert; Brian L Gilmore; Ryan M Spengler; Yi Xing; William Lanier; Debashish Bhattacharya; Beverly L Davidson
Journal: Hum Mol Genet Date: 2009-09-23 Impact factor: 6.150

10. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Authors: Daehwan Kim; Geo Pertea; Cole Trapnell; Harold Pimentel; Ryan Kelley; Steven L Salzberg
Journal: Genome Biol Date: 2013-04-25 Impact factor: 13.583

9 in total

Review 1. ADAR RNA editing in human disease; more to it than meets the I.

Authors: Angela Gallo; Dragana Vukic; David Michalík; Mary A O'Connell; Liam P Keegan
Journal: Hum Genet Date: 2017-09-14 Impact factor: 4.132

2. Inosine RNA modifications are enriched at the codon wobble position in mouse oocytes and eggs†.

Authors: Pavla Brachova; Nehemiah S Alvarez; Xiaoman Hong; Sumedha Gunewardena; Kailey A Vincent; Keith E Latham; Lane K Christenson
Journal: Biol Reprod Date: 2019-11-21 Impact factor: 4.285

3. Accurate identification of RNA editing sites from primitive sequence with deep neural networks.

Authors: Zhangyi Ouyang; Feng Liu; Chenghui Zhao; Chao Ren; Gaole An; Chuan Mei; Xiaochen Bo; Wenjie Shu
Journal: Sci Rep Date: 2018-04-16 Impact factor: 4.379

Review 4. Antibodies specific for nucleic acid modifications.

Authors: Regina Feederle; Aloys Schepers
Journal: RNA Biol Date: 2017-02-23 Impact factor: 4.652

5. Spatiotemporal mapping of RNA editing in the developing mouse brain using in situ sequencing reveals regional and cell-type-specific regulation.

Authors: Elin Lundin; Chenglin Wu; Albin Widmark; Mikaela Behm; Jens Hjerling-Leffler; Chammiran Daniel; Marie Öhman; Mats Nilsson
Journal: BMC Biol Date: 2020-01-14 Impact factor: 7.431