Literature DB >> 30202058

COMRADES determines in vivo RNA structures and interactions.

Omer Ziv^1,2, Marta M Gabryelska³, Aaron T L Lun⁴, Luca F R Gebert⁵, Jessica Sheu-Gruttadauria⁵, Luke W Meredith⁶, Zhong-Yu Liu^7,8, Chun Kit Kwok⁹, Cheng-Feng Qin⁷, Ian J MacRae⁵, Ian Goodfellow⁶, John C Marioni^4,10,11, Grzegorz Kudla¹², Eric A Miska^13,14,15.

Abstract

The structural flexibility of RNA underlies fundamental biological processes, but there are no methods for exploring the multiple conformations adopted by RNAs in vivo. We developed cross-linking of matched RNAs and deep sequencing (COMRADES) for in-depth RNA conformation capture, and a pipeline for the retrieval of RNA structural ensembles. Using COMRADES, we determined the architecture of the Zika virus RNA genome inside cells, and identified multiple site-specific interactions with human noncoding RNAs.

Entities: Chemical

Mesh：

Substances：

Year: 2018 PMID： 30202058 PMCID： PMC6168409 DOI： 10.1038/s41592-018-0121-0

Source DB: PubMed Journal: Nat Methods ISSN： 1548-7091 Impact factor: 28.547

RNA conformational flexibility is essential for splicing, translation, and posttranscriptional regulation. Recent methods have utilized proximity ligation to reveal RNA base-pairing within cells[1-6], but because of insufficient probing depth and lack of appropriate computational algorithms, it has been difficult to assess the in vivo structural dynamics of RNAs. We developed a method—Crosslinking Of Matched RNAs And Deep Sequencing (COMRADES)—that couples in vivo probing of RNA base-pairing with selective RNA capturing. We additionally established an algorithm for assessing the structural complexity of RNA inside cells. (Fig 1a).

Fig. 1:

COMRADES methodology.

a, Outline of COMRADES experimental workflow and the associated computational pipeline. B: biotin. b, % of chimeric reads in COMRADES and control datasets. Mean and s.d. of 3 independent experiments are shown. c, Probed interactions among cytoplasmic and mitochondrial ribosomal RNA subunits. Mean and s.d. of 3 independent experiments are shown. rRNA: cytoplasmic ribosomal RNA; mtrRNA: mitochondrial ribosomal RNA. d, Heat map of ZIKV RNA-RNA interactions. Each dot represents an interaction between the genomic coordinates on the x and y axes. Chimeras ligated in 5'-3' and 3'-5' orientations are plotted above and below the diagonal respectively. e, Zoom-in on a selected 1,500 nt region from (d).

COMRADES utilizes a cell-permeable, azide modified psoralen derivative (Psoralen-triethylene glycol azide) to facilitate coupling of two effective affinity capturing steps, while overcoming the limited cell permeability of biotin labelled psoralen (Supplementary Fig. 1a). The azide group does not affect the psoralen crosslinking properties (Supplementary Fig. 1a). Following in vivo crosslinking, an RNA of interest is selectively captured, allowing nearly 1,000-fold enrichment (Supplementary Fig. 1b-c). The RNA is then fragmented and a copper free click-chemistry reaction links a biotin moiety to in vivo crosslinked regions, enabling a second streptavidin-based affinity selection of crosslinked regions (Supplementary Fig. 1d). Half of resulting RNA is proximity ligated to create RNA chimeras, following reversal of the crosslink to enable high-throughput sequencing and assessment of the base-pairing (Supplementary Fig. 1e). The other half is used as a control, in which reversal of the crosslink precedes the proximity ligation. COMRADES and control samples contain essentially identical RNA composition, ensuring accurate assessment of artificial chimeric reads originated from random ligation or reverse transcription errors. COMRADES’s dual enrichment substantially increases structure probing depth of selected RNA, thus enabling an unbiased and global view of coexisting conformations. COMRADES yields high levels of ligated chimeric reads, whereas these are kept 4 fold lower in the control, as well as in non-crosslinked samples (Fig. 1b). We successfully reported on the known ribosomal RNA structure with high sensitivity (Supplementary Fig. 2), while spurious interactions between cytoplasmic and mitochondrial ribosomal RNA subunits occurred at a very low level (Fig. 1c). The robustness of COMRADES is further demonstrated by its high reproducibility (Fig. 1d-e). RNA viruses utilize RNA base-pairing to regulate various aspects of their life cycle[7-12]. Inside the host cell however, the full-length architecture of RNA genomes and their interactions with the host transcriptome are largely unknown. We used COMRADES to determine RNA base-pairing along the 10.8 kilobases-long single-stranded RNA genome of Zika virus (ZIKV) from the Flavivirus genus inside human cells. We identified 1.7 million non-redundant chimeric reads corresponding to the structure of the ZIKV genome (Fig. 1d-e, Supplementary Fig. 3). This high probing coverage is valuable for analysing multiple coexisting conformations. Previous work mainly identified RNA structures in the untranslated regions (UTRs) of flaviviruses, while 95% of the genome remained unexplored[7-10]. COMRADES identified base-pairing along the entire genome and between the open reading frame (ORF) and the UTRs (Supplementary Fig. 4a). Nearly 80% of the identified interactions spanned a distance of less than 1,000 nucleotides (nt), implying local structure with a certain degree of three-dimensional compaction (Supplementary Fig. 4b). Both short- and long-range interactions were supported by reproducible, well-defined clusters of chimeric reads, ligated in 5'-3' and 3'-5' orientations (Supplementary Data 1), and showed strong evidence of base-pairing when analyzed with the hybrid-min RNA-folding algorithm as compared to a shuffled-chimeras control (Wilcoxon test p-value < 0.0001). COMRADES therefore enables a deep and comprehensive analysis of RNA base-pairing inside cells. During replication, the genome of flaviviruses undergoes a global conformational change mediated by a long-distance base-pairing between the 5' and 3' cyclization sequences[7,13] (5' CS and 3' CS respectively). Additional elements contributing to genome cyclization are the upstream and downstream of AUG regions (UAR and DAR respectively)[8,9]. COMRADES detected extensive and highly specific base-pairing between the known cyclization elements, therefore demonstrating genome cyclization inside cells (Fig. 2a-c, Supplementary Fig. 5a). COMRADES further refined the nature of the base-pairing associated with genome cyclization, by identifying contact regions upstream of the 5' UAR and downstream of the 3' UAR (Fig. 2a,c). We confirmed the existence of previously defined functional RNA pseudoknots including the dumbbell (DB) pseudoknot[14], the downstream of 5' cyclization sequence-pseudoknot (DCS-PK)[15], and the SL1 pseudoknot[16] (Supplementary Fig. 5b). We additionally detected an alternative 5' UTR conformation where stem-loops A and B (SLA and SLB respectively) are not formed but rather engaged in long-distance base-pairing with the downstream envelope coding sequence (Supplementary Fig. 5c-e). The essential role of SLA during replication suggests that this structure is more likely involved in virus translation or packaging. Overall, COMRADES identified nearly all previously known flavivirus RNA structures and has further defined critical base-pairing involving the UTRs.

Fig. 2:

The ZIKV genomic structure inside human cells.

a, Heatmap of RNA-RNA interactions between cyclization elements. Exp: experiment; cont: control. b, Viewpoint histograms showing binding positions of the cyclization sequences along the ZIKV genome. Viewpoint regions are marked by dashed red lines. c, Probed interactions along the circular genome conformation. Colour code indicates the number of non-redundant chimeric reads supporting each base-pair. New: newly identified base-pairing. d, Folding energy (dG) and experimentally supporting evidence (chimera reads) for each of the 1,000 computationally predicted structures corresponding to ZIKV genome nt 2,288 - 3,323. r: Pearson correlation coefficient. e, Clustering and prediction of alternative conformations for the region shown in (d). Colour code as described in (c).

Our intra-viral RNA-RNA interaction map revealed the presence of multiple mutually exclusive RNA structures, where one region alternately base-pairs with several other regions. The averaged Shannon entropy per nucleotide was 5.9 bits, implying high folding plasticity (Supplementary Fig. 6a). We found a strong inverse correlation between the degree of experimental support for base paired regions and their entropy (i.e., strong base-pairing correlates with low entropy, Supplementary Fig. 6b-d, 7). To explore the ensemble of alternative structures, we developed an algorithm to computationally fold ~1,000 nucleotide-long regions using randomly selected subsets of high-confidence mutually compatible folding constraints derived from COMRADES data. For each region, a set of 1,000 structures was generated (Supplementary Data 2-11). The validity of this approach is demonstrated by the clear correlation between the thermodynamic stability and the number of reads supporting each structure (Fig 2d, Supplementary Fig. 8). Nevertheless, the most thermodynamically favoured structures gained only moderate experimental support, implying the additional impact of the cellular environment on RNA folding[17-19]. We further computed the degree of similarity between all pairs of structures and applied multidimensional scaling to cluster structures based on their similarity (Supplementary Data 12). The presence of separated well defined clusters reflects the occurrence of alternative conformations (Fig. 2e, Supplementary Fig. 9). As a control, we randomly shuffled the interacting RNA partners between the chimeric reads; the resulting shuffled structures clustered separately from the structures recovered by COMRADES (Supplementary fig. 10a,b). A single ZIKV structure typically accounted for ~30% of the in vivo observed interactions, whereas a reduced set of 5 structures was sufficient to capture 80-90% of the in vivo data (Supplementary Fig. 10c). Our analysis suggests that the intracellular folding complexity of the ZIKV genome might be explained by postulating the coexistence of a small set of alternative conformations. Viral RNAs have an inherent capacity to form specific interactions through base-pairing with host RNAs[20], but little is known about the prevalence of such interactions. COMRADES revealed multiple interactions between the ZIKV genome and human small regulatory RNAs (Fig. 3a). We found site-specific interactions between the ZIKV ORF and the U1 small nuclear RNA (snRNA, Supplementary Fig. 11a), plausibly affecting host splicing. We similarly detected site-specific interactions with certain human tRNAs (Supplementary Fig. 11b). We identified several interactions between the ZIKV genome and human microRNAs such as miR-21, miR-19, miR-512, miR-515, and miR-1323, while no interactions with microRNAs were detected in control datasets. (Fig. 3b, Supplementary Fig. 11c). COMRADES indicated non-canonical base-pairing between the 5' CS of ZIKV and the seed region of miR-21 (Fig. 3c, Benjamini-Hochberg adjusted p-value 1.0E−13). The miR-21 interaction with the ZIKV genome was further proved significant using an independent analysis pipeline (FDR 3.0E−25, quasi-likelihood moderated F-test). In vitro synthesized miR-21 failed to bind the ZIKV 5' CS on its own, while preloading miR-21 onto purified Argonaute 2 (AGO2) facilitated a strong and sequence specific interaction (Supplementary Fig. 12), supporting the involvement of AGO2 in this base-pairing. CRISPR-Cas9 deletion of MIR21 or antisense inhibition of mature miR-21 in human cells decreased the intracellular level of the ZIKV genome (Supplementary Fig. 13a-d). Abrogating the miR-21 binding ability of a ZIKV replicon through point mutations rendered it insensitive to miR-21 antisense inhibition (Supplementary Fig. 13e), indicating that miR-21 acts through direct interaction with the 5' CS. The ZIKV envelope protein was similarly affected in the MIR21 deficient cells (Supplementary Fig. 14). Although the effect size was relatively small, the strong evidence for the miR-21 - 5' CS interaction presented here suggests that miR-21 might assume a greater pro-viral role in the physiological context.

Fig. 3:

Host-virus RNA-RNA interactions.

a, Human RNA species interacting with the ZIKV genome. Mean and s.d. of 3 independent experiments are shown. b, Probed interactions between the ZIKV genome and specific miRNAs in COMRADES and control samples. c, COMRADES determined base-pairing between ZIKV and miR-21. d, A model of the ZIKV 5' CS engaged in three separate functions. Ribosome and nascent polypeptide are marked in green.

COMRADES revealed the highly dynamic nature of an RNA genome inside cells, and its ability to engage in base-pairing with multiple host regulatory RNAs. The involvement of the conserved 5' CS of ZIKV in genome cyclization, capsid translation and miR-21 binding further demonstrates the intracellular structural complexity of viral RNA genomes. The general applicability of COMRADES provides an opportunity to undertake an unbiased analysis of the dynamic nature of RNA inside cells and can be utilized to investigate the structure and interacting partners of any cellular or foreign RNA in any species.

Online methods

COMRADES.

Each independent experiment was carried out on a different day, and included 3 sequencing-libraries: COMRADES, control, and a non-crosslinked sample.

Psoralen crosslink.

JEG-3 cells (~50 million cells per experiment) were inoculated with ZIKV isolate PE243, at MOI: 2 TCID50/cell. 20 hours post inoculation cells were washed 3 times in PBS and were incubated for 20 minutes with 0.4 mg/ml Psoralen-triethylene glycol azide (psoralen-TEG azide, Berry & Associates) dissolved in PBS and diluted in OptiMEM I with no phenol-red (Gibco). Cells were irradiated on ice with 365 nM UV for 10 minutes using a CL-1000 crosslinker (UVP). Prolonged UVA irradiation should be avoided as it might decompose the azide moiety. Cells were lysed using RNeasy lysis buffer. Proteins were degraded by proteinase K (NEB) and RNA was purified using RNeasy midi kit (Qiagen).

Viral RNA enrichment.

Total RNA was mixed with an array of 50 biotinylated DNA oligos, 20 nucleotides-long each (IDT) designed to capture the ZIKV genomic RNA and was maintained at 37 °C for 6 hours rotating in the following hybridization buffer: 500 mM NaCl, 0.7% SDS, 33 mM Tris-Cl pH 7, 0.7 mM EDTA, 10% Formamide. Hybridization and wash conditions were adapted from[21]. At the end of incubation Dynabeads MyOne Streptavidin C1(Invitrogen) were added and the RNA was incubated for additional 1 hour at 37 C. Beads were captured on a magnet and were washed 5 times with 2x SSC buffer containing 0.5% SDS. RNA was released from beads by degrading the DNA-probes with 0.1 units/μl Turbo DNase (Invitrogen) at 37 °C for 30 minutes. RNA was cleaned by RNA Clean & Concentrator (Zymo Research).

Crosslink pulldown.

RNA was fragmented to an average size of 100 nucleotides using RNase III (Ambion) and was cleaned by RNA clean & concentrator (Zymo Research). Copper-free Click reaction was carried at 37 °C for 90 minutes in the presence of 150 μM Click-IT Biotin DIBO Alkyne (Life technologies) and 0.5 units/μl Superase-In (Invitrogen). Reaction was terminated by RNA Clean & Concentrator (Zymo Research). Biotinylated RNA was pulldown using Dynabeads MyOne Streptavidin C1(Invitrogen) at the following reaction conditions: 100 mM Tris-Cl pH 7.5, 10 mM EDTA, 1 M NaCl, 0.1% Tween-20, 0.5 unit/μl Superase-In. Beads were captured on a magnet and were washed 5 times with 100 mM Tris-HCl pH 7.5, 10 mM EDTA, 3.5M NaCl, 0.1% Tween-20. RNA was eluted by adding 95% Formamide, 10 mM EDTA solution and incubating at 65oC for 5 minutes. To avoid enrichment of small RNA chimeric reads that cannot be double-aligned to the reference ZIKV genome / Human transcriptome, RNA was size fractionated on 10% TBE-Urea gel and fragments corresponding to a size of 100-200 nucleotides were eluted overnight at 4 °C in 10 mM Tris-HCl pH 7.5, 1 mM EDTA, 250 mM NaCl, 0.1% SDS. RNA was concentrated using RNA Clean & Concentrator (Zymo Research).

Proximity ligation and crosslink reversal.

At this stage, the RNA sample was divided in two. One half was used for proximity ligation and then crosslink reversal (i.e. COMRADES sample), while in the other half, crosslink reversal was done before proximity ligation (i.e. control sample). We included an additional control containing an equimolar concentration (albeit a non-similar composition) of non-psoralen treated, non-crosslinked enriched RNA (i.e., Non-crosslinked sample). Before proximity ligation, the RNA was heated to 85 °C for 2 minutes and was cooldown rapidly on ice. Proximity ligation was done under the following conditions: 1 unit/μl RNA ligase 1 (New England Biolabs), 1x RNA ligase buffer, 50 mM ATP, 1 unit/μl Superase-in (Invitrogen), final volume: 200 μl. Reaction was incubated for 16 hours at 16 °C and was terminated by cleaning with RNA Clean & Concentrator (Zymo Research). Crosslink reversal was done by irradiating the RNA on ice with 2.5 KJ/m2 UVC.

Sequencing library preparation.

Library preparation was done as described in[22] with the following modifications: 6N unique molecular identifiers were added to the 5' end of the 3' sequencing adapter; primers and adapters concentrations were lowered to match the low RNA input; Agencourt RNAClean XP beads (Beckman Coulter) were used for clean-up and size separation; pre-adenylated 5' and 3' adapters were used and all ligation reactions were carried without ATP to reduce ligation artefacts. All libraries and controls went through 13 PCR cycles using KAPA HiFi HotStart Ready Mix (KAPA Biosystems). PCR products were size-selected on a 1.8% agarose gel before loading on a HiSeq 1500 sequencer (Illumina).

Cell culture.

JEG-3 placental trophoblasts (ATCC) and Hela cells (ATCC) were cultured in Minimum Essential Medium supplemented with 10% fetal bovine serum, 1 mM sodium pyruvate, GlutaMAX, non-essential amino acids and penicillin-streptomycin. Vero cells (Sigma-Aldrich) were cultured in Dulbecco’s modified Eagle’s medium supplemented with 10% fetal bovine serum, GlutaMAX and penicillin-streptomycin. All cell lines were cultured in a humidified CO2 incubator at 37 °C and were regularly examined to exclude mycoplasma contamination.

Virus inoculation.

ZIKV isolate PE243 was originated from Recife, Brazil in 2015. The virus was propagated in Vero cells and titer was determined by measuring the 50% Tissue Culture Infective Dose (TCID50) in JEG-3 cells. For measurements of virus replication, JEG-3 or Hela cells were inoculated with ZIKV at MOI: 0.1 TCID50/cell for three hours, after which cells were washed 3 times with PBS and supplemented with fresh growth medium. 24 hours post inoculation medium was removed, cells were washed 3 times with PBS and RNA was extracted using RNeasy kit (Qiagen). Virus copy number was determined using a TaqMan real-time PCR Assay (Primerdesign) and was normalized to GAPDH and ribosomal RNA. All virus work was handled in a containment level 2 facility registered with the HSE under COSHH.

Replicon assay.

ZIKV wildtype and 5' CS - 3' CS double mutated replicons were described previously[23]. Replicon RNA was synthesized in vitro using MEGAscript T7 Transcription Kit (Ambion). Replicon RNA was Capped using the ScriptCap m7G Capping System (Cellscript) and transfected to Hela cells using the TransIT-mRNA Transfection Kit (Mirus). Replicon levels were analyzed after 24-48 hours using a microplate luminometer (Promega) and normalized to baseline luminescence values measured at 6 hours post transfection.

MIR21 knockout.

JEG-3 cells were transfected with CAS9-gRNA riboprotein complexes using Lipofectamine RNAiMAX (Life technologies) according to the Alt-R CRISPR-Cas9 user guide (IDT). MIR21 knockout clone1 was generated using the following guide RNA: 5'-TCATGGCAACACCAGTCGATGGG-3' and contains a homozygous deletion at the positions 59841310-59841326 on chromosome 17 (GRCh38/hg38 Assembly). MIR21 knockout clone2 was generated using a mixture of two guide RNAs: 5'-ATGTCAGACAGCCCATCGACTGG-3', 5'-CTACCATCGTGACATCTCCATGG-3', and contain a homozygous deletion at positions 59841249-59841321 on chromosome 17. MIR21 knockout and control clones were validated by sanger sequencing and by TaqMan Advanced miRNA Assay targeting the mature miR-21 (Life Technologies).

miR-21 inhibition.

Hela cells were transfected with inhibitors targeting human miR-21 or non-targeting control A (Power inhibitors, Exiqon) at a final concentration of 25 nM using Lipofectamine RNAiMAX. 6 hours post transfection medium was replaced and cells were inoculated with ZIKV or re-transfected with ZIKV replicons as described above. miR-21 inhibition was validated using a psiCHECK-2 reporter (Promega) carrying a fully complementary miR-21 site at the 3' UTR of a Renilla luciferase reporter along with a Firefly reporter to normalise transfection efficiency. The miR-21 psiCHECK-2 reporter was deposited in addgene (plasmid number 114206). Luminescence was assessed using the Dual-reporter assay (Promega) and normalized to control psiCHECK-2 without the miR-21 binding site.

Gel-based Reverse Transcription Stalling (RTS) assay.

RTS assay was performed as previously described[24] using a Cy5-labeled primer targeting the human 5.8S ribosomal RNA: 5'-Cy5-AAGCGACGCTCAGACAGG-3'.

Dot blot analysis.

50 ng crosslinked RNA, or the indicated amount of 50 nt-long biotinylated standards were spotted on to a Biodyne B Nylon Membrane (Life technologies) and dried by baking at 80 °C for 10 minutes. Biotinylated RNA was detected using the chemiluminescent nucleic acid detection module Kit (Life technologies) and visualized using ChemiDoc MP Imaging System (Biorad).

Purification of human AGO2 loaded with miR-21.

Human AGO2 homogeneously loaded with miR-21 was prepared according to a published protocol[25]. Human AGO2 was expressed in Sf9 cells using the Bac-to-Bac baculovirus expression system (Thermo Fisher Scientific). Sf9 cells were lysed and human AGO2 was purified by Ni-NTA affinity chromatography using a His tag. Human AGO2 was loaded with synthetic 5'-phosphorylated miR-21 (IDT), and the His tag was removed using Tobacco etch virus protease. Human AGO2 loaded with miR-21 was captured using an antisense oligonucleotide (IDT), eluted, and purified by size exclusion chromatography on an ÄKTA FPLC (GE Healthcare Life Science). Protein concentration was measured using absorption at 280 nM with extinction coefficients obtained from the protparam tool (www.expasy.org) and from the ribotask oligocalculator (www.ribotask.com).

Target RNA labelling.

Synthetic RNA oligonucleotides (IDT) were radiolabelled at the 5'-end using gamma 32P ATP (Perkin Elmer) and T4 polynucleotide kinase (NEB), and purified by denaturing polyacrylamide gel and ethanol precipitation. RNA concentration was determined from absorption at 260 nM using extinction coefficients calculated with the ribotask oligocalculator (www.ribotask.com).

Electrophoretic mobility shift assay (EMSA).

Binding reactions were prepared in reaction buffer (28 mM Tris pH 8.0, 20 mM KCl, 80 mM KOAc, 1.6 mM Mg(OAc)2, 0.5 mM TCEP, 0.004% NP-40, 0.01 g/l baker’s yeast tRNA) with a final volume of 20 μl, and a final concentration of the labeled RNAs of 10 nM and of the non-labeled RNA or AGO2-miR-21 of 100 nM. Reactions were incubated for 10 minutes at room temperature and analyzed on a 15% acrylamide native gel in 0.5x TBE.

Kd measurements.

Binding experiments were conducted according to the protocol published in[25]. AGO2-miR-21 (0-200 nM) was incubated with 0.1 nM radiolabaled target in reaction buffer (28 mM Tris pH 8.0, 20 mM KCl, 80 mM KOAc, 1.6 mM Mg(OAc)2, 0.5 mM TCEP, 0.004% NP-40) with a total volume of 25 μl for 45 minutes at room temperature. Filter-binding was performed using a dot-blot apparatus (GE Healthcare Life Sciences) with Protran nitrocellulose membrane (Amersham, GE Healthcare Life Sciences) and Hybond N+ nylon membrane (Amersham, GE Healthcare Life Sciences). Samples were applied with vacuum and washed with 50 μl wash buffer (30 mM Tris pH 8.0, 100 mM KOAc, 2 mM Mg(OAc)2, 0.5 mM TCEP). After air drying, the membrane strips were used to expose phosphor screens (GE Healthcare Life Sciences) for visualization. Screens were imaged on a Typhoon phosphorimager (GE Healthcare Life Science) and signals were quantified with ImageQuant (GE Healthcare Life Sciences). Dissociation constants were calculated by fitting the data to a single site binding equation: F = fraction target RNA bound, Bmax = maximal number of binding sites, [AGO2] = total concentration of the AGO2-miR-21 complex, and KD = calculated dissociation constant, using Prism (GraphPad Software). For weakly binding RNAs Bmax was constrained to ≤1.

Data analysis and statistical testing

Processing and visualization of sequencing data.

Sequencing data were pre-processed to combine FASTQ files of two sequencing lanes (cat) and to remove adapters (cutadapt). Paired end reads were merged by paired-end read merger (pear). UMIs were collapsed by collapse.py (T.D. Domenico, https://github.com/tdido). Chimeric reads were called and annotated with the hyb package[26], using the command: Hyb uses bowtie2[27] in local mapping mode to map reads to a transcriptome database and to identify chimeras, and it annotates the chimeras with RNA base-pairing information generated by hybrid-min[28]. The transcriptome database used by hyb, “hOH7_and_Zika”, consists of human spliced mRNAs and noncoding RNAs described in[4], and the genome sequence of the Zika virus (Zika virus isolate ZIKV/H.sapiens/Brazil/PE243/2015, complete genome). To evaluate the folding energy of chimeric reads, we used hybrid-min[28] with default settings. We then randomly reassigned (shuffled) pairs of fragments found in chimeric reads, and repeated the folding energy analysis. The folding energies of experimentally identified and shuffled chimeras were compared by Wilcoxon test. Virus interaction heatmaps were plotted using Java Treeview[29], such that color intensity represents the coverage of chimeric reads at every pair of positions. The first read of each pair is plotted along the X axis, and the second read along the Y axis. As a result, chimeras found in the 5'-3' orientation are shown above the diagonal, and chimeras in the 3'-5' orientation are below the diagonal. Viewpoint histograms were plotted with gnuplot, and arc plots were plotted with R-chie[30]. For every pair of positions (i, j) along the virus genome we calculated the COMRADES score, C: the number of chimeric reads that, when analyzed with the program hybrid-min with default settings, indicated base-pairing between positions i and j. We used COMRADES scores to calculate per-base Shannon entropy for each nucleotide position along the virus. Shannon entropy of position i is defined as: where n is the length of the genome (10,807 nt); and P(Cij) is: High entropy indicates flexible positions that may form multiple alternative base-pairs, whereas low entropy indicates positions that always pair with the same nucleotide partner. We visualized RNA structures using VARNA[31], where the colour scale represents the COMRADES score for each base pair.

RNA structure prediction.

For RNA structure predictions, we collected all potential base pairs with a non-zero C value, assembled sets of adjacent base pairs into uninterrupted stem structures, and calculated the base-pairing score of each stem as the sum of C values of individual base pairs. We then ranked these stem elements by their scores. In a preliminary analysis, we folded the 10,807 nt virus genome in a set of 50 overlapping 1,000 nt fragments, using the hybrid-ss-min program[28]. Each fragment was folded using a set of 250 top-ranked in vivo probed stem elements as folding constraints. Based on this preliminary analysis, we identified high-scoring stem-loop structures that were reproducibly predicted across multiple fragments, and we defined new fragment boundaries to prevent the disruption of these reproducible structural elements. As a result, we obtained fragment sizes that vary in size, but are approximately 1,000 nt long each. We then performed full folding analysis using the following fragment boundaries: 5'UTR: 1-107, F1: 108-1275, F2: 1276-2287, F3: 2288-3323, F4: 3324-4521, F5: 4522-5551, F6: 5552-6810, F7: 6811-7757, F8: 7758-8755, F9: 8756-9543, F10: 9544-10379, 3'UTR: 10380-10807. For each fragment, we assembled a set of folding constraints that represented the 75 top-scoring stem elements within that fragment, we randomly shuffled this set of constraints 1,000 times, and we used the shuffled constraints for folding prediction by hybrid-ss-min. The resulting individual structures typically incorporate 25%-40% of these constraints. We recorded the folding energy of each structure, as predicted by hybrid-ss-min, and we used the sum of C values to calculate an overall score for each structure. To assemble the top-scoring full-genome structure shown in Supplementary Fig. 4a, we assembled the top-scoring structures for each coding sequence fragment (F1-F10), and the previously proposed structures of the 5' and 3' UTRs. An additional analysis of folding within and between the 5' and 3' UTRs is shown in Fig. 2a-c and Supplementary Fig. 4c. We also repeated the folding analysis with shuffled sets of 50-250 top-scoring constraints per fragment. This yielded similar results, but we found that either reducing or increasing the numbers of constraints tended to reduce the number of high-scoring structures. To explore the sets of alternative structures, we computed pairwise distances between structures as the number of positions with discordant base-pairing. This resulted in a 1,000 × 1,000 matrix of distances, which we then represented on a two-dimensional surface using multidimensional scaling (using the R function cmdscale). Multidimensional scaling, also known as Principal Coordinate Analysis[32], maps multidimensional objects (in this case, RNA structures) to a set of points on a plane, such that the distances between RNA structures are well-approximated by Euclidean distances between points, by minimization of a stress function: The statistical significance of host-virus RNA-RNA interactions was calculated using DESeq2[33], by comparing counts of chimeric reads from 3 COMRADES and 3 control datasets.

Discovery of ZIKV miR-21 interaction with an independent analysis pipeline.

Alignment.

The first read of each pair was processed using UMI-tools[34] to extract the 6 nucleotide unique molecular identifier (UMI) at the start of the read. Processed reads were aligned using the STAR aligner[35], reporting all reads in their original order (--outSAMtype BAM Unsorted --outSAMunmapped Within); only reporting unique alignments (--outFilterMultimapNmax 1); and reporting alignments to individual segments of chimeric reads (--chimOutType WithinBAM --chimSegmentMin 20 --chimScoreJunctionNonGTAG 0 --chimMainSegmentMultNmax 1). The reference consisted of the hg38 build of the human genome, combined with the genome sequence of the PE243 strain of the Zika virus. Each read of the pair was aligned separately to avoid preferencing alignment to the same genomic locus. For each library, the pair of BAM files were collated and pair information was fixed using samtools[36]. PCR duplicates were removed on the basis of their UMIs, using UMI-tools in paired mode.

Detecting significant interactions.

We considered the "interaction space" between the human and Zika genomes, consisting of pairs of 1 kbp bins (one on each genome). For each replicate library in each condition (COMRADES and control), we counted the number of read pairs with one read in each bin using diffHic[37]. This yielded a count matrix that was normalized using the trimmed mean-of-M-values method[38] to correct for composition biases, under the assumption that most read pairs mapping across the Zika and human genomes was caused by non-specific ligation. We then applied the quasi-likelihood framework in edgeR[39] with 2 residual degrees of freedom for dispersion estimation to test for significant differences between the read pair counts for COMRADES and control. This was performed using an additive design matrix that blocked on the batch to reflect the paired-sample design of the experiment. Robust empirical Bayes shrinkage[40] was also used to stabilise the dispersion estimates in the presence of limited replication. Bin pairs were aggregated into clusters based on whether they overlapped the same human gene Test statistics were combined for each gene-Zika interaction using Simes' method[41] prior to applying the Benjamini-Hochberg method. Interactions that were significantly enriched in COMRADES over the control were defined at a false discovery rate threshold of 5%.

Data availability.

All sequencing data sets have been deposited in ArrayExpress under accession number: E-MTAB-6427. Base-pairing prediction, structure prediction and clustering data are available as Supplementary Data files. Additional data that support the findings of this study are available from the corresponding authors upon request. A step-by-step protocol is available as a Supplementary protocol and an open resource in Protocol Exchange[41].

Code availability.

The RNA structure prediction pipeline can be downloaded from: https://github.com/gkudla/comrades.

Reporting Summary

Further information on experimental design is available in the Life Sciences Reporting Summary linked to this article.

39 in total

1. De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly.

Authors: Aaron T L Lun; Gordon K Smyth
Journal: Nucleic Acids Res Date: 2014-05-22 Impact factor: 16.971

2. Novel cis-acting element within the capsid-coding region enhances flavivirus viral-RNA replication by regulating genome cyclization.

Authors: Zhong-Yu Liu; Xiao-Feng Li; Tao Jiang; Yong-Qiang Deng; Hui Zhao; Hong-Jiang Wang; Qing Ye; Shun-Ya Zhu; Yang Qiu; Xi Zhou; E-De Qin; Cheng-Feng Qin
Journal: J Virol Date: 2013-04-10 Impact factor: 5.103

3. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

Review 4. Virus meets host microRNA: the destroyer, the booster, the hijacker.

Authors: Yang Eric Guo; Joan A Steitz
Journal: Mol Cell Biol Date: 2014-07-21 Impact factor: 4.272

5. Characterization of cis-Acting RNA Elements of Zika Virus by Using a Self-Splicing Ribozyme-Dependent Infectious Clone.

Authors: Zhong-Yu Liu; Jiu-Yang Yu; Xing-Yao Huang; Hang Fan; Xiao-Feng Li; Yong-Qiang Deng; Xue Ji; Meng-Li Cheng; Qing Ye; Hui Zhao; Jian-Feng Han; Xiao-Ping An; Tao Jiang; Bo Zhang; Yi-Gang Tong; Cheng-Feng Qin
Journal: J Virol Date: 2017-10-13 Impact factor: 5.103

6. Structural basis for microRNA targeting.

Authors: Nicole T Schirle; Jessica Sheu-Gruttadauria; Ian J MacRae
Journal: Science Date: 2014-10-31 Impact factor: 47.728

7. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding.

Authors: Aleksandra Helwak; Grzegorz Kudla; Tatiana Dudnakova; David Tollervey
Journal: Cell Date: 2013-04-25 Impact factor: 41.582

8. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo.

Authors: Silvi Rouskin; Meghan Zubradt; Stefan Washietl; Manolis Kellis; Jonathan S Weissman
Journal: Nature Date: 2013-12-15 Impact factor: 49.962

9. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy.

Authors: Tom Smith; Andreas Heger; Ian Sudbery
Journal: Genome Res Date: 2017-01-18 Impact factor: 9.043

10. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data.

Authors: Aaron T L Lun; Gordon K Smyth
Journal: BMC Bioinformatics Date: 2015-08-19 Impact factor: 3.169

46 in total

Review 1. lncRedibly versatile: biochemical and biological functions of long noncoding RNAs.

Authors: Emily J Shields; Ana F Petracovici; Roberto Bonasio
Journal: Biochem J Date: 2019-04-10 Impact factor: 3.857

2. RNA base-pairing complexity in living cells visualized by correlated chemical probing.

Authors: Anthony M Mustoe; Nicole N Lama; Patrick S Irving; Samuel W Olson; Kevin M Weeks
Journal: Proc Natl Acad Sci U S A Date: 2019-11-19 Impact factor: 11.205

3. Zika Virus Subgenomic Flavivirus RNA Generation Requires Cooperativity between Duplicated RNA Structures That Are Essential for Productive Infection in Human Cells.

Authors: Horacio M Pallarés; Guadalupe Soledad Costa Navarro; Sergio M Villordo; Fernando Merwaiss; Luana de Borba; Maria M Gonzalez Lopez Ledesma; Diego S Ojeda; Annabelle Henrion-Lacritick; Maria A Morales; Cintia Fabri; María Carla Saleh; Andrea V Gamarnik
Journal: J Virol Date: 2020-08-31 Impact factor: 5.103