Literature DB >> 34258595

Protocol for using TRIBE to study RNA-protein interactions and nuclear organization in mammalian cells.

Jeetayu Biswas¹, Michael Rosbash², Robert H Singer^1,3,4, Reazur Rahman².

Abstract

Targets of RNA-binding proteins discovered by editing (TRIBE) determines RNA-proteins interactions and nuclear organization with minimal false positives. We detail necessary steps for performing mammalian cell RBP-TRIBE to determine the targets of RNA-binding proteins and MS2-TRIBE to determine RNA-RNA interactions within the nucleus. Necessary steps for performing a TRIBE experiment are detailed, starting with plasmid/cell line generation, cellular transfection, and RNA sequencing library preparation and concluding with bioinformatics analysis of RNA editing sites and identification of target RNAs. For complete details on the use and execution of this protocol, please refer to Biswas et al. (2020).

Entities: Chemical

Keywords: Bioinformatics; Flow Cytometry/Mass Cytometry; Gene Expression; Genomics; Molecular Biology; RNAseq; Sequencing

Mesh：

Substances：
RNA-Binding Proteins
RNA

Year: 2021 PMID： 34258595 PMCID： PMC8255943 DOI： 10.1016/j.xpro.2021.100634

Source DB: PubMed Journal: STAR Protoc ISSN： 2666-1667

Before you begin

Clone RBP of interest into TRIBE backbone plasmid and then confirm expression. Identify cell lines of interest for RBP-ADAR expression or prepare MS2 integrated cell line for MCP-ADAR expression. Install and compile dependencies for computational analysis or utilized prepared virtual machine (Zenodo: https://doi.org/10.5281/zenodo.4567690).

Clone RNA-binding protein of interest into TRIBE plasmid backbone

Timing: 1–2 weeks Obtain the necessary, hyperactive ADAR containing, TRIBE plasmids. A control, mammalian expression plasmid, containing mCherry ADAR control and p2A GFP reporter can be obtained from Addgene (Plasmid #154786). For RBP-TRIBE, PCR amplify the RBP of interest from cDNA library, synthesize sequence of interest, or purchase plasmid containing RBP from Addgene. For MS2-TRIBE, the MCP-ADAR plasmid can be obtained from Addgene (Plasmid # 154787). Hyperactive ADAR is highly recommended (Kuttan and Bass, 2012; Xu et al., 2018; Herzog et al., 2020; Nguyen et al., 2020) to minimize the false negative rate and detect editing events with higher efficiency. This will also reduce the minimum sequencing depth required per sample. Clone RBP of interest into TRIBE plasmid using standard restriction enzyme-based cloning or Gibson assembly. mCherry ADAR has NotI and BamHI sites flanking mCherry MCP ADAR has NotI and KpnI sites flanking MCP Confirm proper insertion and frame by Sanger sequencing. Prepare high quality, transfection grade plasmid using an appropriate midiprep or maxiprep kit. After transient transfection into target cells of interest, confirm expression of RBP-ADAR fusion construct by GFP fluorescence (Figure 1). Perform western blot for the ADAR fusion protein using antibodies against the RBP or against V5 tag on C terminus of ADAR.

Figure 1

Experimental design and validation of TRIBE constructs

Top Left - Samples required and example of transient transfection to validate expression of TRIBE constructs. Bottom Left - Control and experimental plasmid design, highlighting features of the mammalian HyperTRIBE plasmids. Right - microscope images of cells transfected with mCherry-ADAR control plasmid to validate expression and calculate transfection efficiency. Scale bar is 200 pixels wide (35um).

Experimental design and validation of TRIBE constructs Top Left - Samples required and example of transient transfection to validate expression of TRIBE constructs. Bottom Left - Control and experimental plasmid design, highlighting features of the mammalian HyperTRIBE plasmids. Right - microscope images of cells transfected with mCherry-ADAR control plasmid to validate expression and calculate transfection efficiency. Scale bar is 200 pixels wide (35um). CRITICAL: The RBP must be in frame with the downstream catalytic domain of ADAR, this means that the endogenous stop codon for the RBP of interest must be excluded. Any issues with the cloning frame will disrupt the downstream p2A GFP and GFP fluorescence will not be observed.

Install and compile software dependencies

Timing: 1 day A Unix based system is necessary for analysis. A dedicated RNA sequencing analysis server running Red Hat Enterprise Linux or Centos is recommended, however any Linux distribution can be used. For convenience, all scripts necessary for processing can be found in the supplement (Data S1) and we have also created a virtual Centos machine with all dependencies pre-installed (Zenodo: https://doi.org/10.5281/zenodo.4567690). After downloading the virtual machine files, the .vmdk or .ovf files can be opened with the following software that are free for noncommercial use: VMware Workstation player: https://www.vmware.com/products/workstation-player/workstation-player-evaluation.html Once installed, go to menu and select Player -> File -> Open Select “Centos_sequencing.ovf” file Allocate resources to the virtual machine depending on what is available on the host computer (see below). Alternative software includes: VirtualBox: https://www.virtualbox.org/ For analysis with the virtual machine we recommend the following resources be allocated to it: 8 vCPUs, 64gb of RAM and 4TB of space At least 32GB of RAM, 2TB of free disk space and a quad core processor is necessary, however more resources are highly recommended to process multiple samples efficiently. To use the pipeline on a non-virtual machine, the following software must be installed in a Linux/Unix environment according to the software specific instructions: bedtools suite (v2.16.2, 2.26.0 or later): http://bedtools.readthedocs.io/ Bowtie2 (v2.1.0, v2.2.9): http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Cutadapt: https://cutadapt.readthedocs.io/en/stable/installation.html HyperTRIBE software: https://github.com/rosbashlab/HyperTRIBE MariDB v10.1 or later https://downloads.mariadb.org/: Perl (v5.8.8, v5.12.5, v5.22.1): https://www.perl.org/get.html Perl Module: DBI.pm (v1.631, v1.636) Perl Module: DBD:mysql (v4.042) Picard (v2.8.2): https://broadinstitute.github.io/picard/ Python (v2.7.2 or later): https://www.python.org/downloads/ SAMtools (v1.11): http://samtools.sourceforge.net/ SRA Toolkit (v2.10.8): https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software STAR (v2.7.3a): https://github.com/alexdobin/STAR Stringtie: https://ccb.jhu.edu/software/stringtie/ Trimmomatics (v0.36): http://www.usadellab.org/cms/?page=trimmomatic CRITICAL: Software installation on linux machines should be performed by someone that is comfortable and knowledgeable, use of improper commands or copying / pasting can result in irreversible damage to the operating system. We highly recommend utilizing the generated virtual machine (Zenodo: https://doi.org/10.5281/zenodo.4567690) but have also included software installations below. Detailed software installation instructions: Open the terminal and create an “RNA” subfolder in your home directory where all the software and scripts will be installed, then navigate to that folder. Download HyperTRIBE.tar.gz from the supplement (Data S1) or Zenodo archive. Make an RNA folder within your home directory. /home/jbiswas/RNA Navigate to the newly created directory and unzip the script archive (in supplementary documents (Data S1) and on Zenodo archive). cd RNA tar -xvzf HyperTRIBE.tar.gz Update instructions for Ubuntu: apt-get install build-essential apt-get update apt-get upgrade Update instructions for Centos/Red Hat: yum groupinstall 'Development Tools' yum check-update yum check-upgrade Alternatively, scripts can be download via HyperTRIBE GitHub: https://github.com/rosbashlab/HyperTRIBE git clone https://github.com/rosbashlab/HyperTRIBE May need the following command before installing HyperTRIBE: For Ubuntu: apt-get install git For Centos/Red Hat: yum install git Perl (v5.8.8, v5.12.5, v5.22.1): https://www.perl.org/get.html. All versions of Perl that have been tested are mentioned here, but newer versions of Perl should work as well. Open the terminal and type the following: curl -L http://xrl.us/installperlnix | bash May need the following commands before installing Perl: For Ubuntu: ○apt install curl ○apt install make For Centos/Red Hat: ○yum install curl ○yum install make Python (v2.7.2 or later): https://www.python.org/downloads/ For Ubuntu: apt update apt install software-properties-common add-apt-repository ppa:deadsnakes/ppa apt install python3.8 For Centos/Red hat: yum update -y yum install -y python3 Perl Module: DBI.pm (v1.631, v1.636) and DBD:mysql (v4.042) cpan DBI continue with default parameters May need to install the following before proceeding in Centos/Red Hat: yum install perl-CPAN yum install gcc cpan YAML cpan DBD::mysql May need to do this after installing mariadb Bedtools suite (v2.16.2, 2.26.0 or later): http://bedtools.readthedocs.io/ For Ubuntu can use the following command: apt install bedtools Alternatively can use the following: wget https://github.com/arq5x/bedtools2/releases/download/v2.29.1/bedtools-2.29.1.tar.gz tar -zxvf bedtools-2.29.1.tar.gz cd bedtools2 make May need the following: For ubuntu: apt-get install g++ For Centos/Red Hat: yum groupinstall ‘Development Tools’ yum install bzip2-devel yum install xz-devel yum install zlib-devel SAMtools (v1.11): http://www.htslib.org/ Download from htslib website (samtools 1.11). tar -vxjf samtools-1.11.tar.bz2 cd samtools-1.x In Ubuntu may need the following command before moving forward: apt-get install libncurses-dev apt-get install zlib1g-dev apt-get install libbz2-dev apt-get install liblzma-dev In Centos/Red Hat may need the following command before moving forward: yum install ncurses-devel ncurses yum install bzip2-devel yum install xz-devel yum install curl-devel make make install Cutadapt: https://cutadapt.readthedocs.io/en/stable/installation.html On Ubuntu: apt install cutadapt On Centos/Red Hat (need python3 installed): yum install epel-release yum install pip pip install upgrade-pip python3 -m pip install --user --upgrade cutadapt Trimmomatics (v0.36): http://www.usadellab.org/cms/?page=trimmomatic Compile by unzipping STAR (v2.7.3a): https://github.com/alexdobin/STAR On Ubuntu may need the following before proceeding: apt-get update apt-get install g++ apt-get install make apt-get install libz-dev wget https://github.com/alexdobin/STAR/archive/2.7.3a.tar.gz tar -xzf 2.7.3a.tar.gz cd STAR-2.7.3a/source make STAR Bowtie2 (v2.1.0, v2.2.9): http://bowtie-bio.sourceforge.net/bowtie2/index.shtml Download the source file. Unzip the folder into RNA directory. Move to the folder using cd command. Use command "make" to compile everything. May need the following command before installing Bowtie2: apt-get install libbz-dev SRA Toolkit (v2.10.8): https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software Untar the archive (tar -xvf sratool…). cd into the directory. cd bin on Ubuntu: vdb-config –interactive on Centos/Red hat: vdb-config -i Other versions of this software should work as well. Picard (v2.8.2): https://broadinstitute.github.io/picard/ Install java apt install default-jre java -version Download .jar file from website and place in RNA folder. Stringtie: https://ccb.jhu.edu/software/stringtie/ git clone https://github.com/gpertea/stringtie cd stringtie make release MariDB v10.1 or later https://downloads.mariadb.org/: https://hypertribe.readthedocs.io/en/latest/mariadb.html For Centos/Red Hat: yum update yum install mariadb-server yum install mariadb-devel systemctl enable mariadb systemctl start mariadb mysql_secure_installation For ubuntu: apt-get install software-properties-common apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 add-apt-repository 'deb [arch=amd64,i386,ppc64el] http://mirror.jmu.edu/pub/mariadb/repo/10.1/ubuntu xenial main' apt update -y apt install -y mariadb-server mysql -V confirms installation apt-get install -y libmariadbclient-dev systemctl start mariadb.service systemctl enable mariadb.service /usr/bin/mysql_secure_installation Perform installation of mariadb with the following commands: Enter current password for root (enter for none): Set root password? [Y/n]: Y New password: !htribe01 Re-enter new password: !htribe01 Remove anonymous users? [Y/n]: Y Disallow root login remotely? [Y/n]: Y Remove test database and access to it? [Y/n]: Y Reload privilege tables now? [Y/n]: Y Confirm that you can login to mysql with the following: mysql -h localhost -u root -p Create a database while in mysql: CREATE DATABASE hg38; CREATE DATABASE mm10; Add software repositories to linux path, replacing the username and directories with what is present on your own system. Open the terminal Edit the path file gedit ∼/.bashrc Add the following commands to the end of the file, change the path to each file, will need to exit and re-open terminal before changes take effect, replace jbiswas with your username: Bedtools export PATH=$PATH:/home/jbiswas/RNA/bedtools2/bin Bowtie2 For ubuntu: export PATH=$PATH:/home/jbiswas/RNA/bowtie2-2.4.2-linux-x86_64 For Centos/Red Hat: export PATH=$PATH:/home/jbiswas/RNA/bowtie2-2.2.9 Picard export PICARD=/home/jbiswas/RNA/picard.jar alias picard="java -jar $PICARD" STAR export PATH=$PATH:/home/jbiswas/RNA/STAR-2.7.3a/bin/Linux_x86_64/ SAMtools export PATH=$PATH:/home/jbiswas/RNA/samtools-1.11 Stringtie export PATH=$PATH:/home/jbiswas/RNA/stringtie SRA toolkit For ubuntu: export PATH=$PATH:/home/jbiswas/RNA/sratoolkit.2.10.8-ubuntu64/bin For Centos/Red Hat: export PATH=$PATH:/home/jbiswas/RNA/sratoolkit.2.10.9-centos_linux64/bin The following annotations files will be of use for downstream analysis and can be downloaded from their corresponding databases. For convenience, Illumina has the latest annotation files on their iGenomes website (for Homo Sapiens, UCSC hg38 and for Mus Musculus, UCSC mm10 were used): https://support.illumina.com/sequencing/sequencing_software/igenome.html In your home directory create an index subfolder: mkdir index Once the iGenomes file has been downloaded, move it to the index subfolder of your RNA directory and use the following command to unpack: tar -xvf “nameoffilegoeshere.tar.gz” Note the following locations for use in the later scripts (Mouse, for human just replace Mus_musculus with Homo_sapiens). Genome sequence file - /home/jbiswas/RNA/index/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa Genome index file - /home/jbiswas/RNA/index/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa.fai Gene features files - /home/jbiswas/RNA/index/Mus_musculus/UCSC/mm10/Annotation/Archives/archive-current/Genes/genes.gtf Create a STAR index from the genome sequence file and gene features file above. mkdir STARIndex copy the above genome sequence (genome.fa) and gtf file (genes.gtf) into this directory STAR --runMode genomeGenerate --runThreadN 8 --genomeDir mm10_star_index --genomeFastaFiles /home/jbiswas/RNA/index/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa --sjdbGTFfile /home/jbiswas/RNA/index/Mus_musculus/UCSC/mm10/Annotation/Archives/archive-current/Genes/genes.gtf --outFileNamePrefix /home/jbiswas/RNA/index/mm10_star_index STAR --runMode genomeGenerate --runThreadN 8 --genomeDir hg38_star_index --genomeFastaFiles /home/jbiswas/RNA/index/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa --sjdbGTFfile /home/jbiswas/RNA/index/Homo_sapiens/UCSC/hg38/Annotation/Archives/archive-current/Genes/genes.gtf --outFileNamePrefix /home/jbiswas/RNA/index/hg38_star_index May need to change the runThreadN depending on your machine May need to change ulimit ulimit -n 30000 Useful path definitions to know for human samples. star_indices=“/home/jbiswas/RNA/index/hg38_star_index” TRIMMOMATIC_JAR="/home/jbiswas/RNA/TRIMMOMATIC/0.39/trimmomatic.jar" PICARD_JAR="/home/jbiswas/RNA/picard.jar" GENOME_FAI_FILE=“/home/jbiswas/RNA/index/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa.fai” gtf_file=“/home/jbiswas/RNA/index/Homo_sapiens/UCSC/hg38/Annotation/Archives/archive-current/Genes/genes.gtf” Useful path definitions to know for mouse samples. star_indices=“/home/jbiswas/RNA/index/mm10_star_index” TRIMMOMATIC_JAR="/home/jbiswas/RNA/TRIMMOMATIC/0.39/trimmomatic.jar" PICARD_JAR="/home/jbiswas/RNA/picard.jar" GENOME_FAI_FILE=“/home/jbiswas/RNA/index/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa.fai” gtf_file=“/home/jbiswas/RNA/index/Mus_musculus/UCSC/mm10/Annotation/Archives/archive-current/Genes/genes.gtf”

Identify cell line of interest or generate MS2 tagged cell line

Timing: 1 day - 1 month The generation of stable cell lines constitutively expressing RBP-E488Q-ADAR fusions has been challenging (See limitations section). This can be circumvented by using transient expression either by lipid-based transfection, nucleofection or other equivalent methods. In the future, inducible systems may be employed to facilitate stable line generation. If performing TRIBE on RBPs other than MCP, use of cell lines that are readily transfectable is recommended. For MS2-TRIBE, use of cell lines containing the MS2 stem loops is required and different available lines are detailed below. Utilize previously published MS2 cell lines or mouse models that are available by request from the respective publications and laboratories: β-Actin (Lionnet et al., 2011) and Arc (Das et al., 2018) mouse models have been generated. Random integration libraries using retroviral central dogma (CD) tagging in human osteosarcoma (U2OS) cells (Sheinberger et al., 2017) and splice acceptor gene trap in human bronchial epithelial (HBEC) cells (Wan et al., 2021). Other stable cell lines have been developed where the MS2 stem loops were targeted to a specific gene or reporter: Immortalized mouse embryonic fibroblasts from the β-Actin MS2 mouse (Lionnet et al., 2011). Human osteosarcoma (U2OS) cells with MS2 tagged p21 (Carvajal et al., 2018) or a doxycycline inducible reporter RNA (Janicki et al., 2004). Mouse embryonic stem cells (mESCs) with MS2 tagged Esrrb (Cho et al., 2018; Spille et al., 2019). Generate an MS2 tagged cell line with CRISPR-Cas9: Previously, mouse models and cell lines were generated with gene targeting and homologous recombination, however this has process has been made easier with the widespread use of CRISPR-Cas9 homology directed repair (Spille et al., 2019). For genome integrated MS2 lines, a modified genome index should be generated. This will allow the MS2 stem loops and surrounding gene regions to be available for mapping during RNA sequencing analysis. CRITICAL: The nucleotide sequence of the MS2 integration site must be known for downstream analysis. The integration site should be sequenced for confirmation and both FASTA and gene transfer format (GTF) files containing the sequence of the integration site and the surrounding 2kb should be generated. Use the published sequence of the MS2 stem loops and the surrounding 2 kb to generate a FASTA file. See expected outcomes for example of FASTA file format. Several MS2 sequences are available as plasmids on Addgene (Plasmid #s: 31865, 40651, 84561, 102718, 104391). MS2V5 (84561, 102718) or MS2V6 (Tutucci et al., 2018) are recommended for use in mammalian cells (Addgene Plasmid #104391). If using a previously published MS2 integrated cell line - the specific stem loop sequence and location of integration should be used. If generating a novel MS2 integrated cell line - MS2 sequence integration can be validated with either 3’/5′ RACE or next generation DNA sequencing. Use the known gene annotations (start codon, exon, CDS, stop codon) to generate a GTF file for the MS2 integrated locus. See expected outcomes Table 1 for example.

Table 1

Example Gene Transfer Format (GTF) file for custom genomes (tab separated text file without headers)

Seqname	Source	Feature	Start	End	Score	Strand	Frame	Attribute
chrZ	unknown	start_codon	1	3	.	+	.	gene_id "Actb_ms2"; gene_name "Actb_ms2"; p_id "P111304811"; transcript_id "NM_111304811"; tss_id "TSS111304811";
chrZ	unknown	CDS	5	15587	.	+	0	gene_id "Actb_ms2"; gene_name "Actb_ms2"; p_id "P111304811"; transcript_id "NM_111304811"; tss_id "TSS111304811";
chrZ	unknown	exon	5	15587	.	+	.	gene_id "Actb_ms2"; gene_name "Actb_ms2"; p_id "P111304811"; transcript_id "NM_111304811"; tss_id "TSS111304811";
chrZ	unknown	stop_codon	15591	15593	.	+	.	gene_id "Actb_ms2"; gene_name "Actb_ms2"; p_id "P111304811"; transcript_id "NM_111304811"; tss_id "111304811";

After running the rnaedit_wtRNA_RNA.sh script, individual files containing editing sites from each replicate are generated:

Example Gene Transfer Format (GTF) file for custom genomes (tab separated text file without headers) After running the rnaedit_wtRNA_RNA.sh script, individual files containing editing sites from each replicate are generated: Download the latest genome index and GTF files from Illumina iGenomes (mouse mm10, human hg38). Concatenate the FASTA file from iGenomes with the reference genome FASTA file (e.g., mm10.fa) to generate a modified genome FASTA (mm10_chrZ.fa). cat mm10.fa chr_Z.fa > mm10_chr_Z.fa Concatenate the MS2 GTF file (chrZ_genes.gtf) to the reference genome GTF file (e.g., mm10_genes.gtf) to generate a modified GTF (mm10_chrZ_genes.gtf): cat mm10_genes.gtf chrZ_genes.gtf > mm10_chrZ_genes.gtf Create a bowtie index: nohup bowtie-build mm10_chrZ.fa mm10_chrZ & Create a bowtie2 index: nohup bowtie2-build mm10_chrZ.fa mm10_chrZ & Create an fai index using SAMtools: samtools faidx mm10_chrZ.fa Create a STAR index: STAR --runMode genomeGenerate --runThreadN 8 --genomeDir star_indices --genomeFastaFiles mm10_chrZ.fa --sjdbGTFfile mm10_chrZ_genes.gtf Preview the files on the Integrative Genomics Viewer (IGV) to confirm that the new MS2 chromosome appears and has associated references.

Key resources table

Materials and equipment

FACS buffer should be made fresh, in a sterile tissue culture hood and stored at 4°C until ready to harvest cells.

Step-by-step method details

Cell culture and transfection

Timing: 1 day ADAR fusion plasmids are introduced into cells by transient transfection. The following steps were optimized for adherent mouse embryonic fibroblasts and human osteosarcoma cells but may need to be adjusted depending on cell type. We recommend transient transfection of 0.5–1 million cells and FACS sorting of 10k GFP+ DAPI– cells so that adequate amount of RNA (>100ng) can be acquired per sample before moving onto library preparation. For each sample, prepare 10 cm dishes: 3×10 cm dishes for mCherry-ADAR control, 3× 10 cm dishes for RBP-ADAR fusions, and one dish of un-transfected or mock transfected cells as a negative control. This final dish is useful for FACS gating and for downstream analysis of SNPs present in the parental cell line. RNA isolation is performed in triplicate for each sample, this serves as a backup in case one sample has low RNA integrity or if there is a failure during library preparation. While the extra sample can be omitted, we have found it to be helpful for those attempting the protocol for the first time. All dishes should be passed in the same manner and for equal number of passages relative to one another, this will help minimize the dish-to-dish variability. This consistency allows for RNA editing and gene expression comparisons to confidently be made between dishes (and eventually RNA sequencing libraries). Culture the cells of interest using standard sterile tissue culture protocols, refer to American Type Culture Collection (ATCC) standards if needed. For our TRIBE experiments, mammalian cells are cultured and maintained in a humidified 37°C incubator with 5% CO2. Mouse embryonic fibroblasts (MEFs) and human osteosarcoma (U2OS) cells are cultured in DMEM 4.5 g glucose, glutamine, supplemented with 1% penicillin streptomycin (P/S) and 10% heat-inactivated fetal bovine serum. Once cell lines have been thawed, it is advisable to passage the cells at least two times before using for experiments. Cell passages should be tracked and cells with minimal passage number should be used. For each RBP of interest, transfect the appropriate number of cells (see above Notes). For adherent MEFs or U2Os cells we use triplicate 10 cm dishes (70–80% confluence gives best results). Follow best practices for transfection reagent of choice and refer to troubleshooting section, problem 1 if issues occur. Before beginning next step, check transfection efficiency of each dish using GFP fluorescence. Additionally, for mCherry ADAR control plasmids, check RFP fluorescence. The ADAR p2A GFP design helps check that the reading frame of the RBP upstream of ADAR is correct. If transfection efficiency is high (> 75%) it is possible to omit the following FACS sorting step and proceed directly to RNA isolation (Eg. HEK cell transfection). If the cells are not uniformly transfected, the WT unedited RNA will dampen the RNA editing levels below the threshold of detection. Prolonged expression of RNA editing constructs can lead to increasing levels of cellular death. Therefore, transfection should be performed the day before RNA isolation and cells should be sorted within 24 h of transfection as editing reaches steady state within that period (Biswas et al., 2020). The following transfection reagents have been tested to be effective: HEK293T cells (70–90% transfection efficiency) – Calcium Phosphate, Lipofectamine 2000, Lipofectamine 3000, JetPrime (Polyplus), JetOptimus (Polyplus) SV40 Mouse embryonic fibroblasts (5–10% transfection efficiency), Human U2OS (25–50% transfection efficiency) – JetPrime (Polyplus), JetOptimus (Polyplus)

FACS sorting

Timing: 4 h For cell lines with lower transfection efficiency, FACS sorting allows isolation of live cells (DAPI negative) transfected with ADAR (GFP positive). Sorting is performed directly into TriZol or equivalent RNA isolation buffer. The following steps were optimized for adherent mouse embryonic fibroblasts (MEFs) and human osteosarcoma cells (U2OS) but may need to be adjusted depending on cell type. We recommend FACS sorting of ∼10k GFP+ DAPI– cells so that >100ng of RNA can be acquired per sample. CRITICAL: TriZol can cause serious chemical burns and is toxic upon inhalation, appropriate precautions should be taken, and reagent should be handled within chemical hood or class II biosafety cabinet. Before removing cells (U2OS, MEFs) from incubator, prepare materials. Place FACS buffer inside tissue culture hood. Pre-warm tissue culture materials to 37°C (tissue culture media, DPBS, trypsin). Pre-label 15 mL Falcon tubes, one corresponding to each FACS sample. Prepare sterile 1.5 mL Eppendorf tubes corresponding to the same number of FACS samples, place 500μL of TriZol in each tube, keep tubes on ice until ready to use. For each 10 cm dish, place one sterile 15 mL Falcon tube in the tissue culture hood. Dissociate cultured cells from their 10 cm plates. Wash each plate twice with 7 mL pre-warmed DPBS. Use 1 mL pre-warmed trypsin to dissociate cells. Place cells in incubator to complete dissociation, use gentle tapping of the dish to promote detachment. Re-suspend the dissociated cells in complete media (with FBS) to neutralize trypsin activity and place each sample into pre-labeled a 15 mL conical Falcon tube. Preparation of FACS samples: prepare three samples for each RBP-ADAR sample, three samples for the mCherry-ADAR control and one sample of un-transfected cells as a negative control (Figure 1). Spin the re-suspended cells gently for 5 min at 500 g. Remove the supernatant from the pelleted cells being careful not to disturb the cell pellets. Re-suspend the cell pellets in 200–500μL of FACS buffer. Pass the re-suspended cells through a mesh strainer to promote dissociation. The density of cells and size of the cell pellet is dependent on cell type. It may be necessary to optimize the amount of FACS buffer to re-suspend in so that sorting can be expedited efficiently. Pause point: Cells can remain on ice for one hour prior to sorting. However, prolonged incubation on ice may cause increased death, decreasing RNA yield and quality. Keep FACS samples on ice and proceed immediately to FACS sorting, aim for 10,000 cells per sample with 1,000 cells as a minimum requirement. Gate the cells using the negative sample to isolate un-transfected cells (Figure 2).

Figure 2

Representative FACS plots

Plots show stepwise selection of DAPI negative, GFP positive cell population for downstream RNA isolation and library preparation. Top plots show gate definition; it is important to identify gates that will prevent un-transfected cells from being included in the final population. Associated population counts can also be used to estimate transfection efficiency.

Representative FACS plots Plots show stepwise selection of DAPI negative, GFP positive cell population for downstream RNA isolation and library preparation. Top plots show gate definition; it is important to identify gates that will prevent un-transfected cells from being included in the final population. Associated population counts can also be used to estimate transfection efficiency. Select single cells using forward scatter and side scatter ranges. Use cells that are DAPI negative (alive) and GFP positive (transfected). Ensure that the FACS solution is no more than 10% of total TriZol volume, start with at least 500uL of TriZol in each of the tubes and add more TriZol after sort if necessary. Cap the tubes containing the samples in TriZol, mix well and store on ice until ready to proceed to RNA isolation. After sorting, TriZol containing cell lysates can be stored at −80 until ready to proceed with RNA isolation.

RNA isolation and quality control

Timing: 4 h High quality RNA is crucial to the process of library preparation. We routinely isolate RNA from TriZol, however other methods such as RNA minipreps, spin columns and direct lysis can be used to isolate total RNA. The protocol is also compatible with nascent RNA isolation (McMahon et al., 2016). CRITICAL: TriZol and chloroform can cause serious chemical burns, appropriate precautions should be taken, and reagent should be handed within chemical hood. RNA isolation using TRIZol is the preferred method when dealing with FACS sorted cells. If material is not limiting, RNA miniprep is an acceptable alternative. Perform RNA isolation according to manufacturer instructions. Isolated RNA should be placed into PCR tubes. Perform DNAse digestion of the RNA samples using Turbo DNA Free Kit (Ambion AM1907). Add 0.1 volume of 10× DNAse buffer to the isolated RNA. Add 1uL of DNAse to the RNA and mix gently. Incubate at 37°C (in thermocycler) for 30 min. While incubating – thaw out the DNAse inactivation resin. Inactivate with 0.1 volumes of the inactivation resin (3.5uL resin for 30uL of RNA) and leave at RT for 2 min. Spin in PCR tubes for 30 s, careful not to disturb the resin. Transfer as much as possible to a fresh tube, use remainder for qubit (2uL into 500uL eppendorf tubes) and Bioanalyzer (3uL into PCR strip). Measuring the concentration of RNA with a Nanodrop will overestimate the amount in the sample. It is highly recommended that more specific assays such as Qubit be utilized so that libraries can be prepared with equal amounts of RNA input. Measure concentration of RNA using QuBit RNA high sensitivity assay according to manufacturer instructions. Perform RNA integrity analysis using RNA bio analyzer (Figure 3).

Figure 3

Examples of RNA integrity measured after RNA isolation

High quality RNA is required (Left). RNA integrity is evaluated using ribosomal RNA peaks which constitute the majority of RNAs within the cell. Do not proceed with samples that have a low RIN score and are degraded (Right). X-axis, fragment size in nucleotides, y-axis, arbitrary fluorescence units (FU).

After sending out samples for quality control (QuBit and Bioanalyzer) we are left with approximately 30uL of RNA. 100 ng of total RNA is ideal for library preparation, however as little as 10 ng can be used. Proceed using samples with RIN scores > 8 (Figure 3). Examples of RNA integrity measured after RNA isolation High quality RNA is required (Left). RNA integrity is evaluated using ribosomal RNA peaks which constitute the majority of RNAs within the cell. Do not proceed with samples that have a low RIN score and are degraded (Right). X-axis, fragment size in nucleotides, y-axis, arbitrary fluorescence units (FU). Tape station can also be used for RNA integrity analysis. Pause point: Once RNA samples have been purified, 2uL and 3uL aliquot can be made for qubit and Bioanalyzer, respectively. All aliquots can be stored at -80 until the next step.

RNA sequencing library preparation

Timing: 1–3 days CRITICAL: Do not proceed with library preparation until sufficient, high quality RNA samples are available. Library preparation times can vary widely depending on the number of samples and the individual pause points within the protocol. Stranded, paired end RNA seq library preparation kits are ideal and allow for accurate mapping of transcripts. When dealing with limiting input amounts, we have had satisfactory performance from kits that have a lower required input (such as the NEB ultra II directional RNA seq library prep kit which can use as little as 10ng of total RNA input). Either rRNA depletion or polyA selection can be used to purify total RNA before moving to library preparation. The decision to use rRNA depletion or poly-A selection should be guided by the model organism and biological question. Caveats of rRNA depletion include that the primers needed are often organism specific and finalized libraries may require increased sequencing depth to achieve an equivalent coverage. Alternative mRNA seq library preparation kits that have been used in the literature for TRIBE include the TruSeq stranded mRNA from Illumina and the KAPA Stranded mRNA-Seq Kit from Roche. Depending on the amount of RNA available, different library prep methods may be used. We have found that NEB provides robust kits and routinely use their Ultra II Directional RNA library prep kit. However, many other, equivalent kits are available. Using the NEB Ultra 2 Kit with sample preparation beads – prepare stranded RNA seq samples according to manufacturer instructions. In addition to the kit, the following is required: NEB Index primers, polyA isolation module or rRNA depletion, a magnet appropriate for PCR tubes. Isolate mRNA with poly T-beads (or rRNA depletion). Fragment the RNA. CRITICAL: Fragmentation time is critical, be sure to follow manufacturer’s instructions for proper library size. If using PE 150bp reads, it is recommended that fragmentation aims for ∼300bp fragments. RNA fragment size can be checked using Bioanalyzer or Tape station. First and second strand synthesis (after this step the DNA is stable). Pause point: There are multiple pause points and the protocol can be split into two or three days. DNA can be stored after second strand synthesis, can also be stored after end prep and adapter ligation and finally is stable after library cleanup. End prep and adapter ligation, choose index primers for each sample. NEB Index primers each contain a specific six nucleotide barcode sequence. Each index primer must be chosen as to not interfere with downstream de-multiplexing. If libraries are going to be sequenced in a single pool, each library within that pool must receive a unique index number for downstream demultiplexing of the sequence data. Library preparation kit manufacturers will sell index oligonucleotides in sets (of 12 or 24 indices). PCR amplification, double library cleanup to remove primer dimers. We recommend using the minimal amount of PCR cycles necessary to minimize the amount of PCR duplicates that remain in the final library. Check proper fragmentation and DNA library size using Bioanalyzer (Figure 4).

Figure 4

Proper cDNA library preparation

DNA libraries should be correctly sized and absent of adapter dimer peaks X-axis, fragment size in nucleotides, y-axis, arbitrary fluorescence units (FU). See troubleshooting, problem 2 for example of libraries requiring extra purification.

Proper cDNA library preparation DNA libraries should be correctly sized and absent of adapter dimer peaks X-axis, fragment size in nucleotides, y-axis, arbitrary fluorescence units (FU). See troubleshooting, problem 2 for example of libraries requiring extra purification. Agilent TapeStation can also be used for DNA library size analysis.

RNA sequencing and mapping

Timing: 1 day The major data generation step for any TRIBE experiment is RNA sequencing. We currently use Illumina next generation sequencers; in the future this approach may be extended to other platforms. For mammalian cells, it is important to achieve the requisite sequencing depth for high confidence SNP calling, we have found that individual lanes of a HiSeq4000 or NovaSeq equivalents running in PE150 mode provide sufficient depth (40Gb per library) at a reasonable cost for a single experiment, where a single experiment contains duplicate RBP samples, duplicate mCherry controls and an untransfected library. Please refer to Figure 5 for a summary of the downstream bioinformatic analysis.

Figure 5

Overview of HyperTRIBE computational analysis

Flow chart showing steps of data analysis and script paths that need to be changed throughout the protocol. Shell scripts are designated “.sh” and highlighted in bold.

The total number of bases output is important for TRIBE (with an ideal of 40Gb per mammalian library. Longer, paired-end reads and higher depth are both useful in performing mapping and SNP calling respectively. We recommend using no shorter than 100bp paired end reads for sequencing and analysis. Long paired end libraries that are sequenced to an appropriate depth will facilitate accurate RNA editing identification. Sequencing depth should be at least 50 million 150bp PE reads per sample when working with human and mouse samples. More reads will be necessary if using 100bp or shorter read lengths. Operate the sequencer according to manufacturer’s instructions. Demultiplex the sequencing reads using index primers that were added during library preparation as per manufacturer instructions. Concatenate data from different lanes together, if applicable. cat data1_read1.fq.gz data data1_read2.fq.gz Multiple mCherry-ADAR control samples should be concatenated together to make a single, high coverage background file. When concatenated, control editing sites found in either sample are used to call background RNA editing events. Quality control and mapping of sequenced reads using STAR: Know the path to the following previously downloaded genome annotations (refer to before beginning section). Prepare the shell scripts (Data S1) using the previously defined paths (refer to install and compile software dependencies section). Star_indices TRIMMOMATIC_JAR PICARD_JAR GENOME_FAI_FILE Move the modified trim_and_align_PE.sh file (Data S1) into the directory with your paired end RNA seq data. Use the trim_and_align script to perform quality control, adapter trimming and mapping of the paired-end data using STAR. For each piece of data: nohup trim_and_align_PE.sh data1_read1.fq.gz data1_read2.fq.gz & Repeat the above command for each sample. Note that this process may be limited by the amount of system memory, for human or mouse samples ensure that the machine has 32GB of RAM per sample that is to be run. Please refer to troubleshooting, problem 3 for common issues running shell scripts. While the alignment is running, the output will be stored in the newly created “nohup.out” file, you can track the progress of the alignment using the following command: tail nohup.out The following files will be created from a successful alignment: Log.out – log of alignment process. Log.final.out – contains important mapping quality metrics. Sort.bam – when used with the .bam.bai file can be loaded into IGV to visualize coverage. Sort.bam.bai Sort.sam – used for downstream TRIBE analysis. Loading alignments into MySQL database: nohup /path_from_root/HyperTRIBE/CODE/load_table.sh sam_filename mysql_tablename expt_name replicate/timepoint & example: nohup /home/jbiswas/RNA/HyperTRIBE/hg38/load_table_hg38.sh wtRNA.sort.sam testRNA rnalibs 2 & Wait for nohup.out file to be updated, this may take several hours depending on the machine resources and sequencing depth. Refer to troubleshooting, problem 3 if running into issues with script permissions. Determination of RNA editing sites from alignments Navigate to your working directory and copy over the script (Data S1) to call RNA editing sites. cd /directory_of_choice/ cp /home/jbiswas/RNA/HyperTRIBE/hg38/rnaedit_wtRNA_RNA_hg38.sh . Modify the heading of the copied script file (Data S1) to direct the script to the appropriate path to the CODE as well as the path to the genome annotation. HyperTRIBE_DIR="/home/jbiswas/RNA/HyperTRIBE/hg38" Annotationfile="/home/jbiswas/RNA/index/Homo_sapiens/UCSC/hg38/Annotation/Archives/archive-current/Genes/refFlat.txt" Modify the script (Data S1) to include your choice of table name, control sample (wtRNA), and different experimental samples or timepoints. wtRNAtablename="testRNA" wtRNAexp="rnalibs" wtRNAtp="2" RNAtablename="testRNA" RNAexp="rnalibs" timepoint=(3 4 5) Editing thresholds can be changed within the script. Recommended thresholds are 1% for the control sample, and 5% for each of the experimental samples. This provides stringent filtering of RNA editing sites. Run the modified script to generate files containing output from a single replicate (Expected Outcomes, Table 2).

Table 2

Example of output from a single sample

Chr	Edit coord	Name	Type	A count	T count	C count	G count	Total count	A_count gDNA/wtRNA	T_count gDNA/wtRNA
chr14	1.03E+08	Cln5	INTRON	18	0	0	2	20	34	0
chr14	14118481	Psmd6	INTRON	0	20	3	0	23	0	38
chr14	25696325	Ppif	INTRON	18	0	0	2	20	43	0
chr14	1.02E+08	Lmo7	EXON	18	0	0	2	20	19	0
chr14	87417269	Tdrd3	INTRON	24	0	0	4	28	28	0

After running the process_editing_sites.sh script, an excel file containing the background subtracted intersection of replicates from both samples is generated:

nohup./rnaedit_wtRNA_RNA.sh & Example of output from a single sample After running the process_editing_sites.sh script, an excel file containing the background subtracted intersection of replicates from both samples is generated: Overview of HyperTRIBE computational analysis Flow chart showing steps of data analysis and script paths that need to be changed throughout the protocol. Shell scripts are designated “.sh” and highlighted in bold.

TRIBE analysis – Postprocessing of data output

Timing: 1 day Once mapped, the mapped files can be used for other standard quantification such as differential gene expression analysis. This section of the protocol will discuss further processing of the mapped data to identify SNPs specific to RNA editing events in both control and experimental samples. Copy the process_editing_sites.sh file into your current working directory. cp /home/jbiswas/RNA/HyperTRIBE/process_editing_results.sh. Modify the location of the HyperTRIBE files and the file prefix. HyperTRIBE_DIR="/home/jbiswas/RNA/HyperTRIBE/ file_prefix=”MCP_TRIBE” Use process_editing_results.sh script to find the intersection of both experimental replicates and remove the background sites (Expected Outcomes, Table 3).

Table 3

Example of output from intersection of two biological replicates

Gene Name	Num Edit Sites	Average Editing	Features	Edit_percent_read_str	Identifier_str
Arid1a	4	12.1	EXON,EXON; …	12.3%_154r,10.4%_134r; …	chr4_133685443; …
Alg13	3	10.7	EXON,EXON; …	6.7%_89r,5.0%_100r; …	chrX_144352673; …
Dhx34	3	9.1	EXON,EXON; …	8.9%_123r,10.2%_137r; …	chr7_16198986; …
Katna1	3	12.2	INTRON,INTRON; …	7.1%_42r,6.8%_44r; …	chr10_7744094; …
Med22	3	24.1	EXON,EXON; …	11.5%_192r,17.0%_165r; …	chr2_26907730; …

Three inputs are required for the script: Bedgraph file containing WT editing sites with 1% editing or more. Bedgraph file containing Replicate 1 editing sites with 5% editing or more. Bedgraph file containing Replicate 2 editing sites with 5% editing or more. Alternatively, bedtools intersect can be used to perform these functions: intersect replicates remove control editing sites Example of output from intersection of two biological replicates Visualize the editing sites by importing the editing bedgraph files into IGV (Figure 6). It is often useful to simultaneously visualize coverage using the .bam and .bam.bai files generated after alignment.

Figure 6

RNA editing can be visualized with IGV and occurs both at and adjacent to the site of RNA binding

Figure modified with permission from Biswas et al., iScience 2020. β-actin gene, focusing on the MBS array (x-axis) showing (from top to bottom). MS2 sites at genomic locus (ground truth): blue boxes represent the known location of the MS2 stem loops. MCP-TRIBE alignment with multimapping: reads are depicted in gray (scale bar for number of reads on right), editing sites are depicted as red bars. Uniquely mapped read alignment: mRNA coverage without multimapping depicted in blue (scale bar for number of reads on right). MCP-TRIBE sites both replicates uniquely mapped: editing events as indicated by dark blue bars where height corresponds to the average editing percentage across both replicates at that nucleotide (scale to right). Light blue shading indicates location of the stem loop nucleotides.

Please refer to troubleshooting, problem 4 for issues with editing site discovery. RNA editing can be visualized with IGV and occurs both at and adjacent to the site of RNA binding Figure modified with permission from Biswas et al., iScience 2020. β-actin gene, focusing on the MBS array (x-axis) showing (from top to bottom). MS2 sites at genomic locus (ground truth): blue boxes represent the known location of the MS2 stem loops. MCP-TRIBE alignment with multimapping: reads are depicted in gray (scale bar for number of reads on right), editing sites are depicted as red bars. Uniquely mapped read alignment: mRNA coverage without multimapping depicted in blue (scale bar for number of reads on right). MCP-TRIBE sites both replicates uniquely mapped: editing events as indicated by dark blue bars where height corresponds to the average editing percentage across both replicates at that nucleotide (scale to right). Light blue shading indicates location of the stem loop nucleotides. Determine intersections of gene symbols with other datasets (such as Hi-C, CLIP or RIP). The gene symbols output from the TRIBE pipeline can be compared to gene symbols from other datasets using free online tools (http://genevenn.sourceforge.net/). Determine relative positions of TRIBE sites to known binding motifs or published CLIP peaks. The .bedgraph files from the TRIBE experiment can be intersected with bed files output from published CLIP datasets (https://www.encodeproject.org/) using bedtools function “intersect” https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html. Distances can be measured using the bedtools function “closest” (https://bedtools.readthedocs.io/en/latest/content/tools/closest.html).

Expected outcomes

RNA yields can vary widely by cell type. To assist the reader, we have included estimated yields of RNA extraction from MEFs and U2OS cells: 1,000 FACS sorted cells = 10–20ng 10,000 FACS sorted cells = 100–200ng 1 × 10 cm dish of adherent cells = greater than 10ug of RNA Input RNA should be uniform across samples with as much RNA as possible being used. After sequencing, reads should be quality controlled and libraries should have appropriate mapping statistics and minimal PCR duplicates (Figure 7).

Figure 7

Expected mapping parameters from mammalian HyperTRIBE

Data from Biswas et al. iScience 2020 and Van Nostrand et al. Nat. Methods 2016.

(A) Percentage of reads in FASTQ file that were uniquely mapped to the genome. Individual points represent individual experiments or biological replicates and error bars represent standard deviation of the mean. P values calculated using two-tailed Welch’s t-test. Processed CLIP data from (Van Nostrand et al., 2016).

(B) Percentage of reads in FASTQ file that are retained after complete processing. Major RNA seq processing steps include unique mapping and PCR duplicate removal. Individual points represent individual experiments or biological replicates and error bars represent standard deviation of the mean. P values calculated using two-tailed Welch’s t-test. Figure reprinted with permission from Biswas et al. iScience 2020. Processed CLIP data from (Van Nostrand et al., 2016).

Expected mapping parameters from mammalian HyperTRIBE Data from Biswas et al. iScience 2020 and Van Nostrand et al. Nat. Methods 2016. (A) Percentage of reads in FASTQ file that were uniquely mapped to the genome. Individual points represent individual experiments or biological replicates and error bars represent standard deviation of the mean. P values calculated using two-tailed Welch’s t-test. Processed CLIP data from (Van Nostrand et al., 2016). (B) Percentage of reads in FASTQ file that are retained after complete processing. Major RNA seq processing steps include unique mapping and PCR duplicate removal. Individual points represent individual experiments or biological replicates and error bars represent standard deviation of the mean. P values calculated using two-tailed Welch’s t-test. Figure reprinted with permission from Biswas et al. iScience 2020. Processed CLIP data from (Van Nostrand et al., 2016). If using the MS2-TRIBE protocol, it is required that one provide a FASTA file with the modified locus. An example FASTA file format for generating custom genome includes a header line beginning with the greater than symbol “>”, the second line contains the nucleotide sequence: >chrZ GCGGCCGCGCTCACAGCCATCTGTAATGGGATCTGATGCCTTCTTCTGTC… If using the MS2-TRIBE protocol, it is required that one provide a Gene Transfer Format (GTF) file with annotations for the modified locus. Features such as start codon, CDS, exon and stop codon should be included.

Quantification and statistical analysis

Below we describe places in the protocol where editing thresholds are set. These variables are important to call editing sites with high confidence and were defined using the MCP/MBS system and hyperactive Drosophila ADAR. By comparing cells with and without the target RNA (MBS array), requiring a minimum of two editing sites per target, each at 5% or higher editing was found to minimize the background levels of RNA editing in the cells lacking MBS. If using a different ADAR catalytic domain (WT or human) or lower read depth the read coverage and percentage threshold levels in the RNA editing script may need to be adjusted. Trim_and_align.sh - STAR alignment output filtering (Data S1): Unique alignment (outFilterMultimapNmax): Set at 1, only reads that map to a single place in the genome are allowed. Percentage of mismatches allowed per read (outFilterMismatchNoverLmax): Set at 0.06, or 6% of the total read length. More details can be found in the STAR manual. Trim_and_align.sh - PCR deduplication (Data S1): Recommended, protocol uses PICARD on the sorted sam file to remove duplicate reads. In the future UMI based deduplication methods may also prove useful. RNAedit_wtRNA_RNA.sh - RNA Editing script (Data S1): Read coverage required for mapping: Set in find_rnaeditsites.pl at 9 reads. Set in rnaedit_wtRNA_RNA.sh at 20 read minimum coverage. Sets the percentage threshold for RNA editing events: Outputs editing events at both 1% and 5% editing thresholds. ○While the 5% editing threshold is used to find reproducible, high confidence editing sites in both replicates, the 1% file can be used to determine sub threshold editing events.

Limitations

Currently, TRIBE does not provide the nucleotide resolution binding profile of techniques such as in CLIP (Ule et al., 2003), CIMS analysis (Zhang and Darnell, 2011) or RBP binding profiles of eCLIP (Van Nostrand et al., 2016). Prior work has shown that the ADAR fusion can reach up to 300 nt away from a putative or known RBP binding site (Biswas et al., 2020; McMahon et al., 2016). Additionally, genetic fusion of RBPs to ADAR currently prevents researchers from profiling the targets of endogenous proteins or accessing binding site information within patient samples. While overexpression is minimized by a weak promoter, RBP fusions may be at higher levels than the endogenous promoter. This should be evaluated on a case by case basis and can be assisted by expressing the RBP of interest fused to a catalytically dead (E367A) adenosine deaminase control (Macbeth et al., 2005; Nguyen et al., 2020). While not limiting to TRIBE experiments, prolonged overexpression of hyperactive ADAR and other editing enzymes may cause editing within many coding sequence and lead to proteotoxic stress within the cell. This has made stable cell lines and animal models constitutively expressing the hyperactive ADAR difficult to generate. This does not appear to be the case with wtADAR (McMahon et al., 2016; Xu et al., 2018) and could be mitigated in the future with the use of inducible ADAR fusions. In the case of MS2-TRIBE, MS2 stem loops are integrated within an RNA of interest. This has been made easier with the advent of CRISPR-Cas9 based methods to perform homology directed repair (Spille et al., 2019), however the cell lines must be validated to show that insertion of the MS2 stem loops does not cause unexpected gene expression changes at the locus of interest (Tutucci et al., 2018).

Troubleshooting

Problem 1

Low transfection efficiency (step 2).

Potential solution

There are several possible causes of low-transfection efficiency, the major categories to troubleshoot include plasmid preparation, cell line compatibility and transfection reagent choice. For plasmid preparation, ensure that cloning was completed properly and confirmed with Sanger sequencing. Follow manufacturer recommended protocols for preparing plasmid, ensuring that high purity DNA is obtained. As a control, optimize transfection efficiency with control mCherry-ADAR plasmid (Addgene Plasmid #154786). Certain cells, such as MEFs are difficult to transfect and may require trying different transfection reagents until a suitable reagent is found. Always ensure that the cells are low passage, have been recently plated and are appropriate confluency for transfection. For the transfection process, we recommend following manufacturer recommend protocols for optimizing the ratio of DNA to transfection reagent.

Problem 2

Adapter dimers (127 nt) or primers (<85 nt) cause extra peaks in final library (Figure 8, left) that should be removed with an extra round of library purification (Figure 8, right) before proceeding to sequencing (step 14).

Figure 8

cDNA library with adapter peaks requires further library purification

cDNA library with adapter peaks requires further library purification Adapter dimers (127 nt) or primers (<85 nt) cause extra peaks in final library (Figure 8, left) that should be removed with an extra round of library purification (Figure 8, right) before proceeding to sequencing (step 14). X-axis, fragment size in nucleotides, y-axis, arbitrary fluorescence units (FU). Perform second round of library purification and re-analyze the fragment size until no adapter peak is present (right).

Problem 3

Issues running the editing pipeline (steps 17–19) may include: ‘Permission denied’ error when running shell scripts. Error in database creation: DBD::mysql::st execute failed – unable to create SQL table. We highly recommend using the provided virtual machine, this has all dependencies pre-installed and will minimize versioning and permission issues. Permission error can be due to user permissions on the script: chmod +x script.sh can be used to make a script executable. This error message occurs when tables are uploaded to the same database and can be ignored.

Problem 4

An expected editing site is not being called by the pipeline (steps 19–22). This could be caused by several issues: There is not enough read coverage to call the site with high confidence. The site may be present in only one replicate and not the other. There are multiple ways to investigate this: Low coverage sites can be improved biologically and computationally. Biologically - Increase coverage at the site by increasing experimental sequencing depth. This is the preferred solution. Computationally, the process_editing_results.sh script sets a threshold of 20 reads minimum coverage in the aligned file. This number can be made as low as 9 reads by changing the values in the script. Lowering the read threshold will increase the chance of false positive editing sites being called. The read threshold should be equal for both the control and experimental samples. It is important to visualize both the levels of editing and the coverage simultaneously for both replicates as well as mCherry ADAR controls. This can be done by opening the .bam or .bedgraph files in IGV.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Jeetayu Biswas (jeetayu.biswas@alumni.einsteinmed.org).

Materials availability

Plasmids encoding mCherry-ADAR (Addgene plasmid #154786) and MCP-ADAR (Addgene plasmid #154787) are available at Addgene.

Data and code availability

The accession number for the raw sequencing data and identified RNA editing sites reported in Biswas et al. (2020) is NCBI's Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152855) GEO: GSE152855. Code used to process and analyze the data is publicly available at (Zenodo: https://doi.org/10.5281/zenodo.4567690 and Github: https://github.com/rosbashlab/HyperTRIBE/). All other relevant data are available from the authors upon request.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Antibodies

Anti-GFP from mouse IgG1κ (clones 7.1 and 13.1)	Roche/Sigma	Cat# 11814460001, RRID:AB_390913

Bacterial and virus strains

Subcloning Efficiency DH5 Competent Cells	Thermo Fisher Scientific	Cat# 18265017

Biological samples

Mouse embryonic fibroblasts, beta-actin MBS	This study. (Biswas et al., 2020; Lionnet et al., 2011)	N/A
Mouse embryonic fibroblasts, no beta actin MBS	This study. (Biswas et al., 2020; Lionnet et al., 2011)	N/A

Chemicals, peptides, and recombinant proteins

EDTA	Sigma	Cat# EDS-100G
Ethanol 70% vol/vol made from 100% vol/vol ethanol and ultrapure water	Thermo Fisher Scientific	Cat# P288-500
jetPRIME transfection reagent	Polyplus	Cat# 114-15
TRIzol LS Reagent	Thermo Fisher Scientific	Cat# 10296010
TURBO DNA-free Kit	Thermo Fisher Scientific	AM1907

Critical commercial assays

NEBNext Ultra II Directional RNA Library Prep Kit for Illumina with Sample Purification Beads	New England Biolabs	Cat# E7765
NEBNext Poly(A) mRNA Magnetic Isolation Module	New England Biolabs	Cat# E7490
NEBNext Multiplex Oligos for Illumina (Index Primers Set 1)	New England Biolabs	Cat# E7335
PCR Purification Kit	Thermo Fisher Scientific	Cat# K310001
Qubit RNA HS Assay Kit	Thermo Fisher Scientific	Cat#Q32852
Plasmid DNA Midiprep/Maxiprep Kit	MACHEREY-NAGEL	Cat#740410 or Cat#740414

Deposited data

MS2-TRIBE data, raw and analyzed data	This study. (Biswas et al., 2020)	https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152855

Experimental models: cell lines

Human osteosarcoma, U2OS	(Janicki et al., 2004)	N/A
Immortalized mouse embryonic fbroblasts, MEFs	This study. (Biswas et al., 2020; Lionnet et al., 2011)	N/A

Recombinant DNA

Plasmid: mCherry-ADAR	Addgene	https://www.addgene.org/154786/
Plasmid: MCP-ADAR	Addgene	https://www.addgene.org/154787/

Software and algorithms

BEDTools	(Quinlan and Hall, 2010)	https://github.com/arq5x/bedtools2
Bowtie2	(Langmead and Salzberg, 2012)	http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Cutadapt	(Martin, 2011)	https://cutadapt.readthedocs.io/en/stable/installation.html
FastQC	Babraham Bioinformatics	https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
GraphPad Prism, version 8	GraphPad	https://www.graphpad.com/scientific-software/prism/
HTSeq, version 0.6.1	(Anders et al., 2015)	https://github.com/htseq/htseq
HyperTRIBE software	(Biswas et al., 2020)	https://github.com/rosbashlab/HyperTRIBEAll processing scripts can be found in Data S1 and are also present in the virtual machine (Zenodo: https://doi.org/10.5281/zenodo.4567690)
ImageJ v1.52p	(Schneider et al., 2012)	https://imagej.net
Integrated Genome Viewer (IGV)	(Robinson et al., 2011)	https://software.broadinstitute.org/software/igv/
MariaDB (v10.1 or later)	MariaDB Foundation	https://downloads.mariadb.org/
Perl (v5.8.8, v5.12.5, v5.22.1)	The Perl Programming Language	https://www.perl.org/get.html
Perl Module DBI.pm (v1.631, v1.636) and DBD mysql (v4.042)	MetaCPAN	https://www.perl.org/get.html
Picard (v2.8.2)	Broad Institute	https://broadinstitute.github.io/picard/
Python (v2.7.2 or later)	Python Software Foundation	https://www.python.org/downloads/
SAMtools	(Li et al., 2009)	http://samtools.sourceforge.net/
SRA Toolkit	(Leinonen et al., 2011)	https://github.com/ncbi/sra-tools
STAR (v2.7.3a or later)	(Dobin et al., 2013)	https://github.com/alexdobin/STAR
StringTie	(Pertea et al., 2015)	https://ccb.jhu.edu/software/stringtie/
Unix based operating system	CentOS or RedHat Enterprise Linux	https://www.centos.org/

Other

Electrophoresis instrument	Agilent	Bioanalyzer 2100 or equivalent
Fluorescence-activated cell sorter (FACS)	BD Biosciences	FACSAria II or equivalent
Illumina sequencing system	Illumina	HiSeq 4000 or NovaSeq or equivalent
PCR Machine	Bio-Rad	T100 or equivalent
Fluorimeter	Thermo Fisher Scientific	Qubit 4 or equivalent
Spectrophotometer	Thermo Fisher Scientific	NanoDrop ND-1000 or equivalent

MEF FACS buffer	Final concentration (mM or μM)	Amount
10× PBS	1×	1 mL
10% (w/v) BSA	1% (w/v)	1 mL
EDTA (0.5 M)	5mM	100 μL
DAPI (0.1 mg/mL)	0.1μg/mL	10 μL
ddH₂O (ultrapure)	n/a	7.89 mL
Total	n/a	10 mL

FACS buffer should be made fresh, in a sterile tissue culture hood and stored at 4°C until ready to harvest cells.

28 in total

1. A CRISPR/Cas9 platform for MS2-labelling of single mRNA in live stem cells.

Authors: Jan-Hendrik Spille; Micca Hecht; Valentin Grube; Won-Ki Cho; Choongman Lee; Ibrahim I Cissé
Journal: Methods Date: 2018-09-12 Impact factor: 3.608

2. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

3. Mechanistic insights into editing-site specificity of ADARs.

Authors: Ashani Kuttan; Brenda L Bass
Journal: Proc Natl Acad Sci U S A Date: 2012-11-05 Impact factor: 11.205

4. Dual inhibition of MDMX and MDM2 as a therapeutic strategy in leukemia.

Authors: Luis A Carvajal; Daniela Ben Neriah; Adrien Senecal; Lumie Benard; Victor Thiruthuvanathan; Tatyana Yatsenko; Swathi-Rao Narayanagari; Justin C Wheat; Tihomira I Todorova; Kelly Mitchell; Charles Kenworthy; Vincent Guerlavais; D Allen Annis; Boris Bartholdy; Britta Will; Jesus D Anampa; Ioannis Mantzaris; Manuel Aivado; Robert H Singer; Robert A Coleman; Amit Verma; Ulrich Steidl
Journal: Sci Transl Med Date: 2018-04-11 Impact factor: 17.956

5. CLIP identifies Nova-regulated RNA networks in the brain.

Authors: Jernej Ule; Kirk B Jensen; Matteo Ruggiu; Aldo Mele; Aljaz Ule; Robert B Darnell
Journal: Science Date: 2003-11-14 Impact factor: 47.728

6. Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data.

Authors: Chaolin Zhang; Robert B Darnell
Journal: Nat Biotechnol Date: 2011-06-01 Impact factor: 54.908

7. Integrative genomics viewer.

Authors: James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal: Nat Biotechnol Date: 2011-01 Impact factor: 54.908

8. HTSeq--a Python framework to work with high-throughput sequencing data.

Authors: Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal: Bioinformatics Date: 2014-09-25 Impact factor: 6.937

9. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP).

Authors: Eric L Van Nostrand; Gabriel A Pratt; Alexander A Shishkin; Chelsea Gelboin-Burkhart; Mark Y Fang; Balaji Sundararaman; Steven M Blue; Thai B Nguyen; Christine Surka; Keri Elkins; Rebecca Stanton; Frank Rigo; Mitchell Guttman; Gene W Yeo
Journal: Nat Methods Date: 2016-03-28 Impact factor: 28.547

10. MS2-TRIBE Evaluates Both Protein-RNA Interactions and Nuclear Organization of Transcription by RNA Editing.

Authors: Jeetayu Biswas; Reazur Rahman; Varun Gupta; Michael Rosbash; Robert H Singer
Journal: iScience Date: 2020-06-28

1 in total

Review 1. Targeted RNA editing: novel tools to study post-transcriptional regulation.

Authors: Weijin Xu; Jeetayu Biswas; Robert H Singer; Michael Rosbash
Journal: Mol Cell Date: 2021-11-04 Impact factor: 17.970

1 in total