| Literature DB >> 27940562 |
Marie-Jeanne Arguel1, Kevin LeBrigand1, Agnès Paquet1, Sandra Ruiz García1, Laure-Emmanuelle Zaragosi1, Pascal Barbry1, Rainer Waldmann1.
Abstract
Single cell RNA sequencing approaches are instrumental in studies of cell-to-cell variability. 5΄ selective transcriptome profiling approaches allow simultaneous definition of the transcription start size and have advantages over 3΄ selective approaches which just provide internal sequences close to the 3΄ end. The only currently existing 5΄ selective approach requires costly and labor intensive fragmentation and cell barcoding after cDNA amplification. We developed an optimized 5΄ selective workflow where all the cell indexing is done prior to fragmentation. With our protocol, cell indexing can be performed in the Fluidigm C1 microfluidic device, resulting in a significant reduction of cost and labor. We also designed optimized unique molecular identifiers that show less sequence bias and vulnerability towards sequencing errors resulting in an improved accuracy of molecule counting. We provide comprehensive experimental workflows for Illumina and Ion Proton sequencers that allow single cell sequencing in a cost range comparable to qPCR assays.Entities:
Mesh:
Substances:
Year: 2017 PMID: 27940562 PMCID: PMC5397152 DOI: 10.1093/nar/gkw1242
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.On chip barcoding workflow. After cell lysis in 4.5 nl poly-adenylated RNA is reverse-transcribed in 31.5 nl with an anchored oligodT primer. A PCR primer sequence and unique molecular identifiers (UMIs) are added to the 3΄ end of the cDNA via reverse transcriptase template switching. The cDNA is subsequently amplified and cell index sequences (barcode) as well as terminal biotins are introduced by PCR in the microfluidic device. The barcoded cDNAs are pooled, fragmented by tagmentation with Tn5 transposase and the biotinylated terminal fragments are isolated on streptavidin beads. 5΄ terminal fragments are selectively amplified and additional sequences required for Ion Torrent sequencers are introduced by PCR. For a detailed protocol see Supplementary Data and for Illumina sequencers see Supplementary Data.
Figure 2.UMI optimization and reproducibility of the protocol. (A–C) Impact of the TSO design on UMI usage bias. We examined TSOs with either a N7N6 UMI (A) and N4H4 UMI (HUMI) (B, C). The 3΄ terminal nucleotide of the TSO was either a LNA-guanosine (A, B) or a ribo guanosine (C). The weblogos represent the frequency at which we found each nucleotide at the given positions of the UMI in our genome matched sequencing reads. The bar graphs below show the percentage of the total transcript molecules associated with the top 10 and top 100 most frequently found UMI sequences. Data are from 100 pooled HEK293 cells processed in tubes. (D) Pairwise correlations of transcript (UMI) counts for three biological replicates of 100 HEK293 cells with the TSO-HUMI-rG3 (C). Data shown are log2(counts+1), R: Pearson correlation coefficient.
Figure 3.Single cell sequencing. (A) Impact of UMI error filtering strategies. Percentage of filtered UMIs for different UMI error correction strategies. Filtering strategies were: Percentile, UMIs with a read coverage of less than the indicated fraction (P 1%, P 10%, P 20%, P 50%) of the average UMI read coverage of the corresponding gene were discarded; Edit distance (ED) = 1, UMIs that differ in just one nucleotide were merged into a single UMI. The percentages of eliminated UMIs were: P1%, 0.01%; P10%, 0.20%; P20%, 14.25%; P50%, 40.43%; ED = 1, 22.61%. Data are from one cell. (B, C) Number of ERCC (B) and transcript molecules (C) detected for each cell (means (dashed lines)/c.v.: ERCCs, 3558/15.7%; transcripts, 62 841/36.7%). (D) Number of genes detected for each cell (mean = 6679 (dashed line); c.v. = 16.9%). (E) Scatter plot showing the number of input ERCCs vs. the number of detected ERCCs (means ± SD). The capture efficiency (26%) was calculated from the intercept of the regression line and the y-axis. (F) Distribution of read starts on annotated transcripts in one % bins between the 5΄ (0%) and the 3΄ end (100%). Data are means ± SD (red bars) for 47 cells. (G) Heatmap of the pairwise correlation of ERCC molecules for 47 cells. (H) As (G) but for mRNAs. (I) Correlation between transcript (UMI) counts (log2(counts + 1)) for pools of 100 HEK293 cells sequenced on an Ion Proton or Illumina Nextseq 500, respectively. Data are means from two pools of 100 HEK293 cells processed in tubes. (J) Correlation of HEK293 single cell transcript (UMI) counts (log2(counts+1)) between our Fluidigm C1 data and previously published Dropseq data (16). Transcript counts are means from 47 cells (Fluidigm) or 259 cells (Dropseq). Average numbers of transcript molecules detected per cell were: Fluidigm, 62,841; Dropseq, 36,746. R: Pearson correlation coefficient.