| Literature DB >> 34211161 |
Martin Philpott1,2, Jonathan Watson3, Anjan Thakurta2,4,5, Tom Brown3,6, Tom Brown3,6, Udo Oppermann7,8,9, Adam P Cribbs10,11.
Abstract
Here we describe single-cell corrected long-read sequencing (scCOLOR-seq), which enables error correction of barcode and unique molecular identifier oligonucleotide sequences and permits standalone cDNA nanopore sequencing of single cells. Barcodes and unique molecular identifiers are synthesized using dimeric nucleotide building blocks that allow error detection. We illustrate the use of the method for evaluating barcode assignment accuracy, differential isoform usage in myeloma cell lines, and fusion transcript detection in a sarcoma cell line.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34211161 PMCID: PMC8668430 DOI: 10.1038/s41587-021-00965-w
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 68.164
Fig. 1Developing a strategy to error-correct barcode and UMI sequences from droplet-based sequencing.
a, Schematic bead and oligonucleotide structure using dimer blocks of nucleotides for Buc-seq. b, Cell barcode-assignment strategy. c, UMI deduplication strategy. d, Simulated data showing the number of barcodes recovered with increasing simulated sequencing error rates. e,f, Simulated data showing the difference and coefficient of variation between the deduplicated UMIs and the ground truth. Correction of the UMI counts was performed using a basic directional network-based approach after accounting for sequencing errors within homodimeric blocks of nucleotides.
Fig. 2scCOLOR-seq identifies transcript isoform diversity and fusion transcripts in cancer cell line models.
a,b, Human HEK293T and mouse 3T3 cells were mixed at a 1:1 ratio and approximately 1,200 cells were taken for encapsulation and cDNA synthesis followed by nanopore sequencing. a, A Barnyard plot showing the expression of mouse and human UMIs before quality filtering using an edit distance of 6. b, A UMAP plot of data after quality filtering showing the clustering of human, mouse or mixed human and mouse cells after barcode correction using an edit distance of 6. Insets: bar plots show the specificity of UMIs aligning to either the human or mouse UMAP cluster. c–h, NCI-H929, DF15 and JJN3 myeloma cell lines were mixed at a 1:1:1 ratio and approximately 1,200 cells were taken for cDNA synthesis and sequenced using a PromethION flow cell. c,d, UMAP plot of gene expression (c) and transcript isoform expression (d). e, Principal CD74 (also known as HLA-DR) splice variants showing all protein-coding transcripts. f–h, UMAP plot showing the isoform expression of detected CD74 transcripts ENST00000377775.7 (f), ENST00000353334.10 (g) and ENST00000009530.12 (h). i, A UMAP plot of total fusion transcripts in Ewing’s cells mapped as a parentage of the total RNA of the cell. j, A UMAP plot showing the expression of the EWS-FLI fusion transcript. k, A schematic showing the structure of the EWSR1 and FLI1 genes. The EWS-FLI fusion transcript consists of the 5′ end of the EWSR1 gene and the 3′ end of the FLI1 gene. Arrowheads denote known fusion events and the most common type-1 fusion transcript is shown. l, A circular representation of the fusion transcripts identified between FLI1 and EWSR1.