| Literature DB >> 27717304 |
Charles Girardot1, Jelle Scholtalbers2, Sajoscha Sauer2, Shu-Yi Su2, Eileen E M Furlong2.
Abstract
BACKGROUND: The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs.Entities:
Keywords: Duplicates; Genomics; Multiplexing; NGS; Software; UMI
Mesh:
Year: 2016 PMID: 27717304 PMCID: PMC5055726 DOI: 10.1186/s12859-016-1284-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Barcoding Strategies. a Schematic view of the multiplexed library processing. A unique and different barcode (BC, white box with black stripes) is used for each sample. The barcode is placed further down the DNA fragment and sequenced in a specific sequencing round (Illumina® TruSeq™, left); or directly upstream the DNA fragment and sequenced concomitantly (custom protocol, right). After sequencing and image processing, reads of multiplexed samples are mixed together in the fastq result file. For each read, the barcoding sequence (black box with white stripes) is computationally clipped off the read end (custom protocols) or read from the additional barcode file (Illumina® TruSeq™, index file is provided with the I1 option); and the original sample is identified by comparing this barcoding sequence to known barcodes. Finally, read sequences are saved in sample specific fastq files. b In PE sequencing, barcodes can be added to one or both fragment ends. The Je demultiplex BPOS option indicates which read(s) contain(s) the barcode(s). c demultiplex options for barcodes present at both read ends. A decision is needed to specify which barcode is used to identify separate samples. d Combining UMIs (BC1 and BC2, white box with black stripes) with Illumina sample indexing (white box with black dots, top) or as composite barcode (bottom). In a composite barcode, the number of random base upstream and downstream the sample index is variable
Fig. 2The different modules of Je (green squared blocks) and their usage in workflows. The clip, demultiplex and demulitplex-illu are the three possible entry points to process barcoded fastq files (blue squared blocks). In most setups (plain arrows), clipped or demultiplexed fastq files are mapped to the genome (grey squared block) using your favorite mapper and filtered for duplicate reads by the Je’s markdupes module using extracted UMIs. In more complex barcoding designs (e.g. composite barcodes, Supplementary Text), additional clipping before or after the sample demultiplexing step could be required (dashed arrows)