| Literature DB >> 28525570 |
David Redin1, Erik Borgström1,2, Mengxiao He1, Hooman Aghelpasand1, Max Käller1, Afshin Ahmadian1.
Abstract
Data produced with short-read sequencing technologies result in ambiguous haplotyping and a limited capacity to investigate the full repertoire of biologically relevant forms of genetic variation. The notion of haplotype-resolved sequencing data has recently gained traction to reduce this unwanted ambiguity and enable exploration of other forms of genetic variation; beyond studies of just nucleotide polymorphisms, such as compound heterozygosity and structural variations. Here we describe Droplet Barcode Sequencing, a novel approach for creating linked-read sequencing libraries by uniquely barcoding the information within single DNA molecules in emulsion droplets, without the aid of specialty reagents or microfluidic devices. Barcode generation and template amplification is performed simultaneously in a single enzymatic reaction, greatly simplifying the workflow and minimizing assay costs compared to alternative approaches. The method has been applied to phase multiple loci targeting all exons of the highly variable Human Leukocyte Antigen A (HLA-A) gene, with DNA from eight individuals present in the same assay. Barcode-based clustering of sequencing reads confirmed analysis of over 2000 independently assayed template molecules, with an average of 753 reads in support of called polymorphisms. Our results show unequivocal characterization of all alleles present, validated by correspondence against confirmed HLA database entries and haplotyping results from previous studies.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28525570 PMCID: PMC5569991 DOI: 10.1093/nar/gkx436
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Technological assay overview. DNA template molecules and degenerated barcoding oligonucleotides are encapsulated into droplets to contain one or zero of each molecule. Primers are added to enable amplification of each component in parallel until a critical concentration is reached, facilitating interaction between the two clonally amplified PCR products. A barcode sequence exclusive to each droplet is thereby coupled to each target loci from the original template molecule. After emulsion breakage the target products are enriched and prepared for sequencing. Read pairs then undergo barcode-based clustering and prevalent variations are called to produce a set of alleles for each sample ID (as determined by an ID-tag at the 5΄ end of template molecules and target 1 amplicons).
Experimentally derived haplotypes for HLA-A
| Reaction sample | Reads† [% trc] | Identified alleles | Barcode clusters | % trc | Found in sample ID | IPD-IMGT/HLA CORRESPONDENCE | External references |
|---|---|---|---|---|---|---|---|
| 8-Plex ID 01-08 | 1.74 M [54.2%] | A | 667 | 14.37% | 01, 03, 04, 05, 08 | A*02:01:01:01 | A*02:01 |
| B | 362 | 13.38% | 02, 03, 08 | A*11:01:01:01 | A*11:01 | ||
| C | 409 | 10.23% | 02, 05, 07 | A*01:01:01:01 | A*01:01 | ||
| D | 215 | 5.64% | 04, 06 | A*29:02:01:01, A*29:02:01:02 | A*29:02 | ||
| E | 134 | 3.98% | 01 | A*03:01:01:01 | No Data | ||
| F | 142 | 3.68% | 07 | A*24:02:01:01 | A*24:02 | ||
| G | 122 | 2.74% | 06 | A*26:01:01:01, A*26:01:01:03N | A*26:01 | ||
| ID 01 | 862 K [53.1%] | E | 744 | 34.81% | 01 | A*03:01:01:01 | No Data |
| A | 486 | 18.27% | A*02:01:01:01 | ||||
| ID 02 | 865 K [60.9%] | C | 563 | 33.17% | 02 | A*01:01:01:01 | A*01:01 |
| B | 419 | 27.44% | A*11:01:01:01 | A*11:01 | |||
| ID 03 | 1.05 M [72.9%] | B | 783 | 42.37% | 03 | A*11:01:01:01 | No Data |
| A | 730 | 30.93% | A*02:01:01:01 | ||||
| ID 04 | 833 K [57.3%] | D | 369 | 32.65% | 04 | A*29:02:01:01, A*29:02:01:02 | A*29:02 |
| A | 423 | 27.11% | A*02:01:01:01 | A*02:01 | |||
| ID 05 | 1.06 M [63.5%] | C | 552 | 37.74% | 05 | A*01:01:01:01 | A*01:01 |
| A | 524 | 32.13% | A*02:01:01:01 | A*02:01 | |||
| ID 06 | 1.05 M [64.1%] | D | 460 | 30.9% | 06 | A*29:02:01:01, A*29:02:01:02 | A*29:02 |
| G | 433 | 27.57% | A*26:01:01:01, A*26:01:01:03N | A*26:01 | |||
| ID 07 | 857 K [63.6%] | F | 718 | 33.13% | 07 | A*24:02:01:01 | A*24:02 |
| C | 787 | 30.50% | A*01:01:01:01 | A*01:01 | |||
| ID 08 | 1.02 M [69.0%] | B | 880 | 44.36% | 08 | A*11:01:01:01 | A*11:01 |
| A | 667 | 24.58% | A*02:01:01:01 | A*02:01 |
Results for the eight-plex and all singleplex reactions, featuring correspondence to the IPD-IMGT/HLA database and two independent sources (24,25) of haplotyping data. † Reads counts of barcode clusters used for classification of alleles. % trc corresponds to percentage of total read counts. Only matching database entries that have been confirmed and published are included. The table details the data analysis output of all alleles with support from at least two independent barcode clusters.
Figure 2.Assay haplotyping results visualized. (A) Data from all clusters with ≥20 reads, for the singleplex reaction carried out with sample ID 03. † Base calling of each position supported by ≥5 reads with a quality score of ≥20. (B) Pedigree visualization of allele heredity for extended family of individuals and a more focused view (targets covering exons 2 and 3) of non-reference base calls for each allele identified within these samples.