| Literature DB >> 31792271 |
David Redin1, Tobias Frick1, Hooman Aghelpasand1, Max Käller1, Erik Borgström1, Remi-Andre Olsen2, Afshin Ahmadian3.
Abstract
The future of human genomics is one that seeks to resolve the entirety of genetic variation through sequencing. The prospect of utilizing genomics for medical purposes require cost-efficient and accurate base calling, long-range haplotyping capability, and reliable calling of structural variants. Short-read sequencing has lead the development towards such a future but has struggled to meet the latter two of these needs. To address this limitation, we developed a technology that preserves the molecular origin of short sequencing reads, with an insignificant increase to sequencing costs. We demonstrate a novel library preparation method for high throughput barcoding of short reads where millions of random barcodes can be used to reconstruct megabase-scale phase blocks.Entities:
Mesh:
Year: 2019 PMID: 31792271 PMCID: PMC6889410 DOI: 10.1038/s41598-019-54446-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the phasing technology. High molecular weight DNA fragments are diluted and tagmented with bead-linked transposases. DNA-loaded beads are put into emulsion droplets with barcoding oligonucleotides and primers for amplification, and the constituents of each original molecule is coupled to a unique barcode sequence through emulsion PCR. Following the removal of uncoupled molecules, the library undergoes standard short read sequencing and subsequent grouping of reads according to the barcode sequence. The resulting barcode-linked reads are utilized for long-range DNA phasing, genome-wide haplotyping or reference-free genome assembly.
Phasing analysis and variant calling for internal 19X and 35X datasets, as well as for the 42X dataset from 10x Genomics.
| Library | GM24385 (19X) | GM24385 (35X) | GIAB (10x Genomics) |
|---|---|---|---|
| Sequencing reads | 641,457,522 | 1,080,294,792 | 976,557,530 |
| Mean depth | 19.1 X | 34.7 X | 41.7 X |
| SNPs Phased | 97.9% | 98.8% | 98.50% |
| N50 Phase Block | 1,832,815 bp | 2,812,019 bp | 9,657,460 bp |
| Longest Phase Block | 7,771,012 bp | 11,919,151 bp | 35,805,844 bp |
| Mean Molecule Length | 25,946 bp | 26,780 bp | 104,745 bp |
| Molecules >20 kb | 74.8% | 74.8% | 92.2% |
| Molecules >100 kb | 18.6% | 18.6% | 44.7% |
| LSV Calls* | 35 | 35 | 35 |
| Short Deletion Calls | 4,008 | 4,047 | 4,383 |
| Median Insert Size | 231 bp | 234 bp | 308 bp |
| Mapped Reads | 82.7% | 89.6% | 96.2% |
| Zero Coverage | 0.735% | 0.507% | 0.178% |
| Q30 bases, Read 1 | 82.4% | 86.2% | 100%** |
| Q30 bases, Read 2 | 64.2% | 69.3% | 100%** |
| SNV Calls (Q > 60) | 3,445,072 | 3,804,691 | 4,054,372 |
| SNV Detection Sensitivity | 76.1% | 83.3% | 85.1% |
| SNV Detection Accuracy | 95.4% | 94.6% | 90.7% |
*Large structural variant (LSV) calls featured multiple heterozygous deletions calls in chromosome X, which following correspondence with 10x Genomics were confirmed as erroneous (Supplementary Table S3).
**Figures indicate an undisclosed pre-filtering of sequencing reads Q < 30 for the dataset from 10x Genomics.
For all datasets, SNV detection sensitivity and accuracy was calculated by comparing to the GIAB ‘ground truth’ callset for GM24385 for SNVs with phred score >60. The reference dataset contained 4,756,689, of which 4,319,399 SNVs had a minimum phred score (Q) of 60. See Supplementary Table S5 for raw SNV counts.
Figure 2Whole genome haplotyping results. (a) Sequence data of haplotype-resolved human genome, GM24385 (19X). From center, phased SNV density and relative read coverage for haplotype 1 (red), phased SNV density and relative read coverage for haplotype 2 (orange), total read coverage (light grey) on a scale from 0 to 25X. The localization of large structural variants is visualized by bands in grey (not drawn to scale). (b) Heatmap of barcode overlap for reads spanning called structural variants with a window size of 250 kb, an 86.0 kb inversion in chromosome 12 (left) and a 40.8 kb heterozygous deletion in chromosome 2 (right). Relative read coverage collapsed for the two haplotypes shown in grey, x and y axis are identical. (c) Barcode-linked reads of a heterozygous deletion identified in chromosome 4, with reads assigned to either haplotype, and where the reads on each line share a mutually exclusive barcode. Reads in the top haplotype (shown in orange) are linked across the deletion spanning 49.5 Kb.