| Literature DB >> 28475621 |
Mikhail Shugay1,2,3, Andrew R Zaretsky1,2,4, Dmitriy A Shagin1,2, Irina A Shagina1,2, Ivan A Volchenkov2,4, Andrew A Shelenkov2,4, Mikhail Y Lebedin1,2,3, Dmitriy V Bagaev1, Sergey Lukyanov1,2, Dmitriy M Chudakov1,2,3,5.
Abstract
Unique molecular identifiers (UMIs) show outstanding performance in targeted high-throughput resequencing, being the most promising approach for the accurate identification of rare variants in complex DNA samples. This approach has application in multiple areas, including cancer diagnostics, thus demanding dedicated software and algorithms. Here we introduce MAGERI, a computational pipeline that efficiently handles all caveats of UMI-based analysis to obtain high-fidelity mutation profiles and call ultra-rare variants. Using an extensive set of benchmark datasets including gold-standard biological samples with known variant frequencies, cell-free DNA from tumor patient blood samples and publicly available UMI-encoded datasets we demonstrate that our method is both robust and efficient in calling rare variants. The versatility of our software is supported by accurate results obtained for both tumor DNA and viral RNA samples in datasets prepared using three different UMI-based protocols.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28475621 PMCID: PMC5419444 DOI: 10.1371/journal.pcbi.1005480
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Datasets used for MAGERI benchmark.
| Dataset | Source | UMI tagging method | Sequencing method | Control variants |
|---|---|---|---|---|
| Tru-Q 7 | This study | Linear PCR | Illumina HiSeq | 27 substitutions, 1 deletion. Variant frequency 1–30%, see |
| Tru-Q 7, 1:9 diluted with healthy donor PBMC DNA | This study | Linear PCR | Illumina HiSeq | 27 substitutions, 1 deletion. Variant frequency 0.1–3%, see |
| Healthy donor PBMC DNA | This study | Linear PCR | Illumina HiSeq | None, all variants are either allelic or erroneous |
| Tumor and plasma DNA from two cancer patients | This study | Linear PCR | Illumina HiSeq | BRAF V600E in tumor |
| Duplex sequencing | [ | Ligation | Illumina HiSeq | ABL1 E279K at 1% frequency |
| HIV amplicon sequencing | [ | RT-PCR | Illumina MiSeq | N/A |
| Torrent | [ | PCR | Ion Torrent | N/A |
*—no intrinsic control variants available. Two samples were used: supernatant from 8E5 cell line that should yield unmutated HIV cDNA and HIV cDNA from patient plasma.
**—only sequencing data for a single template with no appropriate variants is publicly available