| Literature DB >> 27990269 |
Miika J Ahdesmäki1, Simon R Gray2, Justin H Johnson3, Zhongwu Lai3.
Abstract
Grafting of cell lines and primary tumours is a crucial step in the drug development process between cell line studies and clinical trials. Disambiguate is a program for computationally separating the sequencing reads of two species derived from grafted samples. Disambiguate operates on DNA or RNA-seq alignments to the two species and separates the components at very high sensitivity and specificity as illustrated in artificially mixed human-mouse samples. This allows for maximum recovery of data from target tumours for more accurate variant calling and gene expression quantification. Given that no general use open source algorithm accessible to the bioinformatics community exists for the purposes of separating the two species data, the proposed Disambiguate tool presents a novel approach and improvement to performing sequence analysis of grafted samples. Both Python and C++ implementations are available and they are integrated into several open and closed source pipelines. Disambiguate is open source and is freely available at https://github.com/AstraZeneca-NGS/disambiguate.Entities:
Keywords: NGS; disambiguation; explant; patient derived xenograft; sequencing
Year: 2016 PMID: 27990269 PMCID: PMC5130069 DOI: 10.12688/f1000research.10082.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Read pairs assigned human (hg19) and mouse (mm10) by both the disambiguate and xenome algorithms.
The ’ambiguous’ column includes reads that aligned but could not be unambiguously disambiguated. The [†] symbol and the numbers in parentheses indicate false positive reads prior to applying the disambiguation algorithm on the raw alignments. TP denotes true positive and FP false positive.
| Tool | Material | Sample | Total
| Mouse mm10 | Human hg19 | ambiguous |
|---|---|---|---|---|---|---|
| disambiguate | DNA | SRR1176814 (mouse) | 47312349 | 47197650 (99.76%) TP | 26157(0.06%) FP
| 88542 (0.19%) |
| xenome | DNA | SRR1176814 (mouse) | 47312349 | 46889894 (99.11%) TP | 20031 (0.04%) FP | 339326 (0.72%) |
| disambiguate | DNA | SRR1528269 (human) | 77268164 | 11502 (0.01%) FP
| 77102895 (99.79%) TP | 153767 (0.20%) |
| xenome | DNA | SRR1528269 (human) | 77268164 | 3291 (0.004%) FP | 76593625 (99.13%) TP | 521239 (0.67%) |
| disambiguate | RNA | SRR1930152 (mouse) | 24056144 | 23126086 (96.13%) TP | 80694 (0.34%) FP
| 849364 (3.53%) |
| xenome | RNA | SRR1930152 (mouse) | 24056144 | 23071432 (95.91%) TP | 43294 (0.18%) FP | 625302 (2.60%) |
| disambiguate | RNA | SRR387400 (human) | 59653070 | 94289 (0.16%) FP
| 49677937 (83.28%) TP | 9880844 (16.56%) |
| xenome | RNA | SRR387400 (human) | 59653070 | 83621 (0.14%) FP | 53851984 (90.28%) TP | 2043780 (3.43%) |
Figure 1. The disambiguation process illustrated.
Alignment is first performed against both species. The disambiguation application then operates on the raw, natural name sorted BAM files to assign the read pairs into one of the two species or as ambiguous for unresolved cases.