| Literature DB >> 27121965 |
Navin Rustagi1,2, Oliver A Hampton3,4, Jie Li3,5, Liu Xi3, Richard A Gibbs3,4, Sharon E Plon3,4,6, Marek Kimmel3, David A Wheeler3,4.
Abstract
BACKGROUND: Detection of tandem duplication within coding exons, referred to as internal tandem duplication (ITD), remains challenging due to inefficiencies in alignment of ITD-containing reads to the reference genome. There is a critical need to develop efficient methods to recover these important mutational events.Entities:
Keywords: AML; Assembly; Cancer genetics; Clustering; Data mining; De Bruijn graphs; FLT3; Somatic mutations; Tandem duplication
Mesh:
Substances:
Year: 2016 PMID: 27121965 PMCID: PMC4847212 DOI: 10.1186/s12859-016-1031-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1ITD Assembler workflow schema. The major ITD Assembler processing steps are depicted with transitional blue arrows decreasing in size, representing a serial reduction in the number of reads at each step. To the left of the ITD Assembler workflow is an example read set containing two repeated kmers, one 6 bases and another 7 bases apart, which are placed into bins 6 and 7. De Bruijn graph construction is performed on these three example reads using, for illustration purposes, kmer length 3. The integer values near each vertex are the coverage of each vertex in the graph. There are two cycles of length 6 and 7 in this graph representing the two independent ITDs. OLC assembly is performed on those read sets from bins 6 and 7, with resulting contigs being annotated and ITDs reported
Somatic FLT3-ITD detection in TCGA data
| TCGA ID | WES Avg Cov | Read length | TCGAa ITD (length) | ITD Asm (length) | ITD Asm (Mut af) | Pindelb (length) | Pindel (Mut af) | Genomonc (length) |
|---|---|---|---|---|---|---|---|---|
| 2853 | 182.3 | 100 | 18 | 18 | 0.011 | |||
| 2840 | 124.1 | 100 | 18 | 18 | 0.011 | |||
| 2877 | 225.2 | 75 | 18 | 18 | 0.232 | 18 | ||
| 2880 | 180.4 | 75 | 21 | 21 | 0.254 | 21 | ||
| 2918 | 76.5 | 75 | 21 | 21 | 0.113 | 21 | ||
| 2942 | 212.7 | 75 | 24 | 24 | 0.159 | 24 | ||
| 2875 | 229.9 | 75 | 30 | 30 | 0.187 | 30 | ||
| 2836 | 217.7 | 75 | 33 | 33 | 0.054 | |||
| 2879 | 186.9 | 75 | 33 | 33 | 0.355 | 33 | ||
| 2922 | 172.8 | 75 | 33 | 33 | 0.262 | 33 | ||
| 2925 | 141.5 | 75 | 42 | 42 | 0.291 | 42 | 0.024 | 42 |
| 2930 | 142.1 | 100 | 42 | 42 | ||||
| 2895 | 151.4 | 75 | 45 | 45 | 0.201 | 45 | 0.059 | 45 |
| 2812 | 206.3 | 75 | 51 | 51 | 0.412 | 51 | 0.109 | |
| 2869 | 198.8 | 100 | 54 | 54 | 0.017 | 54 | ||
| 2830 | 267.7 | 75 | 69 | 69 | 0.038 | 56 | 0.025 | 42 |
| 2921 | 119.3 | 100 | 24 | 24 | ||||
| 2871 | 210.2 | 75 | 63 | |||||
| 2913 | 100.5 | 75 | 66 | 66 | 0.085 | 66 | ||
| 2931 | 71.1 | 75 | 75 | 70 | 0.316 | |||
| 2844 | 223.5 | 75 | 87 | 87 | 0.156 | |||
| 2825 | 98.2 | 100 | 102 | 102 | 0.023 | |||
| 2809 | 113.6 | 100 | 30 | 0.019 | ||||
| 2949 | 174.1 | 75 | 39 | 0.021 | ||||
| 2915 | 75.2 | 75 | 51 | 0.058 | 49 | 0.024 | 51 | |
| 2895 | 151.4 | 75 | 51 | 0.228 | 50 | 0.051 | ||
| 2934 | 149.9 | 75 | 57 | 0.058 | 56 | 0.071 | 57 | |
| 2823 | 212.0 | 75 | 57 | 0.137 | 57 | 0.062 | ||
| 2833 | 206.1 | 75 | 75 | 0.004 | ||||
| 2862 | 207.8 | 75 | 69 | 0.200 | 69 | |||
| 2923 | 197.1 | 75 | 74 | 0.025 | 74 | |||
| 2918 | 76.48 | 75 | 88 | 0.097 | 88 | |||
| 2919 | 189.8 | 75 | 93 | 0.022 | 93 | |||
| 2959 | 164.9 | 75 | 118 | 0.049 | ||||
| 2896 | 146.6 | 75 | 153 | 0.084 | 153 | |||
| 2921 | 119.3 | 100 | 57 | |||||
| Total | 22 | 22 | 18 | 22 |
All lengths are in base pairs
areference 1, TCGA AML marker paper used as reference set for algorithmic discovery
breference 23, Ye et al. [18]
creference 24, Chiba et al. [17]