| Literature DB >> 23941359 |
Lucas Swanson1, Gordon Robertson, Karen L Mungall, Yaron S Butterfield, Readman Chiu, Richard D Corbett, T Roderick Docking, Donna Hogge, Shaun D Jackman, Richard A Moore, Andrew J Mungall, Ka Ming Nip, Jeremy D K Parker, Jenny Qing Qian, Anthony Raymond, Sandy Sung, Angela Tam, Nina Thiessen, Richard Varhol, Sherry Wang, Deniz Yorukoglu, Yongjun Zhao, Pamela A Hoodless, S Cenk Sahinalp, Aly Karsan, Inanc Birol.
Abstract
BACKGROUND: Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23941359 PMCID: PMC3751903 DOI: 10.1186/1471-2164-14-550
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Chimeric transcript event types. A) A fusion in which the first two exons of gene A are joined to the last two exons of gene B. B) A partial tandem duplication in which the second exon of gene A is duplicated. NCEJ marks the non-canonical exon junction between the two copies of exon A2. C) An internal tandem duplication in which a portion of the second exon of gene A is duplicated, internal to the exon. D) A circular transcript involving only the second exon of gene A. Note that it contains the same A2-A2 NCEJ as the PTD in (B).
Figure 2Stages of the Barnacle pipeline.
Figure 3Details of the Barnacle pipeline. A) Contrasting a collinear alignment topology (i) with non-collinear topologies: (ii) interchromosomal, which involves alignment to two chromosomes; (iii) inversion, which involves alignment to two strands; (iv) eversion, which involves alignment with a reversal of block ordering; and (v) duplication, which involves multiple alignment to the same region. B) (i) Pieces of the contig can be aligned to different regions in the genome, with ‘q’ denoting the quality of each alignment, normalized to the range [0,1]. (ii) Alignments 1 and 5 are selected, because of their high qualities and inclusion, and their low overlap. C) Alignment selection can result in one of four cases: (i) a single ungapped alignment is selected, (ii) a single gapped alignment is selected, (iii) a pair of alignments is selected, or (v) more than two alignments are selected. D) In gap contigs a piece of the contig does not take part in the initial contig-to-genome alignment. Gap contigs are checked for duplications (i) by realigning the gap sequence back to the contig with the original gap location masked, and for inversions (ii) by realigning the gap sequence to a region of the genome determined by the original contig-to-genome alignment. E) Fusions can have homologous sequence near the breakpoint that makes it impossible to determine the precise breakpoint position. F) For split candidates (i), read support is calculated in the region surrounding the overlap of the two contig-to-genome alignments. For gap candidates involving a duplication (ii), read support is calculated in the region between the two copies of the duplicated sequence.
Figure 4Relative coverage of predicted chimeric transcripts in AML datasets. Graphs show the ratio of (C)himeric read depth to (T)otal read depth as a function of (T)otal read depth, where T is the sum of read depth due to chimeric and wild-type transcripts. Dots indicate predictions that pass manual review and validation; crosses indicate predictions that fail manual review (see text). Lines indicate chimeric read-to-contig support levels of 5, 10, 20, and 35. A) Fusion predictions in A08823. Each fusion is represented by a triangle pointing down that uses the minimum value of T from the two genes involved in the fusion, and a triangle pointing up that uses the maximum. B) PTD predictions in A08823. C) ITD predictions in A08823. D) Fusion predictions in A08878. Triangle directions are as in (A). E) PTD predictions in A08878. F) ITD predictions in A08878.
Barnacle predictions in AML datasets A08823 and A08878
| 1 | fusion | PML/RARA | e3/e3 | 192 | A08823 | WGS | 27.8%/40.0% |
| 2 | fusion | RARA/PML | e2/e4 | 276 | A08823 | WGS | 33.1%/33.2% |
| 3 | fusion | TMEM14B/TMEM14C | 3′-utr/3′-utr | 110 | A08823 | Failed MI | 18.8%/9.8% |
| 4 | PTD | MLL | e3-e6 | 80 | A08878 | WGS | 28.2% |
| 5 | PTD | SEC62 | e3-e7 | 40 / 69 | both | No WGS, RT-PCR | 5.0%/8.2%5 |
| 6 | ITD | ACADVL | e1 | 236 | A08823 | WGS | 27.7% |
| 7 | ITD | ACIN1 | e6 | 259 / 655 | both | WGS | 61.9%/80.4%5 |
| 8 | ITD | AKAP2 | e2 | 61 | A08878 | WGS | 33.6% |
| 9 | ITD | DNHD1 | e21 | 76 | A08878 | WGS | 99.1% |
| 10 | ITD | FLT36 | e14 | 268 | A08823 | WGS | 21.8% |
| 11 | ITD | FLT36 | e14 | 950 | A08878 | WGS | 19.6% |
| 12 | ITD | FOXP1 | 3′-utr | 64 | A08878 | WGS | 19.6% |
| 13 | ITD | HSPBP1 | e3 | 56 | A08878 | WGS | 17.8% |
| 14 | ITD | KIAA1211 | e8 | 44 | A08823 | WGS | 51.3% |
| 15 | ITD | MRPS34 | e1,i1 | 52 | A08823 | WGS | 40.5% |
| 16 | ITD | PIEZO1 | e32 | 620 | A08878 | WGS | 57.0% |
| 17 | ITD | SND1 | e1 | 370 | A08823 | WGS | 9.3% |
| 18 | ITD | SSPO | e74 | 35 | A08878 | WGS | 40.3% |
1 Exon numbers are from hg19 UCSC gene annotations.
2 For the two chimeras predicted in both datasets, read support is presented as A08823 support / A08878 support.
3 Validation. WGS: validated via whole-genome shotgun sequencing. RT-PCR: validated via RT-PCR. Failed MI: failed manual inspection.
4 Relative coverage is presented as the local coverage attributable to the chimera, as a percent of the total local coverage (see Stage 5). For fusions, relative coverage with each parental gene is ordered as in Gene(s) column.
5 Relative coverage is presented as A08823 relative coverage / A08878 relative coverage.
6 The FLT3 duplications predicted in A08823 and A08878 have different sequences.
Characterization of Barnacle ITD predictions in A08823 and A08878
| 6 | ACADVL | e1 | A08823 | 15 (IF) | No | rs66549614, rs3835013, rs6145976 |
| 7 | ACIN1 | e6 | both | 6 (IF) | Yes | rs34293824, rs5807202, rs34870944, rs78930189,rs3077646 |
| 8 | AKAP2 | e2 | A08878 | 6 (IF) | Yes | rs77728978 |
| 9 | DNHD1 | e21 | A08878 | 11 + 1 (IF) | No | rs11270441, rs35685553, rs11268490, rs35369957 |
| 10 | FLT3 | e14 | A08823 | 48 (IF) | No | none |
| 11 | FLT3 | e14 | A08878 | 42 + 3 (IF) | No | none |
| 12 | FOXP1 | 3′-utr | A08878 | 6 (IF) | No | rs67554413 |
| 13 | HSPBP1 | e3 | A08878 | 9 (IF) | Yes | rs3040014, rs71743637, rs10701478, rs71927276 |
| 14 | KIAA1211 | e8 | A08823 | 15 + 3 (IF) | Yes | rs71921617, rs11276076, rs67121617 |
| 15 | MRPS34 | e1,i1 | A08823 | 4 (FS4) | No | rs4027362, rs33993627,rs34595082 |
| 16 | PIEZO1 | e32 | A08878 | 6 (IF) | Yes | rs11281795, rs71707279 |
| 17 | SND1 | e1 | A08823 | 21 (IF) | No | none |
| 18 | SSPO | e74 | A08878 | 5 (FS) | Yes | none |
1 Exon numbers are from hg19 UCSC gene annotations.
2 Length is given as either “duplication length” or “duplication length + insertion length”, when extra sequence occurs between the two copies of the duplicated sequence.
3 IF: in-frame, FS: frame-shift.
4 Event involves retention of a 152-nt intron adjacent to the duplication and is in-frame when this intron is considered as well.