| Literature DB >> 31510681 |
Xiao Yang1, Yasushi Saito1, Arjun Rao1, Hyunsung John Kim1, Pranav Singh1, Eric Scott1, Matthew Larson1, Wenying Pan1, Mohini Desai1, Earl Hubbell1.
Abstract
MOTIVATION: Cell-free nucleic acid (cfNA) sequencing data require improvements to existing fusion detection methods along multiple axes: high depth of sequencing, low allele fractions, short fragment lengths and specialized barcodes, such as unique molecular identifiers.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31510681 PMCID: PMC6612805 DOI: 10.1093/bioinformatics/btz346
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 4.Computing maximum coverage of fragment f for a gene pair (g1, g2). g1 and g2 are two genes inferred to cover regions of f. g1 covers regions , and g2 cover regions . are start and end positions of f for region s
Fig. 1.An overview of AF4 workflow. Detailed descriptions are given in the main text
Fig. 2.Fusion event involving gene pairs (g1, g2) and readpairs (r1, r2). The green line denotes the fusion transcript derived from genes g1 and g2 with fusion junction point denoted by x. (a) Neither r1 nor r2 spans x, (b) r1 but not r1 spans x and (c) both r1 and r2 span x
Fig. 3.Fragment generation by stitching and overhang trimming of readpair (r1, r2). (a) r1 and r2 are represented as arrows facing each other denoting the forward and reverse complement strands. The green bars denote one of the shared kmers between them, which is an anchor for suffix—prefix alignment. The stitched fragment is a concatenation of prefix of r1, overlap and suffix of r2. (b) When r1 and/or r2 extends beyond the region of the other read, the overhang is trimmed, and f is the overlap. (c) When r1 and r2 cannot be merged, f is a concatenation of r1 and reverse complement of r2
AF4 for simulated datasets (Liu )
| Samples | AF4 | ChimeRScope | STAR-Fusion | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Targeted |
| ||||||||||
| Length (bp) | Coverage | TP | FP | FN |
| TP | FP | FN |
|
|
|
| 50 | 5× | 146 | 0 | 4 | 0.986 | 143 | 1 | 7 | 0.973 | 0.948 | 0.416 |
| 50 | 20× | 147 | 0 | 3 | 0.990 | 144 | 2 | 6 | 0.973 | 0.954 | 0.855 |
| 50 | 50× | 147 | 0 | 3 | 0.990 | 144 | 2 | 6 | 0.973 | 0.947 | 0.867 |
| 50 | 100× | 147 | 0 | 3 | 0.990 | 144 | 3 | 6 | 0.970 | 0.908 | 0.868 |
| 50 | 200× | 147 | 0 | 3 | 0.990 | 144 | 3 | 6 | 0.970 | 0.905 | 0.875 |
| 75 | 5× | 144 | 0 | 6 | 0.980 | 141 | 1 | 9 | 0.966 | 0.948 | 0.700 |
| 75 | 20× | 147 | 0 | 3 | 0.990 | 143 | 1 | 7 | 0.973 | 0.949 | 0.865 |
| 75 | 50× | 147 | 0 | 3 | 0.990 | 143 | 1 | 7 | 0.973 | 0.957 | 0.872 |
| 75 | 100× | 147 | 0 | 3 | 0.990 | 144 | 2 | 6 | 0.973 | 0.954 | 0.875 |
| 75 | 200× | 147 | 0 | 3 | 0.990 | 144 | 3 | 6 | 0.970 | 0.947 | 0.875 |
| 100 | 5× | 144 | 0 | 6 | 0.980 | 140 | 0 | 10 | 0.966 | 0.940 | 0.692 |
| 100 | 20× | 147 | 0 | 3 | 0.990 | 143 | 1 | 7 | 0.973 | 0.957 | 0.878 |
| 100 | 50× | 147 | 0 | 3 | 0.990 | 143 | 1 | 7 | 0.973 | 0.957 | 0.875 |
| 100 | 100× | 147 | 0 | 3 | 0.990 | 143 | 1 | 7 | 0.973 | 0.957 | 0.872 |
| 100 | 200× | 147 | 0 | 3 | 0.990 | 143 | 1 | 7 | 0.973 | 0.957 | 0.877 |
Note: Each dataset contains fused reads of 150 gene pairs. In the targeted mode, the names of the 655 target gene pairs (including all 150 target ones) are given in the command line, along with the transcriptome. In the de novo mode, only the transcriptome is given.
Efficacy of filtering policies implemented in AF4
| Samples | Stage one | Low cmpl seq | Nearby genes | PCR dups | Abund partners | Final |
|---|---|---|---|---|---|---|
| T1 | 345 | 1 | 0 | 243 | 66 | 35 |
| T2 | 287 | 5 | 0 | 202 | 41 | 39 |
| T4 | 709 | 5 | 43 | 519 | 65 | 77 |
| T7 | 397 | 4 | 8 | 302 | 0 | 83 |
| T10 | 704 | 3 | 3 | 571 | 0 | 127 |
| T13 | 866 | 3 | 0 | 717 | 0 | 146 |
| T14 | 679 | 3 | 3 | 559 | 0 | 114 |
| Prc101 (T) | 29 754 | 0 | 0 | 25 592 | 0 | 4162 |
| Prc101 (D) | 296 113 | 1172 | 6038 | 257 979 | 29 838 | 1481 |
| Prc108 (T) | 284 | 0 | 28 | 203 | 0 | 53 |
| Prc108 (D) | 325 417 | 794 | 6047 | 261 012 | 56 199 | 1634 |
| HC332 (T) | 64 | 0 | 0 | 59 | 0 | 5 |
| HC332 (D) | 140 189 | 407 | 3127 | 121 074 | 14 156 | 1681 |
| HC118 (T) | 116 | 0 | 0 | 111 | 0 | 5 |
| HC118 (D) | 449 265 | 723 | 9030 | 420 528 | 17 305 | 1880 |
| HC160 (T) | 2 | 1 | 0 | 1 | 0 | 0 |
| HC160 (D) | 22 173 | 174 | 539 | 18 658 | 1693 | 1154 |
Note: ‘Stage one’ column shows the number of fusion candidate fragments found in the first stage. Columns ‘Low cmpl seq’ (low-complexity sequences), ‘Nearby genes’, ‘PCR dups’ (PCR duplicates), and ‘Abund partners’ (genes with too many partners) show the number of candidates dropped due to the named policies defined in Section 2.4. ‘Final’ column shows the final number of fusion fragments reported.
Results of AF4 and STAR-Fusion on cfRNA data
| Samples | #readpairs (millions) | Coverage | Number of TMPRSS2-ETV4/ERG supporting fragments | Number of reported fusion events | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AF4 | STAR-Fusion | AF4 | STAR-Fusion | |||||||||
| Targeted |
| Targeted |
| |||||||||
| Final | Prelim | Final | Prelim | Final | Prelim | Final | Prelim | |||||
| Prc101 | 373.8 | 29 132 | 19 | 345 | 19 | 345 | 0 | 31 | 9 | 239 | 19 | 979 |
| Prc108 | 464.5 | 36 200 | 9 | 114 | 0 | 116 | 0 | 13 | 12 | 140 | 13 | 787 |
| HC118 | 399.8 | 31 158 | — | — | — | — | — | — | 1 | 289 | 5 | 930 |
| HC160 | 348.9 | 27 191 | — | — | — | — | — | — | 0 | 303 | 21 | 254 |
| HC332 | 443.2 | 34 540 | — | — | — | — | — | — | 1 | 350 | 18 | 840 |
Note: The coverage is calculated by the formula #readpairs × 166/2.13, where 2.13 (million bp) is the panel size and 166 (bp) is used as the average fragment length. The middle section of the table shows the number of readpairs that support either the TMPRSS2-ETV4 or the TMPRSS2-ERG fusion event. As these two fusions are not expected in the healthy controls, they are marked by ‘—’. The right most part of the table shows the number of unique fusion events reported.
Results of AF4-targeted mode and Manta on cfDNA titration data for identified fusion events
| Samples (titration) | readpairs (millions) | Coverage | AF4 | Manta | ||||
|---|---|---|---|---|---|---|---|---|
| TP | FP | FN | TP | FP | FN | |||
| T1 (0.001) | 1000.9 | 78 004 | 1 | 13 | 1 | 1 | 3774 | 1 |
| T2 (0.002) | 938.8 | 73 165 | 1 | 11 | 1 | 0 | 2884 | 2 |
| T4 (0.004) | 1141.2 | 88 939 | 2 | 16 | 0 | 2 | 2723 | 0 |
| T7 (0.006) | 998.1 | 77 786 | 2 | 16 | 0 | 1 | 2353 | 1 |
| T10 (0.008) | 951.4 | 74 147 | 2 | 15 | 0 | 2 | 2455 | 0 |
| T13 (0.01) | 1033.2 | 80 522 | 2 | 22 | 0 | 2 | 2832 | 0 |
| T14 (0.01) | 1014.2 | 79 041 | 2 | 17 | 0 | 2 | 3108 | 0 |
Note: The coverage is calculated by the formula #readpairs (millions) × 166/2.13, where 2.13 (million bp) is the panel size and 166 (bp) is used as the average fragment length. Some titrations have two to three replicates as shown in the first column.
Runtimes of AF4, STAR-Fusion and Manta in seconds (wall-clock time), where the same number of cores were used
| Samples | #readpairs (millions) | AF4 | AF4 | STAR- Fusion | Manta |
|---|---|---|---|---|---|
| Targeted |
| ||||
| Prc101 | 373.8 | 362 | 481 | 8475 | — |
| Prc108 | 464.5 | 340 | 462 | 16 075 | — |
| HC118 | 399.8 | 346 | 470 | 8801 | — |
| HC160 | 348.9 | 373 | 381 | 7903 | — |
| HC332 | 443.2 | 382 | 391 | 12 666 | — |
| T1 (0.001) | 1000.9 | 1000 | — | — | 1280 |
| T2 (0.002) | 938.8 | 938 | — | — | 1160 |
| T4 (0.004) | 1141.2 | 1268 | — | — | 2016 |
| T7 (0.006) | 998.1 | 1124 | — | — | 1160 |
| T10 (0.008) | 951.4 | 1104 | — | — | 1040 |
| T13 (0.01) | 1033.2 | 1217 | — | — | 1190 |
| T14 (0.01) | 1014.2 | 1117 | — | — | 1490 |
Note: Samples are from Sections 3.2 and 3.3. Times for Manta exclude the alignment and de-duplication steps.