| Literature DB >> 24071583 |
Winnie S Liang1, Jessica Aldrich, Waibhav Tembe, Ahmet Kurdoglu, Irene Cherni, Lori Phillips, Rebecca Reiman, Angela Baker, Glen J Weiss, John D Carpten, David W Craig.
Abstract
As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900-1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300-400-bp inserts. A priori analyses show that LI-WGS requires less sequencing compared with short insert WGS to achieve a target physical coverage, and that LI-WGS requires less sequence coverage to detect a heterozygous event with a power of 0.99. We thus developed an LI-WGS library preparation protocol based off of Illumina's WGS library preparation protocol and illustrate the feasibility of performing LI-WGS. We additionally applied LI-WGS to three separate tumor/normal DNA pairs collected from patients diagnosed with different cancers to demonstrate our application of LI-WGS on actual patient samples for identification of somatic copy number alterations and translocations. With the evolution of sequencing technologies and bioinformatics analyses, we show that modifications to current approaches may improve our ability to interrogate cancer genomes.Entities:
Mesh:
Year: 2013 PMID: 24071583 PMCID: PMC3902897 DOI: 10.1093/nar/gkt865
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Comparison of SI- and LI-WGS. A visualization of mapped reads for SI- and LI-WGS is shown assuming a read depth of 2 for each library type. The reference human genome is shown in the middle of the figure, and the location of a theoretical breakpoint is shown in gray with the location of the breakpoint marked by the gray line. SI (300 bp) mapped reads are displayed above the reference, and LI (900 bp) mapped reads are displayed below the reference. PE reads are represented by heavy solid lines with arrowheads and regions between reads are denoted by a dotted line. Anomalous read pairs are shown in red. Higher physical coverage is achieved for LI-WGS libraries when sequencing to the same read depth for SI- and LI-WGS libraries. Furthermore, by interrogating a larger genomic region using LIs, the likelihood that a breakpoint will fall within that region is increased.
Figure 2.Comparison of power achieved when sequencing LI or SI libraries. Power calculations were performed to evaluate the power achieved when sequencing SI (300 bp) libraries with a 2 × 100 read length (A). These analyses were performed to determine the power of identifying a heterozygous somatic event as characterized by at least 10 anomalous read pairs under three scenarios where a tumor sample may have three different tumor cellularities (100, 50, 25% tumor). This analysis was similarly performed for LI (900 bp) libraries with a 2 × 100 read length (B). We performed additional LI analyses using the same parameters but decreased the read length from 2 × 100 to 2 × 83 (C). For all three analyses, a dotted line demarcates the sequence coverage needed for detecting a heterozygous event in a sample with 50% tumor cellularity and 0.99 power. Coverage shown is sequence coverage, and a is the expected frequency of an event given the different tumor cellularities.
Figure 3.LI library preparation quality control. Two examples of fragmented human genomic samples to a target of 900 bp are shown (A). Fragmented samples are run alongside Invitrogen’s 1 Kb Plus DNA ladder. An example of ligation products for the LI-WGS preparation protocol is shown in (B). Products are run alongside the same 1 Kb Plus ladder shown in (C). The same gel from (B) following size selection is shown in (C) in which multiple collections of ligation product were obtained. An example Bioanalyzer trace of a final LI-WGS library is shown in (D; FU = fluorescence units). The library peak is demarcated by an arrow; flanking peaks are Bioanalyzer marker peaks.
Sequencing metric comparison of SI and LI libraries
| Metric | SI whole genome library | LI whole genome library |
|---|---|---|
| Median insert size (bp) | 322 | 869 |
| Mean insert size (bp) | 313.90 | 869.34 |
| Insert size standard deviation | 48.50 | 64.19 |
| Number of lanes sequenced | 1 | 1 |
| Total cluster density (K/mm2) | 801 ± 70 | 798 ± 61 |
| PF cluster density (K/mm2) | 91.5 ± 2.0 | 81.9 ± 4.8 |
| Read length | 2 × 104 | 2 × 83 |
| Total reads (M) | 221.56 | 220.59 |
| PF reads (M) | 202.48 | 180.28 |
| Read 1 error rate | 0.28 ± 0.03 | 0.43 ± 0.05 |
| Read 2 error rate | 0.48 ± 0.12 | 0.50 ± 0.12 |
| Read 1 phasing/prephasing | 0.136/0.201 | 0.184/0.252 |
| Read 2 phasing/prephasing | 0.145/0.193 | 0.183/0.268 |
| Total yield (Gb) | 33.11 | 31.12 |
| Total Q30 yield (Gb) | 29.30 | 25.30 |
| %Q30 | 88.50 | 81.30 |
| Total reads | 404 968 194 | 360 562 104 |
| Total mapped reads | 379 311 244 | 335 823 767 |
| % reads mapped | 93.66 | 93.14 |
| GC dropout | 2.91 | 5.69 |
| AT dropout | 1.22 | 2.45 |
| Median GC normalized coverage | 0.86 | 0.76 |
| Mapped sequence coverage | 12.57 | 8.78 |
| Mapped physical coverage | 37.95 | 93.06 |
Figure 4.Comparison of cluster sizes between SI and LI libraries. An example image from sequencing a SI library is shown in (A), along with a cluster density plot from Illumina’s Sequence Analysis Viewer. An example image and cluster density plot from sequencing a LI library is shown in (B). In each cluster density plot, the blue boxes represent total densities and the green boxes represent PF cluster densities. Red lines demarcate the median for the total density and the PF density.
Sequencing metrics of SI- and LI-WGS libraries for patients 1, 2 and 3
| Metric | Patient 1 | Patient 2 | Patient 3 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SI | LI | SI | LI | SI | LI | |||||||
| Total amount of data generated (GB) | 275.6 | 73.6 | 285.3 | 67.4 | 341.7 | 74.2 | ||||||
| Q30 data generated (GB) | 196.5 | 60.9 | 261.8 | 54.2 | 307.1 | 60.9 | ||||||
| Normal | Tumor | Normal | Tumor | Normal | Tumor | Normal | Tumor | Normal | Tumor | Normal | Tumor | |
| Number of flowcell lanes sequenced | 5 | 5 | 1 | 1 | 4 | 4 | 1 | 1 | 4 | 4 | 1 | 1 |
| Read length | 102 | 102 | 101 | 101 | 104 | 104 | 101 | 101 | 104 | 104 | 101 | 101 |
| Average cluster density (K/mm2) | 958.4 | 756.0 | 705.9 | 712.5 | 819.3 | 753.5 | ||||||
| Average PF cluster density (%) | 65.0 | 84.4 | 88.0 | 82.5 | 92.1 | 85.3 | ||||||
| Total number of reads | 1 482 053 066 | 1 214 032 574 | 330 467 210 | 338 562 714 | 1 512 625 416 | 1 231 472 418 | 289 650 634 | 311 034 636 | 1 665 626 188 | 1 617 770 394 | 402 134 806 | 270 012 982 |
| Total number of mapped reads | 1 387 438 027 | 1 110 401 730 | 309 910 146 | 318 045 021 | 1 426 314 916 | 1 158 550 623 | 270 915 362 | 289 775 606 | 1 574 312 575 | 151 7223 113 | 376 993 087 | 253 457 102 |
| % mapped reads | 93.62 | 91.46 | 93.78 | 93.94 | 94.29 | 94.08 | 93.53 | 93.17 | 94.52 | 93.78 | 93.75 | 93.87 |
| Average mapped sequence coverage | 45.11 | 36.10 | 9.98 | 10.24 | 47.28 | 38.41 | 8.72 | 9.33 | 52.19 | 50.30 | 12.14 | 8.16 |
| Average mapped physical coverage | 144.48 | 116.03 | 83.86 | 86.20 | 131.40 | 108.19 | 73.38 | 78.33 | 146.37 | 140.13 | 108.28 | 72.58 |
aSequence and physical coverages were calculated using all data generated. SI libraries were sequenced across five flowcell lanes, whereas LI libraries were sequenced across one flowcell lane.
Analysis metrics of SI- and LI-WGS libraries for patients 1, 2 and 3
| Metric | Patient 1 | Patient 2 | Patient 3 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SI | LI | SI | LI | SI | LI | |||||||
| Normal | Tumor | Normal | Tumor | Normal | Tumor | Normal | Tumor | Normal | Tumor | Normal | Tumor | |
| Number of tumor cellularity | n/a | 50 | n/a | 50 | n/a | 50 | n/a | 50 | n/a | n/a | n/a | n/a |
| Median insert size | 328.00 | 330.00 | 865.00 | 861.00 | 285.00 | 293.00 | 852.00 | 860.00 | 274.00 | 275.00 | 901.00 | 901.00 |
| Mean insert size | 326.70 | 327.81 | 848.90 | 850.22 | 289.01893 | 292.970983 | 849.78 | 848.03 | 291.669719 | 289.751541 | 901.04 | 898.37 |
| Insert size standard deviation | 29.91 | 32.80 | 113.24 | 100.82 | 45.67 | 50.13 | 110.67 | 135.52 | 57.61 | 56.75 | 108.42 | 117.62 |
| Average mapped sequence coverage | 8.13 | 8.13 | 8.05 | 8.05 | 8.29 | 8.29 | 8.05 | 8.05 | 8.29 | 8.29 | 8.05 | 8.05 |
| Average mapped physical coverage | 26.04 | 13.26.13 | 67.66 | 67.76 | 23.03 | 23.35 | 67.72 | 67.58 | 23.24 | 23.09 | 71.81 | 71.59 |
| Power to detect event | 0.52 | 0.85 | 0.48 | 0.86 | n/a | n/a | ||||||
| GC dropout | 5.76 | 7.46 | 4.45 | 5.25 | 2.74 | 2.69 | 3.86 | 4.40 | 2.73 | 2.76 | 5.32 | 4.95 |
| AT dropout | 2.02 | 2.78 | 2.15 | 2.58 | 0.98 | 0.86 | 2.07 | 2.17 | 0.87 | 0.92 | 2.71 | 2.44 |
| Median GC normalized coverage | 0.69 | 0.70 | 0.81 | 0.79 | 0.88 | 0.90 | 0.84 | 0.82 | 0.86 | 0.86 | 0.79 | 0.79 |
| Total number reads | 250 023 370 | 250 031 044 | 250 027 715 | 250 019 339 | 249 978 232 | 249 998 124 | 250 002 762 | 250 009 726 | 250 002 357 | 249 991 693 | 250 009 553 | 250 001 078 |
| Number of somatic translocations | 4 | 16 | 3 | 5 | 3 | 15 | ||||||
| Number of somatic translocations detected that affect a COSMIC gene | 0 | 0 | 0 | 0 | 0 | 1 | ||||||
| Total number of common translocations | 3 | 0 | 0 | |||||||||
| Number of CNVs identified | 48 | 4 | 2 | 0 | 0 | 2 | ||||||
| Number of genes affected by CNVs | 752 | 12 | 0 | 0 | 0 | 12 | ||||||
| Number of COSMIC genes affected by CNVs | 16 | 0 | 0 | 0 | 0 | 2 | ||||||
| Total number of common genes affected by CNVs | 11 | 0 | 0 | |||||||||
SI- and LI-WGS bam files were each randomly normalized to ∼250 million mapped reads using SAMtools to allow for a direct comparison across SI and LI data sets.
aPower was calculated assuming that a minimum of eight anomalous read pairs are required for detection. Because the tumor cellularity of patient 3 is not known, power calculations were not performed. n/a (not available).
CNVs affecting COSMIC genes identified using SI and LI data
| Patient | Library | Chr. | Location | CNV | Length (bp) | Log2 fold | Affected COSMIC genes |
|---|---|---|---|---|---|---|---|
| 1 | SI | 3 | 51996100:52507500 | Loss | 511 400 | −0.819 | BAP1 |
| 1 | SI | 9 | 136216500:137456500 | Loss | 1 240 000 | −0.819 | BRD3 |
| 1 | SI | 16 | 88359000:89200400 | Loss | 841 400 | −1.127 | CBFA2T3 |
| 1 | SI | 8 | 38276300:38441900 | Loss | 165 600 | −0.819 | FGFR1 |
| 1 | SI | 19 | 358300:8783100 | Loss | 8 424 800 | −1.234 | FSTL3 |
| 1 | SI | 19 | 358300:8783100 | Loss | 8 424 800 | −1.234 | GNA11 |
| 1 | SI | 19 | 358300:8783100 | Loss | 8 424 800 | −1.234 | MLLT1 |
| 1 | SI | 12 | 56035700:57163000 | Loss | 1 127 300 | −0.789 | NACA |
| 1 | SI | 8 | 70884500:71036600 | Loss | 152 100 | −0.819 | NCOA2 |
| 1 | SI | 22 | 30055000:30229500 | Loss | 174 500 | −0.819 | NF2 |
| 1 | SI | 9 | 137928600:140766000 | Loss | 2 837 400 | −0.789 | NOTCH1 |
| 1 | SI | 9 | 133827500:133983900 | Loss | 156 400 | −0.819 | NUP214 |
| 1 | SI | 19 | 358300:8783100 | Loss | 8 424 800 | −1.234 | SH3GL1 |
| 1 | SI | 19 | 358300:8783100 | Loss | 8 424 800 | −1.234 | STK11 |
| 1 | SI | 19 | 358300:8783100 | Loss | 8 424 800 | −1.234 | TCF3 |
| 1 | SI | 1 | 798600:3766500 | Loss | 2 967 900 | −0.789 | TNFRSF14 |
| 3 | LI | 3 | 186450400:187448100 | Loss | 997 700 | −0.912 | BCL6 |
| 3 | LI | 3 | 186450400:187448100 | Loss | 997 700 | −0.912 | EIF4A2 |
Chr=chromosome
Genic translocations identified using SI and LI data
| Patient | Library | Breakpoint location | Affected genes |
|---|---|---|---|
| 1 | SI | −:7:133311200|−:6:118209600 | EXOC4 |
| 1 | SI | −:3:55788800|−:12:81208800 | ERC2, LIN7A |
| 1 | LI | +:18:29128000|+:3:150368000 | DSG2 |
| 1 | LI | +:6:125820000|+:7:121984000 | CADPS2 |
| 1 | LI | +:9:74810000|+:X:11950000 | GDA |
| 1 | LI | −:3:150370000|−:18:29126000 | DSG2 |
| 1 | LI | +:12:81208000|+:3:55788000 | ERC2, LIN7A |
| 1 | LI | +:X:11952000|+:9:74808000 | GDA |
| 1 | LI | −:6:118210000|−:7:133310000 | EXOC4 |
| 1 | LI | +:14:89290000|+:17:78272000 | TTC8, RNF213 |
| 1 | LI | +:8:140172000|+:9:116200000 | C9orf43 |
| 1 | LI | −:4:91966000|−:11:83130000 | FAM190A |
| 2 | SI | −:7:153790400|−:7:149700000 | DPP6 |
| 2 | LI | +:4:130930800|+:12:65817400 | MSRB3 |
| 2 | LI | −:5:43080400|−:5:43269600 | NIM1 |
| 2 | LI | +:7:34837000|+:11:57763200 | NPSR1 |
| 3 | SI | +:12:9576000|+:12:9460000 | DDX12P, LOC642846 |
| 3 | LI | +:3:11258400|+:3:188188800 | HRH1, LPP |
| 3 | LI | +:3:173983200|+:3:187771200 | NLGN1 |
| 3 | LI | −:11:60480000|−:7:25058400 | MS4A8B |
aValidated by PCR and Sanger sequencing.