| Literature DB >> 29954844 |
Maria Nattestad1, Sara Goodwin1, Karen Ng2, Timour Baslan3, Fritz J Sedlazeck4,5, Philipp Rescheneder6, Tyler Garvin1, Han Fang1, James Gurtowski1, Elizabeth Hutton1, Elizabeth Tseng7, Chen-Shan Chin7, Timothy Beck2, Yogi Sundaravadanam2, Melissa Kramer1, Eric Antoniou1, John D McPherson8, James Hicks1, W Richard McCombie1, Michael C Schatz1,4.
Abstract
The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important ERBB2 oncogene (also known as HER2), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29954844 PMCID: PMC6071638 DOI: 10.1101/gr.231100.117
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Variants found in SK-BR-3 with PacBio long-read sequencing. (A) Circos (Krzywinski et al. 2009) plot showing long-range (larger than 10 kbp or inter-chromosomal) variants found by Sniffles from split-read alignments, with read coverage shown in the outer track. (B) Variant size histogram of deletions and insertions from size 50 bp up to 1 kbp found by long-read (Sniffles) and short-read (SURVIVOR 2-caller consensus) variant calling, showing similar size distributions for insertions and deletions from long reads but not for short reads, where insertions are greatly underrepresented. (C) Sniffles variant counts by type for variants above 1 kbp in size, including translocations and inverted duplications.
Figure 2.Comparing results of mapping and variant calling between PacBio and Illumina paired-end sequencing. (A) Venn diagram showing the intersection of structural variants between the Sniffles call set versus the SURVIVOR 2-caller consensus, with counts indicated. (B) Percentage of variant calls in each area of Venn diagram in A that have matching CNV calls within 50 kbp (the smallest segment allowed in segmentation), where a CNV is a difference in copy number (long-read sequencing) between segments of at least 28×, the diploid average. (C) Venn diagram showing the intersection of long-range variants between the Sniffles call set versus the SURVIVOR 2-caller consensus. Validation rates are shown as percentages below the counts for each category, and extrapolated overall validation rates are shown for Sniffles and SURVIVOR.
Gene fusions with RNA evidence from Iso-Seq and DNA evidence from SMRT DNA sequencing where the genomic path is found using SplitThreader from Sniffles variant calls
Figure 3.Reconstruction of the copy number amplification of the ERBB2 oncogene. (A) Copy number and translocations for the amplified region on Chr 17 that includes ERBB2 showing the relations to Chr 8. Note Chr 8 has extensive rearrangements shown by the green intra-chromosomal arcs. (B) Sequence of events that best explains the copy number and translocations found in this region. Segment 1 (orange) first translocated into Chr 8, followed by the segment 2 (yellow) translocating to a different place on Chr 8. Then, the segment 3 (green) was duplicated from segment 2 by an inversion of the piece between variants D and E along with a 1.5-Mb piece of Chr 8 that was attached at variant E, all of which then attached at variant C. The whole green segment including the 1.5 Mb of Chr 8 then underwent an inverted duplication at variant D. The purple segment could have come from the orange, yellow, or green sequences since it only shares breakpoint A. Additionally, there is a deletion of 10,305 bp between breakpoints D and E.
Figure 4.The KLHDC2-SNTB1 gene fusion in SK-BR-3 occurs through a series of three variants and is directly observed to link the two genes in several individual SMRT-seq reads (A), one of which is shown in detail in B.