| Literature DB >> 34253237 |
Daniel L Cameron1,2,3, Jonathan Baber4,5, Charles Shale4,5, Jose Espejo Valle-Inclan6, Nicolle Besselink6, Arne van Hoeck6, Roel Janssen6, Edwin Cuppen5,6, Peter Priestley4,5, Anthony T Papenfuss7,8,9,10.
Abstract
GRIDSS2 is the first structural variant caller to explicitly report single breakends-breakpoints in which only one side can be unambiguously determined. By treating single breakends as a fundamental genomic rearrangement signal on par with breakpoints, GRIDSS2 can explain 47% of somatic centromere copy number changes using single breakends to non-centromere sequence. On a cohort of 3782 deeply sequenced metastatic cancers, GRIDSS2 achieves an unprecedented 3.1% false negative rate and 3.3% false discovery rate and identifies a novel 32-100 bp duplication signature. GRIDSS2 simplifies complex rearrangement interpretation through phasing of structural variants with 16% of somatic calls phasable using paired-end sequencing.Entities:
Mesh:
Year: 2021 PMID: 34253237 PMCID: PMC8274009 DOI: 10.1186/s13059-021-02423-x
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1GRIDSS2 overview. a contigs are assembled from a single locus of reads mutually supporting the same putative break junction. If the other side cannot be uniquely determined, the contig supports a single breakend call at the break junction position. If different portions of the contig sequence uniquely align to different genomic loci, the assembly supports multiple cis phased breakpoints. b Nearby structural variants will have discordant read pairs spanning across multiple breakpoints. These generate spurious transitive calls that are collapsed into the underlying breakpoints, phasing them cis
Fig. 2:Somatic benchmarks. a COLO829T/BL tumour and blood cell lines were sequenced in triplicate to 100x/40x. In-silico purity downsampling was performed at 40x normal and 60x tumour coverage. Results are compared against a PCR validated somatic truth set generated from multiple sequencing technologies. b Simulation of somatic breakpoints from the CHM13 telomere to telomere assembly against hg38. Single breakend variant calling allows sensitive detection of breakpoints to satellite repeat regions such as centromeres without increasing the false discovery rate. c GRIDSS2/Manta validation results on 13 patient samples for 50 bp + events. d GRIDSS2/Manta/strelka validation results on 13 patient samples for small (32–100 bp) duplications. e GRIDSS2/Strelka validation results for 32–50 bp events. f Per sample counts of 32–100 bp somatic tandem duplications in the Hartwig cohort. These mutations are enriched in colorectal cancer and associated with ATM driver mutations. g Size distribution of small (32–100 bp) tandem duplications across the Hartwig cohort. This is a distinct signature not associated with microsatellite expansion
Fig. 3Consistency with copy number on large cancer cohorts. a False negative rate (FNR) inferred from the presence of SVs copy number transitions broken down by magnitude of copy number change for 1476 PCAWG samples. Comparison is between the PCAWG consensus SV/CNV call sets and GRIDSS2/PURPLE. b Inferred FNR for 3782 100x tumour samples from the Hartwig cohort. Single breakend variant calling is crucial to the low FNR in this cohort. c Comparison of expected vs actual copy number changes for the Hartwig cohort. SV inferred and actual copy number changes are closely correlated
Fig. 4Classification of single breakends and structural variant phasing. a RepeatMasker annotations indicate the majority of somatic single breakends are due to mobile element translocations, or centromeric breaks. b Most likely centromere for single breakends containing centromeric or peri-centromeric repeats based on realignment of breakend sequence to hg38. Shading indicates whether prediction is consistent with the copy number change across the centromere. Chromosome 1 has an excess of inter-chromosomal breaks to centromeric sequence. Chromosomes 13, 14, 15, 21, and 22 have insufficient non-gap p-arm sequence for a centomeric copy number change to be called. c Phasing of structural variants can be determined when breakend assembly contigs span multiple breakpoints. d The majority of variants within 600 bp can be phased using breakend assembly. e Somatic SVs are highly clustered with 22% of all SVs in the Hartwig cohort potentially involving a DNA fragment of 1kbp or less
Fig. 5a Impact of false negative rate (FNR) on complex event resolution. The y-axis indicates the portion of structural variants that form part of a resolved chain of SVs at least as long as the chain length indicated on the x-axis. LINX results for GRIDSS2 calls on the Hartwig and PCAWG cohorts are shown along with simulated results from downsampling the Hartwig cohort to the specified FNRs. A low FNR is essential to accurate complex event resolution. b Circos plot of SMAD4 driver deletion event. The interpretation of this deletion is confounded by the presence of 3 short fragments at the breakpoint. This event can be fully resolved by GRIDSS2 SV phasing. Circos tracks from innermost to outermost are: single breakends (open white circles) and breakpoints, LOH, copy number, connected segments, genes, chromosome. c Circos plot of chromothripsis overlapping centromeric sequence. This event spans across the chromosome 7 centromere. A subset of the chromothriptic fragments have been inserted into chromosome 4. Each SV chain is represented in a different colour