| Literature DB >> 25886820 |
Adam C English1, William J Salerno2, Oliver A Hampton3, Claudia Gonzaga-Jauregui4, Shruthi Ambreth5, Deborah I Ritter6, Christine R Beck7, Caleb F Davis8, Mahmoud Dahdouli9, Singer Ma10, Andrew Carroll11, Narayanan Veeraraghavan12, Jeremy Bruestle13, Becky Drees14, Alex Hastie15, Ernest T Lam16, Simon White17, Pamela Mishra18, Min Wang19, Yi Han20, Feng Zhang21, Pawel Stankiewicz22, David A Wheeler23,24, Jeffrey G Reid25, Donna M Muzny26,27, Jeffrey Rogers28,29, Aniko Sabo30,31, Kim C Worley32,33, James R Lupski34,35,36,37, Eric Boerwinkle38,39, Richard A Gibbs40,41.
Abstract
BACKGROUND: Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods.Entities:
Mesh:
Year: 2015 PMID: 25886820 PMCID: PMC4490614 DOI: 10.1186/s12864-015-1479-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
HS1011 data sources
|
|
|
|
|
|---|---|---|---|
| WGS Illumina HiSeq | NGS | 48X 100x100 bp paired-end | [ |
| WGS Illumina Nextera | NGS | 2X 100x100 bp 6.5 kbp mate-pair inserts | Methods |
| WGS SOLiD | NGS | 3X 35 bp fragment 10X 25x25 bp paired-end 17X 50x50 bp paired-end | [ |
| WGS PacBio | Long-Read | 10X ~10,000 bp | Methods |
| Agilent 1 M | aCGH | 1-million-probe oligo array | [ |
| NimbleGen 2.1 M | aCGH | 2.1-million-probe oligo array | [ |
| NimbleGen 4.2 M | aCGH | 4.2-million-probe oligo array | Methods |
| Custom Agilent Exon Array | aCGH | 44,000 neuropathy-specific oligo array | [ |
| BioNano Irys | Genome Mapping | Single-molecule genome architecture | Methods |
| Sanger-Validated Deletions | Manual | 42 fully resolved deletions | Methods |
Previously published HS1011 data are indicated with literature references, and data new to the present work are described in Methods.
Figure 1Parliament workflows. The Parliament infrastructure is designed to incorporate multiple data types and software for each data type. (a) Novel Method evaluation incorporates new data or methods to the HS1011 workflow. (b) The HS1011 workflow. (c) The Illumina Only workflow, requiring only a paired-end WGS BAM file as input.
Parliament HS1011 summary
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| BreakDancer | Paired End | Illumina HiSeq | [ | 6,474 | 5,520 | 1,740 | 335 | 194 | 104 | 82 |
| CNVnator | Read Depth | Illumina HiSeq | [ | 6,232 | 6,197 | 679 | 402 | 130 | 176 | 109 |
| Crest | Split Read | Illumina HiSeq | [ | 2,490 | 2,219 | 1,636 | 138 | 115 | 8 | 3 |
| Delly | Paired End & Split Read | Illumina HiSeq | [ | 4,465 | 3,720 | 1,150 | 323 | 196 | 109 | 97 |
| Pindel | Paired End | Illumina HiSeq | [ | 5,728 | 4,451 | 2,432 | 244 | 359 | 421 | 206 |
| SV-STAT | Reference-guided Assembly | Illumina HiSeq | Methods | 893 | 892 | 754 | 90 | 32 | 9 | 1 |
| Tiresias | Consensus Sequences | Illumina HiSeq | Methods | 1,354 | 1,347 | 269 | 36 | 112 | 76 | 110 |
| Spiral | Local Assembly | Illumina HiSeq | Methods | 1,886 | 1,881 | 1,626 | 100 | 98 | 76 | 14 |
|
|
|
|
|
|
|
|
| |||
| PBHoney | Local Error and Tail Mapping | PacBio RS | [ | 10,759 | 10,340 | 5,883 | 483 | 0 | 3,792 | 0 |
| SVachra | Discordant Read Pairs | Illumina Nextera | Methods | 6,208 | 4,785 | 490 | 454 | 211 | 96 | 211 |
| aCGH + SOLiD | Probe Intensity/Read Depth | aCGH | [ | 1,971 | 1,960 | 231 | 452 | 8 | 30 | 8 |
| BioNano Irys | Single-molecule Motif Mapping | Irys | Methods | 0 | 343 | 201 | 142 | 0 | 41 | 0 |
|
|
|
|
|
|
|
|
|
Descriptions and results for each SV-detection method are provided. BioNano Irys data was used only for corroboration, not initial discovery, owing to its large size and propensity to span multiple events.
Figure 2Size distribution. All HS1011 SV events larger than 100 bp and less than 100,000 bp were compared to events from the Venter genome (HuRef) and an Asian Male (YH), both specifically characterized for SV content. In this size regime, the HS1011, HuRef, and YH samples contain 5044, 5127, and 5374 deletions (panel a) and 4482, 4479, and 15525 insertions (panel b), respectively. The YH SV distributions are based on de novo assembly of 35 bp single-end and paired end data. This assembly was used to identify SVs between 1 bp and 50 kbp. Initial events larger than 50 bp were filtered using discordant paired-end mapping of ~35 bp reads. Given the relative abundance of HS1011 sequence data (including both long reads and longer short reads as compared to the YH short reads), and given the differences in methods, it is unlikely that the ~3-fold difference in insertions between the YH set and the HS1011 and HuRef sets represents a significant lack of Parliament sensitivity.
Figure 3DGV comparison. Each of the 31,007 reference-inconsistent loci was characterized as either an HS1011 SV or unsupported locus based on its Parliament bitflag and as either “In DGV” or “Not DGV” based on whether it shared at least 50% reciprocal overlap with a DGV event of the same type.
Figure 4SNP concordance. HomozScores are reported for three classes of HS1011 deletion loci: unsupported loci, HS1011 SVs with less than 25X coverage, HS1011 SVs with greater than 50X coverage.
Exonic SVs with assembly support absent from DGV
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| ASTN1 | 1 | 177,131,105 | 177,139,508 | INS | ARR,ILL,PAC | 7 |
| C16orf96 | 16 | 4,619,222 | 4,629,905 | INS | ARR,ILL,PAC | 7 |
| DOCK3 | 3 | 50,879,021 | 50,879,201 | INS | ILL,PAC | 7 |
| PCNXL4 | 14 | 60,575,443 | 60,575,546 | INS | ILL,PAC | 7 |
| PKD1L3 | 16 | 72,030,321 | 72,032,511 | INS | ILL,PAC | 7 |
| TBC1D3G | 17 | 34,805,579 | 34,815,084 | DEL | ILL,PAC | 7 |
| MAGEA11 | X | 148,735,694 | 148,830,894 | MIS | ILL,NEX | 6 |
| METTL21C | 13 | 103,345,467 | 103,348,148 | INS | ILL,NEX | 6 |
| RAP1GDS1 | 4 | 99,179,171 | 99,184,420 | DEL | ARR | 5 |
| ZNF826P | 19 | 20,504,301 | 20,595,300 | DEL | ILL | 5 |
| ASTN1 | 1 | 177,131,101 | 177,139,495 | MIS | ILL | 5 |
| C20orf96 | 20 | 258,785 | 260,245 | DEL | ARR | 5 |
| CSMD3 | 8 | 113,234,275 | 113,239,237 | DEL | NEX | 5 |
| GMCL1 | 2 | 70,066,401 | 70,071,781 | MIS | NEX | 5 |
| HLA-DRB1 | 6 | 32,547,848 | 32,548,158 | DEL | ILL | 5 |
| MLF1IP | 4 | 185,651,743 | 185,652,393 | INS | ILL | 5 |
| MTO1 | 6 | 74,203,747 | 74,216,637 | MIS | ILL | 5 |
| MUC2 | 11 | 1,092,829 | 1,093,579 | INS | ILL | 5 |
| OR4C6 | 11 | 55,431,550 | 55,457,289 | INS | ILL | 5 |
| PDE4DIP | 1 | 144,954,098 | 144,960,871 | MIS | ILL | 5 |
| RAB11FIP3 | 16 | 544,841 | 547,158 | INS | NEX | 5 |
| AP2A2 | 11 | 915,248 | 928,463 | INS | ARR | 4 |
| TPSB2 | 16 | 1,274,089 | 1,288,819 | INS | ARR | 4 |
| TPSG1 | 16 | 1,274,089 | 1,288,819 | INS | ARR | 4 |
| ALDH16A1 | 19 | 49,966,580 | 49,968,737 | INS | NEX | 4 |
| ASMTL | X | 1,550,501 | 1,572,400 | INS | ILL | 4 |
| C14orf39 | 14 | 60,913,981 | 60,941,067 | MIS | ILL | 4 |
| CCL24 | 7 | 75,438,336 | 75,451,936 | INS | ILL | 4 |
| CD99 | X | 2,651,401 | 2,699,500 | INS | ILL | 4 |
| CNTNAP3B | 9 | 43,844,101 | 43,866,100 | INS | ILL | 4 |
| CTNNA2 | 2 | 80,769,022 | 80,780,356 | MIS | ILL | 4 |
| DEFA1 | 8 | 6,833,701 | 6,844,000 | DEL | ILL | 4 |
| DSPP | 4 | 88,536,463 | 88,536,667 | INS | ILL | 4 |
| ENPP7 | 17 | 77,699,565 | 77,726,581 | MIS | ILL | 4 |
| EXOC6B | 2 | 72,688,192 | 72,697,903 | MIS | ILL | 4 |
| FAM186A | 12 | 50,745,742 | 50,745,861 | INS | PAC | 4 |
| FOXO6 | 1 | 41,847,826 | 41,847,932 | INS | PAC | 4 |
| FRK | 6 | 116,274,618 | 116,307,096 | MIS | ILL | 4 |
| HERC2 | 15 | 28,547,301 | 28,566,700 | INS | ILL | 4 |
| HSD17B3 | 9 | 99,057,917 | 99,063,709 | MIS | ILL | 4 |
| IFNAR1 | 21 | 34,694,683 | 34,701,442 | MIS | NEX | 4 |
| IGHV4-61 | 14 | 107,087,259 | 107,099,190 | INS | ILL | 4 |
| IL28A | 19 | 39,730,613 | 39,762,849 | MIS | NEX | 4 |
| IL28B | 19 | 39,730,613 | 39,762,849 | MIS | NEX | 4 |
| IL3RA | X | 1,494,601 | 1,510,800 | INS | ILL | 4 |
| KIAA1671 | 22 | 25,441,077 | 25,467,572 | INS | ILL | 4 |
| KRT37 | 17 | 39,579,211 | 39,595,476 | MIS | ILL | 4 |
| KRT38 | 17 | 39,579,211 | 39,595,476 | MIS | ILL | 4 |
| KRTAP4-7 | 17 | 39,240,740 | 39,240,840 | INS | PAC | 4 |
| KRTAP5-4 | 11 | 1,642,915 | 1,643,128 | INS | PAC | 4 |
| MATR3 | 5 | 138,652,720 | 138,666,146 | INS | ARR | 4 |
| NBPF15 | 1 | 148,571,852 | 148,591,725 | INS | ARR | 4 |
| OR2A7 | 7 | 143,945,501 | 143,956,800 | INS | ILL | 4 |
| OR2G6 | 1 | 248,682,789 | 248,702,341 | MIS | NEX | 4 |
| OR4C3 | 11 | 48,340,701 | 48,347,600 | INS | ILL | 4 |
| OR4C6 | 11 | 55,431,551 | 55,445,867 | INS | PAC | 4 |
| OR8U1 | 11 | 56,143,129 | 56,143,999 | INS | PAC | 4 |
| PLXNB2 | 22 | 50,723,862 | 50,724,455 | DEL | ILL | 4 |
| PPP2R3B | X | 290,201 | 300,100 | INS | ILL | 4 |
| PPP2R3B | X | 327,401 | 344,700 | INS | ILL | 4 |
| PRIM2 | 6 | 57,494,250 | 57,507,908 | INS | ILL | 4 |
| RFC1 | 4 | 39,350,151 | 39,353,407 | INS | NEX | 4 |
| RGPD3 | 2 | 107,082,401 | 107,085,300 | DEL | ILL | 4 |
| RRBP1 | 20 | 17,639,769 | 17,639,981 | INS | PAC | 4 |
| SAMD1 | 19 | 14,200,852 | 14,200,953 | INS | PAC | 4 |
| SHOX | X | 598,001 | 628,300 | INS | ILL | 4 |
| SLC25A6 | X | 1,494,601 | 1,510,800 | INS | ILL | 4 |
| SMC1B | 22 | 45,745,435 | 45,746,440 | INS | ILL | 4 |
| TMEFF2 | 2 | 192,818,912 | 192,840,053 | MIS | ILL | 4 |
| UTS2D | 3 | 190,999,450 | 191,019,577 | INS | ILL | 4 |
| XG | X | 2,651,401 | 2,699,500 | INS | ILL | 4 |
| ZNF208 | 19 | 22,156,807 | 22,156,912 | DEL | PAC | 4 |
| ZNF253 | 19 | 19,990,342 | 20,005,317 | MIS | ILL | 4 |
| ZNF346 | 5 | 176,462,777 | 176,474,191 | INS | ILL | 4 |
| ZNF519 | 18 | 14,105,158 | 14,105,265 | DEL | PAC | 4 |
Genomic location, SV type, Parliament bitflag, and supporting data types are provided for the 75 HS1011 SVs overlapping an exon but not matching a DGV event.
Figure 5Illumina-Only & PacBio comparison. The Illumina only results are compared to the HS1011 SV subset containing Illumina and PacBio discovery. PB-ILL contains all HS1011 SVs with PacBio or Illumina discovery and hybrid assembly support. The ILLHyb workflow uses only PE methods for discovery but both Illumina and PacBio sequence reads for local assembly. The ILLOnly workflow uses only Illumina PE methods and reads for both discovery and assembly.
Illumina-only method comparison
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| CNVnator | 6,197 | 1,211 | 4,986 | 80.46% | 22.62% |
| BreakDancer | 5,520 | 2,269 | 3,251 | 58.89% | 42.39% |
| Delly | 3,720 | 1,669 | 2,051 | 55.13% | 31.18% |
| Crest | 2,219 | 1,889 | 330 | 14.87% | 35.29% |
| Pindel | 4,451 | 3,035 | 1,416 | 31.81% | 56.70% |
| SV-STAT | 892 | 876 | 16 | 1.79% | 16.36% |
| Tiresias | 1,347 | 417 | 930 | 69.04% | 7.79% |
| Spiral | 1,881 | 1,824 | 57 | 3.03% | 34.07% |
|
|
|
|
|
|
|
Performance for each Illumina-only method is summarized. Supported and Unsupported columns indicate the number of calls with and without local hybrid assembly support, respectively. False discovery rate (FDR) and sensitivity are calculated using all 17,704 Illumina Only reference inconsistent loci and the subset of 5,584 that are supported by hybrid assembly.
Figure 6Multi-source comparison. Each cell contains the number of clusters with support from a pair of sources. The diagonal entries describe clusters with support with exactly one data source.
Figure 7Complex rearrangement. A representation of a large-scale deletion and inverted insertion rearrangement on chromosome 11 p15.5 is depicted. Through de novo assembly, the rearrangement breakpoint junctions (Jct 1, 2, and 3) were identified, and the resultant structure in the genome of HS1011 was found to be as depicted. Below are shown the junction sequences of the three breakpoints.