| Literature DB >> 32211024 |
Mei-Wei Luan1, Xiao-Ming Zhang2, Zi-Bin Zhu1, Ying Chen3, Shang-Qian Xie1.
Abstract
Structural variation (SV) represents a major form of genetic variations that contribute to polymorphic variations, human diseases, and phenotypes in many organisms. Long-read sequencing has been successfully used to identify novel and complex SVs. However, comparison of SV detection tools for long-read sequencing datasets has not been reported. Therefore, we developed an analysis workflow that combined two alignment tools (NGMLR and minimap2) and five callers (Sniffles, Picky, smartie-sv, PBHoney, and NanoSV) to evaluate the SV detection in six datasets of Saccharomyces cerevisiae. The accuracy of SV regions was validated by re-aligning raw reads in diverse alignment tools, SV callers, experimental conditions, and sequencing platforms. The results showed that SV detection between NGMLR and minimap2 was not significant when using the same caller. The PBHoney was with the highest average accuracy (89.04%) and Picky has the lowest average accuracy (35.85%). The accuracy of NanoSV, Sniffles, and smartie-sv was 68.67%, 60.47%, and 57.67%, respectively. In addition, smartie-sv and NanoSV detected the most and least number of SVs, and SV detection from the PacBio sequencing platform was significantly more than that from ONT (p = 0.000173).Entities:
Keywords: PacBio and ONT; SV caller; Saccharomyces cerevisiae; long-read sequencing; structural variation
Year: 2020 PMID: 32211024 PMCID: PMC7075250 DOI: 10.3389/fgene.2020.00159
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Overview of the evaluation pipeline in this study. Minimap2 (Li, 2018) and NGMLR (Sedlazeck et al., 2018) were used to perform alignment. Minimap2 aligned against reference genome with parameters ‘–MD -x map-pb/map-ont -R “@RG\tID:default\tSM : SAM” -a’ and NGMLR with the default parameters. SVs were identified by five callers, including Sniffles (Sedlazeck et al., 2018), Picky (Gong et al., 2018), smartie-sv (Kronenberg et al., 2018), PBHoney (English et al., 2014), and NanoSV (Cretu Stancu et al., 2017). Sniffles detected all types of SVs with parameters ‘–genotype –skip_parameter_estimation –min_support 10’ and employed a novel SV scoring scheme to exclude false SVs based on the size, position, type, and coverage of the candidate SVs (Sedlazeck et al., 2018). NanoSV was set ‘-s samtools’ to detect SVs and used the clustering of split reads to identify SV breakpoint junctions based on long-read sequencing data (Cretu Stancu et al., 2017). PBHoney considered both intra-read discordance and soft-clipped tails of long read (>10, 000 bp) to identify SVs (English et al., 2014). PBHoney, smartie-sv, and Picky were used to identify SVs with default parameters.
The number of SVs detected by the combination of two alignment tools and five callers.
| Sample_ID | Alignment tools | Five SVs callers | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PBHoney | Acc (%) | Spec (%) | AUC (%) | NanoSV | Acc (%) | Spec (%) | AUC (%) | Picky | Acc (%) | Spec (%) | AUC (%) | smartie-sv | Acc (%) | Spec (%) | AUC (%) | Sniffles | Acc (%) | Spec (%) | AUC (%) | ||
| SAMD00082707 | minimap2 | 962/1440 | 66.81 | 97.7 | 64.5 | 291/492 | 59.15 | 99.69 | 52.13 | 9711/18,717 | 51.88 | 91.23 | 53.15 | 34,832/72,870 | 47.80 | 13.34 | 71.18 | 600/1010 | 59.41 | 98.04 | 51.4 |
| NGMLR | 254/357 | 71.15 | 99.31 | 63.12 | 321/493 | 65.11 | 99.53 | 54.72 | 8306/14,591 | 56.93 | 94.37 | 53.16 | 25,943/47,460 | 54.66 | 8.4 | 74.95 | 376/546 | 68.86 | 98.38 | 53.95 | |
| SAMN08364553 | minimap2 | 973/997 | 97.59 | 96.92 | 69.42 | 44/60 | 73.33 | 99.93 | 50.65 | 4027/8104 | 49.69 | 99.03 | 52.34 | 54,681/104,164 | 52.50 | 4.48 | 73.14 | 136/255 | 53.33 | 99.64 | 50.73 |
| NGMLR | 2540/2680 | 94.78 | 93.8 | 67.88 | 30/43 | 69.77 | 99.95 | 50.07 | 3369/9561 | 35.24 | 97.68 | 52.35 | 53,516/81,092 | 65.99 | 8.86 | 70.72 | 184/248 | 74.19 | 99.71 | 50.43 | |
| SAMN08364554 | minimap2 | 46/65 | 70.77 | 99.88 | 51.72 | 100/111 | 90.09 | 99.95 | 52.8 | 1898/4433 | 42.82 | 98.38 | 67.29 | 84,646/142,206 | 59.52 | 2.25 | 76.77 | 271/421 | 64.37 | 99.54 | 55.55 |
| NGMLR | 974/1025 | 95.02 | 97.49 | 56.41 | 81/90 | 90.00 | 99.96 | 50.25 | 2978/7485 | 39.79 | 97.54 | 60 | 77,101/114,629 | 67.26 | 5.35 | 28.33 | 348/402 | 86.57 | 99.65 | 55.01 | |
| SAMN09475318_ont* | minimap2 | 399/467 | 85.44 | 96.25 | 62.79 | 185/299 | 61.87 | 99.73 | 53.93 | 1015/4625 | 21.95 | 98.33 | 51.55 | 10,074/19,509 | 51.64 | 8.58 | 72.74 | 283/487 | 58.11 | 97.11 | 54.83 |
| NGMLR | 283/324 | 87.35 | 97.33 | 63.12 | 210/253 | 83.00 | 99.4 | 53.71 | 1540/2303 | 66.87 | 97.18 | 53.87 | 8297/14,717 | 56.38 | 9.07 | 75.63 | 256/410 | 62.44 | 97.02 | 54.94 | |
| SAMN09475318_rs2* | minimap2 | 395/460 | 85.87 | 98.39 | 61.01 | 267/435 | 61.38 | 99.7 | 54.38 | 4594/21,200 | 21.67 | 93.73 | 53.06 | 20,078/40,921 | 49.07 | 10.21 | 72.99 | 409/656 | 62.35 | 97.97 | 54.54 |
| NGMLR | 267/308 | 86.69 | 98.94 | 63.53 | 281/396 | 70.96 | 99.48 | 54.17 | 4333/21,476 | 20.18 | 96.28 | 53.14 | 16,617/30,441 | 54.59 | 6.89 | 77.31 | 321/512 | 62.70 | 98.42 | 56.47 | |
| SAMN09475318_seq* | minimap2 | 463/497 | 93.16 | 98.85 | 59.16 | 243/366 | 66.39 | 99.94 | 54.16 | 1890/9254 | 20.42 | 98.83 | 57.93 | 79,413/14,8540 | 53.46 | 3.23 | 77.1 | 534/1201 | 44.46 | 99.15 | 55.99 |
| NGMLR | 2177/2311 | 94.20 | 94.87 | 62.49 | 303/393 | 77.10 | 99.91 | 51.15 | 2738/7663 | 35.73 | 97.7 | 56.17 | 69,406/110,440 | 62.84 | 8.23 | 73.04 | 549/908 | 60.46 | 99.29 | 53.32 | |
*ont, rs2, and seq presented the datasets from ONT GridION, PacBio RSII, and PacBio Sequel, respectively. Acc: the accuracy of validated SVs. Spec: the specificity of detecting SV by combination alignments and callers.
The number of SVs from two alignments in five callers.
| Callers | Minimap2 | NGMLR | Common | Common/minimap2 (%) | Common/NGMLR (%) |
|---|---|---|---|---|---|
| Sniffles | 547 | 628 | 244 | 44.61 | 38.83 |
| PBHoney | 1,836 | 1,824 | 749 | 40.80 | 41.06 |
| smartie-sv | 11,376 | 15,455 | 7,030 | 61.80 | 45.49 |
| NanoSV | 219 | 216 | 139 | 63.47 | 64.35 |
| Picky | 547 | 1,939 | 244 | 44.61 | 12.58 |
Figure 2Comparison of SVs in PacBio and ONT sequencing platforms. (A) The distribution of SVs in chromosome. (B) The ratio of SV types from three instruments. (C) The common SVs among three instruments.
Figure 3Circos of SVs detected by NGMLR and Sniffles in BY4742 (A) and SY14 (B). From outer to inner, the first ring referred to chromosomes, and 2–6 were INVDUPs (purple), DUPs (green), INVs (yellow), INSs (blue), and DELs (orange). The inner lines referred to BNDs. The outer text referred to genes. (A) BY4742. (B) SY14.
The common SVs in both BY4742 and SY14.
| Chrom | Start | End | Type | Gene name |
|---|---|---|---|---|
| chr3 | 12542 | 200120 | DUP | HMLALPHA2, HMLALPHA1, VAC17, MRC1, KRR1, FYV5, ADF1, MIC10, PRD1, PEX34, KAR4, RDT1, PBN1, LRE1, APA1, YCL049C, YCL048W-A, SPS22, POF1, EMC1, MGR1, YCL042W, GLK1, GID7, ATG22, SRO9, GFD2, GRX1, LSB5, MXR2, STE50, HIS4, BIK1, RNQ1, FUS1, HBN1, FRM2, AGP1, tE(UUC)C, YCL021W-A, YCL019W, YCL020W, SUP53, LEU2, NFS1, DCC1, YCL012C, GBP2, SGF29, ILV6, STP22, VMA9, snR43, LDB16, PGS1, YCL002C, RER1, YCL001W-A, YCL001W-B, YCR001W, CDC10, MRPL32, YCP4, CIT2, YCR006C, SUF2, YCR007C, tN(GUU)C, SAT4, ADY2, ADP1, PGK1, POL4, CTO1, snR33, SUF16, YCR016W, SRD1, tM(CAU)C, tK(CUU)C, MAK32, PET18, MAK31, HTL1, HSP30, YCR022C, YCR023C, SLM5, YCR024C-B, PMP1, NPP1, RHB1, tQ(UUG)C, FEN2, RIM1, SYP1, snR65, RPS14A, snR189, SNT1, ELO2, RRP43, RBK1, BUD5, MATALPHA2, CHA1, SPB1, PDI1, RRP7, KCC4, BUD3, CWH43, YCR025C, BPH1, PHO87, RVS161 |
| chr3 | 12717 | 200620 | DEL | HMLALPHA2, HMLALPHA1, VAC17, MRC1, KRR1, FYV5, ADF1, MIC10, PRD1, PEX34, KAR4, RDT1, PBN1, LRE1, APA1, YCL049C, YCL048W-A, SPS22, POF1, EMC1, MGR1, YCL042W, GLK1, GID7, ATG22, SRO9, GFD2, GRX1, LSB5, MXR2, STE50, HIS4, BIK1, RNQ1, FUS1, HBN1, FRM2, AGP1, tE(UUC)C, YCL021W-A, YCL019W, YCL020W, SUP53, LEU2, NFS1, DCC1, YCL012C, GBP2, SGF29, ILV6, STP22, VMA9, snR43, LDB16, PGS1, YCL002C, RER1, YCL001W-A, YCL001W-B, YCR001W, CDC10, MRPL32, YCP4, CIT2, YCR006C, SUF2, YCR007C, tN(GUU)C, SAT4, ADY2, ADP1, PGK1, POL4, CTO1, snR33, SUF16, YCR016W, SRD1, tM(CAU)C, tK(CUU)C, MAK32, PET18, MAK31, HTL1, HSP30, YCR022C, YCR023C, SLM5, YCR024C-B, PMP1, NPP1, RHB1, tQ(UUG)C, FEN2, RIM1, SYP1, snR65, RPS14A, snR189, SNT1, ELO2, RRP43, RBK1, BUD5, MATALPHA2, MATALPHA1, CHA1, SPB1, PDI1, RRP7, KCC4, BUD3, CWH43, YCR025C, BPH1, PHO87, RVS161 |
| chr4 | 1335446 | 1335495 | INS | PPZ2 |
| chr4 | 528917 | 537428 | DEL | ENA5, ENA2, ENA1 |
| chr7 | 530002 | 530092 | INS | MTL1 |
| chr8 | 212266 | 216250 | DUP | CUP1-1, YHR054C, RUF5-2, CUP1-2, RSC30, RUF5-1 |
| chr9 | 25585 | 25655 | INS | CSS1 |
| chr11 | 64188 | 64283 | INS | MNN4 |
| chr12 | 451417 | 467360 | DUP | RDN37-1, ETS2-1, RDN25-1, TAR1, ITS2-1, RDN58-1, ITS1-1, RDN18-1, ETS1-1, RDN5-1, RDN37-2, ETS2-2, RDN25-2, YLR154C-G, ITS2-2, RDN58-2, ITS1-2, RDN18-2, ETS1-2 |
| chr13 | 908134 | 908705 | DUP | YMR317W |
| chr14 | 415052 | 415283 | INS | DBP2 |
| chr16 | 528208 | 528340 | INS | CIP1 |
| MT | 1 | 85779 | DUP | tP(UGG)Q, 15S_RRNA, tW(UCA)Q, AI1, AI5_BETA, ATP8, ATP6, tE(UUC)Q, COB, BI4, BI3, BI2, OLI1, tS(UGA)Q2, 21S_RRNA, SCEI, tT(UGU)Q1, tC(GCA)Q, tH(GUG)Q, tL(UAA)Q, tQ(UUG)Q, tK(UUU)Q, tR(UCU)Q1, tG(UCC)Q, tD(GUC)Q, tS(GCU)Q1, tR(ACG)Q2, tA(UGC)Q, tI(GAU)Q, tY(GUA)Q, tN(GUU)Q, tM(CAU)Q1, COX2, Q0255, tF(GAA)Q, tT(UAG)Q2, tV(UAC)Q, COX3, tM(CAU)Q2, RPM1, COX1, AI5_ALPHA, AI4, AI3, AI2, VAR1 |
| MT | 41768 | 42012 | INS | COB, BI4 |