| Literature DB >> 35345526 |
Shaoqiang Wang1,2, Jie Li3, A K Alvi Haque2, Haiyong Zhao3, Liying Yang1, Xiguo Yuan1,2,4.
Abstract
Structural variation (SV) is an important type of genome variation and confers susceptibility to human cancer diseases. Systematic analysis of SVs has become a crucial step for the exploration of mechanisms and precision diagnosis of cancers. The central point is how to accurately detect SV breakpoints by using next-generation sequencing (NGS) data. Due to the cooccurrence of multiple types of SVs in the human genome and the intrinsic complexity of SVs, the discrimination of SV breakpoint types is a challenging task. In this paper, we propose a convolutional neural network- (CNN-) based approach, called svBreak, for the detection and discrimination of common types of SV breakpoints. The principle of svBreak is that it extracts a set of SV-related features for each genome site from the sequencing reads aligned to the reference genome and establishes a data matrix where each row represents one site and each column represents one feature and then adopts a CNN model to analyze such data matrix for the prediction of SV breakpoints. The performance of the proposed approach is tested via simulation studies and application to a real sequencing sample. The experimental results demonstrate the merits of the proposed approach when compared with existing methods. Thus, svBreak can be expected to be a supplementary approach in the field of SV analysis in human tumor genomes.Entities:
Mesh:
Year: 2022 PMID: 35345526 PMCID: PMC8957449 DOI: 10.1155/2022/7196040
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Seven common categories of structural variations.
Figure 2The flowchart of svBreak.
Description of the extracted twelve features.
| Features | Description |
|---|---|
| Reversely mapped read (RMR) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is 0 |
| Reversely mapped read showing SM (RMSM) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is -1 |
| Reversely mapped read showing MS (RMMS) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is -1 |
| Mapped read showing SM (MSM) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is 0 |
| Mapped read showing MS (MMS) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is 0 |
| The mapped distance between the paired-end reads smaller than insert size (MDS) | If such paired-end reads exist on one genome site, the value of this feature on the site is 1; otherwise, it is -1 |
| The mapped distance between the paired-end reads equal to insert size (MDE) | If such paired-end reads exist g on one genome site, the value of this feature on the site is 1; otherwise, it is -1 |
| The mapped distance between the paired-end reads larger than insert size (MDL) | If such paired-end reads exist on one genome site, the value of this feature on the site is 1; otherwise, it is -1 |
| Previous breakpoint site with mapped reads showing SM (PSM) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is 0 |
| Previous breakpoint site with mapped reads showing MS (PMS) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is 0 |
| Next breakpoint site with mapped reads showing SM (NSM) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is 0 |
| Next breakpoint site with mapped reads showing MS (NMS) | If such read exists on one genome site, the value of this feature on the site is 1; otherwise, it is 0 |
Figure 3The topology of convolutional neural network.
Figure 4Performance comparison between the three methods in terms of sensitivity, precision, and F1-score on simulation datasets.
Figure 5Seven types of structural variation sensitivity on simulation data of coverage from 10x to 40x.
Figure 6The overlapping result among the three methods.