| Literature DB >> 24564714 |
Andrea Manconi, Alessandro Orro, Emanuele Manca, Giuliano Armano, Luciano Milanesi.
Abstract
BACKGROUND: Single Nucleotide Polymorphism (SNP) genotyping analysis is very susceptible to SNPs chromosomal position errors. As it is known, SNPs mapping data are provided along the SNP arrays without any necessary information to assess in advance their accuracy. Moreover, these mapping data are related to a given build of a genome and need to be updated when a new build is available. As a consequence, researchers often plan to remap SNPs with the aim to obtain more up-to-date SNPs chromosomal positions. In this work, we present G-SNPM a GPU (Graphics Processing Unit) based tool to map SNPs on a genome.Entities:
Mesh:
Year: 2014 PMID: 24564714 PMCID: PMC4015528 DOI: 10.1186/1471-2105-15-S1-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Short-read mapping tools
| Name | Mapping Strategy | Indels Support | Quality evalutation | GPU-based |
|---|---|---|---|---|
| Barracuda | BWT-based indexing of the reference | Yes | Yes | Yes |
| BWA | BWT-based indexing of the reference | Yes | Yes | No |
| Bowtie | BWT-based indexing of the reference | No | Yes | No |
| CUHSHAW2 | BWT-based indexing of the reference | Yes | Yes | Yes |
| CloudBurst | Hash the reads | Yes | No | No |
| MAQ | Hash the reads | No | Yes | No |
| RMAP | Hash the reads | Yes | Yes | No |
| SHRiMP2 | Hash the reads | Yes | Yes | No |
| SOAP2 | BWT-based indexing of the reference | Yes | Yes | No |
| SOAP3 | BWT-based indexing of the reference | No | No | Yes |
| SOAP3-dp | BWT-based indexing of the reference | Yes | No | Yes |
A summary of some of the most popular short-read mapping tools.
Figure 1Using two sequences to represent a SNP. Two sequences are separately aligned for a SNP. After the alignment, results are analyzed to calculate the absolute position of the SNP.
Figure 2Using a sequence to represent a SNP. Only a sequence is aligned for a SNP. After the alignment results are analyzed to remove those false positives and to calculate the absolute position of the SNP.
Figure 3G-SNPM mapping strategy. G-SNPM exploits a three-stage pipeline to update the chromosomal position of a SNP. In the first stage, SOAP3-dp is used to unambiguously map a SNP against a reference sequence. Unmapped or ambiguously mapped SNPs are remapped at the second stage by exploiting SHRiMP2. At the third stage, mapped SNP sequences are analyzed to identify the SNP chromosomal position.
Analyzed chips
| CHIP name | hg build | SNPs | unmapped SNPs |
|---|---|---|---|
| HumanOmni 1S | 37.1 | 1.185.976 | 5.314 |
| CNV370 ver 3 | 36.1 | 373.397 | 0 |
| HH300 ver 2 | 36.1 | 318.237 | 0 |
The first column reports the name of the chips and the second the reference build of the human genome used by the chip vendor to map the SNPs. The third and fourth column report the overall number of SNPs of the chip and the number of them unmapped by the chip vendor, respectively.
Results obtained using G-SNPM to remap the SNPs against the same reference build used by the chip vendor
| SNPs | |||||
|---|---|---|---|---|---|
| HumanOmni 1S | 37.1 | 1.185.122 | 1.185.118 | 854 | 4.626 |
| CNV370 ver 3 | 36.1 | 373.397 | 373.382 | 0 | 14.391 |
| HH300 ver 2 | 36.1 | 318.237 | 318.237 | 0 | 1.822 |
A summarization of the discrepancies observed remapping the SNPs with G-SNPM against the same reference builds previously used by the chip vendor to detect the SNPs positions. The first and the second column report the name of the chip and its reference build, respectively. The third column reports the overall number of SNPs mapped using G-SNPM, whereas the fourth column reports the number of them that are uniquely mapped. The fifth column reports the number of SNPs for which G-SNPM did not provide any valid alignment. Finally, the sixth column reports the number of mapped SNPs for which G-SNPM provided different positions with respect to those detected by the chip vendor.
Results obtained using G-SNPM to remap the SNPs against the build 37.3 of the human genome
| CHIP name | hg build | mapped SNPs | uniquely mapped SNPs | unmapped SNPs |
|---|---|---|---|---|
| HumanOmni 1S | 37.3 | 1.185.108 | 1.185.103 | 868 |
| CNV370 ver 3 | 37.3 | 373.374 | 373.371 | 23 |
| HH300 ver 2 | 37.3 | 318.217 | 318.216 | 20 |
The first and the second columns report the name of the chip and its reference build, respectively. The third column reports the overall number of SNPs mapped using G-SNPM, whereas the fourth column reports the number of them uniquely mapped. The fifth column reports the number of SNPs for which G-SNPM did not provide a valid alignment.
SNPs chromosomal regions projected with the NCBI Genome Remapping Service against the build 37.3 of the human genome
| CHIP name | projected regions | unprojected regions |
|---|---|---|
| CNV370 v. 3.0 | 373.185 | 212 |
| HH300 v. 2.0 | 318.209 | 28 |
A summarization of the results observed converting from the build 36.1 to the build 37.3 of the human genome the coordinates of the regions containing the SNPs detected by the chip vendor. The first column reports the name of the chip, whereas the second and the third report the number of regions successfully projected against the build 37.3 and the number of regions for which the NCBI service has been unable to provide any conversion, respectively.
Comparison between G-SNPM and the NCBI Genome Remapping Service
| CHIP name | regions differently remapped |
|---|---|
| CNV370 ver 3 | 7.296 |
| HH300 ver 2 | 454 |
The table shows for each analyzed chip the number of SNPs remapped with G-SNPM against the build 37.3 of the human genome whose positions did not fall inside the regions obtained with the NCBI Genome Remapping Service.
Overall analysis of mapped SNPs and running time
| option D disabled | option D enabled | ||||
|---|---|---|---|---|---|
| HumanOmni 1S | 37.1 | 1.184.688 | 20 m | 1.185.118 | 1 h 34 m |
| HumanOmni 1S | 37.3 | 1.185.031 | 19 m | 1.185.103 | 1 h 30 m |
| CNV370 v. 3.0 | 36.1 | 373.382 | 56 m | 373.382 | 2 h 5 m |
| CNV370 v. 3.0 | 37.3 | 373.367 | 52 m | 373.371 | 2 h 2 m |
| HH300 v. 2.0 | 36.1 | 318.237 | 29 m | 318.237 | 29 m |
| HH300 v. 2.0 | 37.3 | 318.216 | 37 m | 318.216 | 37 m |
The table is divided in two parts. The first summarizes the performance of G-SNPM when only its first stage has been used to remap against the overall genome sequence those SNPs previously unmapped against the same chromosomal sequence detected by the chip vendor (option "D" disabled). The second part of the table summarizes the performance of G-SNPM when both stages have been used to remap against the overall genome sequence those SNPs previously unmapped against the same chromosomal sequence detected by the chip vendor (option "D" enabled).
Analysis of the performance at the second stage of G-SNPM
| CHIP name | reference build | sequences analyzed | time |
|---|---|---|---|
| HumanOmni 1S | 37.1 | 17 | 13 m |
| HumanOmni 1S | 37.3 | 17 | 12 m |
| CNV370 v. 3.0 | 36.1 | 56 | 41 m |
| CNV370 v. 3.0 | 37.3 | 81 | 49 m |
| HH300 v. 2.0 | 36.1 | 10 | 22 m |
| HH300 v. 2.0 | 37.3 | 36 | 27 m |
A summarization of the performance in terms of running time at the second stage of the G-SNPM. The table shows the number of sequences that G-SNPM tried to align at the second stage and the time required to align them. It is evident a considerable imbalance of the processing time between the first and the second level. The table summarizes the performance with option "D" disabled.