| Literature DB >> 27766935 |
Quang Tran1, Shanshan Gao1, Vinhthuy Phan2.
Abstract
Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and developing methods to detect genetic variants. Our analysis revealed that this reference contained thousands of INDELs that were constructed in a biased manner. This bias occurred at the level of aligning short reads to reference genomes to detect variants. The bias is caused by the existence of many theoretically optimal alignments between the reference genome and reads containing alternative alleles at those INDEL locations. We examined several popular aligners and showed that these aligners could be divided into groups whose alignments yielded INDELs that agreed strongly or disagreed strongly with reported INDELs. This finding suggests that the agreement or disagreement between the aligners' called INDEL and the reported INDEL is merely a result of the arbitrary selection of one of the optimal alignments. The existence of bias in INDEL calling might have a serious influence in downstream analyses. As such, our finding suggests that this phenomenon should be further addressed.Entities:
Keywords: INDEL detection; Short read alignment; Variant calling
Mesh:
Year: 2016 PMID: 27766935 PMCID: PMC5073887 DOI: 10.1186/s12859-016-1216-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Percentage of correct mapping, actual and expected alignment by aligners
| Aligners | Correct mapping % | Actual agreement % | Expected agreement % |
|
|---|---|---|---|---|
| Bowtie2 | 96 | 99 | 30 | 0.0000546 |
| BWA | 93 | 99 | 30 | 0.0000550 |
| SHRiMP2 | 97 | 99 | 31 | 0.0001491 |
| RazerS | 88 | 75 | 31 | 0.0001631 |
| CUSHAW2 | 97 | 70 | 31 | 0.0000562 |
| GASSST | 91 | 8 | 17 | 0.0015892 |
| Smalt | 96 | 5 | 31 | 0.0003852 |
Fig. 1Distribution of INDEL complexity across human chromosomes
Fig. 2Density of INDEL complexity across human chromosomes