| Literature DB >> 26322511 |
Jan Schröder1, Santhosh Girirajan2, Anthony T Papenfuss3, Paul Medvedev4.
Abstract
The uses of the Genome Reference Consortium's human reference sequence can be roughly categorized into three related but distinct categories: as a representative species genome, as a coordinate system for identifying variants, and as an alignment reference for variation detection algorithms. However, the use of this reference sequence as simultaneously a representative species genome and as an alignment reference leads to unnecessary artifacts for structural variation detection algorithms and limits their accuracy. We show how decoupling these two references and developing a separate alignment reference can significantly improve the accuracy of structural variation detection, lead to improved genotyping of disease related genes, and decrease the cost of studying polymorphism in a population.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26322511 PMCID: PMC4556445 DOI: 10.1371/journal.pone.0136771
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Method workflow.
a) In a traditional SV calling pipeline the reads are first aligned against the GRC reference and the alignments are passed to an SV caller, which annotates regions of the GRC reference as being inserted/deleted. b) Our approach is composed of two additional components. BUILD_REF takes a set of sequences to be inserted and modifies the GRC reference genome (e.g. hg18) by inserting the sequences into their prescribed locations, obtaining a new genome (ref+). We next align the reads to ref+ and run a SV caller. The TRANSLATE_CALLS component then modifies the resulting calls so that they become calls relative to the GRC reference, not ref+.
Fig 2An illustrative example.
In the top scenario, a VNA (shown in red) is present in the donor. In ref+, only concordant alignments (correct orientation and mapped distance) are present. As a result, the SV caller does not make a call in ref+, which is converted by TRANSLATE_CALLS to an insertion call in the GRC reference (hg18). In the GRC reference, however, the read pairs that originate from across the VNA junction map discordantly, with one read left unmapped or falsely mapping to a homologous region. These signals in the GRC reference are difficult to decipher for any SV algorithm. In the bottom scenario, where the VNA is absent in the donor, the pairs that span the VNA injection point in the donor align concordantly to the GRC reference. In ref+, they align discordantly with an enlarged mapped distance but bear the hallmark signature of a deletion. This is among the easiest signals that an SV caller can detect and most algorithms show good results with respect to this SV type
Fig 3Analysis of ref+ pipeline accuracy.
Each vertical line represents one individual, with the plus (+) point representing the ref+ pipeline and the square point representing the GRC pipeline.