| Literature DB >> 21056022 |
Alexander Röck1, Jodi Irwin, Arne Dür, Thomas Parsons, Walther Parson.
Abstract
The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org).Entities:
Mesh:
Substances:
Year: 2010 PMID: 21056022 PMCID: PMC3064999 DOI: 10.1016/j.fsigen.2010.10.006
Source DB: PubMed Journal: Forensic Sci Int Genet ISSN: 1872-4973 Impact factor: 4.882
Query results obtained by position-based and string-based search of both phylogenetic (55C 56T 57C 60.1T 93G 263G 309.1C 315.1C 573.1C 573.2C) and operational (54.1C 56C 93G 263G 309.1C 315.1C 573.1C 573.2C) nomenclature of CR haplotype CN253.
| Number of differences to CN253 | Number of haplotypes | Haplogroups of haplotypes | ||
|---|---|---|---|---|
| Position-based search | String-based search | |||
| Phylogenetic alignment | Rule-based alignment | |||
| 0 | 1 | 0 | 1 | H15 |
| 1 | 0 | 0 | 0 | |
| 2 | 2 | 0 | 2 | H15 |
| 3 | 6 | 31 | 37 | R0, H15, H15a1 |
| 4 | 6 | 247 | 252 | |
| 5 | 36 | 424 | 426 | |
| 6+ | 7279 | 6628 | 6612 | |
Contrasting phylogenetic and operational nomenclature of profile CHN.ASN.000451. Phylogenetic alignment of profile CHN.ASN.000451 is gained taking into consideration the phylogenetic nomenclature of profile AF016 from [12] that is stored in the EMPOP database.
| Sample information | Haplogroup | Phylogenetic nomenclature | Operational nomenclature |
|---|---|---|---|
| AF016 | B4a1a1a | 16182C 16183C 16189C 16217C 16247G 16261T 16519C | |
| 73G 146C 263G 309.1C 315.1C | |||
| CHN.ASN.000451 | B4a1a1a | 16182C 16183C 16189C 16217C 16247G 16261T 16519C | 16182C 16183C 16189C 16217C 16247G 16261T 16519C |
| 73G 146C 263G 308DEL 309DEL 315.1C 523DEL 524DEL | 73G 146C 263G 308T 310DEL 523DEL 524DEL | ||
| 10 different annotations for profile CHN.ASN.000451 | |||
| 16182C 16183C 16189C 16217C 16247G 16261T 16519C 73G 146C 263G 308- 309- 315.1C 523- 524- | |||
| 16182C 16183C 16189C 16217C 16247G 16261T 16519C 73G 146C 263G 308- 309T 310C 523- 524- | |||
| 16182C 16183C 16189C 16217C 16247G 16261T 16519C 73G 146C 263G 308T 309- 310C 523- 524- | |||
| 16182C 16183- 16189C 16193.1C 16217C 16247G 16261T 16519C 73G 146C 263G 308- 309- 315.1C 523- 524- | |||
| 16182C 16183C 16189C 16217C 16247G 16261T 16519C 73G 146C 263G 308T 310C 315- 523- 524- | |||
| 16182C 16183- 16189C 16193.1C 16217C 16247G 16261T 16519C 73G 146C 263G 308- 309T 310C 523- 524- | |||
| 16182C 16183- 16188.1C 16189C 16217C 16247G 16261T 16519C 73G 146C 263G 308- 309- 315.1C 523- 524- | |||
| 16182C 16183- 16189C 16193.1C 16217C 16247G 16261T 16519C 73G 146C 263G 308T 309- 310C 523- 524- | |||
| 16182C 16183- 16188.1C 16189C 16217C 16247G 16261T 16519C 73G 146C 263G 308- 309T 310C 523- 524- | |||
| 16182C 16183- 16189C 16193.1C 16217C 16247G 16261T 16519C 73G 146C 263G 308T 310C 315- 523- 524- | |||
Contrasting regular and irregular indels. Insertions of all four bases have been described at position 42. The insertion of a C (42.1C) is an irregular term as this position is followed by two C-residues in rCRS. To comply with regularity it is notated at the 3′ end of the C-tract (44.1C). Similar applies to the deletion of block AC at positions 515 and 516. This is transformed to 523- 524-. Bases in bold face denote runs.
| 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | |||||||||||||||||
| 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ||||||
| 9 | 0 | 1 | 2 | 2 | 3 | 4 | 4 | 5 | 6 | 7 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ||||||
| . | . | |||||||||||||||||||||||||||||||||||
| 1 | 1 | |||||||||||||||||||||||||||||||||||
| 5′ | 3′ | 5′ | 3′ | |||||||||||||||||||||||||||||||||
| rCRS | C | T | C | T | - | - | A | T | G | rCRS | C | C | A | G | C | C | G | C | T | G | ||||||||||||||||
| 42.1C | C | T | C | T | C | C | C | - | A | T | G | 515- 516- | C | C | A | G | C | - | - | A | C | A | C | A | C | A | C | C | G | C | T | G | ||||
| 44.1C | C | T | C | T | - | C | C | C | A | T | G | 523- 524- | C | C | A | G | C | A | C | A | C | A | C | A | C | - | - | C | G | C | T | G |
Search results for querying profile CHN.ASN.000451 with range 16024-16365 73-340 in EMPOP 2.
| Search result without ignoring indels | ||
|---|---|---|
| Number of differences to profile CHN.ASN.000451 | Number of haplotypes | Haplogroups of haplotypes |
| 0 | 0 | |
| 1 | 0 | |
| 2 | 0 | |
| 3 | 1 | B4a1a1a |
| 4 | 5 | B4 |
| 5 | 4 | B4, B4a1a |
| 6+ | 10889 | |