| Literature DB >> 28249562 |
Philip L Tzou1, Xiaoqiu Huang2, Robert W Shafer3.
Abstract
BACKGROUND: Current nucleotide-to-amino acid alignment software programs were developed primarily for detecting gene exons within eukaryotic genomes and were therefore optimized for speed across long genetic sequences. We developed a nucleotide-to-amino acid alignment program NucAmino optimized for virus sequencing.Entities:
Keywords: Drug resistance; HIV-1; Open source; Sequence alignment; Viruses
Mesh:
Substances:
Year: 2017 PMID: 28249562 PMCID: PMC5333393 DOI: 10.1186/s12859-017-1555-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Dynamic programming alignment implemented by NucAmino to align nucleotide sequences to a reference amino acid sequence
Classification of Sequences with Gaps for Which Local Alignment Program (LAP) and NucAmino Yielded Different Results
| Classification | Number of Discordances | Explanation | Example | |
|---|---|---|---|---|
| LAP output | NucAmino output | |||
| A. Insertions and deletions in RT β3–β4 loop region (codons 62 to 72) | 128 | For 233 insertions in this region, NucAmino placed all but 3 at codon 69. In contrast, LAP placed 113 insertions at codon 69, whereas 111 were placed at codons 62 to 70. For 99 deletions in this region, there were 3 differences between NucAmino and LAP which can be all classified into classification D. (Example shown on the right: AF315241) | 66 67 68 69 70 | 66 67 68 69 70 |
| B. Insertions in PR codons 33/41 loop region | 31 | For 151 insertions in this region, LAP and NucAmino placed 31 insertions at different positions. (Example shown on the right: HQ657812) | 32 33 34 35 36 37 | 32 33 34 35 36 37 |
| C. Different placement of indels and/or frameshifts (not in classification A or B) | 213 | For 213 sequences with indels and/or frameshifts outside of the RT β3–β4 loop region and the PR codon 33/41 loop region, gaps were placed at slightly different positions. (Example shown on the right: EF071939) | 306 307 308 309 310 311 312 | 306 307 308 309 310 311 312 |
| D. Codon alignment corrections (not in classification A, B or C) | 108 | Overall, there were 218 sequences with 110 insertions and 133 deletions for which LAP aligned 3 nucleotides across more than one codon whereas NucAmino aligned the nucleotides to a single codon. (Example: HM569289) Of these, 114 were not in the RT β3β4 loop or in PR codon 35/41 loop region. | 200 2 01 202 203 | 200 201 202 203 |
| E. Large gaps | 32 | 32 sequences had large gaps presumably because the contributor excluded unsequenced regions from the GenBank submission or inserted large stretches of N’s. For 21 and 11 of these regions, LAP and NucAmino accurately reported a large deletion encompassing the missing region, respectively. | Insufficient space to provide an example. | |
Abbreviations: RT reverse transcriptase, PR protease