| Literature DB >> 30445717 |
Shin-Lin Tu1, Jeannette P Staheli2, Colum McClay3, Kathleen McLeod4, Timothy M Rose5,6, Chris Upton7.
Abstract
Base-By-Base is a comprehensive tool for the creation and editing of multiple sequence alignments that is coded in Java and runs on multiple platforms. It can be used with gene and protein sequences as well as with large viral genomes, which themselves can contain gene annotations. This report describes new features added to Base-By-Base over the last 7 years. The two most significant additions are: (1) The recoding and inclusion of "consensus-degenerate hybrid oligonucleotide primers" (CODEHOP), a popular tool for the design of degenerate primers from a multiple sequence alignment of proteins; and (2) the ability to perform fuzzy searches within the columns of sequence data in multiple sequence alignments to determine the distribution of sequence variants among the sequences. The intuitive interface focuses on the presentation of results in easily understood visualizations and providing the ability to annotate the sequences in a multiple alignment with analytic and user data.Entities:
Keywords: ASFV; BBB; Base-By-Base; MSA; bioinformatics; comparative genomics; poxvirus; software; virus
Mesh:
Year: 2018 PMID: 30445717 PMCID: PMC6265842 DOI: 10.3390/v10110637
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Anatomy of a “consensus-degenerate hybrid oligonucleotide primers” (CODEHOP) PCR primer. A CODEHOP is a pool of related primers containing all possible nucleotide sequences encoding 3 to 4 highly conserved amino acids within a 3′ degenerate core and a 5′ consensus clamp containing the most probable nucleotide at each position for the flanking codons. (A) multiple alignment of protein sequences; (B) predicted CODEHOP primer pool.
Figure 2j-CODEHOP primer design output. The uracil DNA glycosylase test data set was used as input for the j-CODEHOP program and the following options were used for primer design: (1) Block making alignment tool—“MUSCLE”; (2) Codon table—“Homo sapiens”; (3) Clamp (nondegenerate 5′ region) length—“25”; (4) Core (degenerate 3′ region)—max degeneracy “16”, length of degenerate core in aa “4”, strictness (%) “0”, min aa conservation (%) “80”. Default values were used for the Advanced options: (1) 3′ nucleotide—“Invariant 3′ nt”; (2) Min block length—“5”; Primer concentration (nM)—“50”; (3) Restrict 3′ nucleotide to G or C—“unchecked”; (4) Exclude Leu, Ser, and Arg from 3′ region—“unchecked”. (A) Initial graphical output showing the consensus amino acid sequence for the ordered blocks of multiply aligned protein sequences. Amino acids showing conservation above the chosen minimum value are capitalized. The positions of predicted CODEHOP PCR primers are indicated showing the extent of the amino acid sequence used for primer design, the direction of the primer (forward (F) or reverse (R)), and the degeneracy of the 3′ degenerate region, i.e., 4×, 8× or 16×. A CODEHOP with 4× degeneracy is composed of a pool of 4 different primers that provide all possible sequences encoding the 3–4 highly conserved amino acid motif targeted by the CODEHOP primer. The amino acid sequence of the motif is used to name the primer, ex. “PWNY”. (B) The output obtained by clicking on a primer of interest in the initial graphical output, in this case primer “PWNY-F 8×”. This output shows the primer sequence (5′ to 3′), with the 5′ nondegenerate consensus region in capital letters and the 3’ degenerate region in small letters, using the international code for ambiguous nucleotides, i.e., “Y” (C,T), “R” (A,G), “N” (A,C,G,T), etc. The codons in the primer sequence are aligned with the block of multiply aligned protein sequences. Amino acid positions showing conservation above the chosen minimum are indicated with an asterisk. Amino acids within the multiple alignment which are identical to the consensus sequence are indicated with a dot. The metadata for the chosen primer design criteria are indicated, as is the primer location in amino acids and base pairs. A third panel (not shown) provides a list of primers predicted from the current amino acid block to export. The primer sequence and metadata can be exported in a “comma-separated values” (CSV) spreadsheet format. The panels shown are high-resolution representations of program output.
Figure 3Visual summary from BBB. Pink and pale blue boxes represent genes transcribed to the right and left, respectively, for the genomes of two poxviruses. The centre tract indicates differences between the two sequences: Dark blue lines are SNPs (the abruptly dense SNPs turns out to be falsely assembled out-of-frame sequence from another virus) and green and red blocks show insertions and deletions (erroneously transposed sequences).
Figure 4The contrast of unique SNPs found in the genomic core of 10 cowpox viruses (using the BBB Find Differences feature) with that of a maximum-likelihood phylogenetic tree. Red numbers denote the number of unique SNPs found for the virus that are not shared with any of the others. The phylogenetic tree branch scale denotes the average number of nucleotide substitution per site.
SNPs shared by CPXV-BR, CPXV-Nor1994MAN, and strain noted in the table; all other viruses in Figure 4 have a different nucleotide. SNPs close together are grouped on a single line in the table.
| +BeaBer04/1 |
| 22,518, 22,519, 22,583 |
| 31,870 |
| +RatHei09/1 |
| 4677, 4679 |
| 4886, 4896, 4899, 4917 |
| 9401 |
| 16,480 |
| 19,731 |
| 31,573 |
| 35,003 |
| 40,781 |
| +Ge 1980 EP4 |
| 1204 |
| 10,615, 10,618 |
| 10,731, 10,747 |
| 14,442, 14,457, 14,460, 14,553, 14,574, 14,664, 14,667 |
| 15,071, 15,072, 15,076, 15,138, 15,161, 15,163, 15,171 |
| 19,409 |
| 25,381 |
| 30,528, 30,534, 30,547, 30,549, 30,556 |
| 32,797, 32,799 |
| 35,713, 35,758, 35,812 |
| 36,217 |
| 41,120 |
| +Ge 2002 MKY, |
| 19,510 |
| 47,392 |