| Literature DB >> 31615393 |
Christopher Monit1, Richard A Goldstein2, Greg J Towers2.
Abstract
BACKGROUND: Studying site-specific amino acid frequencies by eye can reveal biologically significant variability and lineage-specific adaptation. This so-called 'sequence gazing' often informs bioinformatics and experimental research. But it is important to also account for the underlying phylogeny, since similarities may be due to common descent rather than selection pressure, and because it is important to distinguish between founder effects and convergent evolution. We set out to combine phylogenetic and sequence data to produce evolutionarily insightful visualisations.Entities:
Keywords: Phylogenetics; Protein evolution; Visualisation
Mesh:
Year: 2019 PMID: 31615393 PMCID: PMC6792252 DOI: 10.1186/s12862-019-1518-9
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Fig. 1ChromaClade example applications. a-c A dataset of 1331 HIV-1 group M capsid sequences containing representatives from all subtypes was downloaded from the Los Alamos HIV-1 sequence database [4] and aligned manually. The phylogeny was estimated from the nucleotide sequences using RAxML 8 [5] with substitution model GTR + Gamma and rooted using HIV-1 group O sequences as an outgroup (not shown). ChromaClade was used to annotate taxon labels with residues found at capsid protein sites. a Site 1, proline is entirely conserved; b site 92, alanine is mostly conserved in subtypes, B, C and D, while proline is mostly conserved in the remaining subtypes; c, site 110, the wildtype threonine is found in most sequences, while the asparagine escape mutant has arisen multiple times independently. Prominent subtypes are indicated, right. d-f A phylogeny was estimated as above for an aligned set of avian and pandemic human influenza virus PB2 gene sequences downloaded from the influenza virus resource [6] and mid-point rooted; the sampling years of the human pandemic sequences are shown, right. Black circles indicate clades found in at least 700 of 1000 bootstrap replicates. ChromaClade was used to colour-annotate the taxon labels and branches according to residues found at sites 627 (d), 591 (e) and 271 (f); branches where the ancestral state is unclear are coloured grey. These annotated trees were visualised using FigTree [1]