| Literature DB >> 27417145 |
Thibaut Jombart1, Frederick Archer2, Klaus Schliep3, Zhian Kamvar4, Rebecca Harris5, Emmanuel Paradis6, Jérome Goudet7,8, Hilmar Lapp9.
Abstract
Genetic sequences of multiple genes are becoming increasingly common for a wide range of organisms including viruses, bacteria and eukaryotes. While such data may sometimes be treated as a single locus, in practice, a number of biological and statistical phenomena can lead to phylogenetic incongruence. In such cases, different loci should, at least as a preliminary step, be examined and analysed separately. The r software has become a popular platform for phylogenetics, with several packages implementing distance-based, parsimony and likelihood-based phylogenetic reconstruction, and an even greater number of packages implementing phylogenetic comparative methods. Unfortunately, basic data structures and tools for analysing multiple genes have so far been lacking, thereby limiting potential for investigating phylogenetic incongruence. In this study, we introduce the new r package apex to fill this gap. apex implements new object classes, which extend existing standards for storing DNA and amino acid sequences, and provides a number of convenient tools for handling, visualizing and analysing these data. In this study, we introduce the main features of the package and illustrate its functionalities through the analysis of a simple data set.Entities:
Keywords: zzm321990rzzm321990; genetics; package; phylogenies; software
Mesh:
Year: 2016 PMID: 27417145 PMCID: PMC5215480 DOI: 10.1111/1755-0998.12567
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Content of multidna objects. The content of each slot can be accessed using ‘@[slot name]’, where ‘[slot name]’ is any of the values listed in the first column
| Slot name | Data stored | Description |
|---|---|---|
|
|
| A |
|
|
| A vector of labels for the individuals |
|
|
| The number of individuals in the data set |
|
|
| The total number of sequences, pooling all genes, and including gap‐only sequences |
|
|
| The total number of gap‐only sequences |
|
|
| A |
|
|
| A |
Slots whose content is NULL when empty.
Content of multiphyDat objects. The content of each slot can be accessed using ‘@[slot name]’, where ‘[slot name]’ is any of the values listed in the first column
| Slot name | Data stored | Description |
|---|---|---|
|
|
| A |
|
|
| A character string indicating the type of sequences (e.g. DNA, protein) |
|
|
| A vector of labels for the individuals |
|
|
| The number of individuals in the data set |
|
|
| The total number of sequences, pooling all genes, and including gap‐only sequences |
|
|
| The total number of gap‐only sequences |
|
|
| A |
|
|
| A |
Slots whose content is NULL when empty.
Functions for importing and exporting data in apex
| Function | Input | Output | Notes |
|---|---|---|---|
|
| Interleaved, sequential, clustal, fasta files |
| Based on |
|
| Fasta files |
| Based on |
|
| Interleaved, fasta | multiphyDat | Based on |
|
|
|
| |
|
|
|
| Only works for DNA sequences |
|
|
|
| Extract either SNPs or haplotypes |
|
|
|
| Extract either SNPs or haplotypes |
Base class for genetic markers in the adegenet package.
Figure 1Individual and concatenated sequence alignments of chickadees data. The first four graphs are a plot of a multidna object containing DNA alignments for four different loci (patr_poat 43, 47, 48 and 49). The fifth graph displays the concatenated alignment.
Figure 2Phylogenies of the chickadees data. The function getTree was used to obtain individual phylogenies for each locus (patr_poat43, 47, 48 and 49), and from the concatenated alignment (central phylogeny). Each phylogeny is an unrooted, neighbour‐joining tree based on Hamming distances between DNA sequences. Taxa are identified using colours to facilitate comparison of the trees.