| Literature DB >> 18840282 |
Abstract
BACKGROUND: Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects.Entities:
Mesh:
Year: 2008 PMID: 18840282 PMCID: PMC2577119 DOI: 10.1186/1471-2105-9-420
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Main XplorSeq window. Screen shots of XplorSeq main window. (A) Listing of imported sequences, contigs, and blast data. Import, export, data analysis, and data transformation options are presented in menus within the tool drawer adjacent to the main window. (B) Project-associated data fields. Comments box provides space for recording details of analysis.
Figure 2Organization of basic XplorSeq Data structures.
Figure 3Clone inspector window. Summarizes information associated with a selected clone, a group of sequences (both individual reads and contigs) from the same amplified or cloned gene. The main window summarizes the top BLAST hit for the clone (i.e. the sequence or contig BLAST hit with the highest bit-score). The phylogenetic lineage of the clone can be assigned at the bottom of the main window. The drawer to the right of the window presents the contents of the meta-data dictionary associated with this clone.
Figure 4Sequence inspector window. Displays information associated with a selected sequence. Nucleotides are color-coded to represent the quality scores of individual nucleotides (the legend at the bottom of the window shows the meaning of the colors). Primers used to amplify the gene are summarized on the lower left. Basic summary statistics (length, trimmed length, and number of nts > Q20) are presented in the lower right. The drawer to the right of the window presents the contents of the meta-data dictionary associated with this sequence.
Figure 5Metadata editors. A) Editing all keys for meta-data in project. B) Editing values associated with keys for a particular sequence object.
Summary of XplorSeq functionality
| Chromatogram... | Import DNA chromatograms (.esd, .scf, .abi etc.): phred |
| PHD... | Import DNA sequences in Phd format |
| Contig... | Import DNA sequences and quality scores in FastA format |
| Blast... | Parse Blast records |
| FastA... | Import DNA sequences in FastA format |
| XplorSeq Library... | Import XplorSeq document |
| Phylogenetic Lineage... | Import phylogenetic lineage information from entrez |
| Metadata... | Import metadata in key-value format |
| Sequences... | Export DNA sequences in variety of formats |
| FastA + Qual... | Export DNA sequences and quality scores |
| Blast Info... | Export summary of Blast records |
| Cluster Table... | Enumerate OTUs belonging to groups of sequences |
| OTU Diversity... | Calculate OTU richness for set of sequences |
| Quality Scores... | Export summary of quality scores |
| Blast Accession #'s... | Export accession numbers of top Blast hits |
| Sequin Script... | Export data in format for Genbank submission (sequin) |
| Blast Database... | Create a Blast database (formatdb) |
| XML File... | Export data in XML format |
| Metadata... | Summarize and export metadata |
| Placeholder Tree... | List selected sequences in Newick format |
| Basecall->Blast... | Pipe data from chromatogram through Blast analysis |
| Contig->Blast... | Pipe data from contig assembly to Blast analysis |
| Basecall... | Perform base calling (phred or ttuner) |
| Contig... | Perform contig assembly (phrap or TIGR_Assembler) |
| Blast NCBI... | Blast query of Genbank |
| Blast Local... | Blast query of local blast database |
| Get Entrez Lineage Info. | Download entrez phylogenetic lineage information (idfetch) |
| Align... | Perform multiple sequence alignment (clustal) |
| Biodiversity (biodiv)... | Calculates biodiversity indices with random resampling (biodiv) |
| XplorSeq Doc Difference... | Generate differences between two XplorSeq documents |
| Edit Sequence Names... | Alter names of sequences |
| Edit Lineage Names... | Edit phylogenetic lineage information |
| Edit Metadata... | Edit metadata associated with sequence |
| Edit Metadata Keys... | Edit all metadata keys in document |
| Group... | Group sequences and contigs |
| UnGroup... | Ungroup sequences and contigs |
| Clean... | Delete blast information, contigs |
| Sort... | Sort records in document |
| Set Oligos... | Associate primer sequences with sequence objects |
| Trim... | Trim sequences based on quality score and primer |
| UnTrim. | Remove trimming information |
| Rev.-Complement | Reverse complement sequence |
| DNA -> RNA | Convert DNA sequence to RNA sequence |
| RNA -> DNA | Convert RNA sequence to DNA sequence |
| UPPER CASE | Convert sequence to upper case |
| lower case | Convert sequence to lower case |
| OTU clustering | Cluster Operational Taxonomic Units (sortx) |
| Clearcut NJ Tree... | Fast neighbor joining trees (clearcut) |
| Phylip distance matrix... | Calculate distance matrix (dnadist) |
| Phylip NJ Tree... | Calculate Neighbor joining or UPGMA trees (neighbor) |
| Phylip seqboot... | Generate bootstrap replicates of alignment (seqboot) |
| Phylip consense... | Generate consensus of multiple trees (consense) |
| RAxML... | Generate Maximum Likelihood tree (raxmlHPC) |
1Command line executables are listed in parentheses.
Figure 6Analyses of aligned sequences. XplorSeq provides GUI-based access to several command-line programs used for phylogenetic analysis of multiple sequence alignments, including A-D) several commonly used programs from the phylip package [37]; E) RAxML for maximum-likelihood phylogenetic inference; F) sortx, for rapid clustering of sequences into OTUs; and G) biodiv for estimation of biodiversity indices through resampling statistics.
Figure 7Tabulation of sequence abundance/prevalence. The Summary Table dialog provides multiple means of tabulating sequence data. In this example, rows are defined by the values of a meta-data key (e.g. 97% OTUs), while columns are defined by the value associated with another meta-data key (e.g. PCR results). The "Data Format" panel specifies that sequence counts in each column are normalized to the total of each column.
Figure 8Sequin script export. Scripted export of sequence data for GenBank submission through Sequin. Data associated with each sequence can be manipulated in order to tailor the level of detail that will go into Sequin.
Figure 9Input of placeholder tree into ARB. Sequences were exported from XplorSeq using the "Placeholder Tree" option. Sequences were split into two categories on the basis of associated meta-data (in this example, results of a PCR screen). Here, the user has selected all taxa belonging to the "neg" group. Because ARB propagates taxa markings between trees, placeholder trees can be used to graphically organize and manipulate groups of sequences that aren't necessarily related.
Execution times of commonly used software: comparison of XplorSeq with command line implementation
| Execution Time (sec.)1 | ||||||
| System A2 | System B3 | |||||
| Program | XplorSeq4 | Command Line | XplorSeq4 | Command Line | Task | Sequence Data |
| phred | 50.5 (5.3)1 | 51.0 (2.9)1 | 23.3 (0.8)1 | 19.3 (0.5)1 | Basecall | 768 .esd files. |
| phrap | 167.0 (2.4) | 153.3 (0.8) | 30.5 (1.2) | 28.8 (1.5) | Contig | 384 pairs of reads. |
| blastall | 368.0 (2.4) | 345.3 (0.8) | 228.5 (1.2) | 221.3 (2.4) | Local blast | 24 1585-mers |
| XplorSeq | 69.0 (8.1) | na | 21.2 (0.4) | na | Import fasta | 250,000 1585-mers |
| XplorSeq | 130.8 (10.8) | na | 12.2 (0.4) | na | Open XplorSeq file | 250,000 1585-mers |
| XplorSeq | 160.8 (14.4) | na | 11.7 (0.5) | na | Save XplorSeq file | 250,000 1585-mers |
| XplorSeq | 60.5 (0.8) | na | 46.3 (1.4) | na | Import fasta | 1,000,000 25-mers |
| XplorSeq | 216.3 (11.2) | na | 28.0 (0.6) | na | Open XplorSeq file | 1,000,000 25-mers |
| XplorSeq | 34.3 (3.8) | na | 16.5 (0.5) | na | Save XplorSeq file | 1,000,000 25-mers |
1Elapsed time of execution: Mean (St. Dev) seconds.
2Executed on a 2 GHz Intel Core Duo MacBook Pro. 1 GB 667 MHz DDR2 SDRAM. Mac OSX version 10.5.4.
3Executed on a workstation with 2 × 3 GHz Quad-Core Intel Xeon processors. 8 GB 800 MHz DDR2 FB-DIMM. Mac OS X Server version 10.5.4
4Elapsed time includes export and import of data.