| Literature DB >> 19943962 |
Chunguang Liang1, Alexander Schmid, María José López-Sánchez, Andres Moya, Roy Gross, Jörg Bernhardt, Thomas Dandekar.
Abstract
BACKGROUND: ESTs or variable sequence reads can be available in prokaryotic studies well before a complete genome is known. Use cases include (i) transcriptome studies or (ii) single cell sequencing of bacteria. Without suitable software their further analysis and mapping would have to await finalization of the corresponding genome.Entities:
Mesh:
Year: 2009 PMID: 19943962 PMCID: PMC2789075 DOI: 10.1186/1471-2105-10-391
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Strategy of JANE. 1) HSP regions (high scoring pairs) are collected applying the BLAST algorithm. Parameters were optimized to detect alignments with lower sequence similarity. 2) HSP fragments are processed using JANE's algorithm, a quality filter is applied to increase accuracy. 3) Top scoring HSPs for each reads are used as anchors. Next, further HSPs are selectively considered (only the closest first) if not too distant (criterion: remaining length of the EST/sequence read to be mapped). 4) The algorithm assembles and calculates potential coverage. Two reads sharing overlapping regions are consecutively merged, forming a contig. 5) A predicted genome is generated replacing more and more parts from the template genome by mapped and assembled sequences. 6) Modules in the JANE toolkit predict function information on individual ESTs or sequence reads.
Figure 2Mapping of . In the example, JANE maps a high fraction of ESTs to moderately related genomes (at least 80% rRNA identity/60% household enzyme similarity) with about 80% accuracy. The number in the left scale indicates the location in the genome template (in kilobases). All mapped ESTs are listed. Arrows in different colours (right) mark the consensus regions of mapping, i.e., red indicates the forward strand and green the reverse strand. A statistical report analyzes the mapping result (right corner). The inserts at the bottom show that by clicking on an individual EST detailed analysis is possible, the different HSPs used and their position appear. This includes also a list of all HSPs available as well as the chosen best EST match generated by JANE. Further analysis regarding all ESTs of that region, contig prediction, coding sequence and function prediction is also provided (see text). More details on the program options and an actual screen shot are given in Fig. S1.
Figure 3Mapping of . Using a related template genome, rapid mapping of sequence reads, or of pre-assembled contigs is smoothly achieved. As in Fig. 2, the number in the left scale indicates the location in the genome template (in kilobases), arrows in different colours mark the consensus regions of mapping, i.e., red indicates the forward strand and green reverse. All mapped sequences are listed. Together, after mapping, they represent the major part of the JDK6008 genome.
Benchmark tests of different alignment software.
| Application | RMAP | JANE | |||||
|---|---|---|---|---|---|---|---|
| Long reads (contigs)1 | |||||||
| Shortest reads (40 bp)4 | Running time (s) | 0.4 | 7.4 | 1.6 | 2.4 | 7.9 | 3.7 |
| No of mapped reads | 310 | 310 | 310 | 310 | 310 | 310 | |
We indicate the challenge in mapping and the distance of the template genome used for mapping. Bold: Moderate similar template genome, italics: closely-related genome template. Normal: cognate genome template.
1Long reads: 128 from a library of Staphylococcus aureus contigues (minimum lengths 627 bp or longer). Times are given in seconds. Accuracy of mapping was determined as given in materials and methods.
2Variable ESTs: 310 from a library of Blattabacteria reads with a minimum length of 19 bp.
3EST fragments trimmed to a fixed length of 40 bp.
4Short reads: 310 artificial fragments of 40 nucleotide length generated from the Blattabacteria genome sequence.
5Maq is not suitable for long reads (see Additional File 1), e.g., ESTs of variable lengths, it is customized for the Illumina-Solexa genome analyzer with a sequence limit of 63 bp.
6 n.a. not applicable for mapping long reads due to a length limit of SOAP (specialized on very short reads of 20-40 bp).
7The alignment procedure of Bowtie is remarkable faster than other software, however for a fair comparison, the time for index building has still to be included.
8SeqMap did not find one entry when aligning short reads, but this is an untypical case with long repeats, we repeated the test replacing this read, the number of mapped reads is then 100%.
Benchmark test on mapping Solexa reads
| Application | Mapping reads* to chromosome 12 contig | Mapping reads* to chromosome 21 contig |
|---|---|---|
| RMAP | 27 (2.7%) | 10 (1.0%) |
| JANE | 37 (3.7%) | 18 (1.8%) |
| SeqMap | 26 (2.6%) | 7 (0.7%) |
| SOAP | 37 (3.7%) | 11 (2.0%) |
| Bowtie | 37 (3.7%) | 11 (2.0%) |
* The total number of reads is 1000, we eliminated the incomplete or ambiguous reads in order to ensure all the programs run smoothly across the benchmark test. Both templates are Homo sapiens chromosome fragments within an acceptable length range for all the above applications (Solexa reads of 36 bp).
Benchmark test on single-cell genome mapping1.
| Application | Running time (s) | |
|---|---|---|
| 0.7 | 3 (0.65%) | |
| 3.3 | 0 | |
| 7.1 | 103 (22.3%) | |
| 3980 | n.a. 3 |
1Example: pori bacteria sequence reads are mapped to the template genome Rhodopirellula Baltica SH1. These are typical long reads (300 bp and more). 2Accuracy of mapping (fingerprint test) was determined as given in materials and methods. 3Exonerate assumes a eukaryotic genome with splicing. Splicing events are introduced by Exonerate if the sequence stretch is short and then an intron (several thousand base pairs) is assumed until the next local match. Thus, in the tough case above, all ESTs mapped (462 from 462) were sliced by Exonerate in several pieces and only short regions were aligned. If there were longer stretches they were mapped to the same region as JANE, but did have shorter length in the alignment.