| Literature DB >> 20678236 |
Abhirami Ratnakumar1, Sean McWilliam, Wesley Barris, Brian P Dalrymple.
Abstract
BACKGROUND: The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs de novo, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters.Entities:
Mesh:
Year: 2010 PMID: 20678236 PMCID: PMC3091654 DOI: 10.1186/1471-2164-11-458
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Programs and parameters used
| ID | |||
|---|---|---|---|
| MBc16 | MegaBlast | 16 | -r 1 -q -1 -X 40 -W 16 |
| MBc12 | MegaBlast | 12 | -r 1 -q -1 -X 40 -W 12 |
| MBc11 | MegaBlast | 11 | -r 1 -q -1 -X 40 -W 11 |
| MBc9 | MegaBlast | 9 | -r 1 -q -1 -X 40 -W 9 |
| MBc8 | MegaBlast | 8 | -r 1 -q -1 -X 40 -W 8 |
| MBd21 | MegaBlast | 11/21 | -t 21 -W 11 -q -3 -r 2 -G 5 -E 2 -N 2 |
| MBd18 | MegaBlast | 11/18 | -t 18 -W 11 -q -3 -r 2 -G 5 -E 2 -N 2 |
| MBd16 | MegaBlast | 11/16 | -t 16 -W 11 -q -3 -r 2 -G 5 -E 2 -N 2 |
| BZd1 | Blastz | 12/19 | K = 4500, L = 4500, M = 50 |
| BZd2 | Blastz | 12/19 | K = 2200, L = 2200, M = 50 |
| BZc1 | Blastz | 8 | K = 2500 L = 2500 M = 50 T = 0 |
| PH | PatternHunter | 11/18 | -db 0 -mi -mj -b 2 -N 1 |
1For discontiguous searches the number of matches is listed first followed by the length of the match seed.
2All MegaBlast searches were run with -D 3 -U T -F 'm D'. No score filter was set to allow for subsequent filtering of matches based on scores.
Figure 1Percentage of total ovine BESs positioned vs. percentage of total ovine BESs in tail-to-tail pairs. A) Full length ovine BESs. B) Ovine BESs trimmed to 240 bases in length. In A) the observed results from running MegaBLAST using a contiguous word size of 8 (MBc8) are shown as blue triangles, and the observed results from running MegaBLAST with a contiguous word size of 11 (MBc11) are shown as blue squares. The transition of the megablast curves in both A) and B) from right to left show the effect of increasing the score cut-off from no score cut-off (right most symbol) to score cut-offs starting at 40 and increasing in increments of 5 up to 100 (left most symbol).
Effect of match reward and mismatch penalty on specificity and sensitivity of contiguous and discontiguous MegaBLAST searches
| % total BESs | % BESs positioned | Parameters | |||
|---|---|---|---|---|---|
| positioned | in tail-to-tail BACs | in tail-to-tail BACs | match reward, -r | mismatch penalty, -q | |
| Contiguous1 | 76.78% | 32.36% | 42.15% | 1 | -1 |
| 77.57% | 31.91% | 41.14% | 2 | -3 | |
| Discontiguous2 | 54.92% | 21.94% | 39.95% | 1 | -1 |
| 51.17% | 22.64% | 44.24% | 2 | -3 | |
1word size 11, with no score filtering
2word size of 11 with a match length of 21 using a discontiguous seed, with no score filtering
Calculation of true and false positive and false negative rates from search results.
| BESs with positions | BESs in tail-to-tail BACs | ||||
|---|---|---|---|---|---|
| 8 | 292,916 | 130,358 | 230,616 | 0.43 | 0.21 |
| 40 | 283,315 | 130,236 | 215,746 | 0.40 | 0.24 |
| 45 | 263,060 | 129,806 | 186,000 | 0.30 | 0.30 |
| 50 | 245,983 | 128,826 | 162,636 | 0.21 | 0.34 |
| 55 | 236,647 | 126,698 | 150,524 | 0.16 | 0.36 |
| 60 | 231,047 | 125,670 | 143,484 | 0.12 | 0.38 |
| 65 | 225,201 | 123,140 | 136,314 | 0.10 | 0.40 |
| 70 | 221,398 | 121,134 | 131,750 | 0.08 | 0.40 |
| 100 | 200,409 | 106,368 | 107,954 | 0.01 | 0.46 |
1results from the MegaBLAST search on full length BESs using a contiguous word size of 8 the default score cut-off is 8, the results in this row correspond to the case where no score cut-off is specified.
2for each score cut-off the number of BESs predicted to be in tail-to-tail BACs assuming all BESs were correctly positioned.
3false positive rate
4false negative rate
Figure 2Distribution of MegaBLAST scores for trimmed ovine BESs vs. the equine and bovine genome assemblies.