| Literature DB >> 23773280 |
Nathan M Romine1, Richard J Martin, Jeffrey K Beetham.
Abstract
BACKGROUND: Gene identification and sequence determination are critical requirements for many biological, genomic, and bioinformatic studies. With the advent of next generation sequencing (NGS) technologies, such determinations are predominantly accomplished in silico for organisms for which the genome is known or for which there exists substantial gene sequence information. Without detailed genomic/gene information, in silico sequence determination is not straightforward, and full coding sequence determination typically involves time- and labor-intensive PCR-based amplification and cloning methods.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23773280 PMCID: PMC3689052 DOI: 10.1186/1471-2156-14-55
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1Determination of gene sequences (1) Individual data sets were assembled into contigs using Velvet. (2) BLAST searches for genes of the nAChR-pathway were carried out with a high cutoff (expect value = 1E-10) to identify contigs highly similar to the target genes. (3) Reads were individually mapped (using Bowtie) to high similarity contigs. (4) All paired-reads for which at least one read mapped to a contig (in Step- 3) were identified and binned using a custom Java program. (5) De novo assembly of Step- 4 sequences was performed using Velvet. (6) Iteration of Steps 3-5 was performed until the iteration resulted in no additional reads being mapped to the contig of interest.
Drug-target genes and high similarity initial-assembly contigs at multiple k-mers
| lev-1 | CAB03148 | 1419 | 3 | 148 | 99-219 | 3 | 238 | 174-294 | 5 | 238 | 96-525 | 4 | 237 | 114-381 | |
| lev-8 | CAB01685 | 1596 | 2 | 147 | 126-168 | 1 | 255 | 1 | 108 | 108 | | nd | nd | ||
| unc-29 | CAB02308 | 1482 | 12 | 144 | 105-168 | 7 | 259 | 120- | 8 | 248 | 132-426 | 11 | 202 | 96-402 | |
| unc-38 | CCD69819 | 1524 | 6 | 180 | 111-294 | 9 | 184 | 90-375 | 9 | 283 | 84-1098 | 8 | 339 | 87-1380 | |
| unc-63 | CCD66192 | 1524 | 3 | 272 | 105- | 5 | 247 | 96-510 | 5 | 263 | 105-510 | 5 | 235 | 105-510 | |
| avr-14 | CCD61323 | 1251 | 1 | 90 | 90 | 2 | 112 | 102-123 | 2 | 114 | 105-123 | 4 | 111 | 105-123 | |
| avr-15 | CAB03329 | 1437 | 4 | 163 | 123-210 | 7 | 154 | 123-192 | 6 | 140 | 105-210 | 3 | 185 | 123-240 | |
| ben-1 | CAB00853 | 1335 | 2 | 163 | 108-219 | 1 | 111 | 111 | 1 | 114 | 114 | 3 | 102 | 90-117 | |
| glc-1 | CAB07361 | 1305 | 1 | 123 | 123 | 1 | 108 | 108 | 1 | 318 | 1 | 237 | 237 | ||
| glc-2 | CCD62432 | 1305 | 4 | 198 | 147-318 | 1 | 879 | 879 | 1 | 939 | 3 | 375 | 207-582 | ||
| glc-3 | CCD69051 | 1455 | 3 | 146 | 87-207 | 2 | 115 | 111-120 | 2 | 102 | 93-111 | 4 | 117 | 99-144 | |
| glc-4 | CCD65896 | 1503 | 6 | 192 | 111-279 | 2 | 702 | 249-1155 | 2 | 702 | 249-1155 | 2 | 702 | 249-1155 | |
| | |||||||||||||||
| lev-1 | 5 | 219 | 93-420 | 11 | 150 | 99-330 | 9 | 181 | 114-255 | 8 | 207 | 102-558 | 6 | 334 | 138-657 |
| lev-8 | | | nd | | | nd | | | nd | | | nd | | | nd |
| unc-29 | 8 | 223 | 120-465 | 9 | 171 | 96-378 | 8 | 180 | 96-396 | 2 | 172 | 162-183 | 4 | 173 | 144-207 |
| unc-38 | 7 | 493 | 90- | 7 | 416 | 84- | 6 | 472 | 108- | 4 | 708 | 261- | 3 | 890 | 360- |
| unc-63 | 3 | 227 | 105-297 | 4 | 285 | 105-414 | 3 | 275 | 105-414 | 1 | 414 | 414 | 2 | 277 | 141-414 |
| avr-14 | 3 | 123 | 111-135 | 5 | 145 | 111-207 | 6 | 169 | 123-270 | 5 | 202 | 123-321 | 5 | 240 | 123-327 |
| avr-15 | 5 | 156 | 90-243 | 7 | 153 | 96- | 4 | 132 | 93-213 | 4 | 138 | 96-243 | 4 | 108 | 108-111 |
| ben-1 | 2 | 97 | 93-102 | 4 | 99 | 90-120 | 3 | 99 | 96-102 | 3 | 109 | 87-135 | 5 | 109 | 93-129 |
| glc-1 | 1 | 237 | 237 | | | nd | 2 | 141 | 141-141 | | | nd | | | nd |
| glc-2 | 3 | 261 | 147-393 | 2 | 439 | 135-744 | 2 | 475 | 207-744 | 2 | 387 | 171-603 | 2 | 336 | 333-339 |
| glc-3 | 3 | 112 | 111-114 | 7 | 169 | 93-330 | 7 | 120 | 90-168 | 5 | 145 | 117-213 | 6 | 153 | 111-189 |
| glc-4 | 2 | 702 | 249-1155 | 2 | 702 | 249-1155 | 2 | 702 | 249-1155 | 2 | 702 | 249-1155 | 1 | 972 | 972 |
| | |||||||||||||||
| lev-1 | 3 | 564 | 339-807 | 1 | 1392 | 1392 | 1 | 1401 | 2 | 619 | 345-894 | 3 | 245 | 219-270 | |
| lev-8 | | | nd | | | nd | | | nd | | | nd | | | nd |
| unc-29 | | | nd | | | nd | | | nd | | | nd | | | nd |
| unc-38 | 1 | 1473 | 1 | 1473 | 2 | 868 | 264- | 1 | 1473 | | | nd | |||
| unc-63 | | | nd | | | nd | | | nd | | | nd | | | nd |
| avr-14 | 1 | 390 | 1 | 219 | 219 | | | nd | | | nd | | | nd | |
| avr-15 | 2 | 111 | 111-111 | | | | 1 | 120 | 120 | 1 | 174 | 174 | 1 | 174 | 174 |
| ben-1 | 7 | 100 | 75-120 | 10 | 138 | 90-234 | 16 | 115 | 84-234 | 25 | 114 | 87-348 | 15 | 160 | 90- |
| glc-1 | | | nd | | | nd | | | nd | | | nd | | | nd |
| glc-2 | | | nd | | | nd | | | nd | | | nd | | | nd |
| glc-3 | 3 | 618 | 84- | | | nd | 1 | 378 | 378 | | | nd | | | nd |
| glc-4 | 1 | 1428 | 1 | 1428 | 1 | 972 | 972 | 1 | 972 | 972 | 1 | 1428 | |||
C. elegans target genes (name (ID), GenBank accession number (Acc #) and coding sequence length (CDS len)) used to BLASTx-query a database comprised of the initial de novo library assembly. For each k-mer (e.g. “21 mer HSPs”) are listed in columns the number of high scoring pairs identified (HSP; “#”), the mean HSP length in DNA bases (“”), and the range of HSP-lengths (“R”) with minimum and maximum length shown. Bold HSP-length values indicates the longest HSP identified among all k-mers for a given target gene. “nd” indicate no high-similarity HSPs were identified at that k-mer.
Target gene identification and comparison
| Levamisole target genes (nAChR subunits) | | | | | ||||||
| lev-1 | 1401 | 1401 | 1580 | 1612 | 467 | 477 | 472 | 91 | 100 | GACS01000001 |
| lev-8 | 255 | 255 | 269 | 269 | 85 | 89 | 531 | 85 | 17 | GACS01000002 |
| unc-29 | 480 | 576 | 482 | 614 | 160 | 204 | 493 | 88 | 41 | GACS01000003 |
| unc-38 | 1473 | 1455 | 1607 | 1631 | 491 | 507 | 511 | 72 | 100 | GACS01000004 |
| unc-63 | 576 | 1248 | 578 | 1770 | 192 | 417 | 502 | 82 | 82 | GACS01000005 |
| Macrocyclic lactone, benzimidazole target genes | | | | | ||||||
| avr-14 | 390 | 1299 | 629 | 1485 | 130 | 464 | 416 | 52 | 100 | GACS01000006 |
| avr-15 | 291 | 1242 | 560 | 1531 | 97 | 447 | 478 | 44 | 100 | GACS01000007 |
| ben-1_1 | 453 | 1308 | 453 | 1436 | 151 | 448 | 444 | 94 | 100 | GACS01000008 |
| ben-1_2 | 453 | 1317 | 453 | 1751 | 151 | 448 | 444 | 95 | 100 | GACS01000009 |
| glc-2 | 939 | 1215 | 994 | 1389 | 313 | 424 | 434 | 72 | 100 | GACS01000010 |
| glc-3 | 1383 | 1383 | 1724 | 2048 | 461 | 531 | 484 | 58 | 100 | GACS01000011 |
| glc-4 | 1428 | 1521 | 1628 | 1628 | 476 | 508 | 500 | 77 | 100 | GACS01000012 |
Best initial high scoring pair (HSP) and contig lengths correspond to the longest HSP from the initial read assemblies (see bold values from Table 1) and the contig from which that HSP derives. “% full length” represents the percent of the comparator C. elegans protein that is represented within the final contig. “% ID” represents the percent identity as determined by pairwise alignment.