| Literature DB >> 18592220 |
Friedhelm Pfeiffer1, Alexander Broicher, Thomas Gillich, Kathrin Klee, José Mejía, Markus Rampp, Dieter Oesterhelt.
Abstract
HaloLex is a software system for the central management, integration, curation, and web-based visualization of genomic and other -omics data for any given microorganism. The system has been employed for the manual curation of three haloarchaeal genomes, namely Halobacterium salinarum (strain R1), Natronomonas pharaonis, and Haloquadratum walsbyi. HaloLex, in particular, enables the integrated analysis of genome-wide proteomic results with the underlying genomic data. This has proven indispensable to generate reliable gene predictions for GC-rich genomes, which, due to their characteristically low abundance of stop codons, are known to be hard targets for standard gene finders, especially concerning start codon assignment. The proteomic identification of more than 600 N-terminal peptides has greatly increased the reliability of the start codon assignment for Halobacterium salinarum. Application of homology-based methods to the published genome of Haloarcula marismortui allowed to detect 47 previously unidentified genes (a problem that is particularly serious for short protein sequences) and to correct more than 300 start codon misassignments.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18592220 PMCID: PMC2516542 DOI: 10.1007/s00203-008-0389-z
Source DB: PubMed Journal: Arch Microbiol ISSN: 0302-8933 Impact factor: 2.552
Fig. 1Screenshot of the search functionality of HaloLex. Example output of a query for all genes of Halobacterium salinarum (R1), which were “reliably” identified by proteomics (indicated in the rightmost column). The complete list of 1,992 identifications was truncated for brevity
Fig. 2Screenshot of the region viewer of HaloLex. Genomic region on the Halobacterium chromosome with ORFs color-coded according to different trust levels of proteomic identification. “Spurious” ORFs (which are hidden by default) are rendered as open symbols
Fig. 3Integrated access to genomic and proteomic data. Montage of different views of the HaloLex web interface on proteomic data. Blue arrows indicate example navigation tracks from a particular spot on a 2D gel image via two different mass-spectra to the identified protein, and its location on the genome, respectively
Fig. 4Expected and actual frequency of stop codons for 425 microbial genomes. For the chromosomes of 425 microbial strains, the expected and the actual number of stop codons was counted and normalized by the total number of codons. Species are sorted along the abscissa by decreasing GC content. For nearly all genomes, the number of actually present stop codons (open circles) is significantly lower than that expected (filled symbols). The small inset shows that for the group of genomes with a GC content >60% (to the left of the dashed vertical line), only 72% of the expected stop codons are found, whereas more than 85% of the expected stop codons are actually present in the group of genomes with a GC content <60% (to the right of the dashed vertical line). The GenBank data for all microbial strains were downloaded from ftp.ncbi.nih.gov/genomes/Bacteria. Only the chromosome (more precisely: the longest replicon) was chosen for each strain and only one representative strain was used for each species
Fig. 5Dinucleotide bias for Halobacterium salinarum. a Counts of dinucleotides in the Halobacterium salinarum chromosome. Dinucleotides are grouped according to the number of G or C residues. Within each group, each dinucleotide is adjacent to its reverse complement (e.g., TC and GA). The four palindromic dinucleotides are indicated by green arrows. For each group, the theoretically expected average (blue line) is compared with the average, which is actually observed (yellow line). b Same as (a) but showing the counts of trinucleotides. Red circles highlight stop codons and blue circles highlight trinucleotides that correspond to arginine codons. c The amino acid composition as computed from the protein-coding gene set (black) and from trinucleotide counts (gray). The over-representation of the acidic amino acids aspartate and to a lesser extent glutamate (red circles) in protein-coding genes contrasts with the over-representation of the basic amino acid arginine, prolines and serines (blue circles) in translations of random stretches of DNA. This is the basis for a strong pI difference between these two sets of ORFs
Fig. 6pI shift around start codons. The distribution of pI values of the 20 N-terminal residues excluding the initial Met (solid line) and the 20 residues of the spurious ORF extension (broken line) that precedes the start codon is plotted for Halobacterium. Transmembrane proteins and proteins with a signal sequence or twin-arginine export motif have been excluded from the analysis. The small inset shows the correlation of the pI value of the N-terminal region of the protein (pI-post, plotted on the x-axis) and the pI value of the spurious ORF extension (pI_pre, plotted on the y-axis). The majority of the N-terminal regions are acidic, while a large fraction of the spurious extensions is highly alkaline
Newly assigned genes in Haloarcula marismortui
| ORF | Length (aa) | Best homolog | Function | Seq id. (%) | Other homologs |
|---|---|---|---|---|---|
| rrnAC0103_A | 75 | NP3662A | rib_prot S28.eR | 86 | OE2664F, HQ2884A |
| rrnAC0208_A | 60 | NP0350A | CHY | 48 | – |
| rrnAC0216_A | 126 | OE2874F | CHY | 42 | HQ1719A |
| rrnAC0301_A | 80 | HQ2541A | Small ZnF | 45 | – |
| rrnAC0669_A | 150 | NP0856A | CHY | 66 | HQ1219A, OE1540R |
| rrnAC0678_A | 53 | NP0788A | Small ZnF | 73 | HQ1109A, OE1789R, HQ2748A, OE7210R |
| rrnAC0696_A | 99 | NP0816A | Small ZnF | 47 | OE1556F |
| rrnAC0797_A | 57 | HQ2892A | rib_prot L37.eR | 92 | OE3141R, NP4310A |
| rrnAC0991_A | 48 | NP2998A | CHY | 79 | OE3047F |
| rrnAC1044_A | 86 | HQ1848A | moaD family protein | 33 | NP2500A, NP5020A, NP3946A, OE3595R |
| rrnAC1515_A | 66 | NP1736A | Small ZnF | 58 | HQ3220A, OE3365R |
| rrnAC1588_A | 146 | OE5063R | IS200-type transposase | 72 | NP4630A, OE1439F, rrAC0815, OE4728F |
| rrnAC1597_A | 61 | NP4882A | rib_prot S14 | 72 | OE3408F, HQ2828A, NP1768A |
| rrnAC1603_A | 94 | NP4870A | RNAseP comp. 1 | 52 | OE3398F, HQ2834A |
| rrnAC1676_A | 44 | NP4282A | CHY | 77 | NP2940A |
| rrnAC1676_B | 122 | HQ1297A | CHY | 60 | – |
| rrnAC1678_A | 100 | HQ1827A | CHY | 59 | – |
| rrnAC1706_A | 141 | NP1764A | CHY | 52 | – |
| rrnAC1831_A | 63 | NP1510A | CHY | 63 | HQ1704A, OE1775R |
| rrnAC1867_A | 52 | HQ1176A | Small ZnF | 69 | OE1435R, NP5316A |
| rrnAC1929_A | 212 | OE3249F | Cob cluster protein | 54 | NP5310A, HQ1412A, NP1896A |
| rrnAC1936_A | 89 | NP3612A | CHY | 48 | – |
| rrnAC1983_A | 231 | NP1896A | CHY | 43 | – |
| rrnAC2105_A | 134 | NP0772A | CHY | 54 | HQ1375A, OE4661R |
| rrnAC2167_A | 96 | NP4084A | CHY | 35 | HQ2323A |
| rrnAC2212_A | 142 | HQ1071A | CHY | 38 | OE2090R |
| rrnAC2268_A | 116 | NP2558A | Transcription regulator | 74 | OE2591R, NP3596A, rrnAC3399, HQ1949A |
| rrnAC2270_A | 59 | HQ1365A | Small ZnF | 68 | NP0928A, OE4676F |
| rrnAC2286_A | 84 | NP5336A | CHY | 43 | – |
| rrnAC2448_A | 54 | NP5086A | CHY | 51 | – |
| rrnAC2530_A | 130 | HQ2261A | CHY | 52 | – |
| rrnAC2569_A | 49 | HQ3677A | Small ZnF | 74 | NP0778A, OE4167C1R, HQ2748A, OE7210R |
| rrnAC2574_A | 52 | NP0788A | Small ZnF | 92 | OE1789R, HQ1109A, HQ2748A |
| rrnAC2592_A | 98 | HQ1034A | CHY | 79 | NP1820A |
| rrnAC2764_A | 111 | NP5102A | CHY | 59 | HQ3659A, OE4054F |
| rrnAC2791_A | 115 | NP0196A | CHY | 70 | HQ3647A, OE3914R |
| rrnAC2834_A | 129 | OE4148F | CHY | 31 | HQ3411A |
| rrnAC2897_A | 73 | OE4475R | Small ZnF | 58 | HQ3437A, NP0708A |
| rrnAC2982_A | 137 | HQ2813A | CHY | 42 | HQ1789A, HQ2547A, pNG7092, NP1808A |
| rrnAC3115_A | 57 | NP0186A | rib_prot HL32 | 75 | HQ3421A |
| rrnB0024_A | 139 | OE6004F | Small ZnF | 64 | NP6252A, HQ1149A |
| rrnB0146_A | 118 | OE1549F | CHY | 54 | NP1698A, HQ1429A, OE3894R |
| rrnB0177_A | 89 | HQ2065A | CHY | 50 | – |
| pNG3034_A | 47 | NP4282A | CHY | 80 | NP2940A |
| pNG6117_A | 85 | OE6052R | CHY | 59 | – |
| pNG6164_A | 115 | OE6242R | CHY | 79 | – |
| pNG6170_A | 53 | NP0788A | CHY | 75 | OE1789R, HQ1109A, HQ2748A, OE7210R |
Using tblastN, previously unannotated genes were detected and realized by the manual curation options within HaloLex. For each newly assigned gene, its code, length, the best homolog (with a brief function indication and percentage of sequence identity) and other homologous genes are given. Codes are systematically assigned using the number of the upstream ORF and a letter attached with an intervening underscore (commonly _A). Function assignment abbreviations: CHY conserved hypothetical protein, rib_prot ribosomal protein, small ZnF small CPxCG-related zinc finger protein (Tarasov et al. 2008)
Genes with corrected start codon assignments in Haloarcula marismortui
| ORF | Corrected length (aa) | Original length (aa) | Direction of change | Homologous ORFs |
|---|---|---|---|---|
| rrnAC0004 | 1,551 | 1,356 | Extended | NP4364A, OE3175F, HQ3018A |
| rrnAC0005 | 624 | 360 | Extended | OE2052F, NP3952A |
| rrnAC0012 | 957 | 1,083 | Shortened | HQ1987A, NP4816A |
| rrnAC0041 | 279 | 339 | Shortened | NP3812A, HQ1503A, OE2950R |
| rrnAC0053 | 864 | 717 | Extended | NP2042A, HQ3014A |
| rrnAC0080 | 1,347 | 1,383 | Shortened | NP3698A, OE2648F, HQ2890A |
| rrnAC0083 | 936 | 654 | Extended | NP1382A, HQ2430A |
| rrnAC0101 | 2,238 | 2,328 | Shortened | OE2656R, NP3690A |
| rrnAC0115 | 774 | 909 | Shortened | NP4142A, OE2472F |
| rrnAC0137 | 882 | 993 | Shortened | OE2860R, NP3116A, rrnAC1236 |
| rrnAC0145 | 1,125 | 843 | Extended | OE4359F, HQ1341A |
| rrnAC0171 | 984 | 1,080 | Shortened | OE1599F |
| rrnAC0178 | 1,812 | 1,725 | Extended | OE1613R |
| rrnAC0181 | 1,866 | 1,413 | Extended | NP4200A, OE3010F, HQ2528A |
| rrnAC0198 | 1,236 | 1,305 | Shortened | NP1646A |
| rrnAC0199 | 480 | 537 | Shortened | NP1296A |
| rrnAC0213 | 951 | 726 | Extended | HQ1261A |
| rrnAC0215 | 723 | 522 | Extended | NP0356A, HQ2398A, OE1636F |
| rrnAC0239 | 999 | 1,110 | Shortened | NP1902A, OE2918F |
| rrnAC0240 | 333 | 372 | Shortened | OE3652F, HQ2230A |
| rrnAC0249 | 879 | 1,017 | Shortened | OE3606R, NP3184A |
| rrnAC0261 | 267 | 300 | Shortened | HQ2898A, NP3686A, OE3683R |
| rrnAC0280 | 333 | 381 | Shortened | NP2580A, HQ2722A |
| rrnAC0284 | 447 | 225 | Extended | NP2596A, HQ2724A |
| rrnAC0304 | 1,104 | 1,155 | Shortened | OE2360R, NP5226A |
| rrnAC0305 | 957 | 1,011 | Shortened | OE2551F, NP3722A |
| rrnAC0322 | 1,176 | 1,203 | Shortened | NP3076A, OE2763F |
| rrnAC0324 | 411 | 153 | Extended | NP3362A, OE4451F |
| rrnAC0329 | 1,245 | 1,035 | Extended | NP2206A, HQ2700A |
| rrnAC0374 | 414 | 324 | Extended | NP2642A, OE2237F, HQ2615A |
| rrnAC0394 | 1,008 | 1,056 | Shortened | NP1082A, OE4339R, HQ3696A |
| rrnAC0426 | 1,386 | 1,464 | Shortened | pNG7203, NP0964A, HQ3464A |
| rrnAC0430 | 936 | 975 | Shortened | HQ2692A, NP1916A, OE2547R |
| rrnAC0436 | 1,530 | 1,605 | Shortened | OE2288F, NP2702A, HQ2668A |
| rrnAC0481 | 1,002 | 894 | Extended | NP2798A |
| rrnAC0494 | 249 | 306 | Shortened | NP4192A, OE1860F, HQ1556A |
| rrnAC0497 | 708 | 471 | Extended | HQ1562A, OE1794R |
| rrnAC0505 | 1,710 | 1,485 | Extended | OE3490R, NP1742A, HQ3347A |
| rrnAC0506 | 1,527 | 1,575 | Shortened | NP3956A, OE2049R |
| rrnAC0536 | 573 | 606 | Shortened | NP2906A, HQ2751A |
| rrnAC0546 | 1,791 | 1,833 | Shortened | HQ1573A, OE1495R, NP1746A |
| rrnAC0568 | 699 | 435 | Extended | HQ1615A, rrnAC3127, NP1996A |
| rrnAC0572 | 804 | 855 | Shortened | HQ3669A, rrnAC2557, NP0792A, OE3115F |
| rrnAC0589 | 657 | 846 | Shortened | rrnAC2321, OE8048F |
| rrnAC0617 | 1,335 | 1,413 | Shortened | NP0212A, HQ2634A, OE8010R |
| rrnAC0619 | 1,272 | 963 | Extended | HQ3141A, OE4634F, NP0578A |
| rrnAC0620 | 1,566 | 1,431 | Extended | NP4066A, HQ2635A, OE7174R |
| rrnAC0628 | 912 | 957 | Shortened | NP4072A, HQ2637A, OE1748R |
| rrnAC0629 | 738 | 774 | Shortened | NP4074A, HQ2638A, OE1752F |
| rrnAC0631 | 1,122 | 642 | Extended | NP0380A, OE1582R, HQ1531A |
| rrnAC0633 | 969 | 1,008 | Shortened | NP0384A, OE1578F, HQ1670A |
| rrnAC0638 | 474 | 606 | Shortened | HQ3692A, NP2228A |
| rrnAC0651 | 1,191 | 1,059 | Extended | OE4393R, NP0888A |
| rrnAC0655 | 603 | 657 | Shortened | NP1390A, HQ1666A |
| rrnAC0660 | 333 | 417 | Shortened | NP4090A, HQ2743A, OE1651F |
| rrnAC0663 | 1,029 | 240 | Extended | NP0372A, OE1646R, HQ2392A |
| rrnAC0666 | 708 | 582 | Extended | NP0100A, HQ3478A, OE1004F |
| rrnAC0674 | 810 | 945 | Shortened | rrnAC0848, OE7042R, rrnAC2044, HQ2141A, NP6028A |
| rrnAC0687 | 1,269 | 1,347 | Shortened | OE4207F |
| rrnAC0696 | 762 | 900 | Shortened | NP0818A, OE1554R |
| rrnAC0717 | 636 | 699 | Shortened | NP1230A, OE1713F, HQ1537A |
| rrnAC0721 | 1,371 | 1,395 | Shortened | OE3467R, HQ1298A |
| rrnAC0753 | 879 | 936 | Shortened | OE2785R, HQ2762A |
| rrnAC0777 | 903 | 492 | Extended | HQ2440A |
| rrnAC0779 | 861 | 915 | Shortened | OE2138F, NP1596A |
| rrnAC0801 | 951 | 1,071 | Shortened | NP4302A, OE3145F, HQ2933A |
| rrnAC0825 | 1,284 | 1,383 | Shortened | NP4134A, HQ2196A |
| rrnAC0833 | 795 | 858 | Shortened | NP1462A, OE1641R, HQ2394A |
| rrnAC0838 | 1,779 | 1,827 | Shortened | NP2726A, HQ1873A, OE2653R |
| rrnAC0841 | 954 | 1,062 | Shortened | NP2730A, OE2561R, HQ1874A |
| rrnAC0843 | 1,524 | 1,587 | Shortened | NP2738A, OE2555R, HQ2402A |
| rrnAC0852 | 1,017 | 966 | Extended | OE3343R |
| rrnAC0875 | 543 | 405 | Extended | OE3121R, HQ2788A |
| rrnAC0878 | 810 | 783 | Extended | OE3119R, NP4334A |
| rrnAC0883 | 1,116 | 504 | Extended | NP4236A |
| rrnAC0896 | 1,152 | 1,293 | Shortened | HQ1590A, OE2358F, NP3650A |
| rrnAC0917 | 1,065 | 1,140 | Shortened | HQ1663A, OE1669F |
| rrnAC0925 | 990 | 1,035 | Shortened | NP2796A, OE2451R |
| rrnAC0934 | 423 | 471 | Shortened | NP2710A, OE2005F, HQ2301A |
| rrnAC0942 | 1,611 | 1,416 | Extended | OE3436R |
| rrnAC0944 | 1,302 | 1,350 | Shortened | HQ1663A, OE1669F |
| rrnAC0956 | 462 | 537 | Shortened | HQ1497A, OE2934R |
| rrnAC1042 | 1,806 | 1,851 | Shortened | rrnAC1570, HQ3533A |
| rrnAC1083 | 1,965 | 2,010 | Shortened | NP4322A, OE2871F |
| rrnAC1106 | 519 | 420 | Extended | NP4198A, OE2985F, HQ2561A |
| rrnAC1107 | 1,476 | 1,176 | Extended | NP4904A, HQ1686A |
| rrnAC1115 | 270 | 324 | Shortened | NP4036A, OE2903R, HQ2458A |
| rrnAC1138 | 849 | 507 | Extended | OE2020F, NP1592A |
| rrnAC1169 | 1,311 | 867 | Extended | NP3742A, OE2827R, HQ2339A |
| rrnAC1218 | 1,794 | 1,821 | Shortened | HQ1754A |
| rrnAC1220 | 663 | 606 | Extended | HQ1752A |
| rrnAC1261 | 1,155 | 1,182 | Shortened | NP4050A, HQ2389A |
| rrnAC1263 | 798 | 894 | Shortened | OE2913R, NP3970A |
| rrnAC1281 | 2,529 | 2,592 | Shortened | OE2573F, NP1526A |
| rrnAC1299 | 1,881 | 1,704 | Extended | OE1143R, HQ3344A, NP1442A |
| rrnAC1308 | 339 | 399 | Shortened | NP2066A, HQ1665A, OE1673F |
| rrnAC1336 | 399 | 426 | Shortened | NP4972A |
| rrnAC1341 | 1,683 | 1,800 | Shortened | NP0164A |
| rrnAC1350 | 1,050 | 1,191 | Shortened | NP3216A, OE1906R, HQ2500A |
| rrnAC1361 | 687 | 498 | Extended | OE2276F, NP2980A, HQ1692A |
| rrnAC1365 | 1,026 | 1,161 | Shortened | rrnAC0576 |
| rrnAC1377 | 582 | 777 | Shortened | rrnAC0508, NP3954A |
| rrnAC1383 | 210 | 342 | Shortened | NP1548A |
| rrnAC1395 | 849 | 1,029 | Shortened | NP4160A |
| rrnAC1438 | 429 | 351 | Extended | NP3220A, OE2139R, HQ1579A |
| rrnAC1443 | 1,275 | 1,398 | Shortened | NP3228A, HQ1584A, OE2149R |
| rrnAC1444 | 1,623 | 1,746 | Shortened | HQ1934A, HQ2096A, pNG7256 |
| rrnAC1447 | 414 | 534 | Shortened | NP2292A, HQ1637A, OE1953F |
| rrnAC1454 | 303 | 207 | Extended | OE1963F, HQ1645A, NP2308A |
| rrnAC1477 | 1,047 | 1,251 | Shortened | OE2014F, HQ2353A |
| rrnAC1497 | 1,431 | 1,707 | Shortened | NP4594A, OE3274R |
| rrnAC1500 | 1,092 | 1,155 | Shortened | OE3278R, NP4774A |
| rrnAC1504 | 846 | 528 | Extended | NP4780A, HQ2866A, OE3286F |
| rrnAC1516 | 189 | 489 | Shortened | OE3330F |
| rrnAC1530 | 765 | 459 | Extended | NP1786A, HQ3174A |
| rrnAC1532 | 711 | 579 | Extended | NP1788A, OE3352R, HQ3173A |
| rrnAC1536 | 1,332 | 1,395 | Shortened | HQ1685A, NP4902A, OE5298F |
| rrnAC1542 | 1,416 | 1,644 | Shortened | OE1133F |
| rrnAC1567 | 291 | 474 | Shortened | HQ2131A |
| rrnAC1588 | 1,308 | 882 | Extended | OE5062R |
| rrnAC1621 | 570 | 459 | Extended | HQ2801A, OE3367F |
| rrnAC1626 | 2,532 | 2,808 | Shortened | rrnAC2044, rrnAC0848, OE7042R, HQ2141A |
| rrnAC1628 | 366 | 216 | Extended | OE3324R, HQ2783A, NP3352A |
| rrnAC1630 | 591 | 411 | Extended | OE2334R, NP0028A |
| rrnAC1638 | 975 | 1,107 | Shortened | NP1214A |
| rrnAC1647 | 402 | 222 | Extended | NP1834A |
| rrnAC1655 | 750 | 675 | Extended | NP1082A, OE4339R, HQ3696A |
| rrnAC1665 | 642 | 735 | Shortened | NP2884A |
| rrnAC1669 | 279 | 351 | Shortened | OE1371R, HQ1283A, NP5232A |
| rrnAC1680 | 390 | 429 | Shortened | NP0612A, OE1371R, HQ1286A |
| rrnAC1690 | 1,545 | 1,509 | Extended | NP0624A, HQ1292A |
| rrnAC1702 | 1,719 | 1,218 | Extended | NP1742A, OE3490R, HQ3347A |
| rrnAC1708 | 1,347 | 1,089 | Extended | NP4502A, HQ3336A, OE3496R |
| rrnAC1718 | 1,359 | 1,065 | Extended | NP4542A, OE3506F, HQ3330A |
| rrnAC1726 | 1,527 | 1,485 | Extended | HQ3326A, OE3511F, NP4534A |
| rrnAC1743 | 549 | 486 | Extended | HQ1673A, NP5358A, rrnAC3526 |
| rrnAC1764 | 924 | 978 | Shortened | rrnAC1777, NP5168A, OE1385F, HQ1277A |
| rrnAC1774 | 339 | 411 | Shortened | HQ1279A, OE1379R |
| rrnAC1776 | 588 | 633 | Shortened | NP5166A, OE1384F, HQ1278A |
| rrnAC1779 | 651 | 516 | Extended | NP5170A |
| rrnAC1782 | 933 | 963 | Shortened | HQ1276A, NP5174A, OE4651F |
| rrnAC1797 | 858 | 903 | Shortened | NP4932A, OE3445F, rrnAC0317 |
| rrnAC1809 | 735 | 444 | Extended | OE1445R, NP1134A |
| rrnAC1812 | 705 | 525 | Extended | OE1451F, HQ1168A, NP1178A |
| rrnAC1822 | 717 | 666 | Extended | NP1636A, OE1793F, HQ1712A |
| rrnAC1826 | 522 | 237 | Extended | NP1498A, HQ1709A, OE1785F |
| rrnAC1840 | 2,676 | 2,727 | Shortened | NP1516A, OE1770F, HQ1701A |
| rrnAC1849 | 1,746 | 1,083 | Extended | HQ1189A, NP5206A |
| rrnAC1853 | 3,003 | 2,616 | Extended | NP5214A, HQ1185A |
| rrnAC1855 | 891 | 330 | Extended | NP5218A, OE1417F, HQ1183A |
| rrnAC1867 | 522 | 756 | Shortened | HQ1177A, OE1434R, NP5318A |
| rrnAC1870 | 1,455 | 1,209 | Extended | NP4702A, OE3960F |
| rrnAC1880 | 1,239 | 1,284 | Shortened | NP0438A, OE3971R, rrnAC3166 |
| rrnAC1905 | 429 | 279 | Extended | NP4526A, pNG6069 |
| rrnAC1930 | 924 | 366 | Extended | OE3253F, NP5308A, HQ1411A |
| rrnAC1931 | 804 | 504 | Extended | HQ1410A, NP5306A, OE3255F |
| rrnAC1950 | 1,893 | 1,680 | Extended | NP0158A, HQ1329A |
| rrnAC1957 | 1,491 | 1,671 | Shortened | HQ2578A |
| rrnAC1979 | 795 | 591 | Extended | NP1462A, OE1641R |
| rrnAC1983 | 1,218 | 1,755 | Shortened | NP1754A, OE2013R |
| rrnAC1992 | 738 | 591 | Extended | NP1470A |
| rrnAC2014 | 792 | 747 | Extended | NP5122A, OE1306F, HQ1416A |
| rrnAC2085 | 360 | 522 | Shortened | NP0342A, OE4713R, HQ3071A |
| rrnAC2098 | 1,878 | 1,905 | Shortened | NP0198A, OE4671R, HQ1369A |
| rrnAC2105 | 522 | 1,014 | Shortened | NP0774A, OE4663F, HQ1347A |
| rrnAC2127 | 1,005 | 711 | Extended | NP0962A |
| rrnAC2129 | 864 | 894 | Shortened | OE4355R, NP3186A |
| rrnAC2158 | 1,974 | 2,034 | Shortened | NP0404A, OE4613F, HQ3117A |
| rrnAC2159 | 462 | 609 | Shortened | NP0954A, HQ3116A, OE4610R |
| rrnAC2181 | 1,098 | 1,227 | Shortened | NP1140A, OE4571R, HQ1074A |
| rrnAC2221 | 435 | 579 | Shortened | OE4541F, NP1718A, HQ1065A, rrnAC2455 |
| rrnAC2223 | 372 | 387 | Shortened | NP1710A, OE4544R, HQ1063A |
| rrnAC2245 | 1,137 | 870 | Extended | OE4034R, HQ3066A, NP0030A |
| rrnAC2247 | 1,296 | 1,443 | Shortened | NP1050A |
| rrnAC2258 | 624 | 435 | Extended | NP0018A |
| rrnAC2261 | 954 | 1,026 | Shortened | OE1151R, NP0014A, HQ1359A |
| rrnAC2278 | 528 | 600 | Shortened | NP3368A, HQ2565A, rrnAC0868, OE2992R |
| rrnAC2284 | 1,038 | 993 | Extended | NP5368A, OE2438R |
| rrnAC2352 | 1,167 | 1,188 | Shortened | pNG7026, OE5170F, HQ1989A |
| rrnAC2356 | 999 | 1,251 | Shortened | NP5048A, HQ1275A, OE4196R |
| rrnAC2359 | 432 | 180 | Extended | NP4806A, rrnAC0738, OE3162F, HQ2346A |
| rrnAC2377 | 1,281 | 924 | Extended | NP0578A, OE4634F, HQ3141A |
| rrnAC2440 | 684 | 780 | Shortened | NP0956A, OE4360R, HQ3733A |
| rrnAC2460 | 1,977 | 2,127 | Shortened | NP1264A, HQ3402A, OE4140R |
| rrnAC2469 | 1,299 | 1,422 | Shortened | HQ2809A, HQ2192A |
| rrnAC2473 | 1,524 | 1,569 | Shortened | OE2133R, NP3020A |
| rrnAC2474 | 288 | 351 | Shortened | NP1258A, OE4136R, HQ3399A |
| rrnAC2476 | 936 | 582 | Extended | NP1312A, OE4133R |
| rrnAC2518 | 1,266 | 1,221 | Extended | NP1318A, OE3943R, HQ3056A |
| rrnAC2525 | 1,026 | 735 | Extended | HQ1021A, OE4201R |
| rrnAC2526 | 390 | 477 | Shortened | NP1272A, HQ2001A, OE4217R |
| rrnAC2529 | 741 | 906 | Shortened | NP1268A, OE4218F, HQ1025A |
| rrnAC2532 | 2,766 | 3,069 | Shortened | NP0538A, OE1272R, HQ1460A |
| rrnAC2550 | 2,154 | 2,241 | Shortened | OE1267R, NP0536A, HQ1456A |
| rrnAC2558 | 1,863 | 1,443 | Extended | OE3889R, HQ3102A, NP1576A |
| rrnAC2565 | 978 | 1,038 | Shortened | HQ3671A, NP0900A, OE4195F |
| rrnAC2582 | 699 | 870 | Shortened | NP1406A, HQ1040A, OE4235F |
| rrnAC2586 | 1,200 | 1,065 | Extended | HQ3704A, NP1412A, OE4236F |
| rrnAC2592 | 975 | 1,281 | Shortened | HQ1035A, NP1818A |
| rrnAC2627 | 2,061 | 2,106 | Shortened | NP1344A, HQ2213A |
| rrnAC2629 | 804 | 864 | Shortened | NP5160A |
| rrnAC2630 | 858 | 993 | Shortened | OE4085R, NP0606A, HQ3650A |
| rrnAC2633 | 1,356 | 888 | Extended | rrnAC0404, OE3070R |
| rrnAC2636 | 624 | 720 | Shortened | NP5088A, OE3906F |
| rrnAC2642 | 738 | 246 | Extended | NP5114A, HQ2624A, OE2740F |
| rrnAC2656 | 1,125 | 1,182 | Shortened | HQ2450A, OE2317R |
| rrnAC2657 | 1,194 | 1,062 | Extended | OE5132F, rrnB0290, NP1412A |
| rrnAC2714 | 1,569 | 1,179 | Extended | NP0482A, HQ1003A, OE4390F |
| rrnAC2722 | 510 | 558 | Shortened | NP0462A, HQ3640A, OE4429F |
| rrnAC2748 | 435 | 582 | Shortened | NP5152A, HQ3265A, OE4027F |
| rrnAC2749 | 453 | 186 | Extended | NP5150A, OE4028R, HQ3266A |
| rrnAC2753 | 648 | 687 | Shortened | HQ1473A |
| rrnAC2754 | 366 | 417 | Shortened | NP5146A, OE4039F |
| rrnAC2755 | 1,206 | 1,257 | Shortened | OE4034R, HQ3066A, NP0030A |
| rrnAC2756 | 2,046 | 1,770 | Extended | NP5144A, HQ3065A, OE4041F |
| rrnAC2761 | 1,143 | 960 | Extended | OE2170R, HQ2450A |
| rrnAC2772 | 1,521 | 1,617 | Shortened | NP1074A, HQ2660A |
| rrnAC2776 | 1,488 | 1,260 | Extended | pNG7305, OE2076F, HQ2506A |
| rrnAC2780 | 1,356 | 1,443 | Shortened | NP4376A, HQ3643A, OE3922R |
| rrnAC2781 | 894 | 711 | Extended | NP0190A, OE3921F, HQ3644A |
| rrnAC2782 | 1,236 | 1,140 | Extended | NP0192A |
| rrnAC2783 | 1,437 | 1,296 | Extended | NP5292A, OE2063R |
| rrnAC2798 | 552 | 393 | Extended | OE3905F, HQ1379A, NP0086A |
| rrnAC2800 | 612 | 699 | Shortened | NP0092A, OE3902R, HQ1377A |
| rrnAC2804 | 516 | 447 | Extended | NP1700A, OE3895F |
| rrnAC2806 | 381 | 228 | Extended | NP1698A, HQ1429A, OE3894R |
| rrnAC2810 | 909 | 990 | Shortened | OE3892R, NP1688A, HQ3137A |
| rrnAC2811 | 1,905 | 1,128 | Extended | OE3889R, HQ3102A, NP1576A |
| rrnAC2818 | 1,344 | 1,479 | Shortened | NP2252A, HQ3104A, OE3882R |
| rrnAC2822 | 951 | 309 | Extended | NP2248A, OE3879F, HQ3106A |
| rrnAC2831 | 540 | 411 | Extended | NP5076A, HQ1339A, OE3871R |
| rrnAC2834 | 1,032 | 1,677 | Shortened | NP1066A, OE4144R, HQ1407A |
| rrnAC2836 | 1,206 | 1,359 | Shortened | HQ2439A, NP3538A |
| rrnAC2851 | 744 | 786 | Shortened | NP0554A, OE4165R, HQ3686A |
| rrnAC2857 | 2,196 | 2,238 | Shortened | NP1350A, OE4181R, HQ3684A |
| rrnAC2859 | 630 | 678 | Shortened | NP1332A, HQ3517A |
| rrnAC2867 | 1,467 | 1,353 | Extended | OE4370R, NP5292A |
| rrnAC2870 | 1,092 | 951 | Extended | HQ3382A, OE4359F |
| rrnAC2891 | 1,698 | 1,773 | Shortened | NP1008A, OE4122R, HQ3049A |
| rrnAC2893 | 975 | 873 | Extended | NP0698A, HQ3439A, OE2975F |
| rrnAC2901 | 1,728 | 1,278 | Extended | NP0898A, OE4471R |
| rrnAC2933 | 720 | 558 | Extended | NP0238A, HQ1390A |
| rrnAC2937 | 2,844 | 2,895 | Shortened | OE1286R, NP0232A |
| rrnAC3005 | 927 | 999 | Shortened | NP1116A, HQ3034A, OE3214F |
| rrnAC3008 | 1,158 | 723 | Extended | NP1114A, HQ3033A, OE3216F |
| rrnAC3046 | 1,104 | 1,146 | Shortened | NP0884A, HQ2919A |
| rrnAC3050 | 2,199 | 2,514 | Shortened | rrnAC0848, rrnAC2044, HQ2141A, NP6028A |
| rrnAC3062 | 954 | 1,032 | Shortened | rrnAC1698, OE1358R, HQ2259A |
| rrnAC3071 | 915 | 1,149 | Shortened | OE3959R, HQ3234A, NP5036A |
| rrnAC3074 | 699 | 963 | Shortened | NP0072A, HQ3236A, OE3964R |
| rrnAC3079 | 573 | 624 | Shortened | NP3402A |
| rrnAC3083 | 375 | 423 | Shortened | NP0948A, OE4292F, HQ3465A |
| rrnAC3100 | 2,010 | 2,049 | Shortened | NP2262A, HQ1094A, OE3832F |
| rrnAC3121 | 651 | 1,194 | Shortened | HQ1495A |
| rrnAC3130 | 939 | 645 | Extended | HQ1618A, rrnB0227 |
| rrnAC3132 | 1,392 | 1,419 | Shortened | HQ1619A, rrnAC2624 |
| rrnAC3137 | 1,047 | 798 | Extended | NP0860A, HQ3045A, OE4446R |
| rrnAC3167 | 687 | 750 | Shortened | OE1188F, rrnAC1953 |
| rrnAC3182 | 627 | 666 | Shortened | NP3516A, rrnAC1228 |
| rrnAC3198 | 1,215 | 1,044 | Extended | NP1072A |
| rrnAC3210 | 1,311 | 1,356 | Shortened | NP4992A, OE3792F, HQ3101A |
| rrnAC3214 | 864 | 936 | Shortened | HQ3098A, OE3787R, NP2524A |
| rrnAC3226 | 1,671 | 1,599 | Extended | NP5136A |
| rrnAC3236 | 1,383 | 420 | Extended | OE1018F, rrnAC1586 |
| rrnAC3256 | 546 | 663 | Shortened | NP3054A, HQ3084A, OE3752R |
| rrnAC3268 | 678 | 498 | Extended | NP5010A, OE3731R, HQ3131A |
| rrnAC3272 | 981 | 834 | Extended | NP5006A, HQ3129A, OE3735F |
| rrnAC3279 | 1,107 | 1,032 | Extended | NP2398A, OE3722F, HQ3125A |
| rrnAC3302 | 786 | 735 | Extended | OE3439F, HQ2249A |
| rrnAC3328 | 531 | 621 | Shortened | NP1380A, OE1858F, HQ1684A |
| rrnAC3342 | 891 | 945 | Shortened | NP4916A, OE3430F, HQ2764A |
| rrnAC3345 | 453 | 531 | Shortened | HQ1964A, rrnAC2948, OE2717R |
| rrnAC3348 | 876 | 963 | Shortened | NP4524A, HQ3322A, OE3531R |
| rrnAC3352 | 609 | 765 | Shortened | NP4518A, OE3537R, HQ2230A |
| rrnAC3371 | 2,361 | 1,971 | Extended | pNG2034, NP3562A, HQ1851A, OE5286R |
| rrnAC3385 | 573 | 396 | Extended | OE3854R, NP2906A |
| rrnAC3394 | 444 | 525 | Shortened | NP1284A, rrnB0323 |
| rrnAC3420 | 843 | 885 | Shortened | OE3633R, NP2432A, HQ3036A |
| rrnAC3450 | 366 | 399 | Shortened | OE3588C1R, NP2486A, HQ3026A |
| rrnAC3452 | 894 | 951 | Shortened | NP2484A, OE3586R, HQ3027A |
| rrnAC3462 | 1,929 | 2,019 | Shortened | NP2410A, OE3580R, HQ2579A |
| rrnAC3475 | 1,113 | 594 | Extended | NP6258A |
| rrnAC3486 | 636 | 687 | Shortened | NP4286A |
| rrnAC3509 | 624 | 471 | Extended | OE1814R, HQ1569A |
| rrnAC3528 | 459 | 477 | Shortened | rrnAC3526, pNG2015 |
| rrnAC3536 | 276 | 153 | Extended | NP0840A, OE1853R |
| rrnAC3537 | 1,089 | 681 | Extended | HQ1674A, OE1854R, NP0842A |
| rrnAC3551 | 1,026 | 750 | Extended | NP2032A, HQ3010A, rrnAC1926, OE5141R |
| rrnB0092 | 1,533 | 1,599 | Shortened | HQ1972A |
| rrnB0172 | 591 | 507 | Extended | NP1842A |
| rrnB0198 | 1,530 | 1,623 | Shortened | NP0556A, OE4115F |
| rrnB0242 | 1,173 | 1,287 | Shortened | NP2606A, HQ2307A |
| rrnB0257 | 834 | 888 | Shortened | NP4244A, OE1942F |
| rrnB0265 | 1,683 | 1,851 | Shortened | NP4242A |
| rrnB0266 | 1,404 | 1,011 | Extended | NP3416A |
| rrnB0275 | 1,101 | 1,065 | Extended | rrnAC0899, OE4576F |
| rrnB0325 | 1,107 | 1,278 | Shortened | OE5142F, NP2128A, rrnAC3284, HQ3147A |
| pNG2007 | 939 | 1,119 | Shortened | OE4023F, NP1282A, rrnAC2744, HQ3263A |
| pNG2015 | 516 | 615 | Shortened | rrnAC3526, NP1956A |
| pNG4017 | 1,254 | 1,125 | Extended | NP2168A, rrnAC2207, OE2401F |
| pNG4035 | 561 | 516 | Extended | OE3768F, NP5358A |
| pNG5001 | 1,134 | 1,104 | Extended | rrnAC0252, NP0102A, HQ1815A, OE1005F |
| pNG5004 | 1,488 | 1,068 | Extended | rrnAC0250 |
| pNG5010 | 1,632 | 879 | Extended | OE5248F, HQ1543A, NP2464A |
| pNG5131 | 633 | 579 | Extended | rrnAC3384, OE4753R |
| pNG5139 | 1,251 | 312 | Extended | HQ2051A, NP6268A, OE1070R |
| pNG6047 | 591 | 618 | Shortened | HQ1118A, OE2691R, rrnAC0503, NP2664A |
| pNG6054 | 423 | 1,047 | Shortened | NP5022A |
| pNG6069 | 417 | 444 | Shortened | NP4526A, rrnAC1905 |
| pNG6075 | 378 | 477 | Shortened | OE7144R, NP3058A |
| pNG6092 | 294 | 324 | Shortened | pNG6058, OE7057F, NP3002A, HQ2407A |
| pNG6120 | 861 | 921 | Shortened | OE5424R |
| pNG6141 | 615 | 585 | Extended | OE5415R |
| pNG7012 | 1,281 | 1,026 | Extended | OE1077R, rrnAC3239, HQ2680A, NP2322A |
| pNG7037 | 1,488 | 1,725 | Shortened | NP5056A |
| pNG7040 | 1,182 | 999 | Extended | pNG7041, OE4576F |
| pNG7050 | 750 | 705 | Extended | HQ3696A, rrnAC0479, NP1198A, OE3661F |
| pNG7058 | 528 | 501 | Extended | OE1252R, HQ2374A, NP1606A |
| pNG7060 | 1,272 | 1,302 | Shortened | NP6204A |
| pNG7066 | 1,017 | 1,107 | Shortened | OE2128F, HQ2746A, NP1386A |
| pNG7078 | 1,071 | 1,242 | Shortened | NP1388A, HQ1592A |
| pNG7081 | 399 | 363 | Extended | HQ4010A |
| pNG7101 | 1,971 | 2,004 | Shortened | HQ1729A |
| pNG7106 | 897 | 819 | Extended | OE2497F, HQ2422A, NP1346A |
| pNG7178 | 984 | 834 | Extended | HQ2189A |
| pNG7227 | 747 | 612 | Extended | NP0054A, HQ1091A, OE3843F |
| pNG7244 | 2,481 | 2,166 | Extended | pNG7246, HQ1944A |
| pNG7252 | 1,659 | 1,788 | Shortened | OE2316R, rrnAC2655, HQ2451A |
| pNG7278 | 1,041 | 1,155 | Shortened | OE4674F, HQ1124A |
| pNG7280 | 540 | 489 | Extended | pNG6134, NP5298A |
| pNG7297 | 603 | 630 | Shortened | NP0672A |
| pNG7321 | 381 | 408 | Shortened | HQ1769A, pNG7235, OE3930R, NP0566A |
| pNG7327 | 1,512 | 1,572 | Shortened | HQ1784A, NP0802A, OE1568F |
| pNG7342 | 1,098 | 1,128 | Shortened | pNG7026, OE5170F |
| pNG7351 | 1,065 | 1,089 | Shortened | rrnAC0191, NP1260A, OE4674F, HQ3648A |
| pNG7377 | 432 | 324 | Extended | HQ3372A |
| pNG7380 | 1,746 | 1,932 | Shortened | HQ1768A |
Using our semiautomatic checking procedure, candidate genes with probable errors in start codon assignment were identified and subjected to manual curation. When sufficiently strong evidences were found, the start codon was reassigned using the manual curation options of HaloLex. For each gene in the list (first column), we provide the corrected (second column) and original length (third column) of the amino-acid sequence, and the set of homologous genes that support our decision for the new start codon assignment (fifth column). The redundant fourth column facilitates a quick overview of whether sequences were extended or shortened with respect to their original length
Fig. 7Homology-based start codon checking for the detection of ORFs, which are too short. A sequence alignment of four homologous proteins of H. salinarum (strains R1 and NRC-1), N. pharaonis, H. walsbyi and H. marismortui is shown. Codes starting with OE are from H. salinarum strain R1, those with VNG from strain NRC-1, NP from N. pharaonis, HQ from H. walsbyi and those starting with rrnAC from H. marismortui. Uppercase letters indicate the protein sequence as obtained from the current database, the first methionine being bold. Lowercase letters indicate additional residues obtained by our correction of the start codon assignment. Residues conserved in all sequences are indicated by asterisks
Fig. 8Homology-based start codon checking for the detection of ORFs, which are too long. A sequence alignment of four homologous proteins of H. salinarum (strains R1 and NRC-1), N. pharaonis, H. walsbyi and H. marismortui is shown. Codes starting with OE are from H. salinarum strain R1, those with VNG from strain NRC-1, NP from N. pharaonis, HQ from H. walsbyi and those starting with rrnAC from H. marismortui. The protein sequences are highly homologous. Residues conserved in all sequences are indicated by asterisks (lower alignment block). Spurious N-terminal sequence extensions are possible in three of the four species, but are considered to be incorrect as they are not homologous to each other (upper alignment block). Uppercase letters indicate the protein sequence as obtained from the current database, the first methionine being bold. The position of the probable initiator methionine in the current database sequence is indicated. Lowercase letters indicate gene extensions, which are possible but are considered spurious