| Literature DB >> 18397894 |
Pawel Szczesny1, Andrei Lupas.
Abstract
MOTIVATION: Trimeric autotransporter adhesins (TAAs), such as Yersinia YadA, Neisseria NadA, Moraxella UspAs, Haemophilus Hia and Bartonella BadA, are important pathogenicity factors of proteobacteria. Their high sequence diversity and distinct mosaic-like structure lead to difficulties in the annotation of their sequences. These stem from the large number of short repeats, the presence of compositionally unusual coiled-coils, fuzzy domain boundaries and regions of seemingly low sequence complexity.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18397894 PMCID: PMC2373917 DOI: 10.1093/bioinformatics/btn118
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic architecture or TAAs and organization of their domains. The names of three domain types, Ylhead, HIM and ISneck, stand for—in order—YadA-like head, head insert motif and insertion sequence neck. Other abbreviated domain names (GIN, FxG, HANS, DALL and FGG) come from prominent residue patterns in those domains.
Fig. 2.Frequencies of amino acids in TAAs compared to the non-redundant database. The arrow denotes increasing side-chain size. The numbers for the graph are: G (nr—7.0%, TAA—12.2%), A (8.6%, 13.5%), S (7.1%, 11.4%), C (1.4%, 0%), D (5.3%, 5.5%), P (5.0%, 1.5%), N (4.1%, 8.3%), T (5.5%, 10.2%), E (6.2%, 2.9%), V (6.6%, 7.4%), Q (4.1%, 3.3%), H (2.2%, 0.8%), M (2.3%, 0.9%), I (5.7%, 4.8%), L (9.8%, 6.1%), K (5.2%, 5.0%), R (5.7%, 1.9%), F (3.9%, 1.6%), Y (2.9%, 1.9%) and W (1.2%, 0.4%).
Annotated domains of TAAs, divided into functional classes. Where the structure of a domain is known, the Protein Data Bank accession code is given in square brackets
| Domain name | Structure | Prediction | PFAM |
|---|---|---|---|
| Class: signal peptide | |||
| Autotransporter signal peptide | – | HMMER | – |
| Class: head | |||
| Ylhead | Beta-solenoid [1p9h] | HMMER/rules | Hep_Hag |
| HIM1, HIM2 | Insert within last Ylhead repeat before the neck | HMMER | – |
| Trp-ring | Beta-meander [1s7m] | HMMER | – |
| GIN | Beta-prism [1s7m] | HMMER | – |
| FxG | Unknown | HMMER | X_fast-SP-rel |
| Class: stalk | |||
| Right-handed stalks | Right handed CC | HMMER | – |
| Left-handed stalks | Left-handed coiled coil | Marcoil/rules | DUF1079 |
| FGG | Coiled-coil variant | HMMER/rules | – |
| Class: connector | |||
| Neck | Loop [1p9h] | HMMER | HIM |
| HANS | Unknown | HMMER/rules | – |
| DALL1, 2, 3 | Unknown | HMMER | – |
| ISneck1 | Neck with an inserted domain [1s7m] | HMMER | – |
| ISneck2 | Neck with an short insertion | HMMER | - (HIM) |
| Class: anchor | |||
| Membrane anchor | Beta-barrel occluded by CC | HMMER | YadA |
aDUF1079 is a motif in the left-handed stalks of Moraxella UspA1 and UspA2.
bThese domains were added as a result of the manual benchmark annotation and were not used for benchmarking.
cPFAM HIM definition annotates only first half of this motif.
Fig. 3.Details of daTAA and PFAM performance in comparison with manual annotation. The two sets of sequences are as described in the Methods. In each group, the first box denotes the PFAM annotation, the second daTAA and the third the manual annotation. Matches are colored according to their functional class as autotransporter signal peptide: blue, heads: red, connectors: green, stalks: yellow, anchor: grey. Set A. 1. gi|153095004| Mannheimia haemolytica PHL213 2. gi|149190224| Vibrio shilonii AK1 3. gi|154149446| Campylobacter hominis ATCC BAA-381 4. gi|153834639| Vibrio harvei HY01 5. gi|153093295| Mannheimia haemolytica PHL213 6. gi|150380584| Shewanella sediminis HAW-EB3 7. gi|148827620| Haemophilus influenzae PittGG 8. gi|154149537| Campylobacter hominis ATCC BAA-381 9. gi|149909020| Moritella sp. PE36 Set B. 1. gi|78061293| Burkholderia sp. 383 2. gi|161017094| Bartonella tribocorum CIP 105476 3. gi|161505469| Salmonella enterica subsp. arizonae 4. gi|156124985| Acinetobacter venetianus 5. gi|157145682| Citrobacter koseri ATCC BAA895 6. gi|155199120| Escherichia coli 7. gi|86750771| Rhodopseudomonas palustris HaA2 8. gi|85709253| Erythrobacter sp. NAP1 9. gi|162429157| Methylobacterium nodulans ORS 2060.
Coverage of annotation against total number of residues in the testing sets
| SET A | SET B | |||||
|---|---|---|---|---|---|---|
| daTAA (%) | PFAM (%) | manual (%) | daTAA (%) | PFAM (%) | manual (%) | |
| Domains common for PFAM and daTAA | ||||||
| Ylhead | 18 | 17 | 20 | 13 | 12 | 14 |
| Neck | 4 | 5 | 5 | 8 | 11 | 10 |
| MemAnch | 10 | 6 | 10 | 6 | 2 | 6 |
| FxG | – | – | – | 1 | 1 | 1 |
| Domains not present in PFAM | 13 | – | 18 | 27 | – | 36 |
| Coiled coils | 5 | – | 3 | 11 | – | 25 |
| Total | 50 | 28 | 56 | 66 | 26 | 92 |
Fig. 4.Comparison of the annotation of consecutive repeats in the YadA head. The first line (green) denotes the PFAM annotation, the second line (red) daTAA. The third line shows the sequence of the YadA head, with the conserved SVAIG motifs bold and underlined.
Fig. 5.Screenshot of the daTAA results page. (A) Overview of repetitive domains. The graph is obtained by an implementation of a method for the visualization of repeats in strings (Wattenberg, 2002). (B) Picture with domain bubbles, coiled-coil prediction and schematic representation of the anticipated fiber width. (C) Tooltip with a Marcoil coiled-coil prediction and polar core motifs.