| Literature DB >> 24155369 |
Durga B Kuchibhatla1, Westley A Sherman, Betty Y W Chung, Shelley Cook, Georg Schneider, Birgit Eisenhaber, David G Karlin.
Abstract
The genome sequences of new viruses often contain many "orphan" or "taxon-specific" proteins apparently lacking homologs. However, because viral proteins evolve very fast, commonly used sequence similarity detection methods such as BLAST may overlook homologs. We analyzed a data set of proteins from RNA viruses characterized as "genus specific" by BLAST. More powerful methods developed recently, such as HHblits or HHpred (available through web-based, user-friendly interfaces), could detect distant homologs of a quarter of these proteins, suggesting that these methods should be used to annotate viral genomes. In-depth manual analyses of a subset of the remaining sequences, guided by contextual information such as taxonomy, gene order, or domain cooccurrence, identified distant homologs of another third. Thus, a combination of powerful automated methods and manual analyses can uncover distant homologs of many proteins thought to be orphans. We expect these methodological results to be also applicable to cellular organisms, since they generally evolve much more slowly than RNA viruses. As an application, we reanalyzed the genome of a bee pathogen, <span class="Species">Chronic bee paralysis virus (<span class="Species">CBPV). We could identify homologs of most of its proteins thought to be orphans; in each case, identifying homologs provided functional clues. We discovered that CBPV encodes a domain homologous to the Alphavirus methyltransferase-guanylyltransferase; a putative membrane protein, SP24, with homologs in unrelated insect viruses and insect-transmitted plant viruses having different morphologies (cileviruses, higreviruses, blunerviruses, negeviruses); and a putative virion glycoprotein, ORF2, also found in negeviruses. SP24 and ORF2 are probably major structural components of the virions.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24155369 PMCID: PMC3911697 DOI: 10.1128/JVI.02595-13
Source DB: PubMed Journal: J Virol ISSN: 0022-538X Impact factor: 5.103
Capacities of the methods tested to detect homologs at different taxonomic depths
| Algorithm | % of sequences for which homologs were found with the following taxonomic distribution: | ||
|---|---|---|---|
| At most 1 genus | >1 genus | >1 family | |
| BLAST | 100 | 0 | 0 |
| PSI-BLAST | 94 | 6 | 2.6 |
| HHblits | 81.5 | 18.5 | 8.3 |
| HHpred | 80.1 | 19.9 | 14.2 |
| All combined | 74.6 | 25.4 | 14.2 |
The total for each row can be >100% because “beyond genus level” includes “beyond family level.” Likewise, the total proportion of the different algorithms in each column can be greater than the value of the cell “all methods combined” because for some proteins, distant homologs were detected by several algorithms. Percentages were calculated from a total of 351 sequences.
FIG 1Capacities of the methods tested to detect homologs at different taxonomic depths. Shown are the proportions of proteins classified as genus restricted by BLAST and found by the different similarity search methods to have homologs beyond the genus level and beyond the family level. Precise values are in Table 1, columns 2 and 3.
In-depth analysis, using contextual information, of a random subset of proteins classified as genus restricted by all automated methods
| Accession no. | Protein name | Taxonomic position (family, genus, species) | Taxonomic distribution found by automated methods | Taxonomic distribution found by in-depth manual analysis | Type of protein (reference) | Evidence |
|---|---|---|---|---|---|---|
| Large coat protein | 2 species | >40 families | Contains two domains of capsids with a jellyroll fold (PFAM clan Viral_ssRNA_CP) ( | Marginal HHpred hits (E = 0.28 for region 241–308 and E = 7.5 for 43–159 region) to PFAM family RhV, HHalign comparison with RhV alignment (E = 5 × 10−2), functional confirmation ( | ||
| Coat protein | 1 species | >40 families | Capsid with a jellyroll fold (PFAM clan Viral_ssRNA_CP) ( | Marginal HHpred hit (E = 0.2) to PFAM family Viral_Coat for 67–181 region, HHalign comparison with Viral_coat (E = 1.4 × 10−3), functional confirmation ( | ||
| 6K2 | 4 species | 4 genera in | 6K2 ( | Subsignificant CSI-BLAST hits to other 6K2 proteins (also located between CI protein and VPg protein), significant HHalign comparison between full-length 6K2 of | ||
| Putative matrix protein M | 1 species | At least 1 whole genus ( | Membrane protein ( | Subsignificant CSI-BLAST hits of M proteins of flaviviruses to | ||
| VPg | Unclassified, | 1 species | 1 whole genus ( | Genome-linked protein Vpg ( | CSI-BLAST finds full-length significant matches to VPg of many sobemoviruses, further iterative sequence searches identify as homologs VPg of all other sobemoviruses, all have same position within 2a/2b polyprotein (downstream of serine protease domain) | |
| P9 | 1 genus | 1 genus | P9 ( | |||
| Hypothetical protein p34 | 1 species | 2 species | Endonuclease | HHpred hit to RNase Dicer | ||
| p22 | 1 species | 1 species | ||||
| p5 | 1 species | 1 species | ||||
| Hypothetical peptide | 1 species | 1 species |
p34 has homologs in only two viral species but is homologous to a vast family of RNases from cellular organisms and thus most probably originated by horizontal transfer, which is beyond the scope of this study (see Materials and Methods and Discussion).
PDB, Protein Data Bank.
accession numbers of ORFs of chronic bee viruses and homologous ORFs
| Genus and species or products and host species | Protein name | Accession no. |
|---|---|---|
| | ORF1 from RNA1 | |
| ORF2 from RNA2 | ||
| ORF3 from RNA2 (SP24) | ||
| | ORF1 | Being submitted |
| ORF2 | Being submitted | |
| | ORF1 | |
| | ORF1 | |
| | p24 | |
| | p24 | |
| p24 | ||
| p23 | ||
| | ORF2 | |
| ORF3 | ||
| | ORF2 | |
| | ORF3 | |
| | ORF2 | |
| | ORF3 | |
| | ORF3 | |
| | ORF2 | |
| | ORF2 | |
| | ORF2 | |
| Cellular proteins | ||
| | IP15837p | |
| | Hypothetical nonconserved protein |
Proposed genus (P. Blanchard, personal communication).
Proposed genus (this study).
Proposed genus (89).
May be an endogenous viral protein (see text).
FIG 2The SP24 family of virion membrane proteins of insect viruses. The boundaries of the predicted TM segments are approximative. We assumed that the topology of SP24 was conserved in all of the viruses, but in chroparaviruses (first two sequences), TM segment 2 is less hydrophobic and thus may be simply membrane associated, which would generate a different overall topology. The N-and C-terminal regions have no detectable sequence similarity and are presented only for information; whether they are homologous is unknown. Predicted N-glycosylation sites are indicated for N- and C-terminal regions only. Actual N-glycosylation can occur only if these regions are on the outside the virion, which we cannot reliably predict (see text).
FIG 3The viral ORF2 glycoproteins of chroparaviruses and negeviruses. (A) Alignment of the cysteine-rich region of ORF2 sequences of chroparaviruses and negeviruses. Conventions are the same as in Fig. 2. The conserved cysteines, predicted to form disulfide bridges, are indicated by an asterisk. (B) Predicted organization of ORF2. We make no claim to accurately predict disulfide connectivity. Other disulfide bridges are likely to occur elsewhere in ORF2 but are not conserved across taxons (see text). tm1 and tm2, TM segments 1 and 2, respectively.
FIG 4Comparison of SP24 and the ORF2 glycoprotein in different viruses. The genomic contexts of SP24 in insect viruses are shown. Genomes and proteins are approximately to scale. The names of proteins that have significant sequence similarity (and are thus demonstrably homologs) are in bold.
virion morphology of viruses encoding an SP24 matrix protein
| Genus | Type species | Morphology | Host(s) | Reference(s) |
|---|---|---|---|---|
| Ellipsoidal, different populations (220 by 41, 54, or 64 nm), treatment with acid or base solutions results in more rounded, apparently empty shells (20–30 by 20–50 nm) | Insects (bees, mosquitoes) | |||
| Short, membrane-bound, enveloped, bacilliform particles (40–50 by 80–120 nm) | Plants (citrus), insects (erythrophyte mites) | |||
| Short, bacilliform particles (30 by 50 nm) | Plant (hibiscus), insects (erythrophyte mites) | |||
| Unknown | Plant (blueberry), probably transmitted by erythrophyte mites | |||
| Spherical, enveloped particles (45–55 nm) | Insects (mosquitoes, phlebotomine sand flies) |