Literature DB >> 23847402

Insight from γC1 protein model for implication in cotton leaf curl disease.

Khuram Shahzad¹, Abdul Hai, Nadeem Kizilbash, Jawaria Ambreen, Jamal Alruwaili.

Abstract

DNA γ is approximately half of the size of Begomovirus DNA. It encodes a γC1 gene that is conserved in position and size. This gene has the capacity to encode a 13 to 14 kDa protein comprising 118 amino acid residues. It has been shown earlier that γC1 protein is necessary for inducing symptoms of cotton leaf curl disease. The structure for γC1 (CLCuDγ01-Pakistan) is still unknown. Therefore, a model of γC1 (CLCuDγ01-Pakistan) was developed using DoBo and I-TASSER servers followed by validation by PROCHECK and VERIFY 3D servers. The developed model provides an insight in a role for this multifunctional protein in causing Cotton Leaf Curl Disease (CLCuD). A possible function of this protein might be the suppression of RNAsilencing in cotton plants.

Entities: Chemical Disease Species

Keywords: Cotton Leaf Curl Disease; DNA γ; RNA silencing; protein structure prediction; threading; γC1 gene

Year: 2013 PMID： 23847402 PMCID： PMC3705618 DOI： 10.6026/97320630009471

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

The members of family Gemini viridae consist of singlestranded DNA (ssDNA) viruses that infect a wide range of plants and cause serious crop damage. One of the four genera of Gemini viruses is Begomovirus [1]. Most members of Begomovirus have bipartite DNA, called DNA A and DNA B. DNA A encodes the proteins required for viral DNA replication and encapsidation, whereas, DNA B encodes two proteins that are essential for systemic movement. Recently some whiteflytransmitted Begomovirus have been shown to require the presence of single-stranded DNA satellite (known as DNA γ) to induce characteristic symptoms of Cotton Leaf Curl Disease (CLCuD) in some hosts [2, 3]. Approximately half of the Begomovirus DNA consists of DNA γ (1.3 to 1.4 kb) [2, 4]. Sequence analyses have shown that the complementary-sense strand of all DNA γ molecules encode a γC1 gene which is conserved in sequence, position and length. This gene has the capacity to encode 13 to 14 kDa protein comprising of 118 amino acids [5, 6]. The precise function of DNA γ and the γC1 protein in the pathogenesis of CLCuD is still not fully understood. It has been proposed that DNA γ may play a direct or an indirect role in viral DNA replication, facilitation of the movement of viruses or countering the host defense response [4]. The functions of DNA γ have been shown to be mediated by complementary-sense gene, γC1. The protein product of γC1 gene has been shown to act as a suppressor of posttranscriptional gene silencing [5-9]. The DNA γ-encoded protein, γC1, is the cause of both pathogenicity and suppression of gene silencing [10]. In this study, the 3D structure of CLCuDγ01-Pakistani protein has been predicted by use of the online structure prediction server, called I-TASSER. DoBo server was used to predict the domains of the protein. PROCHECK and VERIFY 3D servers were used for evaluation of the predicted γC1 structure. The final 3D model was evaluated in terms of functional capability of the protein.

Methodology

The amino acid sequence of CLCuDγ01-Pakistani protein was retrieved from ExPASy Bioinformatics resource portal (http://expasy.org) with accession number (UniProtKB= Q911I3). The nucleotide sequence was accessed through EMBL nucleotide sequence database (Accession number: AJ292769). The nucleotide sequence is 357 base pairs long, which encodes a protein of 118 amino acid residues. The primary sequence of 118 amino acids was used to unravel the structural aspects of this protein. Both structure prediction and evaluation tools (data not shown) were used, but the following tools were relied upon more for greater accuracy. The secondary structure elements were determined using the DoBo [11] and PredictProtein (results not shown) [12] severs. For protein homology modeling, we used online available tool called ITASSER [13]. This server uses the threading technique to predict the 3D models. The server generated 5 best models based on multiple-threading alignments and iterative template fragment assembly simulations along with their confidence scores (Figure 1). The 5 models were visualized by the Visual Molecular Dynamics (VMD) software [14]. To evaluate these models, different validation techniques were used. In a similar fashion, PROCHECK [15] and VERIFY 3D [16] servers were used to validate the predicted protein structures. The PROCHECK software generates ramachandran plot which nicely explains the steriochemical configuration of amino acid residues. The VERIFY 3D analyzes the compatibility of an atomic model with its amino acid sequence. Each amino acid residue is assigned a structural class. A collection of structures is used as a reference to obtain a score for each of the 20 amino acids in any structural class. The scores are then plotted for individual residues [17, 18]. Finally, the better model was evaluated based on the aforementioned tools.

Figure 1

The five models for tertiary structure of CLCuDγ01-Pakistani protein predicted by the I-TASSER server. Each model is represented by a-helices (purple colored) and γ-strands (yellow colored).

Results & Discussion

The CLCuDγ01-Pakistani protein and Chinese Y10γC1 are distantly related from each other with respect to amino acid sequence. However, when the secondary structure elements of CLCuDγ01-Pak were compared to Chinese Y10γC1, the structural conservation was observed between the two species [19] (Figure 2). Since no template was found for homology modeling of CLCuDγ01-Pakistani protein, it was decided that threading technique can be useful for tertiary structure prediction. This guided us towards the implementation of the threading-based I-TASSER server. Five models of the tertiary structure were received from the server (Figure 1). These models were verified using PROCHECK and VERIFY 3D online available servers. The overall percentage representation of results is shown in Table 1 (see supplementary material). The threading-based method was successful in determining the 3D structure of the CLCuDγ01-Pakistani protein and it provided five complete models. Model numbers 3 and 5 had good scores in terms of Ramachandran plots and 3D-1D amino acid distributions. Both the models passed the percentage representation and averaged 3D-1D scores. Model number 5 was selected as the better model because of the presence of larger percentage of amino acid residues in the core and allowed regions in the Ramachandran plot. None of its amino acid residues were present in the forbidden or disallowed region of the Ramachandran plot as shown by (Table 1) and (Figure 1c). The accuracy of the model was also verified by use of the criteria developed by VERIFY 3D server. This server verifies the 3D structural distribution of amino acids as compared to the 1D distribution of amino acid residues [16].

Figure 2

Schematic comparison of predicted structural elements of CLCuDγ01-Pakistani protein and Chinese viral protein Y10γC1 [18]. The γ-strands and α helices are shown by yellow arrows and purple colored bars respectively for γC1 (Pakistan). While in Chinese Y10γC1 sequence, the γ-strands and α-helices are shown by green arrows and yellow bars respectively. The part of the amino acid sequence where structural differences are found between the two proteins is outlined by a box.

The selected model number 5 showed structural conservation between the CLCuDγ01-Pakistani protein and Chinese Y10γC1 for two a-helices located near the C-terminus (Figure 2). These a-helices have been implicated in the multimerization of the protein [19]. From the amino acid sequence analysis it was observed that CLCuDγ01-Pak lacks cysteine residues. Therefore, a zinc finger DNA-binding domain (Cys-His motif) is missing from the protein structure [20, 21& 22]. However, a Histidine-based DNA-binding domain might still be present in the protein that may allow it to bind single stranded or double stranded DNA without size or sequence specificity and to be able to suppress the host RNA silencing activity as observed in Chinese strain [8]. The exact sequence location in CLCuD γ01 protein still needs experimental verification. The secondary structure elements of CLCuDγ01-Pakistani protein (Figure 3a) comprise three a-helices and five γ-strands. Three of the γ-strands (γ1, γ2 and γ3) are at the N-terminus, the fourth γ-strand (γ4) is located almost in the middle of the of sructure and the fifth γ-strand (γ5) is present near the Cterminus. All of the three a-helices (α1, α2 and α3) are located in the middle of the protein structure. Five Glycine residues present in the secondary strcuture elements. Glycine 5 is present in the middle part of γ1; Glycine 10 is at the beginning of γ2; and Glycine 44 and Glycine 64 are present in α1 and α2 respectively. Only one Proline residue is present in the secondary structure elements (Proline 109 in γ5). Glycines and Prolines can both produce a “kink‛ in α-helices and γ-strands. The amino acids residues: Alanine, Aspartic Acid, Glutamic acid, Isoleucine, Leucine and Methionine favor the formation of α-helix. The three α-helices of CLCuDγ01-Pakistani protein contain a total of 31 amino acid residues out of which 18 belong to the type that favor formation of α-helices. The secondary structure elements were further compared with those of the Chinese viral protein Y10γC1 [19] Figure 2(Figure 2). Both the proteins were found to be comparable. The only region showing any difference between the two proteins is the region between amino acid 35-55 where an extra a-helix and a γ-strand are present in CLCuDγ01-Pakistani strain.

Figure 3

(a) Amino acid sequence and secondary structure elements of the CLCuDγ01-Pakistani protein. The consensus secondary structure elements were determined by use of DoBo, PredictProtein and I-TASSER servers. Highlighted in red are amino acids that are highly conserved between Malvaceous γ satellites; those in yellow are the amino acids that are highly conserved between all the γ satellites; arrows indicate the position of Glycine and Proline residues in the secondary structure elements. (b) Model number 5 for the tertiary structure of γC1 predicted protein by I-TASSER server. The a-helices are represented by purple color, while γ- strands are represented by yellow color. N- and C-terminus residues are colored red. (c) Ramachandran plot of the predicted protein model of γC1 showing the values of Psi and Phi angles.

The predicted tertiary structure of CLCuDγ01-Pakistani protein shows three α-helices and five γ-strands. The four γ-strands are arranged antiparallel to each other (Figure 3b). According to the results provided by the DoBo server, CLCuDγ01-Pakistani protein contains two domains: an N-terminal domain which stretches from amino acid 1-51 and contains three γ-strands and one α-helix and a C-terminal domain which stretches from amino acid 55-118 and one γ-strand and two α-helices. Both the domains are critical for functioning of the protein [10]. Figure 3 shows the secondary elements distribution across the primary sequence (a), the predicted 3D model (b) and Ramachandran plot (c) for the amino acids distribution. To analyze the stereochemical quality of the predicted structure PROCHECK software was used. According to PROCHECK results, the fifth model (model#5) (Figure 3b) seems the most appropriate one because it has most of the amino acid residues present in the core and allowed regions (92.2%) while only 4.7 % of the total amino acids are found in the generous region as indicated by Ramachandran plot (Figure 3c). Further, the quality was assessed by the VERIFY 3D software. Both the PROCHECK and VERIFY 3D results are shown in the Table 1 (see supplementary material). The Figure 3 shows the secondary elements distribution across the primary sequence (a), the predicted 3D model (b) and Ramachandran plot for the amino acids distribution (c). The molecular basis of pathogenicity of γC1 can be explained by the suppression of RNA silencing activity. RNA silencing is a surveillance system that exists in many species under different names but with same phenomena e.g., in animals (RNA interference), fungi (quelling) and plants (posttranscriptional gene silencing) [22, 23]. RNA silencing, either initiated or inhibited by different viruses, plays an important antiviral role in eukaryotes (e.g. animals and plants etc.). Some viruses have evolved or acquired functional proteins that suppress RNA silencing by targeting different steps of silencing pathways [7, 24, 25]. This role has only recently been elucidated in γC1 protein of tomato yellow leaf curl China betasatellite (TYLCCNB) which forms a multimeric complex with the help of cysteine residues for its proper functioning [18]. However, this type of role in CLCuDγ01 has not been experimentally elucidated yet. γC1 is also involved in other functions also such as pathogen (virus) movement in host plants, DNA-binding and post-transcriptional gene silencing [26]. It is known that expression of γC1 interferes with local silencing in transient Agrobacterium-based assays. γC1 protein targets different stages in silencing process by overlapping the miRNA. It binds both single stranded and double stranded DNA in a non-specific manner without the presence of a zinc finger domain [8]. The γC1 fusion proteins have been shown to be primarily localized in the nucleus in both insect and plant cells. They require a nuclear localization sequence (NLS) for entering the nucleus of a cell [8]. Although comparable with other Begomovirus proteins such as AL2/AC2, with respect to size, DNA-binding properties, and nuclear localization, γC1 lacks the zinc finger motif and shares little or no sequence homology with these proteins [27]. The predicted structure can help virologists in pinpointing the major regions of interaction between plant hosts and viruses to uncover the interaction mechanism. Future research in the direction of predicting the binding sites will also help uncover the possible mechanisms of RNA-silencing and several other functions. By knowing the structural aspects of this protein, it will be easier to target the disease causing viruses by introducing the novel drugs inside the plants or using the gene therapy techniques in plant to eliminate the disease effects and to increase the potential economic growth at industrial level.

Conclusion

The secondary and tertiary structures of CLCuDγ01-Pakistani protein associated with Cotton Leaf Curl Disease CLCuD have been predicted using in silico methodologies. The novel aspects of the protein structure have been highlighted using already available literature. The secondary structure elements were compared with the Chinese viral protein Y10γC1 which revealed that both the proteins are structurally somewhat similar despite sequence dissimilarities. The only difference is in the region between amino acid 35-55 where an extra a-helix and a γ-strand are present in CLCuDγ01-Pakistani protein. The novel structural aspects can be further used to highlight the potential role of this protein inside the host to unravel the potential role of the disease causing protein inside the host. The future advancement such as protein-protein interactions, role in DNA binding and RNA interference need expert level approaches to investigate the structural and functional aspects of this protein and to control its role in spreading of Cotton Leaf Curl Disease (CLCuD).

28 in total

1. Characterization of DNAbeta associated with begomoviruses in China and evidence for co-evolution with their cognate viral DNA-A.

Authors: Xueping Zhou; Yan Xie; Xiaorong Tao; Zhongkai Zhang; Zhenghe Li; Claude M Fauquet
Journal: J Gen Virol Date: 2003-01 Impact factor: 3.891

Review 2. Plant viral suppressors of RNA silencing.

Authors: Braden M Roth; Gail J Pruss; Vicki B Vance
Journal: Virus Res Date: 2004-06-01 Impact factor: 3.303

3. βC1 encoded by tomato yellow leaf curl China betasatellite forms multimeric complexes in vitro and in vivo.

Authors: Xiaofei Cheng; Xiaoqiang Wang; Jianxiang Wu; Rob W Briddon; Xueping Zhou
Journal: Virology Date: 2010-10-28 Impact factor: 3.616

Review 4. Silencing suppression by geminivirus proteins.

Authors: David M Bisaro
Journal: Virology Date: 2006-01-05 Impact factor: 3.616

Review 5. Subviral agents associated with plant single-stranded DNA viruses.

Authors: R W Briddon; J Stanley
Journal: Virology Date: 2006-01-05 Impact factor: 3.616

6. Plants expressing tomato golden mosaic virus AL2 or beet curly top virus L2 transgenes show enhanced susceptibility to infection by DNA and RNA viruses.

Authors: G Sunter; J L Sunter; D M Bisaro
Journal: Virology Date: 2001-06-20 Impact factor: 3.616

7. The tomato golden mosaic virus transactivator (TrAP) is a single-stranded DNA and zinc-binding phosphoprotein with an acidic activation domain.

Authors: M D Hartitz; G Sunter; D M Bisaro
Journal: Virology Date: 1999-10-10 Impact factor: 3.616

8. A Begomovirus DNAbeta-encoded protein binds DNA, functions as a suppressor of RNA silencing, and targets the cell nucleus.

Authors: Xiaofeng Cui; Guixin Li; Daowen Wang; Dongwei Hu; Xueping Zhou
Journal: J Virol Date: 2005-08 Impact factor: 5.103

9. A unique virus complex causes Ageratum yellow vein disease.

Authors: K Saunders; I D Bedford; R W Briddon; P G Markham; S M Wong; J Stanley
Journal: Proc Natl Acad Sci U S A Date: 2000-06-06 Impact factor: 11.205

10. I-TASSER server for protein 3D structure prediction.

Authors: Yang Zhang
Journal: BMC Bioinformatics Date: 2008-01-23 Impact factor: 3.169