Literature DB >> 25435546

Plant-PrAS: a database of physicochemical and structural properties and novel functional regions in plant proteomes.

Atsushi Kurotani¹, Yutaka Yamada², Kazuo Shinozaki², Yutaka Kuroda³, Tetsuya Sakurai⁴.

Abstract

Arabidopsis thaliana is an important model species for studies of plant gene functions. Research on Arabidopsis has resulted in the generation of high-quality genome sequences, annotations and related post-genomic studies. The amount of annotation, such as gene-coding regions and structures, is steadily growing in the field of plant research. In contrast to the genomics resource of animals and microorganisms, there are still some difficulties with characterization of some gene functions in plant genomics studies. The acquisition of information on protein structure can help elucidate the corresponding gene function because proteins encoded in the genome possess highly specific structures and functions. In this study, we calculated multiple physicochemical and secondary structural parameters of protein sequences, including length, hydrophobicity, the amount of secondary structure, the number of intrinsically disordered regions (IDRs) and the predicted presence of transmembrane helices and signal peptides, using a total of 208,333 protein sequences from the genomes of six representative plant species, Arabidopsis thaliana, Glycine max (soybean), Populus trichocarpa (poplar), Oryza sativa (rice), Physcomitrella patens (moss) and Cyanidioschyzon merolae (alga). Using the PASS tool and the Rosetta Stone method, we annotated the presence of novel functional regions in 1,732 protein sequences that included unannotated sequences from the Arabidopsis and rice proteomes. These results were organized into the Plant Protein Annotation Suite database (Plant-PrAS), which can be freely accessed online at http://plant-pras.riken.jp/.

Entities: Chemical Disease Species

Keywords: Database; Gene function; Physicochemical property; Plant protein; Protein property

Mesh：

Substances：
Plant Proteins
Proteome

Year: 2014 PMID： 25435546 PMCID： PMC4301743 DOI： 10.1093/pcp/pcu176

Source DB: PubMed Journal: Plant Cell Physiol ISSN： 0032-0781 Impact factor: 4.927

Introduction

The flowering plant Arabidopsis has a small genome and a short life cycle. Therefore, it is considered an important model plant. After the whole-genome sequence of Arabidopsis was published in 2000 (Arabidopsis Genome Initiative 2000), the information related to Arabidopsis research was organized into The Arabidopsis Information Resource (TAIR; http://arabidopsis.org/), comprising various types of data such as DNA and seed stocks, literature citations, gene functions and protein structures (Lamesch et al. 2012). Nevertheless, one-third of all the proteins of Arabidopsis still lack functional annotations in terms of biological roles (Kourmpetis et al. 2011, Li et al. 2012) in spite of the extensive experimental and computational studies undertaken by many researchers. Similarly, the whole-genome sequencing of rice, one of the most important model crop plants, was recently completed (International Rice Genome Sequencing Project 2005, Yu et al. 2002). Subsequently, all the functional annotations for proteins and non-coding RNAs (ncRNAs) were manually curated (Rice Annotation Project 2007). The genome and the functional gene annotations of rice have been updated in the Michigan State University Rice Genome Annotation Project database (MSU Rice; http://rice.plantbiology.msu.edu/) (Kawahara et al. 2013) and in the Rice Annotation Project database (RAP-DB; http://rapdb.dna.affrc.go.jp/) (Sakai et al. 2013). Those annotations, however, also include information on genes with insufficient experimental evidence. Thus, Arabidopsis thaliana and rice, two well-studied plant species, still harbor unannotated genes. In order to improve functional annotation of genes in plants, various initiatives have been undertaken, such as inclusion of an experimental method that uses cross-species expressed sequence tag (EST) information (Chen et al. 2007), integration of plant genomic information (Asamizu et al. 2014), integration of Arabidopsis transcriptomic information (Obayashi et al. 2014), utilization of transcriptomic and metabolic profiles among plant tissues (Sakurai et al. 2013), integrative analysis of plant hormone accumulation and gene expression among rice tissues (Kudo et al. 2013), inclusion of the phenotypic information on mutant Arabidopsis lines (Sakurai et al. 2011, Myouga et al. 2013, Akiyama et al. 2014), inclusion of experimental and computational methods using gene expression data and experimentally derived (or predicted) protein–protein interactions (Kourmpetis et al. 2011), and inclusion of similarity clustering among protein sequences in the SALAD database (http://salad.dna.affrc.go.jp/salad/) (Mihara et al. 2010). To improve the annotations further, we attempted to utilize the proteome information. In this study, we adopted a new method, which we use to study predicted secondary structures and functions of proteins to make plant gene annotations easier to understand. Because proteins possess specific structures and functions, obtaining this information helps us to elucidate the corresponding gene functions. Here, we report analyses of multiple physicochemical and secondary structural parameters of whole-protein sequences obtained from representative data sets of six plant species, A. thaliana, Glycine max (soybean), Populus trichocarpa (poplar), Oryza sativa (rice), Physcomitrella patens (moss) and Cyanidioschyzon merolae (alga). The genome sequences of these six species have been completely determined previously. We propose new annotations for the predicted functional regions corresponding to the unannotated genes of Arabidopsis and rice. We also developed the Plant-PrAS (Plant Protein Annotation Suite) database, which includes the annotations generated in this study.

Results and Discussion

Protein sequence sets

We prepared non-redundant sequence sets from the whole-protein sequences, using the procedure that was described in our previous study (Kurotani et al. 2014). Protein sequences with length ranging from 50 to 2,000 amino acid residues were extracted from the databases for analysis in this study. Redundant data in these protein sequences were removed using the OrthoMCL software (Chen et al. 2006). The final filtered proteomes contained 26,326, 34,972, 35,791, 40,087, 35,908, 30,654 and 4,595 non-redundant protein sequences corresponding to Arabidopsis, soybean, poplar, MSU Rice, RAP-DB (rice), moss and algae, respectively. In addition, 20,572 and 6,216 non-redundant protein sequences of the mouse and yeast, respectively, were also prepared as a reference (mammals and fungi).

Secondary structural properties of proteins

Transmembrane helices, domain linkers and signal peptides

We calculated the number of transmembrane helices, domain linkers and signal peptides in the protein sequences using the TMHMM (Krogh et al. 2001), DROP (Ebina et al. 2011) and SignalP (Petersen et al. 2011) software packages. For example, transmembrane helices in proteins play an important role in the transport of various substances across biological membranes, and signal peptides are present either in secreted proteins or in transmembrane proteins. Predicting the numbers of transmembrane helices, domain linkers and signal peptides in a protein sequence does not lead to the prediction of protein function directly but does elucidate the corresponding intramolecular interactions. The results obtained using the above-mentioned analytical tools suggested that P. patens and C. merolae possess a smaller number of transmembrane helices, domain linkers and signal peptides than do the other plant species examined (Supplementary Table S1). We can speculate that the physiology of higher order plants (vascular plants) involves a variety of functions that require the presence of a greater number of transmembrane helices, domain linkers and signal peptides compared with lower order plants.

Intrinsically disordered regions (IDRs) and post-translational modifications

Recently, it was reported that the number of IDRs in proteins is higher among the monocots compared with other types of plants (Kurotani et al. 2014). In our processed data sets, the IDR content of the monocot rice calculated using the RONN software (Yang et al. 2005) was higher than that of the other five plant species, in agreement with our recent study (Kurotani et al. 2014). Moreover, in angiosperms, the proteins showing high IDR content generally show higher reactivity in these regions (e.g. post-translational modifications such as phosphorylation and O-glycosylation) (Iakoucheva et al. 2004, Gao and Xu 2012, Yao et al. 2012). The IDRs are considered vulnerable to an attack by a reactive molecule owing to their high flexibility and easy accessibility. The frequencies of N-glycosylation sites in Arabidopsis, soybean and poplar (all dicots) were higher than those in the monocot rice (Supplementary Table S1). On the other hand, the frequency of O-glycosylation in the monocot rice was higher than that in the dicot species (Supplementary Table S1). The reason is that O-glycosylation occurs preferably in IDRs as a non-conservation property involved in functional diversity and structural stability (Nishikawa et al. 2010), whereas N-glycosylation does not strongly correlate with IDR content; this is because N-glycosylation is known to occur co-translationally before a protein is fully folded (Petrescu et al. 2004, Kurotani et al. 2014). Moreover, a higher IDR content results in unstable protein structures and problems with crystallization (Oldfield et al. 2013). Accordingly, we observed that rice proteins, as a whole, tend to show higher susceptibility to phosphorylation and O-glycosylation but fail to crystallize during three-dimensional structural analysis owing to the presence of a greater number of IDRs compared with Arabidopsis, soybean and poplar.

Functional regions

In order to obtain useful information on the functional regions in the protein sequence data sets, Plant-PrAS prepares the results by means of the PASS tool, which identifies highly conserved sequence regions using existing protein sequence sets (Kuroda et al. 2000), and by means of the Rosetta Stone method, which identifies the regions likely to be involved in protein–protein interactions, using a comparative genomic approach (Enright et al. 1999, Marcotte 1999). ‘Rosetta Stone composites’ are paired regions in a protein sequence, and ‘Rosetta Stone components’ are the elements of the Rosetta Stone composites (Enright et al. 1999). Plant-PrAS provides the results on both the Rosetta Stone composites and components to help find functional regions. As a result of the calculations on the six plants species, we obtained 32,158 protein sequence hits with the PASS tool, 19,627 with the Rosetta stone composites and 13,428 with the Rosetta Stone components (Supplementary Table S2). In addition, Plant-PrAS can combine and provide the results of the PASS and Rosetta Stone methods to improve the reliability of the functional region annotations. Finally, we identified functional regions in 52,049 non-overlapping protein sequence hits by means of PASS and Rosetta Stone composites/components from the six plant species.

Detection of novel functional regions in the unannotated protein sequences of Arabidopsis and rice

We extracted the unannotated protein sequences of Arabidopsis and rice from the annotation information file, which contained 5,180 sequences for Arabidopsis, 15,322 for MSU Rice and 14,716 for RAP-DB (see the Materials and Methods and Supplementary Table S3). Subsequently, we identified candidate protein sequences, including the novel functional regions in the unannotated sequences in Arabidopsis and rice, by using PASS and Rosetta Stone composite/component methods and the Pfam database (Finn et al. 2014). As a result, we assigned 2,470 proteins to Pfam. For those proteins not assigned to Pfam, we found novel functional regions in 523 proteins (PASS), in 1,008 proteins (Rosetta Stone composites) and in 700 proteins (Rosetta Stone components; Table 1). Finally, we annotated 1,732 non-overlapping proteins from the unannotated sequences in Arabidopsis and rice using the methods for detection of functional regions. With regard to the above analyses using the PASS tool and the Rosetta Stone methods, we applied this tool and the methods to UniProt-plant (UniProt Consortium 2014), which is a collection of plant protein sequences that includes abundant as well as unknown functional protein sequences. The above results have the possibility that novel functional regions are identified on the information on unannotated proteins from this study.

Table 1

Detection of novel functional regions in the unannotated protein sequences of Arabidopsis and rice by means of Plant-PrAS (Plant Protein Annotation Suite database)

Plant species	Unannotated sequences	Pfam(+)^a	Pfam(–)
			PASS(+)^b	Rosetta Stone
				Composite(+)^c	Component(+)^d
Arabidopsis	5,180	312	111	421	63
MSU Rice	15,322	640	111	280	225
RAP-DB (rice)	14,716	1,518	301	307	412
Total	35,218	2,470	523	1,008	700

a The number of protein hits in the Pfam database.

b The number of proteins whose functional regions were detected by PASS but not by Pfam [Pfam(–)].

The number of proteins whose functional regions were detected as Rosetta Stone composites with Pfam(–).

The number of proteins whose functional regions were detected as Rosetta Stone components with Pfam(–).

Detection of novel functional regions in the unannotated protein sequences of Arabidopsis and rice by means of Plant-PrAS (Plant Protein Annotation Suite database) a The number of protein hits in the Pfam database. b The number of proteins whose functional regions were detected by PASS but not by Pfam [Pfam(–)]. The number of proteins whose functional regions were detected as Rosetta Stone composites with Pfam(–). The number of proteins whose functional regions were detected as Rosetta Stone components with Pfam(–).

The search interface of Plant-PrAS

We developed a publicly accessible web-based database, Plant-PrAS (http://plant-pras.riken.jp/), which currently stores 208,333 protein sequence records derived from genome-wide analysis of six major plant species (A. thaliana, soybean, poplar, rice, P. patens and C. merolae) and 26,788 protein sequence records derived from the two reference species (the mouse and yeast). Each protein sequence is annotated with information on the calculated protein properties and classified physicochemical properties [length, percentage of charged amino acids, percentage of non-polar amino acids, percentage of acidic amino acids, percentage of basic amino acids, percentage low complexity, the grand average of hydrophobicity (GRAVY) and the pI], and protein secondary structural properties [percentage solvent accessibility, percentage of β-sheets, percentage of IDRs, and the presence of a signal peptide(s), transmembrane helices, S–S bonds and domain linkers]; functional annotations against the eukaryotic orthologous groups (KOG) of Clusters of Orthologous Groups of proteins (COGs) (Tatusov et al. 2000, Tatusov et al. 2003), Protein Data Bank (PDB) (Berman et al. 2000, Berman et al. 2013), UniProt-SwissProt and UniProt-plant; functional regions detected using the PASS tool and Rosetta Stone methods; and other properties such as protein solubility, subcellular localization, the number of N/O-glycosylation sites and the number of ubiquitination sites. Plant-PrAS offers powerful search features and statistical information on various calculations. An entire data set can be downloaded as a file. The database has three types of search functions: ‘Property Search’, ‘Keyword Search’ and ‘ID Search’.

Property Search

Plant-PrAS allows users easily to combine search results using a Property Search, designed for obtaining abundant proteomic information all at once. The Property Search can extract data from multiple species in our data set and from multiple protein sequence properties such as length, percentage of charged amino acids and GRAVY; from protein structural properties such as S–S bonds, transmembrane helices and percentage IDRs; from protein annotation data such as the information on Pfam, UniProt and the Enzyme Commission (EC) number; from protein modification/localization data such as O/N-glycosylation and subcellular localization; and from functionally conserved regions and interaction regions (Fig. 1A). On this page, for instance, a user can select an annotated or unannotated sequence from Arabidopsis and rice. Combined selection of the unannotated sequences and the calculation tools is available, helping to find a novel annotation corresponding to the unannotated sequences. For example, when a user performs a search by checking the options ‘unannotated sequences’, ‘Rosetta Stone composite hit UniProt-plant’ and ‘Pfam not-hit/unknown’ for Arabidopsis, MSU Rice and RAP-DB, the results show the presence of 421, 280 and 307 candidate protein sequences, respectively, including those corresponding to a novel functional region (Table 1). On the Results page of the ‘Property Search’, users can browse through the protein features by the averages of the extracted sequence properties in the statistics table (Fig. 2A). The search results can also be downloaded as a text file. Plant-PrAS houses information on charged amino acids, IDRs and solvent accessibility. Thus, the Property Search feature can be utilized for plant proteomic analyses.

Fig. 1

Search interfaces of Plant-PrAS. A user can search for multiple protein sequence properties on the ‘Property Search’ page (A). The user can also search for objective records using the ‘Keyword Search’ function (B). ‘ID Search’ makes it possible to search for objective records by IDs from public databases (C).

Fig. 2

Examples of search results in Plant-PrAS. (A) The results of Property Search. (B) The results of Keyword or ID Search.

Keyword Search

This option can be used to find protein sequences in our data sets, by using any keywords containing three characters corresponding to the protein descriptions from Pfam, PDB, KOG and UniProt (Fig. 1B). This feature allows the user to select the AND/OR function during a multiple keyword search. The extracted records are listed on the results page with short descriptions (Fig. 2B). The user can click on an ‘ID’ to obtain detailed information on a protein.

ID Search

Plant-PrAS allows a user to extract general IDs supported by the public databases pertaining to our data sets, by using the ID Search function (Fig. 1C). The extracted records are listed on the Results page with short descriptions (Fig. 2B). The user can click on an ‘ID’ to obtain detailed information on a protein.

Annotation details of proteins in Plant-PrAS

The Annotation Details page of Plant-PrAS displays basic information on each protein, such as protein sequence and similar proteins in the same species and among other species (Fig. 3A). Similarly, the page contains information on physical and sequence properties (Fig. 3B), structural properties (Fig. 3C), detected functional regions (Fig. 3D), functional annotation (Fig. 3E) and modifications and subcellular localization (Fig. 3F). To facilitate evaluation of various protein properties, the page shows the summary with average, median and percentile values in relation to proteins from the same species as a background distribution (Fig. 3G).

Fig. 3

Typical examples of the annotation details of proteins in Plant-PrAS. (A) Basic information on a protein in Plant-PrAS. (B) Physical and sequence properties. (C) Structural properties. (D) The detected functional regions. (E) Functional annotation. (F) Modifications and subcellular localization. (G) Summary with average, median and percentile values in relation to proteins from the same species (as a background distribution).

Exploration of the properties of unannotated proteins

We wanted to determine whether a data set obtained using Plant-PrAS provides new insights into the functions of unannotated proteins. Here, we present an example of deduction of such a function. Generally, the propensity for solubility or cell-free synthesis of a protein in Escherichia coli can be predicted by analyzing various properties of the protein sequence (Luan et al. 2004, Tartaglia et al. 2009, Kurotani et al. 2010, Agostini et al. 2012). The results produced by the protein solubility tool showed that the percentage of soluble proteins was higher among the unannotated proteins than among the annotated proteins (P < 0.05 in the t-test of differences between the annotated and unannotated proteins; Table 2). This result shows that unannotated proteins may contribute to the success of protein solubilization experiments. Moreover, the functional regions extracted using the Rosetta Stone method have the potential to interact with each partner region. Therefore, functional region candidates of this property identified by Plant-PrAS may aid in the discovery of novel annotated proteins that contain the novel functional regions.

Table 2

The percentage of soluble proteins (among all proteins) in Arabidopsis and rice

Species	Category	No. of sequences (soluble/total)^a	Percentage of soluble proteins
Arabidopsis	Annotated	7,545/21,146	35.7%
Arabidopsis	Unannotated	2,389/5,180	46.1%
MSU Rice	Annotated	8,432/24,765	34.0%
MSU Rice	Unannotated	8,177/15,322	53.4%
RAP-DB (rice)	Annotated	7,579/21,192	35.8%
RAP-DB (rice)	Unannotated	7,746/14,716	52.7%

In Arabidopsis and rice, there is a greater number of soluble proteins among unannotated proteins than among annotated proteins (P < 0.05 in the t-test of the differences between annotated and unannotated proteins).

Proteins that have a solubility score >0.5 according to the SOLpro software were regarded as soluble proteins.

The percentage of soluble proteins (among all proteins) in Arabidopsis and rice In Arabidopsis and rice, there is a greater number of soluble proteins among unannotated proteins than among annotated proteins (P < 0.05 in the t-test of the differences between annotated and unannotated proteins). Proteins that have a solubility score >0.5 according to the SOLpro software were regarded as soluble proteins.

Materials and Methods

Protein sequence resources

We analyzed the whole-protein sequences derived from the genome sequences of six major model plant species, namely Brassicaceae (Arabidopsis) (Arabidopsis Genome Initiative 2000), Fabaceae (soybean) (Schmutz et al. 2010), Salicaceae (poplar) (Tuskan et al. 2006), Poaceae (rice) (Yu et al. 2002, International Rice Genome Sequencing Project 2005), Funariaceae (P. patens) (Rensing et al. 2008) and (C. merolae) (Matsuzaki et al. 2004). The Arabidopsis proteomic sequence set was retrieved from TAIR (Lamesch et al. 2012). The rice sequences were retrieved from RAP-DB (Sakai et al. 2013) and from the MSU Rice Genome Annotation Project website (Ouyang et al. 2007, Kawahara et al. 2013). Cyanidioschyzon merolae sequences were retrieved from the C. merolae Genome Project website. The other plant sequences were retrieved from Phytozome (Goodstein et al. 2012). In addition, mouse (Mouse Genome Sequencing Consortium 2002) and yeast (Mewes et al. 2002) sequences were retrieved from the National Center for Biotechnology Information (NCBI) (ftp://ftp.ncbi.nih.gov/genomes/M_musculus/protein/) and from Munich Information Center for Protein Sequences (MIPS) (ftp://ftpmips.gsf.de/fungi/yeast/), respectively. They were used as reference proteome sets. Subsequently, we prepared non-redundant proteome sequence sets of the target organisms using the OrthoMCL software (Chen et al. 2006) with the runtime options pi_cutoff = 90, pmatch_cutoff = 90 and pv_cutoff = 1e-30.

Analysis of a protein sequence

Physicochemical properties

The percentage of polar, charged, acidic and basic amino acids as well as the isoelectric point were calculated using the ProteoMix software (Chikayama et al. 2004). The GRAVY index was calculated using the GRAVY algorithm (Kyte and Doolittle 1982).

Secondary structural properties

For prediction of these properties, we used the following software tools: SignalP (Petersen et al. 2011) to detect the presence of signal peptides, TMHMM (Krogh et al. 2001) to identify transmembrane helix domains, DROP (Ebina et al. 2011) to find interdomain linkers, DIpro (Cheng et al. 2006) to find S–S bonds, SSpro (Cheng et al. 2005) to identify secondary structures, ACCpro (Cheng et al. 2005) to analyze solvent accessibility and RONN (Yang et al. 2005) to find IDRs.

Functional and structural annotations

We used all the protein sequences for searches in KOG (Tatusov et al. 2000, Tatusov et al. 2003) and in UniProt-SwissProt/UniProt-plant (UniProt Consortium 2014) using BLASTP with the runtime options ‘cutoff E-value’ 1e-10 or 1e-5, respectively. Similarly, we used all protein sequences for searches in the PDB (Berman et al. 2000, Berman et al. 2013) using BLASTP with >50% identity. The Pfam annotations (Finn et al. 2014) and the EC number (Bairoch 2000) were obtained using the InterProScan software (Hunter et al. 2012).

Other properties

To analyze other properties, we used the following software packages: SOLpro (Magnan et al. 2009) for protein solubility, TargetP (Emanuelsson et al. 2000) and WoLF PSORT (Horton et al. 2007) for subcellular localization, NetNglyc (R. Gupta et al. unpublished) for N-glycosylation, the Gomond’s algorithm (Gomord et al. 2010) for O-glycosylation, and UbPred (Radivojac et al. 2010) for ubiquitination.

Detection of the functional regions

This procedure was performed on protein sequences by means of the proteome sequence set of the UniProt-plant database and the PASS tool (Kuroda et al. 2000), with the runtime options ‘cutoff E-value’≤1e-7 and ‘cutoff homolog’≥100, and the Rosetta Stone method (Enright et al. 1999, Marcotte 1999), with the cutoff E-value ≤1e-5, identities ≥35%, component length ≥50 amino acids and a component range from 10 to 30 amino acids, with runtime options similar to those described previously (Uversky 2002, Chia and Kolatkar 2004, Enault et al. 2005, Wallner and Elofsson 2005, Nayeem et al. 2006).

Extraction of the unannotated sequences in Arabidopsis and rice

The unannotated sequences of Arabidopsis and rice (MSU Rice and RAP-DB) were extracted from whole-protein sequences using the description terms shown in Supplementary Table S3.

Availability and implementation of the system

Plant-PrAS was implemented in a web application framework, MENTA, with MySQL as a database engine, and was tested in the following web browsers: Internet Explorer 11, Chrome 36 and Firefox 31.

Supplementary data

Supplementary data are available at PCP online.

Funding

This work was supported the Japan Society for the Promotion of Science [a Grant-in-Aid for Young Scientists (B) (18700106 to T.S.)].

67 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The ENZYME database in 2000.

Authors: A Bairoch
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

3. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

4. Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics.

Authors: Y Kuroda; K Tani; Y Matsuo; S Yokoyama
Journal: Protein Sci Date: 2000-12 Impact factor: 6.725

5. Genome-wide computational function prediction of Arabidopsis proteins by integration of multiple data sources.

Authors: Yiannis A I Kourmpetis; Aalt D J van Dijk; Roeland C H J van Ham; Cajo J F ter Braak
Journal: Plant Physiol Date: 2010-11-22 Impact factor: 8.340

6. A simple method for displaying the hydropathic character of a protein.

Authors: J Kyte; R F Doolittle
Journal: J Mol Biol Date: 1982-05-05 Impact factor: 5.469

7. RiceFOX: a database of Arabidopsis mutant lines overexpressing rice full-length cDNA that contains a wide range of trait information to facilitate analysis of gene function.

Authors: Tetsuya Sakurai; Youichi Kondou; Kenji Akiyama; Atsushi Kurotani; Mieko Higuchi; Takanari Ichikawa; Hirofumi Kuroda; Miyako Kusano; Masaki Mori; Tsutomu Saitou; Hitoshi Sakakibara; Shoji Sugano; Makoto Suzuki; Hideki Takahashi; Shinya Takahashi; Hiroshi Takatsuji; Naoki Yokotani; Takeshi Yoshizumi; Kazuki Saito; Kazuo Shinozaki; Kenji Oda; Hirohiko Hirochika; Minami Matsui
Journal: Plant Cell Physiol Date: 2010-12-23 Impact factor: 4.927

8. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.

Authors: Philippe Lamesch; Tanya Z Berardini; Donghui Li; David Swarbreck; Christopher Wilks; Rajkumar Sasidharan; Robert Muller; Kate Dreher; Debbie L Alexander; Margarita Garcia-Hernandez; Athikkattuvalasu S Karthikeyan; Cynthia H Lee; William D Nelson; Larry Ploetz; Shanker Singh; April Wensel; Eva Huala
Journal: Nucleic Acids Res Date: 2011-12-02 Impact factor: 16.971