| Literature DB >> 25481684 |
Jérôme Grimplet1, Anne-Françoise Adam-Blondon, Pierre-François Bert, Oliver Bitz, Dario Cantu, Christopher Davies, Serge Delrot, Mario Pezzotti, Stéphane Rombauts, Grant R Cramer.
Abstract
BACKGROUND: Grapevine (Vitis vinifera L.) is one of the most important fruit crops in the world and serves as a valuable model for fruit development in woody species. A major breakthrough in grapevine genomics was achieved in 2007 with the sequencing of the Vitis vinifera cv. PN40024 genome. Subsequently, data on structural and functional characterization of grape genes accumulated exponentially. To better exploit the results obtained by the international community, we think that a coordinated nomenclature for gene naming in species with sequenced genomes is essential. It will pave the way for the accumulation of functional data that will enable effective scientific discussion and discovery. The exploitation of data that were generated independently of the genome release is hampered by their heterogeneous nature and by often incompatible and decentralized storage. Classically, large amounts of data describing gene functions are only available in printed articles and therefore remain hardly accessible for automatic text mining. On the other hand, high throughput "Omics" data are typically stored in public repositories, but should be arranged in compendia to better contribute to the annotation and functional characterization of the genes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25481684 PMCID: PMC4299395 DOI: 10.1186/1471-2164-15-1077
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Summary of the point raised in each of the sections.
Brief definition and example of the main elements of the gene nomenclature
| Elements | Locus ID | Full name | Symbol | Synonyms |
|---|---|---|---|---|
| Example | Vitvi18g12230 | ( | (Vvi)ADH1 | GV-ADH1 aldehyde reductase, ethanol dehydrogenase |
| Description | Genome localization | Relatively descriptive function, include the level of curation (see Figure
| Concise (3–10 characters), should be descriptive of function when possible | Any known synonyms |
Figure 2Decision tree of rules for classifying sequences according to the level of evidence for its function.
Definition of the level of curation terms
| Value | Definition |
|---|---|
| Hypothetical protein | Allocated to each locus at the beginning of the process, meaning that the gene codes for a protein, for which no information regarding its function or actual existence is known. It should be removed only when existence of transcript is proven. |
| Expressed | Replaces “hypothetical” if existence of transcripts has been proven through expression data (proof of existence of RNA(s): RT-PCR, EST, RNA-seq, Northern blots, microarrays, etc.). The next step is to determine if similarity with sequences in other species can be observed. |
| ZZZ domain containing | Allocated if by comparison with other sequences or by performing a domain analysis, the highest level of information on the coding protein is the presence of a given domain ZZZ. |
| Similar to | Indicates that the existence of a protein is probable because a minimal level of similarity with a protein from a plant species was met. An e-value of e-20 is considered to be a reasonable cut-off or to have at least 30% identity for at least 80 contiguous amino acids, which places it into the “safe zone” as defined by
[ |
| YYY | If the gene has been experimentally characterized and named YYY or if there is > |
| Putative | Derived from |
| Probable | Indicates stronger evidence than the qualifier “putative” on function. This qualifier implies that there must be at least |
| Uncertain | Indicates that the existence of the protein is unsure and that there is evidence that the sequence corresponds to a |
| Translated | Is acquired when experimental |
Figure 3Decision tree on the naming or possible renaming procedure of a gene.
Figure 4Molecular phylogenetic analysis of and EIL gene models by the maximum likelihood method. Multiple sequence alignment for full-length transcription factors was inferred using MUSCLE [36]. The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model [37]. The bootstrap consensus tree inferred from 100 replicates [38] is taken to represent the evolutionary history of the taxa analyzed [38]. Branches corresponding to partitions reproduced in less than 70% of bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches [38]. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The analysis involved 10 amino acid sequences. The coding data was translated assuming a Standard genetic code table. All positions containing gaps and missing data were eliminated. There were a total of 273 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 [39]. Arrows point toward recommended Vitis symbols.
Figure 5Molecular phylogenetic analysis of sugar transporter gene models by the Maximum Likelihood method. The trees are adapted from [3] and produced using MUSCLE [36] and PhyML with the JTT amino acid substitution model. Bootstrapping was performed with 100 replicates. In addition to the original picture, branches corresponding to partitions reproduced in less than 70% of bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches [38]. A) sucrose transporters B) hexose transporters C) ERD6-like proteins. Arrows point toward recommended Vitis symbols, the green symbols are the putative symbols that would be used had not the Vitis gene been previously annotated in the literature. Recommended synonyms are in brackets.
Figure 6Molecular phylogenetic analysis of gene models by the Maximum Likelihood method. Multiple sequence alignment for full-length carotenoid cleavage dioxygenases was inferred using MUSCLE [36]. The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model [37]. The bootstrap consensus tree inferred from 100 replicates [38] is taken to represent the evolutionary history of the taxa analyzed [38]. Branches corresponding to partitions reproduced in less than 70% of bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches [38]. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The analysis involved 20 amino acid sequences. The coding data was translated assuming a Standard genetic code table. All positions containing gaps and missing data were eliminated. There were a total of 225 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 [39]. Arrows point toward recommended Vitis symbols. Asterisks indicate redundant synonyms.
Figure 7Molecular phylogenetic analysis of trihydroxystilbene synthase gene models by the Maximum Likelihood method. Multiple sequence alignment for full-length trihydroxystilbene synthases was inferred using MUSCLE [36] from the nucleotide sequence. The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model [37]. The bootstrap consensus tree inferred from 100 replicates [38] is taken to represent the evolutionary history of the taxa analyzed [38]. Branches corresponding to partitions reproduced in less than 70% of bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches [38]. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The analysis involved 40 amino acid sequences. The coding data was translated assuming a standard genetic code table. All positions with less than 95% site coverage were eliminated. That is, fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. There were a total of 292 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 [39]. Arrows point toward recommended Vitis symbols. A,B,C refer to the groups in [4].