Literature DB >> 27742820

PIECE 2.0: an update for the plant gene structure comparison and evolution database.

Yi Wang1,2,3, Ling Xu2,3, Roger Thilmony1, Frank M You4, Yong Q Gu5, Devin Coleman-Derr6,3.   

Abstract

PIECE (Plant Intron Exon Comparison and Evolution) is a web-accessible database that houses intron and exon information of plant genes. PIECE serves as a resource for biologists interested in comparing intron-exon organization and provides valuable insights into the evolution of gene structure in plant genomes. Recently, we updated PIECE to a new version, PIECE 2.0 (http://probes.pw.usda.gov/piece or http://aegilops.wheat.ucdavis.edu/piece). PIECE 2.0 contains annotated genes from 49 sequenced plant species as compared to 25 species in the previous version. In the current version, we also added several new features: (i) a new viewer was developed to show phylogenetic trees displayed along with the structure of individual genes; (ii) genes in the phylogenetic tree can now be also grouped according to KOG (The annotation of Eukaryotic Orthologous Groups) and KO (KEGG Orthology) in addition to Pfam domains; (iii) information on intronless genes are now included in the database; (iv) a statistical summary of global gene structure information for each species and its comparison with other species was added; and (v) an improved GSDraw tool was implemented in the web server to enhance the analysis and display of gene structure. The updated PIECE 2.0 database will be a valuable resource for the plant research community for the study of gene structure and evolution. © Crown copyright 2016.

Entities:  

Mesh:

Year:  2016        PMID: 27742820      PMCID: PMC5210635          DOI: 10.1093/nar/gkw935

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Eukaryotes possess ‘genes in pieces’ in which the protein-coding exon sequences are interrupted by non-coding intron sequences (1). Most of the genes in eukaryotes contain introns and exons, thus, understanding the organization of the intron–exon structure is important because such information reveals conserved or diverged structures of genes from different species (orthologs), and/or of the different family members (paralogs), providing insights into the process of gene evolution. Recent advances in sequencing technologies have led to an unprecedented progress in generating genome sequence data and opened an new era for comparative genomics studies (2,3). These data sets allow researchers to address many fundamental evolutionary questions at a genome-wide comparative scale. In the comparative analyses of gene structure, gene sequences from different plant genomes are often grouped by gene family members and/or ortholog clusters. Using the phylogenetic analysis tool, along with the prediction of gene structure, one can identify intron–exon patterns and intron gain or loss events in the grouped gene sequences. The reconstruction of intron gain/loss events during the evolutionary history of a gene provides valuable information for clarifying the evolutionary relationships within large gene families and facilitate a deeper understanding of the possible functional implications, including the generation or disruption of lineage-specific alternative splicing events (4). In order to study gene structure evolution in species with sequenced genomes, user-friendly and publicly available resources are necessary. PIECE (Plant Intron Exon Comparison and Evolution) is an intron–exon database that provides a powerful platform to compare gene structure among plant species (5). It was released in 2012 and published in the 2013 Nucleic Acids Research database issue. During the past 4 years, the genomic sequence data for plant species have undergone substantial expansion. The increasing number of genes from more sequenced plant genomes has greatly enriched the gene intron–exon database, but requires the phylogenetic analysis at a much larger scale for accurate dissection of the evolution of plant intron–exon organization. Comparative analysis of intron–exon architecture is important for understanding the rules governing gene structure organization, protein functionality and evolutionary changes among plant species. Here, we present a new version of PIECE (PIECE 2.0, http://probes.pw.usda.gov/piece/ or http://aegilops.wheat.ucdavis.edu/piece/). In this updated version, we made significant modifications and improvements to the original database by increasing more genome data, enhancing web display, adding new useful features. The updated version contains 2 089 560 protein coding genes from 49 plant species; more than twice the number in the previous version (25 species). In order to view gene structure data for large gene families between multiple species, we developed a new interactive viewer that provides an easy method for displaying and analyzing intron–exon gene structures arranged via a phylogenetic analysis. Several new features have been integrated into the current version, including displaying gene structures according to KOG (The annotation of Eukaryotic Orthologous Groups) (6) and KO (KEGG Orthology) (7) information, intronless gene database. Also a global gene structure summary for each species is available, allowing species level comparisons to be made. Finally, we also updated the GSDraw tool, which can now more conveniently produces customizable, high-quality gene structure images including a phylogenetic tree using input files from users.

NEW FEATURE

Data update

In PIECE 2.0, we have updated the gene structure data from more sequenced plant species. The raw genome data sets were downloaded from Phytozome 11.0 (8) and gene structure data sets were refined by our in-house pipeline (Supplementary Figure S1). To build the PIECE database, we parsed and analyzed five types of files from the raw data to extract intron-exon information, including genomic sequence files, transcript sequence files, protein sequence files, gene annotation files and general feature format (GFF) files containing coordinate data of the gene structure in genomic sequences. Now PIECE 2 089 560 protein coding genes from 49 plant species, covering major lineages of the plant kingdom (Supplementary Table S1).

Search system improvement

PIECE 2.0 has a user-friendly system for finding target genes. Currently, the search page allows users to make queries with six types of keywords: (i) gene symbol; (ii) locus name; (iii) gene description; (iv) Pfam (9) ID; (v) KOG ID; and (vi) KO ID. For each type, there is an example link in the text box (#1 to #6) on the search page. The example link helps the user easily learn how to perform the search in PIECE. Moreover, users can choose the plant species from an interactive taxonomy tree, and the selected species are shown on the right of the search page. The taxonomy tree was generated by phyloT (http://phylot.biobyte.de) based on NCBI taxonomy (10) and manually compared with the tree on Phytozome BLAST page (https://phytozome.jgi.doe.gov/pz/portal.html#!search?show=BLAST). Users also can set whether the alternative splice isoforms and introless genes are displayed on the search result page. The main page of the search results lists all the genes meeting the search criteria and provides brief information, such as index number, species name, Pfam IDs, KOG ID, KO ID, exon number, symbol, annotation and ortholog analysis options (Figure 1). Compared with PIECE 1.0, there are two added features in the search results with PIECE 2.0. One is the exon number, which helps users quickly learn any intron and exon number variations. The other is the KOG and KO ID information for the genes. PIECE 2.0 can not only show phylogenetic trees displayed along with gene structure information by clicking on a Pfam domain, but also reconstruct the phylogenetic trees using the genes searched with KOG or KO category.
Figure 1.

An example of PIECE (Plant Intron Exon Comparison and Evolution) 2.0 search results. (A) Gene ID column with link to sequence detail. (B) Pfam, KOG and KO information with link to gene structure view. (C) Ortholog analysis options include showing gene structure, GLOOM analysis and Exalign analysis.

An example of PIECE (Plant Intron Exon Comparison and Evolution) 2.0 search results. (A) Gene ID column with link to sequence detail. (B) Pfam, KOG and KO information with link to gene structure view. (C) Ortholog analysis options include showing gene structure, GLOOM analysis and Exalign analysis.

New gene structure viewer

In PIECE 1.0, we developed a graphical viewer that produces high-quality images to display gene structure, Pfam domain and phylogenetic tree together. The elements of gene structure including exon, intron and Pfam domain are interactive that enables the user to select, click and flexibly modify colors. In PIECE 2.0, the new viewer provides users a friendly way to display and analyze intron–exon data within the phylogenetic tree (Figure 2) through HTML5 canvas visualization using the open source Javascript library named Phylo.io (11). Phylo.io is a web framework to visualize and compare phylogenetic trees side-by-side. The original Plylo.io only displays the trees. Therefore, we modified the source code so that it will also show the gene structure pattern diagram linked to the resulting bootstrapped dendrogram for each gene category. Phylo.io can improve the legibility of large trees by estimating an optimal collapsing depth using the number of leaves and the size of viewing area (11). Users can swap, collapse and expand the tree by clicking the nodes or reroot the tree by clicking the tree branches. The coordinates of gene structure elements in the diagram will be changed accordingly for each gene when the tree is adjusted. Users can also easily search for individual genes in the tree, and the gene position in the tree will be highlighted in red along with the path to that leaf from the root of the tree. By default, the highlighted gene is the one selected by the user in the search result page. The highlighted item can be removed if the user clicks the search button without any input. Unlike the previous version, users can now change the gene structure display in the viewer by directly selecting display type in the left panel. There are five types available (i) genomic sequences; (ii) aligned by the CDS start points; (iii) genomic sequences without UTR; (iv) protein sequences, and (v) aligned protein sequences. The new viewer also has several options to allow users to change the attributes of the diagram including setting color of and hiding elements of gene structure, and zoom in and zoom out function. Moreover, the user can directly download the diagram as high-quality SVG format file. As a demonstration, Figure 2 shows the gene structure of Lipoxygenase proteins (Pfam domain PF00305) in Arabidopsis thaliana and Populus trichocarpa using the new viewer. The phylogenetic distribution suggests that LOX genes could be grouped into two major subclasses, Class I and Class II (Figure 2C). This result is consistent with a recent study of genome-wide analysis of LOX genes in Poplar (12). No sister gene pairs were found in the tree between Arabidopsis thaliana and Populus trichocarpa, suggesting that multiple gene duplications occurred in the course of LOX gene evolution in Poplar. Moreover, most LOX genes in the viewer contain two protein domains (Pfam domain PF01477 and Pfam domain PF00305) except Potir.001G227100.1. The number of exons in these duplicated LOX genes ranged from 1 to 9, indicating a complex distribution pattern of intron-exon structure in the LOX genes.
Figure 2.

The new PIECE viewer showing the gene structures of LOX proteins (Pfam domain PF00305) in Arabidopsis thaliana and Populus trichocarpa. (A) Control panel for user selection of parameters in phylogenetic tree reconstruction and gene structure display. (B) Operation panel for search ID, manipulating the tree and download the picture. (C) Gene structure pattern diagram linked to the resulting bootstrapped dendrogram. (D) A example showing the expanding of a subgroup from the phylogenetic tree.

The new PIECE viewer showing the gene structures of LOX proteins (Pfam domain PF00305) in Arabidopsis thaliana and Populus trichocarpa. (A) Control panel for user selection of parameters in phylogenetic tree reconstruction and gene structure display. (B) Operation panel for search ID, manipulating the tree and download the picture. (C) Gene structure pattern diagram linked to the resulting bootstrapped dendrogram. (D) A example showing the expanding of a subgroup from the phylogenetic tree.

Intronless gene

Although many eukaryote genes carry introns, a significant portion of eukaryote genes lack introns. Intronless genes have gained increasing attention because of their implication in understanding evolutionary patterns of related genes and genomes (13). Currently, there are only two specialized database named PIGD (13) and IGDD (14) for plant intronless genes. PIGD only contains five species from the Poaceae family and IGDD only provides the data set of intronless genes from five eudicot species. In PIECE 2.0, we created a searchable database containing 353 515 predicted intronless genes from the 49 plant species (Supplementary Table S1). We annotated these intronless genes using the annotation information in the Phytozome database. PIECE 2.0 provides information of each intronless gene sequence, its genomic location, Pfam, isoelectric point (PI), molecular weight (MW), KOG and GO term as well as subcellular localization predicted by WoLF PSORT (15). Such information will be displayed in the result page when a search of an intronless gene is made. PIECE 2.0 also provides a function to compare intronless with intron-containing genes from the same gene family or with the orthologous genes. For example, AT1G15700.1, which encodes the gamma subunit of chloroplast ATP synthase, is intronless and contains a protein domain (Pfam domain PF00231). Supplementary Figure S2A shows that based on the search using the PF00231 domain, two related Arabidopsis genes are also intronless, while one related gene has 7 introns. Interestingly, most orthologs of AT1G15700.1 are intronless (Supplementary Figure S2B), but the genes from green algae all have introns. This observation could suggest that most of the introns were lost in the evolution of aquatic plants (green algae) to land plants.

Statistics of gene structure characterization

In PIECE 2.0, clicking on ‘Browse’ from the home page will provide the statistical summary of the global gene structure information for each species in the current database, including total number, total length, median length, average length of genes as shown in a table format. A box plot of gene, exon, intron, 5′ UTR and 3′ UTR are also displayed in the page, allowing the user easily compare the median, maximum and minimum length of each element (Supplementary Figure S3). PIECE 2.0 also provides the utility to compare the attribute of gene structure among plant species using the ‘Compare’ page. The result is shown in a dot plot, which has zoom in and zoom out feature by clicking and dragging. Users can choose an element of a gene structure for x-axis or y-axis to redraw the plot. Figure 3A is a dot plot showing the comparison of average intron length (x-axis) with average exon length (y-axis) among plant species. The result shows that the ratio of the average intron length and average exon length is in the range of 1.5–3.5 for most plant species, meaning that they have a similar average intron and exon length ratio (Figure 3B). Some exceptions include Ostreococcus lucimarinus, Micromonas sp. RCC299 and Micromonas pusilla, where the intron average length is shorter than the exon average length (ratio < 1). In addition, the average intron length of Vitis vinifera and Amborella trichopoda is much longer than the exon average length (ratio > 5) (Figure 3A).
Figure 3.

Dot plot for intron and exon average length comparison in the 49 species in PIECE 2.0. (A) Display of all 49 species. (B) A zoom-in picture of the analysis result of selected species.

Dot plot for intron and exon average length comparison in the 49 species in PIECE 2.0. (A) Display of all 49 species. (B) A zoom-in picture of the analysis result of selected species.

GSDraw update

The comparison and visualization of gene structure for gene family or homologous genes offers an important method for biologists to analyze gene evolution. A number of web server applications for gene structure analysis have been developed, such as FancyGene (16), GSDraw (5), GenePainter (17) and GSDS (18). Among them, GSDraw is the one that can use sequence data (genomic, CDS and transcript sequences) to extract the gene structure, protein motif, and phylogenetic information, and then process the information together to draw the display diagram. In PIECE 2.0, we integrated GSDraw with the new gene structure viewer discussed above to provide an efficient and interactive graphical interface for analyzing gene structure. Users can set their own parameters in GSDraw for protein motif identification and phylogenetic tree reconstruction. GSDraw can now also draw a gene structure diagram by uploading a GFF file, a feature that was added based on the feedback from PIECE 1.0 users.

DISCUSSION

A feature of eukaryotic gene structure is that the genomic sequences of protein-coding genes are frequently interrupted by introns, its generation are relics of primordial genes (19–21). The evolutionary dynamics of intron–exon organization can be used to infer the evolutionary history of a gene family. One of the principle methods of performing this type of analysis is to reconstruct a phylogenetic tree that best represents the evolutionary histories of gene families. In such analysis, sequence alignments are used to create phylogenetic trees and then this information is combined with the intron–exon organization of each member in the gene family. In plants, very few databases provide comparative information on gene structure evolution. Phytozome (8) and PLAZA (22) are well known plant genome database where intron–exon sequences of plant genes can be obtained. However, these resources lack comparative analytical capabilities to investigate intron–exon structural evolution through comparing gene families and orthologs from different species (5). A recent released database, named JuncDB (23), is a database for comparing exonic architectures. However, it only uses transcripts of orthologous genes in data process and does not focus on plant species. Unlike other genomic databases, PIECE is a plant intron–exon database that allows comparative analysis at a genome-wide level. Users can easily integrate, visualize and analyze phylogenetic trees, intron–exon structure, protein domains and intron phase. PIECE has been widely used by plant biologists for functional and evolutionary studies of gene families and orthologs (24–29). Here, we reported the new PIECE 2.0 version with updated gene structure data sets by including 24 newly sequenced genomes since the release of PIECE 1.0 and with a significant improvement of the view function to display phylogenetic trees along with gene structure and protein domains. In addition, PIECE 2.0 can now compare gene structures with four categories, including Pfam, KOG, KO and ortholog. This can help users better understand the evolution of gene structure. PIECE 2.0 also provides a web resource for the collection and analysis of plant intronless genes. In conclusion, it is anticipated that PIECE 2.0 will prove a useful resource for addressing important biology questions regarding the organization and evolution of gene structure in the evolutionary history of plant species.
  29 in total

1.  In search of lost introns.

Authors:  Miklós Csurös; J Andrew Holey; Igor B Rogozin
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

2.  Characterization of intron loss events in mammals.

Authors:  Jasmin Coulombe-Huntington; Jacek Majewski
Journal:  Genome Res       Date:  2006-11-15       Impact factor: 9.043

3.  Genome-wide investigation and transcriptome analysis of the WRKY gene family in Gossypium.

Authors:  Mingquan Ding; Jiadong Chen; Yurong Jiang; Lifeng Lin; YueFen Cao; Minhua Wang; Yuting Zhang; Junkang Rong; Wuwei Ye
Journal:  Mol Genet Genomics       Date:  2014-09-05       Impact factor: 3.291

4.  Type material in the NCBI Taxonomy Database.

Authors:  Scott Federhen
Journal:  Nucleic Acids Res       Date:  2014-11-14       Impact factor: 19.160

5.  The Lipoxygenase Gene Family in Poplar: Identification, Classification, and Expression in Response to MeJA Treatment.

Authors:  Zhu Chen; Xue Chen; Hanwei Yan; Weiwei Li; Yuan Li; Ronghao Cai; Yan Xiang
Journal:  PLoS One       Date:  2015-04-30       Impact factor: 3.240

6.  Characterization of the Maize Chitinase Genes and Their Effect on Aspergillus flavus and Aflatoxin Accumulation Resistance.

Authors:  Leigh K Hawkins; J Erik Mylroie; Dafne A Oliveira; J Spencer Smith; Seval Ozkan; Gary L Windham; W Paul Williams; Marilyn L Warburton
Journal:  PLoS One       Date:  2015-06-19       Impact factor: 3.240

7.  GSDS 2.0: an upgraded gene feature visualization server.

Authors:  Bo Hu; Jinpu Jin; An-Yuan Guo; He Zhang; Jingchu Luo; Ge Gao
Journal:  Bioinformatics       Date:  2014-12-10       Impact factor: 6.937

8.  JuncDB: an exon-exon junction database.

Authors:  Michal Chorev; Lotem Guy; Liran Carmel
Journal:  Nucleic Acids Res       Date:  2015-10-30       Impact factor: 16.971

9.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes.

Authors:  Eugene V Koonin; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Dmitri M Krylov; Kira S Makarova; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Igor B Rogozin; Sergei Smirnov; Alexander V Sorokin; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal:  Genome Biol       Date:  2004-01-15       Impact factor: 13.583

10.  The Pfam protein families database: towards a more sustainable future.

Authors:  Robert D Finn; Penelope Coggill; Ruth Y Eberhardt; Sean R Eddy; Jaina Mistry; Alex L Mitchell; Simon C Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A Salazar; John Tate; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2015-12-15       Impact factor: 16.971

View more
  11 in total

1.  GrainGenes: centralized small grain resources and digital platform for geneticists and breeders.

Authors:  Victoria C Blake; Margaret R Woodhouse; Gerard R Lazo; Sarah G Odell; Charlene P Wight; Nicholas A Tinker; Yi Wang; Yong Q Gu; Clay L Birkett; Jean-Luc Jannink; Dave E Matthews; David L Hane; Steve L Michel; Eric Yao; Taner Z Sen
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

2.  Ectopic expression of GhCOBL9A, a cotton glycosyl-phosphatidyl inositol-anchored protein encoding gene, promotes cell elongation, thickening and increased plant biomass in transgenic Arabidopsis.

Authors:  Erli Niu; Shuai Fang; Xiaoguang Shang; Wangzhen Guo
Journal:  Mol Genet Genomics       Date:  2018-06-05       Impact factor: 3.291

3.  CottonFGD: an integrated functional genomics database for cotton.

Authors:  Tao Zhu; Chengzhen Liang; Zhigang Meng; Guoqing Sun; Zhaoghong Meng; Sandui Guo; Rui Zhang
Journal:  BMC Plant Biol       Date:  2017-06-08       Impact factor: 4.215

4.  Evolution of chloroplast retrograde signaling facilitates green plant adaptation to land.

Authors:  Chenchen Zhao; Yuanyuan Wang; Kai Xun Chan; D Blaine Marchant; Peter J Franks; David Randall; Estee E Tee; Guang Chen; Sunita Ramesh; Su Yin Phua; Ben Zhang; Adrian Hills; Fei Dai; Dawei Xue; Matthew Gilliham; Steve Tyerman; Eviatar Nevo; Feibo Wu; Guoping Zhang; Gane K-S Wong; James H Leebens-Mack; Michael Melkonian; Michael R Blatt; Pamela S Soltis; Douglas E Soltis; Barry J Pogson; Zhong-Hua Chen
Journal:  Proc Natl Acad Sci U S A       Date:  2019-02-25       Impact factor: 11.205

5.  Fragmentation of Pooled PCR Products for Highly Multiplexed TILLING.

Authors:  Andrea Tramontano; Luka Jarc; Joanna Jankowicz-Cieslak; Bernhard J Hofinger; Katarzyna Gajek; Miriam Szurman-Zubrzycka; Iwona Szarejko; Ivan Ingelbrecht; Bradley J Till
Journal:  G3 (Bethesda)       Date:  2019-08-08       Impact factor: 3.154

6.  SilkDB 3.0: visualizing and exploring multiple levels of data for silkworm.

Authors:  Fang Lu; Zhaoyuan Wei; Yongjiang Luo; Hailong Guo; Guoqing Zhang; Qingyou Xia; Yi Wang
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

7.  Genome-wide analysis of the Glycerol-3-Phosphate Acyltransferase (GPAT) gene family reveals the evolution and diversification of plant GPATs.

Authors:  Edgar Waschburger; Franceli Rodrigues Kulcheski; Nicole Moreira Veto; Rogerio Margis; Marcia Margis-Pinheiro; Andreia Carina Turchetto-Zolet
Journal:  Genet Mol Biol       Date:  2018-03-19       Impact factor: 1.771

8.  Systematic characterization of the branch point binding protein, splicing factor 1, gene family in plant development and stress responses.

Authors:  Kai-Lu Zhang; Zhen Feng; Jing-Fang Yang; Feng Yang; Tian Yuan; Di Zhang; Ge-Fei Hao; Yan-Ming Fang; Jianhua Zhang; Caie Wu; Mo-Xian Chen; Fu-Yuan Zhu
Journal:  BMC Plant Biol       Date:  2020-08-18       Impact factor: 4.215

9.  [Mechanism of DERL3 Affecting the Proliferation, Invasion and Metastasis of Lung Adenocarcinoma A549 Cells].

Authors:  Dandan Zhou; Jiemin Wang; Ke Yang; Liping Zhang; Quan Zheng; Jun Bai; Yaqiong Hu; Qingjie Mu; Chonggao Yin; Hongli Li
Journal:  Zhongguo Fei Ai Za Zhi       Date:  2020-08-20

10.  Investigating the Viral Suppressor HC-Pro Inhibiting Small RNA Methylation through Functional Comparison of HEN1 in Angiosperm and Bryophyte.

Authors:  Neda Sanobar; Pin-Chun Lin; Zhao-Jun Pan; Ru-Ying Fang; Veny Tjita; Fang-Fang Chen; Hao-Ching Wang; Huang-Lung Tsai; Shu-Hsing Wu; Tang-Long Shen; Yan-Huey Chen; Shih-Shun Lin
Journal:  Viruses       Date:  2021-09-15       Impact factor: 5.048

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.