Literature DB >> 28365739

GrTEdb: the first web-based database of transposable elements in cotton (Gossypium raimondii).

Zhenzhen Xu1, Jing Liu2, Wanchao Ni1, Zhen Peng2, Yue Guo2, Wuwei Ye3, Fang Huang1, Xianggui Zhang1, Peng Xu1, Qi Guo1, Xinlian Shen1, Jianchang Du2.   

Abstract

Although several diploid and tetroploid Gossypium species genomes have been sequenced, the well annotated web-based transposable elements (TEs) database is lacking. To better understand the roles of TEs in structural, functional and evolutionary dynamics of the cotton genome, a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb), was constructed. A total of 14 332 TEs were structurally annotated and clearly categorized in G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1 , 12 Mutators , 435 PIF-Harbingers , 275 CACTAs and 14 Helitrons . Meanwhile, the web-based sequence browsing, searching, downloading and blast tool were implemented to help users easily and effectively to annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. GrTEdb provides resources and information related with TEs in G. raimondii , and will facilitate gene and genome analyses within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes in Gossypium species. Database URL: http://www.grtedb.org/.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28365739      PMCID: PMC5467567          DOI: 10.1093/database/bax013

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


Introduction

Transposable elements (TEs) are the most abundant DNA components in most characterized genomes of high eukaryotes (1). Based on their structural features and transposition mechanisms, TEs are generally classified into two classes: retrotransposons and DNA transposons (2). In plants, retrotransposons are further classified into two distinct orders, long terminal repeat (LTR)-retrotransposons (Ty1/Copia and Ty3/Gypsy) and non-LTR retrotransposons (LINE and SINE), whereas DNA transposons are traditionally separated into two main orders, terminal inverted repeat (TIR) (Tc1-Mariner, hAT, Mutator, PIF/Harbinger and CACTA) and Helitron (Helitron) (2, 3). Although TEs are often considered as ‘junk DNA’ due to their continuous reproduction and potential disruption of the regular host genes (4–6), more evidence has unambiguously shown that they play important roles in altering gene structures, regulation of gene expression, affecting genome evolution and creating new genes (7–9). Thus, complete identification and characterization of TEs have become a priority in genome sequencing projects, and this will largely contribute to accurate annotation of protein-coding genes and other genomic components, and play significant roles in investigating potential interaction between TEs and functional genes (10). Recently, several diploid and tetroploid Gossypium species genomes have been sequenced (11–15), and the availability of their draft genome sequences have provided an unprecedented opportunity for identification, structural and functional characterization and evolutionary analyses of TEs in this economically important crop. Gossypium raimondii (DD; 2n = 6), one of the putative D-genome parent of tetraploid cotton species [such as G. hirsutum (L). and G. barbadense (L.)] has a smaller genome size (∼737.8 Mb) (12). So, we carried out the characterization of almost all families of TEs in G. raimondii genome using comprehensive methods, and constructed the G. raimondii transposable elements database (GrTEdb) in this study. We implemented web-based sequence browsing, searching, downloading and blast tool to help users easily and effectively to annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. Thus, GrTEdb provide the first web-based friendly user interface database of TEs in Gossypium species, and will also facilitate genome evolution analyses within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes.

Construction and content of the database

The assembled sequence of the G. raimondii genome was downloaded from http://www.phytozome.com (11). A combination of structure-based and homology-based approaches was employed to identify different TEs in the G. raimondii genome. LTR-retrotransposons were characterized according to the methods previously described by Ma et al. (2006) (16): first, the LTR-retrotransposons were identified using the LTR_STRUC software; then CROSS_MATCH was used to detect elements missed by the program. The alignments were performed between G. raimondii genome and the flanking LTRs of these LTR-retrotransposons, which generated by the LTR_STRUC. Different perl scripts were written to facilitate the data mining and analyses. Other Non-LTR-retrotransposons and DNA transposons (such as L1, Mutator, PIF-Harbinger, CACTA and Helitron) were detected following the protocol provided by Holligan et al. (2006) (17): the alignment were performed between the conservative sequences of transposase in Arabidopsis thaliana and G. raimondii genomes using tblastn, and the TSD and TIR were detected using some perl scripts. The detailed manual inspection was conducted to confirm each predicted element and to define its structure and boundaries. In addition, TEs were classified into different superfamilies and families as previously described (2, 17). Only elements with clearly defined boundaries and insertion sites were deposited in the GrTEdb database. Based the above approaches, 14 332 TEs were structurally annotated and clearly categorized in the G. raimondii genome, and these elements are classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1, 12 Mutators, 435 PIF-Harbingers, 275 CACTAs and 14 Helitrons (Table 1). Based on the 80-80-80 rule (2). TEs that were assigned as Copia- and Gypsy-like elements superfamilies were then categorized into 199 and 218 distinct families respectively because of their large number in G. raimondii.
Table 1.

Summary of the identified TEs in G. raimondii

ClassOrderSuperfamilyCopy numbers
RetrotransposonsLTRCopia2929
Gypsy10 368
LINEL1299
DNA transposonsTIRMutator12
PIF-Harbinger435
CACTA275
HelitronHelitron14
Total14 332
Summary of the identified TEs in G. raimondii

User interface

GrTEdb was established to enable users to browse, search, view, analyze and download the TEs data and information. The GrTEdb database organization is navigated by six sections: Home, Browse, Search and Download, Blast, Links and Contact (Figure 1A).
Figure 1.

(A) The top menu of GrTEdb. (B) The user interface of browsing in GrTEdb. Users can browse the detailed information of each superfamily by clicking the hyperlinks provided in this page.

(A) The top menu of GrTEdb. (B) The user interface of browsing in GrTEdb. Users can browse the detailed information of each superfamily by clicking the hyperlinks provided in this page.

Browse

In the browsing interface, the classification structures of TEs deposited in GrTEdb were showed. Users can download the whole TEs sequences, and can browse any one superfamily of interest by the hyperlinks provided. The detailed information of each superfamily can be retrieved and downloaded by clicking the corresponding entry (Figure 1B).

Search and download

In the searching and downloading interface, users can use a keyword to search the GrTEdb (e.g. TE ID, Class, Order, Superfamily and Family) to locate specific TEs quickly. The search results can be viewed and downloaded by clicking the hyperlinks provided on the page (Figure 2).
Figure 2.

The searching interface of GrTEdb. Users can use a keyword to locate specific TEs quickly in GrTEdb (e.g. TE ID, Class, Order, Superfamily and Family). The search results can be viewed and downloaded by clicking the hyperlinks provided on the page.

The searching interface of GrTEdb. Users can use a keyword to locate specific TEs quickly in GrTEdb (e.g. TE ID, Class, Order, Superfamily and Family). The search results can be viewed and downloaded by clicking the hyperlinks provided on the page. In the chromosomal region search page, users can retrieve the TEs for any one entire chromosome or in a defined window around either a chromosomal position or a gene model, and the detailed information of each retrieved TEs can be viewed and downloaded by clicking the hyperlinks provided on the page (Figure 3). This function can help users to locate TEs that surround the genes of interests easily, and study the interaction between TEs and their adjacent genes.
Figure 3.

The chromosomal region search page. Users can retrieve the TE sequences for any one entire chromosome or in a defined window around either a chromosomal position or a gene model, and the detailed information of each retrieved TEs can be viewed and downloaded by clicking the hyperlinks provided on the page.

The chromosomal region search page. Users can retrieve the TE sequences for any one entire chromosome or in a defined window around either a chromosomal position or a gene model, and the detailed information of each retrieved TEs can be viewed and downloaded by clicking the hyperlinks provided on the page.

Blast

We did not intend to integrate tools currently available (except for BLAST) for sequence comparison, editing and/or assembly in our database because of the complex structural variation and distribution patterns of TEs among classes and families (Figure 4). In the BLAST search page, users can handy and quickly compare their sequences with the cotton TEs deposited in GrTEdb.
Figure 4.

The BLAST interface (left) and a sample of BLASTn results (right) provided in GrTEdb.

The BLAST interface (left) and a sample of BLASTn results (right) provided in GrTEdb.

Links

A variety of links to other TEs database were included in our GrTEdb database.

Contact

In this section, contact information and links to our labs were provided. Please feel free to contact us if you have any suggestions and problems.

Discussion

Because of the structural complexity and the time consuming process, it remains challenging to annotate all TEs in a sequenced genome. Currently only a few TE databases have been established (10, 18–24). Because these databases can help users easily and quickly annotate their sequences, and they have been widely used (10). However, in these plant TE databases such as P-MITE (a Plant MITE database), the TIGR Plant Repeat Databases, and so on, there is little information about the cotton TEs. In parallel, although there were some reports associated with TEs in Gossypium (11–15, 25, 26), the web-based database of TEs was lacked. Here we have generated a web-based TE database (GrTEdb) using multiple methods, and only TEs with clearly defined boundaries were deposited in the database. More studies have showed that many TEs are structurally incomplete because they have undergone intra- or inter-element unequal recombination or accumulation of small deletions by illegitimate recombination (27, 28). For example, a large number of LTR-RT families with highly degraded protein-coding sequences or without any coding sequences (often defined as non-autonomous elements) have been found in several plants (29–35), and these elements remains challenging to be identified and characterized. Therefore, GrTEdb provides the reference sequences of TEs data for cotton, and users can use these data to identify more complex elements and develop their specific functions. Recently, G. arboretum (A2) genome, a pupative contributor of the A subgenomes cotton species, and the allotetraploid upland cotton (AD)1 [G. hirsutum (L.)], which accounts for >90% of cultivated cotton worldwide, have been sequenced and assembled (13–15). Because of the close evolutionary relationships of DD, AA and AADD genomes, our GrTEdb database is not only useful for G. raimondii study, but also can facilitate structural and evolutionary analysis in AA, DD, AADD and other unfinished Gossypium genomes. The web-based interface can also help users at the beginning stage of bioinformatics to easily access and use this database. Further, TEs in our database will help cotton breeders develop markers for mapping agronomically important genes and accelerate breeding process.

Conclusions

We have generated a web-based GrTEdb, and it provides researchers with not only resources and information related to different TEs in the cotton genome but also tools for performing data analysis. Thus, GrTEdb will facilitate cotton genome evolution analyses among AA, DD and AADD genome species, the evaluating impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes. In parallel, TEs in our database will facilitate users for marker development for mapping agronomically important genes, and for both intra- and inter-specific comparison of TEs at whole genome levels.

Availability and requirements

All TEs or subsets of TEs can be viewed and downloaded from the website http://www.grtedb.org/, and all data deposited in the database are freely available to all users without any restrictions.

Funding

The Key Scientific and Technological Project of Jiangsu Province (BK20150540); Jiangsu Agricultural Science and Technology Innovation Fund (CX(14)5008); the State Key Laboratory of Cotton Biology Open Fund (CB2016B03); the National Natural Science Foundation of China (NSFC) (31370266, 31471545). Conflict of interest. None declared.
  35 in total

1.  The history and disposition of transposable elements in polyploid Gossypium.

Authors:  Guanjing Hu; Jennifer S Hawkins; Corrinne E Grover; Jonathan F Wendel
Journal:  Genome       Date:  2010-08       Impact factor: 2.166

Review 2.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

Review 3.  A unified classification system for eukaryotic transposable elements.

Authors:  Thomas Wicker; François Sabot; Aurélie Hua-Van; Jeffrey L Bennetzen; Pierre Capy; Boulos Chalhoub; Andrew Flavell; Philippe Leroy; Michele Morgante; Olivier Panaud; Etienne Paux; Phillip SanMiguel; Alan H Schulman
Journal:  Nat Rev Genet       Date:  2007-12       Impact factor: 53.242

Review 4.  Transposable elements and the evolution of regulatory networks.

Authors:  Cédric Feschotte
Journal:  Nat Rev Genet       Date:  2008-05       Impact factor: 53.242

5.  Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution.

Authors:  Fuguang Li; Guangyi Fan; Cairui Lu; Guanghui Xiao; Changsong Zou; Russell J Kohel; Zhiying Ma; Haihong Shang; Xiongfeng Ma; Jianyong Wu; Xinming Liang; Gai Huang; Richard G Percy; Kun Liu; Weihua Yang; Wenbin Chen; Xiongming Du; Chengcheng Shi; Youlu Yuan; Wuwei Ye; Xin Liu; Xueyan Zhang; Weiqing Liu; Hengling Wei; Shoujun Wei; Guodong Huang; Xianlong Zhang; Shuijin Zhu; He Zhang; Fengming Sun; Xingfen Wang; Jie Liang; Jiahao Wang; Qiang He; Leihuan Huang; Jun Wang; Jinjie Cui; Guoli Song; Kunbo Wang; Xun Xu; John Z Yu; Yuxian Zhu; Shuxun Yu
Journal:  Nat Biotechnol       Date:  2015-04-20       Impact factor: 54.908

6.  The transposable element landscape of the model legume Lotus japonicus.

Authors:  Dawn Holligan; Xiaoyu Zhang; Ning Jiang; Ellen J Pritham; Susan R Wessler
Journal:  Genetics       Date:  2006-10-08       Impact factor: 4.562

7.  Selfish genes, the phenotype paradigm and genome evolution.

Authors:  W F Doolittle; C Sapienza
Journal:  Nature       Date:  1980-04-17       Impact factor: 49.962

8.  Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis.

Authors:  Katrien M Devos; James K M Brown; Jeffrey L Bennetzen
Journal:  Genome Res       Date:  2002-07       Impact factor: 9.043

9.  Genome sequence of the cultivated cotton Gossypium arboreum.

Authors:  Fuguang Li; Guangyi Fan; Kunbo Wang; Fengming Sun; Youlu Yuan; Guoli Song; Qin Li; Zhiying Ma; Cairui Lu; Changsong Zou; Wenbin Chen; Xinming Liang; Haihong Shang; Weiqing Liu; Chengcheng Shi; Guanghui Xiao; Caiyun Gou; Wuwei Ye; Xun Xu; Xueyan Zhang; Hengling Wei; Zhifang Li; Guiyin Zhang; Junyi Wang; Kun Liu; Russell J Kohel; Richard G Percy; John Z Yu; Yu-Xian Zhu; Jun Wang; Shuxun Yu
Journal:  Nat Genet       Date:  2014-05-18       Impact factor: 38.330

10.  MnTEdb, a collective resource for mulberry transposable elements.

Authors:  Bi Ma; Tian Li; Zhonghuai Xiang; Ningjia He
Journal:  Database (Oxford)       Date:  2015-02-27       Impact factor: 3.451

View more
  6 in total

1.  Diversity, distribution and dynamics of full-length Copia and Gypsy LTR retroelements in Solanum lycopersicum.

Authors:  Rosalía Cristina Paz; Melisa Eliana Kozaczek; Hernán Guillermo Rosli; Natalia Pilar Andino; Maria Virginia Sanchez-Puerta
Journal:  Genetica       Date:  2017-08-03       Impact factor: 1.082

2.  Identification of transposons near predicted lncRNA and mRNA pools of Prunus mume using an integrative transposable element database constructed from Rosaceae plant genomes.

Authors:  Kaifeng Ma; Qixiang Zhang; Tangren Cheng; Jia Wang
Journal:  Mol Genet Genomics       Date:  2018-05-26       Impact factor: 3.291

3.  TE-greedy-nester: structure-based detection of LTR retrotransposons and their nesting.

Authors:  Matej Lexa; Pavel Jedlicka; Ivan Vanat; Michal Cervenansky; Eduard Kejnovsky
Journal:  Bioinformatics       Date:  2020-12-22       Impact factor: 6.937

4.  An Atlas of Plant Transposable Elements.

Authors:  Daniel Longhi Fernandes Pedro; Tharcisio Soares Amorim; Alessandro Varani; Romain Guyot; Douglas Silva Domingues; Alexandre Rossi Paschoal
Journal:  F1000Res       Date:  2021-11-24

5.  A comprehensive annotation dataset of intact LTR retrotransposons of 300 plant genomes.

Authors:  Shan-Shan Zhou; Xue-Mei Yan; Kai-Fu Zhang; Hui Liu; Jie Xu; Shuai Nie; Kai-Hua Jia; Si-Qian Jiao; Wei Zhao; You-Jie Zhao; Ilga Porth; Yousry A El Kassaby; Tongli Wang; Jian-Feng Mao
Journal:  Sci Data       Date:  2021-07-15       Impact factor: 6.444

6.  Structural and Functional Annotation of Transposable Elements Revealed a Potential Regulation of Genes Involved in Rubber Biosynthesis by TE-Derived siRNA Interference in Hevea brasiliensis.

Authors:  Shuangyang Wu; Romain Guyot; Stéphanie Bocs; Gaëtan Droc; Fetrina Oktavia; Songnian Hu; Chaorong Tang; Pascal Montoro; Julie Leclercq
Journal:  Int J Mol Sci       Date:  2020-06-13       Impact factor: 5.923

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.