| Literature DB >> 22545773 |
Alexandros Bousios1, Evangelia Minga, Nikoleta Kalitsou, Maria Pantermali, Aphrodite Tsaballa, Nikos Darzentas.
Abstract
BACKGROUND: Sireviruses are an ancient genus of the Copia superfamily of LTR retrotransposons, and the only one that has exclusively proliferated within plant genomes. Based on experimental data and phylogenetic analyses, Sireviruses have successfully infiltrated many branches of the plant kingdom, extensively colonizing the genomes of grass species. Notably, it was recently shown that they have been a major force in the make-up and evolution of the maize genome, where they currently occupy ~21% of the nuclear content and ~90% of the Copia population. It is highly likely, therefore, that their life dynamics have been fundamental in the genome composition and organization of a plethora of plant hosts. To assist studies into their impact on plant genome evolution and also facilitate accurate identification and annotation of transposable elements in sequencing projects, we developed MASiVEdb (Mapping and Analysis of SireVirus Elements Database), a collective and systematic resource of Sireviruses in plants. DESCRIPTION: Taking advantage of the increasing availability of plant genomic sequences, and using an updated version of MASiVE, an algorithm specifically designed to identify Sireviruses based on their highly conserved genome structure, we populated MASiVEdb (http://bat.infspire.org/databases/masivedb/) with data on 16,243 intact Sireviruses (total length >158Mb) discovered in 11 fully-sequenced plant genomes. MASiVEdb is unlike any other transposable element database, providing a multitude of highly curated and detailed information on a specific genus across its hosts, such as complete set of coordinates, insertion age, and an analytical breakdown of the structure and gene complement of each element. All data are readily available through basic and advanced query interfaces, batch retrieval, and downloadable files. A purpose-built system is also offered for detecting and visualizing similarity between user sequences and Sireviruses, as well as for coding domain discovery and phylogenetic analysis.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22545773 PMCID: PMC3414828 DOI: 10.1186/1471-2164-13-158
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Properties of the host species and their Sirevirus populations included in MASiVEdb
| thale cress | eudicot | 116 | 5 | 4 | 0.95 | 2 | |
| brome | monocot | 262 | 5 | 22 | 1.97 | 14 | |
| n/a | alga | 99 | 17 | 0 | n/a | 0 | |
| strawberry | eudicot | 203 | 7 | 1 | 0.45 | 0 | |
| soybean | eudicot | 915 | 20 | 1337 | 0.45 | 1294 | |
| lotus | eudicot | 291 | 7 | 282 | 0.35 | 270 | |
| rice | monocot | 362 | 12 | 25 | 2.18 | 14 | |
| rice | monocot | 370 | 12 | 91 | 1.29 | 42 | |
| n/a | alga | 13 | 21 | 0 | n/a | 0 | |
| poplar | eudicot | 304 | 19 | 0 | n/a | 0 | |
| sorghum | monocot | 633 | 10 | 522 | 0.80 | 227 | |
| cacao | eudicot | 214 | 10 | 77 | 3.17 | 52 | |
| grapevine | eudicot | 414 | 19 | 49 | 1.73 | 45 | |
| maize | monocot | 1969 | 10 | 13833 | 1.29 | 516 | |
| 16243 | 1.20 | 2476 |
aThe chromosome sequence data for Arabidopsis were downloaded from http://www.arabidopsis.org/, for brachypodium, Chlamydomonas, soybean, poplar, sorghum and grapevine from http://www.phytozome.net/, for strawberry from http://www.strawberrygenome.org/, for lotus from http://www.kazusa.or.jp/lotus/, for rice (indica) from http://rice.genomics.org.cn/rice/, for rice (japonica) from http://rgp.dna.affrc.go.jp/E/IRGSP/Build5/build5, for Ostreococcus from http://genome.jgi-psf.org/, for cacao from http://cocoagendb.cirad.fr/, and for maize from http://www.maizesequence.org/; bThe genome sizes do not represent the real estimates for each species, but correspond to the cumulative Mb found in sequence files for chromosomes after removing unanchored contigs and scaffolds; chr, chromosome; SVs, Sireviruses; my, million years; ENV, envelope-like gene.
Figure 1The web interface of MASiVEdb. The large buttons on the home page lead to the four main sections, including batch retrieval and downloads captured here. Clicking on the ‘radar’ icon of each species produces a Circos-based [55] image of the abundance, chromosomal localization and age distribution of its Sireviruses.
Figure 2The query interface and output of MASiVEdb. (A) Simple form: the user has to select a host species or all species from the drop-down menu, which then opens an adapted list of choices and filters. (B) Advanced form: here users can retrieve multiple information from multiple species simultaneously. (C) The output matrix is common for both forms and permits further interactive processing (i.e. sorting and filtering).
Figure 3The sequence search interface and output of MASiVEdb. Besides reporting on the results of the analysis, the output page also provides links to the RT- and INT-based Copia phylogenetic trees and to the visualization of the sequence similarity through Circoletto.