| Literature DB >> 26792975 |
Luca Ambrosino1, Hamed Bostan1, Pasquale di Salle1, Mara Sangiovanni2, Alessandra Vigilante3, Maria L Chiusano1.
Abstract
Arabidopsis thaliana is widely accepted as a model species in plant biology. Its genome, due to its small size and diploidy, was the first to be sequenced among plants, making this species also a reference for plant comparative genomics. Nevertheless, the evolutionary mechanisms that shaped the Arabidopsis genome are still controversial. Indeed, duplications, translocations, inversions, and gene loss events that contributed to the current organization are difficult to be traced. A reliable identification of paralogs and single-copy genes is essential to understand these mechanisms. Therefore, we implemented a dedicated pipeline to identify paralog genes and classify single-copy genes into opportune categories. PATsi, a web-accessible database, was organized to allow the straightforward access to the paralogs organized into networks and to the classification of single-copy genes. This permits to efficiently explore the gene collection of Arabidopsis for evolutionary investigations and comparative genomics.Entities:
Keywords: Arabidopsis thaliana; paralogs; singletons
Year: 2016 PMID: 26792975 PMCID: PMC4710182 DOI: 10.4137/EBO.S32536
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Details of the Arabidopsis gene collection and classification.
| CLASSIFICATION | GENE NUMBER | ANALYSIS |
|---|---|---|
| Nonprotein-coding genes | 6070 | miscRNAs, tRNAs, rRNAs, ncRNAs, pseudogenes, transposons and unknown genes |
| Paralogs classified into networks | 22522 | All-against-all BLASTp |
| Unassigned genes due to the Rost’s formula | 405 | Filtering with Rost’s formula |
| Unassigned genes due to the masking filter | 213 | All-against-all BLASTp |
| Unassigned genes due to loose protein similarity | 440 | All-against-all BLASTp |
| Unassigned genes due to the ORF annotation error | 2 | Transcripts BLASTx |
| Unassigned genes due to similarities with nonprotein-coding genes | 178 | Full genes BLASTn |
| Unassigned genes due to similarities with intergenic regions | 0 | Full genes BLASTn |
| Singletons not confirmed by ESTs (no EST trace) | 24 | Transcripts BLASTn (free |
| Singletons not confirmed by ESTs (discarded by E-value cutoff) | 688 | Filtering of BLASTn results by |
| Singletons not confirmed by ESTs (discarded by coverage and identity requirements) | 201 | Filtering of BLASTn versus EST results by coverage and identity |
| Singletons not confirmed by ESTs | 100 | Filtering by Delta ≥ 20 (EST length ≥ 20 nt than the transcript) |
| Singletons confirmed by ESTs | 9 | 0 < Delta < 20 (EST length greater than transcript but less than 20 nt) |
| Singleton confirmed by ESTs | 2387 | Delta ≤ 0 (Transcript longer than the EST) |
Notes: A summary of the classes of genes defined in pATsi. The analyses performed to obtain genes in each class are also reported.
Figure 1Example view of the largest network of paralogs. Network of paralogs consisting of 6,834 genes.
Notes: Each dot in dark gray represents a single gene, and each line in light gray represents a paralogy relationship between two genes.
Figure 2Possible queries workflow in pATsi web interface. (A) Main page of the pATsi database browser; for each query, the user can switch from the gene view (bordered in green) to the class view (bordered in red). (B) List of genes associated with a query. (C) Gene information page. In (C2), a network graph is shown; each circle is a GeneID, with the light blue-circled one representing the selected GeneID and the yellow-circled one(s) representing the paralog(s) of the selected GeneID; gray lines represent paralogies between genes. (D) List of networks associated with a query. (E) Network information page.
Figure 3Network organization. (A) List of subnetworks associated with a network query. (B) Network information page. (C) Graphic representation of a network of 24 genes (NET88G24_4) splitted into three subnetworks (NET88G24_4 NET1-NET2-NET3) and one singleton (NET88G24_4 SIN1).
Figure 4Example of pATsi usage. (A) List of genes associated with the NET253G11_2 network. (B) Graph view of NET253G11_2 network. (C) Graph view of NET253G11_2_NET1 and NET253G11_2_NET12 subnetworks.
Notes: Orange circles represent genes annotated as gamma carbonic anhydrases; yellow circles represent genes annotated as serine acetyltransferases; gray circle represents a gene with an unknown function.