| Literature DB >> 30497367 |
Luca Ambrosino1,2, Valentino Ruggieri1,3, Hamed Bostan1,4, Marco Miralto1,2, Nicola Vitulo5, Mohamed Zouine6, Amalia Barone1, Mondher Bouzayen6, Luigi Frusciante1, Mario Pezzotti5, Giorgio Valle7, Maria Luisa Chiusano8,9.
Abstract
BACKGROUND: "Omics" approaches may provide useful information for a deeper understanding of speciation events, diversification and function innovation. This can be achieved by investigating the molecular similarities at sequence level between species, allowing the definition of ortholog and paralog genes. However, the spreading of sequenced genome, often endowed with still preliminary annotations, requires suitable bioinformatics to be appropriately exploited in this framework.Entities:
Keywords: Comparative genomics; Grapevine; Orthologs; Paralogs; Species specific gene loci; Tomato
Mesh:
Substances:
Year: 2018 PMID: 30497367 PMCID: PMC6266932 DOI: 10.1186/s12859-018-2420-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Comparison between genes, mRNAs and proteins similarity searches results. a Venn diagram showing differences and similarities in the number of BBHs detected using genes, mRNAs and protein sequences. b Venn diagram showing the number of S. lycopersicum genes that have an ortholog counterpart in V. vinifera, and vice versa
Fig. 2General overview of the cross comparison between S. lycopersicum and V. vinifera. Tomato and grapevine genes are represented in red and in green, respectively. BBHs are shown in orange background; paralogs detected with the stringent e-value threshold (e− 50) are shown in green background; low similarities detected with the loose e-value threshold (e− 3) are shown in blue background; species-specific genes, including paralogs and single-copy genes (singletons) are shown in light gray background
Fig. 3Ortholog/paralog networks detected with a stringent e-value threshold (e− 50). a Bar chart showing the number of networks classified according to their size. b Scatter plots showing the distribution of the networks based on the respective number of genes from S. lycopersicum and V. vinifera. The diameter of the circles is proportional to the number of BBHs inside each network
Summary statistics of networks detected using different e-value thresholds
| e−3 | e− 50 | |
|---|---|---|
| Total nodes | 61,269 | 54,655 |
| Total edges | 3,699,964 | 1,354,314 |
| Tomato nodes | 32,333 | 27,547 |
| Grapevine nodes | 28,936 | 27,108 |
| Orthology edges | 17,823 | 17,823 |
| Paralogy edges | 3,682,141 | 1,336,491 |
| Total networks | 641 | 3601 |
| Total 2-genes networks | 385 | 2143 |
| Total 3–9 genes networks | 243 | 1356 |
| Total 10+ genes networks | 12 | 102 |
| “Big network” nodes | 59,306 | 43,236 |
| “Big network” edges | 3,695,231 | 1,328,306 |
| “Big network” tomato nodes | 31,312 | 21,456 |
| “Big network” grapevine nodes | 27,994 | 21,780 |
Statistics for the networks detected by using different e-value thresholds (e−50, for defining paralogs, and e−3, to define even looser similarities, respectively)
Functional description similarity between tomato and grapevine orthologs
| Functional Annotation dentity percentage | Number of orthology relationships | Description examples |
|---|---|---|
| 100 (%) | 8652 | Photosystem II D2 protein |
| 80–99 (%) | 1087 | E3 ubiquitin-protein ligase RING1 |
| 60–79 (%) | 322 | Probable pectate lyase P59 |
| 40–59 (%) | 1442 | Germin-like protein subfamily 1 member 14 |
| 20–39 (%) | 4782 | 9-divinyl ether synthase |
| 0–19 (%) | 1629 | Metal transporter Nramp6 |
Summary of the functional description similarity between tomato and grapevine orthologs, based on software inspection of matching fonts between tomato and grapevine
Fig. 4Co-expression analysis of the complete set of BBHs showing expression in at least one tissue/stage. For each of the 24 clusters identified the profiling (in grey) and the centroid (in violet) are showed. T1, T2 and T3 represent tomato fruit stages (T1 = 2 cm fruit, T2 = breaker and T3 = mature fruit) while G1, G2 and G3 represent grapevine fruit stages (G1 = post-setting, G2 = veraison and G3 = mature berry), all in physiological conditions. Numbers are used to indicate each cluster. Number of clustered genes are indicated in red
Tomato and grapevine genome annotation versions
| Database | Tomato | Grapevine |
|---|---|---|
| PHYTOZOME | iTAG 2.4 | Genoscope v2 = CRIBI v0 |
| PLAZA | iTAG 2.4 | Genoscope v1 |
| ENSEMBL PLANTS | iTAG 2.4 | CRIBI v1 |
| GRAMENE | iTAG 2.4 | CRIBI v1 |
| PLANTGDB | GenBank Release 160.0 | Genoscope v2 = CRIBI v0 |
| GREENPHYL | iTAG 2.3 | Genoscope v2 = CRIBI v0 |
| EGGNOG | iTAG 2.4 | CRIBI v1 |
| INPARANOID | iTAG 2.4 | CRIBI v1 |
Tomato and grapevine genome annotation versions available in most used comparative databases
Fig. 5Groups of BBHs detected by each level of analysis. The confirmation level is shown: BBH detected at gene (blue), transcript (orange) and protein (grey) sequence level. The diameter of each circle is proportional to the BBH score average. The consensus groups pull together the BBHs that are common to all three different methods
Fig. 6Ortholog/paralog network involving EIL transcription factor. Red circles represent tomato genes; purple circles represent grapevine genes; gray lines represent paralogy relationships; black double lines represent orthology relationships. In the table, the expression values in RPKM (Reads Per Kilobases per Million) shown for each of the considered fruit developmental stage, are associated with the genes of the network. Stage 1, 2 and 3 correspond to 2 cm fruit, breaker and mature fruit in tomato, and to post-setting, veraison and mature berry in grapevine, respectively
Statistics of alignments based on sequence similarity of gene, transcript and protein comparisons
| (A) EXAMPLE 1 | ||||||||
| Query length | Subject length | Query coverage | Subject coverage | Identity/alig. Length | Positives/alig. Length | Score | e-value | |
| Solyc07g008880.2 versus VIT_09s0002g07070 | ||||||||
| Gene vs Gene | 12,772 | 12,589 | 7556/12772 | 7575/12589 | 6338/7556 | – | 6808 | 0 |
| mRNA vs mRNA | 2559 | 2476 | 2289/2559 | 2289/2476 | 2152/2289 | 2200/2289 | 5385 | 0 |
| Protein vs Protein | 2384 | 2348 | 2336/2384 | 2340/2348 | 2267/2343 | 2306/2343 | 4737 | 0 |
| Solyc06g043160.1 versus VIT_09s0002g07070 | ||||||||
| Gene vs Gene | 159 | 12,589 | 154/159 | 154/12589 | 118/154 | – | 82 | 3e−15 |
| mRNA vs mRNA | 53 | 2476 | 53/53 | 53/2476 | 43/53 | 47/53 | 111 | 7e−26 |
| Protein vs Protein | 52 | 2348 | 52/52 | 52/2348 | 42/52 | 46/52 | 95 | 5e−24 |
| (B) EXAMPLE 2 | ||||||||
| Solyc01g111530.2 versus VIT_03s0038g02340 | ||||||||
| Gene vs Gene | 11,273 | 26,647 | 5531/11273 | 5540/26647 | 4318/5531 | – | 3116 | 0 |
| mRNA vs mRNA | 2044 | 1986 | 1681/2044 | 1681/1986 | 1287/1681 | 1447/1681 | 2927 | 0 |
| Protein vs Protein | 1860 | 1897 | 1860/1860 | 1896/1897 | 1500/1914 | 1669/1914 | 2689 | 0 |
| Solyc01g111530.2 versus VIT_04s0023g03830 | ||||||||
| Gene vs Gene | 11,273 | 10,303 | 3225/11273 | 3228/10303 | 2548/3225 | – | 2024 | 2e−98 |
| mRNA vs mRNA | 2044 | 1932 | 1499/2044 | 1499/1932 | 1093/1499 | 1258/1499 | 2558 | 0 |
| Protein vs Protein | 1860 | 1811 | 1764/1860 | 1799/1811 | 1190/1830 | 1407/1830 | 2210 | 0 |
| (C) EXAMPLE 3 | ||||||||
| Solyc01g007530.2 versus VIT_10s0092g00760 | ||||||||
| Gene vs Gene | 1813 | 3870 | 1024/1813 | 1034/3870 | 910/1024 | – | 1127 | 0 |
| mRNA vs mRNA | 353 | 318 | 160/353 | 160/318 | 130/160 | 141/160 | 336 | 4e−54 |
| Protein vs Protein | SEQUENCE SIMILARITY NOT DETECTED | |||||||
The first three alignments of each example lead to the prediction of an orthology relationship by the multilevel approach proposed in this work: (A) Solyc07g008880.2 versus VIT_09s0002g07070, (B) Solyc01g111530.2 versus VIT_03s0038g02340 and (C) Solyc01g007530.2 versus VIT_10s0092g00760. The second triplets of alignments of (A) and (B) lead to the prediction of an orthology relationship by The Ensembl Plants / Gramene pipelines involving the same tomato or grapevine gene implicated in the relationship inferred by our approach (Solyc06g043160.1 versus VIT_09s0002g07070 and Solyc01g111530.2 versus VIT_04s0023g03830). Query length and query coverage are referred to the tomato gene loci, subject length and subject coverage are referred to grapevine gene loci
Fig. 7Example of a network showing different type of relationships. Network extracted from our database (http://biosrv.cab.unina.it/comparalogs/gene/search) showing a group of 5 GAMYB transcription factors sharing different type of relationships. Gene relationships are shown in blue; transcript relationships are shown in dark yellow; protein relationships are shown in grey