| Literature DB >> 18426584 |
Matthieu G Conte1, Sylvain Gaillard, Gaetan Droc, Christophe Perin.
Abstract
BACKGROUND: Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations.Entities:
Mesh:
Year: 2008 PMID: 18426584 PMCID: PMC2377279 DOI: 10.1186/1471-2164-9-183
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Clustering of 46 TF in PlantTribes (PT) and Genome Clustering database (GCD) compared to DAFT and DRTF databases.
| IPR006957 | 6 | 6 | 6 | 6 | 9 | 10 | 10 | 12 | |
| IPR010409 | 7 | 8 | 10 | 12 | 4 | 4 | 5 | 5 | |
| IPR006779 | 3 | 3b(3) | 3 | 3 | 2 | 2 | 5 | 5 | |
| IPR008540 | 8 | 6 | 9 | 10 | 6 | 5 | 4 | 5 | |
| IPR007726 | 3 | 3 | 2 | 3 | 3 | 3 | 2 | 3 | |
| IPR005333 | 23 | 23b(2) | 27b(15) | 27 | 21 | 21b(2) | 25b(14) | 26 | |
| IPR006734 | 10 | 9 | 9 | 10 | 16 | 16 | 17 | 19 | |
| IPR007592 | 21 | 18 | 21b(8) | 21 | 15 | 21b(4) | 15b(10) | 16 | |
| IPR007818 | 10 | 10 | 11 | 12 | 5 | 5 | 5 | 5 | |
| IPR013742 | 2 | 3 | 3 | 1 | 1 | 2 | 3 | 0 | |
| IPR003311 | 29 | 28 | 33 | 34 | 31 | 33 | 41 | 41 | |
| IPR003957 | 11 | 12 | 17 | 37a | 12 | 12 | 12 | 29a | |
| IPR001289 | 10 | 10 | 18 | 19 | 11 | 9 | 17 | 19a | |
| IPR003851 | 36 | 36 | 40b(16) | 40 | 30 | 29 | 34b(18) | 33 | |
| IPR006780 | 5 | 5 | 6 | 6 | 8 | 7 | 6 | 8 | |
| IPR000197 | 9 | 11 | 5 | 5 | 6 | 8 | 3 | 4 | |
| IPR000007 | 11 | 11 | 11 | 10 | 14 | 15 | 16 | 14 | |
| IPR004333 | 16 | 17 | 26b(10) | 24 | 20 | 18 | 20b(13) | 21 | |
| IPR000232 | 23 | 23 | 22 | 25 | 29 | 29 | 37 | 39 | |
| IPR001471 | 146 | 110 | 113 | 150 | 165 | 108 | 108 | 170 | |
| IPR003441 | 107 | 90 | 92 | 119 | 131 | 92 | 85 | 152 | |
| IPR004827 | 72 | 185b(12) | 82b(22) | 84 | 84 | 205b(19) | 64b(22) | 90 | |
| IPR002100 | 104 | 71 | 69 | 65 | 64 | 65 | 62 | 44 | |
| IPR003657 | 72 | 71 | 105b(30) | 81 | 98 | 99 | 96b(37) | 104 | |
| IPR005202 | 33 | 32 | 25 | 34 | 55 | 63 | 39 | 57 | |
| IPR004883 | 42 | 36 | 35 | 45 | 36 | 29 | 21 | 37 | |
| IPR001387 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | |
| IPR003035 | 14 | 61a | 9 | 10 | 13 | 315a | 13b(9) | 13b(2) | |
| IPR006456 | 16 | 16 | 13 | 17 | 15 | 15 | 8 | 15 | |
| IPR005172 | 8 | 10 | 3 | 9 | 11 | 18 | 3 | 11 | |
| IPR011525 | 22 | 25 | 24 | 19 | 26 | 29 | 26 | 22 | |
| IPR010399 | 18 | 10 | 26b(9) | 20 | 18 | 13 | 10 | 19 | |
| IPR009105 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 0 | |
| IPR000770 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 0 | |
| IPR001005 IPR009057 | 43 | 73 | 22 | 260a | 46 | 80 | 56b(23) | 244a | |
| IPR001789 IPR009057 | 10 | 73a | 8 | 14 | 8 | 80a | 4 | 9 | |
| IPR006594 IPR011046 | 2 | 76a | 5 | 227a | 6 | 74a | 3 | 207 | |
| IPR000048 IPR002110 IPR005559 | 6 | 6 | 4 | 3 | 6 | 8 | 5 | 5 | |
| IPR001965 IPR011011 | 7 | 7 | 7 | 46a | 10 | 10 | 10 | 56 | |
| IPR003958 IPR009072 | 2 | 12a | 17a | 37a | 1 | 12a | 12a | 29a | |
| IPR003958 IPR009072 | 13 | 10 | 15 | 37a | 16 | 12 | 17b(8) | 29a | |
| 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | ||
| 9 | 9 | 9b(7) | 9b(8) | 12 | 16 | 12b(8) | 12b(11) | ||
| 2 | 3 | 3b(2) | 2b(2) | 1 | 1 | 1 | 1 | ||
| IPR001005 IPR009057 | 150 | 222 | 118 | 260 | 129 | 221 | 86 | 244 | |
a: Cluster containing more than 50% of sequences in comparison with the total sequence family members present in DATF/DRTF; b: Total number of sequences identified in several subgroups (number of subgroups in brackets) for the DATF/DRTF corresponding gene list; PF by PFAM domains; BC by BLASTCLUST. Note that for all groups, the sequence number differences found with DATF/DRTF are mainly due to data source versions used.
Figure 1Examples of cluster curation generated automatically by TribeMCL. (A) An example of a gene family identified for the cluster 308 at Inflation = 1.2 corresponding to the TUB/TLP Transcription factor gene family. (B) Another example where two consistent gene families were identified for the Clusters 91 and 6073 at higher Inflation (I = 2). Cluster 137 (I = 1.2) groups members from these two families and thus cannot be considered as a gene family. Inside the box is the cluster id and in brackets is the number of At/Os sequences inside the cluster.
Figure 2The phylogenomic pipeline GreenPhyl. Left: the major steps of the phylogenomics pipeline. Right: the software integrated in GreenPhyl. *, custom software. SS, splice selection, GI, gene id indexing. SB, set bootstrap values.
True positive ortholog test set and ortholog predictions of GreenPhyl (GP), Inparanoid (INP) and BBMH (BH).
| At5g20240.1 | PI | Os05g34940.1 | OsMADS4 | 1/2 | 14704206 | 1/2 | 1/1 | 1/2 | 0 | 1 | 0 |
| Os01g66030.1 | OsMADS2 | ||||||||||
| At3g54340.1 | AP3/DEF | Os06g49840.1 | SPW1 | 1/1 | 12506001 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 |
| At1g24260.1 | SEP3 | Os09g32948.1 | OsMADS8 | 3/2 | 16968881; 17205197; 10821278 | 3/2 | 1/1 | 1/1 | 0 | 5 | 5 |
| At3g02310.1 | SEP2 | Os08g41950.2 | OsMADS7 | ||||||||
| At5g15800.1 | SEP1 | ||||||||||
| At4g18960.1 | AG | Os01g10504.1 | OsMADS3 | 1/2 | 16326928 | 0/3 | 0 | 1/1 | 2 | 2 | 1 |
| Os05g11414.3 | OsMADS58 | ||||||||||
| At2g45660.1 | SOC1 | Os03g03070.1 | OsMADS50 | 1/1 | 15144377 | 0/3 | 0 | 0 | 1 | 1 | 1 |
| At1g14920.1 | GAI | Os03g49990.1 | SLR1 | 2/1 | 11340177; 11826293 | 3/1 | 1/1 | 1/1 | 0 | 1 | 1 |
| At2g01570.1 | RGA1 | ||||||||||
| At1g55580.1 | LAS | Os06g40780.1 | MOC1 | 1/1 | 12687001; 12730136 | 1/2 | 1/1 | 1/1 | 0 | 0 | 0 |
| Os02g10360.1 | |||||||||||
| At3g54220.1 | SCR | Os12g02870.1 | OsSCR | 1/2 | 12974810 | 1/2 | 1/1 | 1/2 | 0 | 1 | 0 |
| Os11g03110.1 | |||||||||||
| At4g37650.1 | SHR | Os03g31880.1 | OsSHR | 1/2 | 12974810 | 1/2 | 1/1 | 1/1 | 0 | 1 | 1 |
| Os07g39820.1 | |||||||||||
| At3g11260.1 | WOX5 | Os01g63510.1 | QHB | 2/1 | 12904206; 14711878 | 2/1 | 1/1 | 1/1 | 0 | 1 | 1 |
| At5g05770.1 | WOX7 | ||||||||||
| At4g16280.3 | FCA | Os09g03610.2 | 1/1 | 16240176 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 | |
| At4g00650.1 | FRI | Os03g63440.1 | 1/1 | 12667866 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 | |
| At2g44990.1 | MAX3 | Os04g46470.1 | HTD1 | 1/1 | 17092317 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 |
| At2g42620.1 | MAX2 | Os06g06050.1 | OsMAX2 | 1/1 | 15659436 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 |
| At5g03280.1 | EIN2 | Os07g06130.1 | OsEIN2 | 1/1 | 15047876 | 1/2 | 1/1 | 1/3 | 0 | 0 | 0 |
| Os03g49400.1 | |||||||||||
| At5g47120.1 | Bi1 | Os02g03280.1 | OsBi1 | 1/1 | 10618494 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 |
| At2g27550.1 | CEN | Os11g05470.1 | RCN1 | 2/3 | 8974397; 12148532 | 2/4 | 1/1 | 1/2 | 0 | 5 | 4 |
| At5g03840.1 | TFL1 | Os04g33570.1 | |||||||||
| Os02g32950.1 | RCN2 | ||||||||||
| Os12g05590.1 | RCN3 | ||||||||||
| At1g22770.1 | GI | Os01g08700.1 | OsGI | 1/1 | 12700762 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 |
| At5g61380.1 | TOC1 | Os02g40510.1 | PRR1 | 1/1 | 14634161 | 1/1 | 1/1 | 1/1 | 0 | 0 | 0 |
TAIR, TIGR id = accession number from TAIR and TIGR; orth = ortholog relations identified in the literature. GP = Greenphyl; BH = BBMH; INP = Inparanoid; true positives = ortholog relation; missing = ortholog missed by GP, BH or INP. PMID = Pubmed id
True negative ortholog test set and ortholog predictions of GreenPhyl (GP), Inparanoid (INP) and BBMH (BH).
| - | - | Os03g11614.1 | LHS1 | 0/3 | 16099195; 10852934 | 0/3 | 0 | 0 |
| - | - | Os06g06750.1 | MADS5 | |||||
| - | - | Os03g54170.1 | MADS34 | |||||
| At5g10140.1 | FLC | - | - | 6/0 | 12667866;12724541; 15695584 | 6/0 | 0 | 0 |
| At5g65060.2 | MAF3 | - | - | |||||
| At5g65070.1 | MAF4 | - | - | |||||
| At5g65050.1 | MAF2 | - | - | |||||
| At1g77080.3 | MAF1 | - | - | |||||
| At5g65080.1 | MAF5 | - | - | |||||
| At3g18990.1 | VRN1 | - | - | 2/0 | 12667866;16549797 | 2/0 | 1/1 | 1/2 |
| At1g49480.1 | RTV1 | - | - | |||||
| At4g16845.1 | VRN2 | - | - | 1/0 | 12667866 | 1/0 | 0 | 0 |
UP = Ultraparalogs identified in the literature; PMID = Pubmed id; GP = GreenPhyl; INP = Inparanoid
Figure 3Venn diagram of ortholog prediction between GreenPhyl, BBMH and Inparanoid at a threshold of 50%.
Figure 4Extended orthologs relationships.