| Literature DB >> 34196123 |
Gon Carmi1, Alessandro Gorohovski1, Milana Frenkel-Morgenstern1.
Abstract
Here, we introduce a novel 'evolution of protein domains' (EvoProDom) model for describing the evolution of proteins based on the 'mix and merge' of protein domains. We assembled and integrated genomic and proteomic data comprising protein domain content and orthologous proteins from 109 organisms. In EvoProDom, we characterized evolutionary events, particularly, translocations, as reciprocal exchanges of protein domains between orthologous proteins in different organisms. We showed that protein domains that translocate with highly frequency are generated by transcripts enriched in trans-splicing events, that is, the generation of novel transcripts from the fusion of two distinct genes. In EvoProDom, we describe a general method to collate orthologous protein annotation from KEGG, and protein domain content from protein sequences using tools such as KoFamKOAL and Pfam. To summarize, EvoProDom presents a novel model for protein evolution based on the 'mix and merge' of protein domains rather than DNA-based evolution models. This confers the advantage of considering chromosomal alterations as drivers of protein evolutionary events.Entities:
Keywords: protein domains; protein evolution; translocations
Mesh:
Substances:
Year: 2021 PMID: 34196123 PMCID: PMC8409312 DOI: 10.1002/2211-5463.13245
Source DB: PubMed Journal: FEBS Open Bio ISSN: 2211-5463 Impact factor: 2.693
The EvoProDom model was applied to an assembly of organisms from diverse taxa belonging to superdomains, that is, Eukaryota, Viruses, and Bacteria. In total, 109 organisms were included in the ensemble and grouped as follows: (a) 15 fish; (b) four subterranean (S), eight fossorial (F), and 21 aboveground (A) animals (SFA) [15, 16]; (c) 65 organisms with known PPIs (BioGrid version 3.5.173, [17, 18]); (d) 17 organisms with HiC datasets (GEO_hic); (e) four cats; and (f) 15 pathogenic organisms [19]. Organisms with HiC datasets were obtained by searching for ‘HiC’ in the NCBI GEO database. Taxonomy ID, organism ID, organism name and common name are provided. Additionally, assembly and group classification are indicated. In addition, statistics for proteins and isoforms are included such that listed proteins are the longest isoforms and isoforms are alternative splicing variants. Total comprises both proteins and isoforms*. *Only proteins and isoforms with KO annotation are included. Organism ID is a 3–4 letter code, where the lowercase letter code corresponds to KEGG organisms and uppercase letters correspond to organisms not included in the KEGG database.
| Organism ID | Organism name | Super kingdom | Ecology | Common name | Source | Assembly | Total | Proteins | Isoforms |
|---|---|---|---|---|---|---|---|---|---|
| aga | Eukaryota | na | African malaria mosquito | biogrid_3.5.173 | GCF_000005575.2_AgamP3 | 6802 | 5928 | 874 | |
| aju |
| Eukaryota | na | Cheetah | Cats | GCF_001443585.1_aciJub1 | 19 242 | 13 018 | 6224 |
| ame |
| Eukaryota | na | honey bee | biogrid_3.5.173 | GCF_000002195.4_Amel_4.5 | 12 559 | 6016 | 6543 |
| ani | Eukaryota | na | Aspergillus nidulans | biogrid_3.5.173 | GCF_000149205.2_ASM14920v2 | 3925 | 3925 | 0 | |
| ASM |
| Eukaryota | na | Mexican tetra | Fish | GCF_000372685.2_Astyanax_mexicanus‐2.0 | 29 294 | 16 593 | 12 701 |
| ath |
| Eukaryota | na | Thale cress | GEO_hic, biogrid_3.5.173 | GCF_000001735.4_TAIR10.1 | 21 347 | 11 664 | 9683 |
| bsp | Bacteria | na | na | GEO_hic | GCF_000497485.1_ASM49748v1 | 2399 | 2399 | 0 | |
| bsu | Bacteria | na | na | biogrid_3.5.173 | GCF_002009135.1_ASM200913v1 | 2425 | 2425 | 0 | |
| bta |
| Eukaryota | A | Cattle | biogrid_3.5.173,SFA | GCF_002263795.1_ARS‐UCD1.2 | 46 970 | 15 444 | 31 526 |
| CAA |
| Eukaryota | na | Goldfish | Fish | GCF_003368295.1 ASM336829v1 | 66 282 | 34 815 | 31 467 |
| cal | Eukaryota | na | na | biogrid_3.5.173, Jones, | GCF_000182965.3_ASM18296v3 | 3419 | 3419 | 0 | |
| CAP |
| Eukaryota | A | Domestic guinea pig | biogrid_3.5.173,SFA | GCF_000151735.1 Cavpor3.0 | 27 511 | 14 502 | 13 009 |
| ccar |
| Eukaryota | na | Common carp | Fish | GCF_000951615.1_common_carp_genome | 32 539 | 24 182 | 8357 |
| ccr |
| Bacteria | na | na | GEO_hic | GCF_000006905.1_ASM690v1 | 1994 | 1994 | 0 |
| cel |
| Eukaryota | na | Nematode | GEO_hic, biogrid_3.5.173 | GCF_000002985.6_WBcel235 | 7918 | 5462 | 2456 |
| cfa |
| Eukaryota | na | Dog | biogrid_3.5.173 | GCF_000002285.3_CanFam3.1 | 41 761 | 14 307 | 27 454 |
| cge |
| Eukaryota | A | Chinese hamster | biogrid_3.5.173, SFA | GCF_000419365.1_C_griseus_v1.0 | 23 914 | 14 931 | 8983 |
| CHA |
| Eukaryota | S | Cape golden mole | SFA | GCF_000296735.1_ChrAsi1.0 | 19 180 | 14 764 | 4416 |
| CHL |
| Eukaryota | A | Long‐tailed chinchilla | SFA | GCF_000276665.1_ChiLan1.0 | 32 225 | 14 466 | 17 759 |
| COC |
| Eukaryota | F | Star‐nosed mole | SFA | GCF_000260355.1_ConCri1.0 | 21 431 | 12 911 | 8520 |
| COG |
| Eukaryota | na | Channel bull blenny | Fish | GCF_900634415.1 fCotGob3.1 | 27 249 | 15 024 | 12 225 |
| cre |
| Eukaryota | na | Green algae | biogrid_3.5.173 | GCF_000002595.1_v3.0 | 3874 | 3835 | 39 |
| csab |
| Eukaryota | na | Green monkey | biogrid_3.5.173 | GCF_000409795.2_Chlorocebus_sabeus_1.1 | 44 091 | 14 550 | 29 541 |
| DAN |
| Eukaryota | F | Nine‐banded armadillo | SFA | GCF_000208655.1_Dasnov3.0 | 26 476 | 15 213 | 11 263 |
| ddi | Eukaryota | na | na | biogrid_3.5.173 | GCF_000004695.1_dicty_2.7 | 4517 | 4508 | 9 | |
| DIO |
| Eukaryota | F | Ord's kangaroo rat | SFA | GCF_000151885.1_Dord_2.0 | 21 281 | 14 129 | 7152 |
| dme |
| Eukaryota | A | Fruit fly | SFA, GEO_hic, biogrid_3.5.173 | GCF_000001215.4_Release_6_plus_ISO1_MT | 15 749 | 6630 | 9119 |
| dre |
| Eukaryota | na | Zebrafish | Fish, biogrid_3.5.173, GEO_hic | GCF_000002035.6_GRCz11 | 37 274 | 17 375 | 19 899 |
| ecb |
| Eukaryota | na | Horse | biogrid_3.5.173 | GCF_002863925.1_EquCab3.0 | 44 295 | 15 529 | 28 766 |
| eco | Bacteria | na | na | biogrid_3.5.173 | GCF_001566335.1_ASM156633v1 | 3194 | 3194 | 0 | |
| ECTE |
| Eukaryota | A | Small Madagascar hedgehog | SFA | GCF_000313985.1 EchTel2.0 | 16 955 | 13 827 | 3128 |
| ELE |
| Eukaryota | A | Cape elephant shrew | SFA | GCF_000299155.1 EleEdw1.0 | 18 981 | 15 255 | 3726 |
| ERE |
| Eukaryota | A | Western European hedgehog | SFA | GCF_000296755.1_EriEur2.0 | 21 873 | 14 153 | 7720 |
| fca |
| Eukaryota | A | Domestic cat | SFA, cats | GCF_000181335.3_Felis_catus_9.0 | 39 855 | 14 572 | 25 283 |
| FUD |
| Eukaryota | S | Damara mole‐rat | SFA | GCF_000743615.1_DMR_v1.0 | 31 386 | 14 138 | 17 248 |
| gga |
| Eukaryota | A | Chicken | biogrid_3.5.173, SFA,GEO_hic | GCF_000002315.5_GRCg6a | 35 502 | 11 947 | 23 555 |
| gmx |
| Eukaryota | na | Soybean | biogrid_3.5.173 | GCF_000004515.4_Glycine_max_v2.0 | 32 653 | 21 054 | 11 599 |
| HCV | Hepatitis C virus | Viruses | na | HCV | biogrid_3.5.173, Jones | GCF_000861845.1_ViralProj15432 | 1 | 1 | 0 |
| hgl |
| Eukaryota | S | Naked mole‐rat | SFA | GCF_000247695.1_HetGla_female_1.0 | 31 478 | 14 565 | 16 913 |
| HHV1 | Human Herpesvirus 1 | Viruses | na | Herpes simplex virus type 1 | biogrid_3.5.173, Jones | GCF_000859985.2_ViralProj15217 | 27 | 27 | 0 |
| HHV2 | Human Herpesvirus 2 | Viruses | na | HHV2 | biogrid_3.5.173 | GCF_000858385.2_ViralProj15218 | 27 | 27 | 0 |
| HHV3 | Human Herpesvirus 3 | Viruses | na | Varicella‐zoster virus | biogrid_3.5.173, Jones | GCF_000858285.1_ViralProj15198 | 6 | 6 | 0 |
| HHV4 | Human gammaherpesvirus 4 | Viruses | na | EBV | GEO_hic, biogrid_3.5.173 | GCF_002402265.1_Decoy | 21 | 19 | 2 |
| HHV5 | Human Herpesvirus 5 | Viruses | na | Human cytomegalovirus | biogrid_3.5.173, Jones | GCF_000845245.1_ViralProj14559 | 16 | 16 | 0 |
| HHV6A | Human Herpesvirus 6A | Viruses | na | HHV6A | biogrid_3.5.173 | GCF_000845685.1_ViralProj14462 | 5 | 5 | 0 |
| HHV6B | Human Herpesvirus 6B | Viruses | na | HHV6B | biogrid_3.5.173 | GCF_000846365.1_ViralProj14422 | 5 | 5 | 0 |
| HHV7 | Human Herpesvirus 7 | Viruses | na | HHV7 | biogrid_3.5.173, Jones | GCF_000848125.1_ViralProj14625 | 4 | 4 | 0 |
| HHV8 | Human gammaherpesvirus 8 | Viruses | na | KSHV | GEO_hic, biogrid_3.5.173, Jones | GCF_000838265.1_ViralProj14158 | 8 | 8 | 0 |
| HIV1 | Human Immunodeficiency Virus 1 | Viruses | na | HIV1 | biogrid_3.5.173, Jones | GCF_000864765.1_ViralProj15476 | 5 | 5 | 0 |
| HIV2 | Human Immunodeficiency Virus 2 | Viruses | na | HIV2 | biogrid_3.5.173, Jones | GCF_000856385.1_ViralProj14991 | 5 | 5 | 0 |
| HPV10 | Human papillomavirus type 10 | Viruses | na | HPV10 | biogrid_3.5.173, Jones | GCF_000864905.1_ViralProj15504 | 7 | 6 | 1 |
| HPV16 | Human papillomavirus 16 | Viruses | na | HPV16 | biogrid_3.5.173, GEO_hic, Jones | GCF_000863945.3_ViralProj15505 | 7 | 7 | 0 |
| HPV6b | Human papillomavirus type 6b | Viruses | na | HPV6b | biogrid_3.5.173, Jones | GCF_000861945.1_ViralProj15454 | 6 | 6 | 0 |
| hsa |
| Eukaryota | A | Human | biogrid_3.5.173, SFA, GEO_hic | GCF_000001405.37_GRCh38.p11 | 76 306 | 14 484 | 61 822 |
| ICT |
| Eukaryota | F | Thirteen‐lined ground squirrel | SFA | GCF_000236235.1 SpeTri2.0 | 28 828 | 14 776 | 14 052 |
| lav |
| Eukaryota | A | African savanna elephant | SFA | GCF_000001905.1_Loxafr3.0 | 30 929 | 15 683 | 15 246 |
| lcf |
| Eukaryota | na | Barramundi perch | Fish | GCF_001640805.1_ASM164080v1 | 31 308 | 17 416 | 13 892 |
| lcm |
| Eukaryota | na | Coelacanth | Fish | GCF_000225785.1_LatCha1 | 22 318 | 13 088 | 9230 |
| LEO |
| Eukaryota | na | Spotted gar | Fish | GCF_000242695.1 LepOcu1 | 28 773 | 12 422 | 16 351 |
| MAM |
| Eukaryota | na | Rhesus monkey | biogrid_3.5.173, GEO_hic | GCF_003339765.3 Mmul_10 | 49 563 | 15 063 | 34 500 |
| MARM |
| Eukaryota | F | European marmot | SFA | GCF_001458135.1 marMar2.1 | 23 284 | 15 082 | 8202 |
| mge |
| Bacteria | na | na | Jones | GCF_000027325.1_ASM2732v1 | 265 | 265 | 0 |
| mgp |
| Eukaryota | na | Turkey | biogrid_3.5.173 | GCF_000146605.2_Turkey_5.0 | 20 631 | 11 045 | 9586 |
| MIO |
| Eukaryota | F | prairie vole | SFA | GCF_000317375.1_MicOch1.0 | 23 045 | 14 950 | 8095 |
| mmu |
| Eukaryota | A | House mouse | biogrid_3.5.173, SFA, GEO_hic | GCF_000001635.26_GRCm38.p6 | 54 095 | 15 939 | 38 156 |
| mtv | Bacteria | na | na | biogrid_3.5.173, Jones | GCF_000195955.2_ASM19595v2 | 1874 | 1874 | 0 | |
| ncc |
| Eukaryota | na | Black rockcod | Fish | GCF_000735185.1_NC01 | 17 089 | 12 300 | 4789 |
| ncr | Eukaryota | na | na | biogrid_3.5.173 | GCF_000182925.2_NC12 | 4303 | 3798 | 505 | |
| NEL |
| Eukaryota | A | Desert woodrat | SFA | GCF_001675575.1 ASM167557v1 | 11 060 | 11 060 | 0 |
| nfu |
| Eukaryota | na | Turquoise killifish | Fish | GCF_001465895.1_Nfu_20140520 | 25 760 | 15 051 | 10 709 |
| ngi |
| Eukaryota | S | Upper Galilee mountains blind mole rat | SFA | GCF_000622305.1_S.galili_v1.0 | 28 587 | 15 163 | 13 424 |
| nle |
| Eukaryota | na | Northern white‐cheeked gibbon | GEO_hic | GCF_000146795.2_Nleu_3.0 | 27 130 | 14 001 | 13 129 |
| nto |
| Eukaryota | na | Tobacco | biogrid_3.5.173 | GCF_000390325.2_Ntom_v01 | 21 031 | 12 501 | 8530 |
| oaa |
| Eukaryota | A | platypus | SFA | GCF_000002275.2_Ornithorhynchus_anatinus_5.0.1 | 13 803 | 10 377 | 3426 |
| oas |
| Eukaryota | na | Sheep | biogrid_3.5.173 | GCF_000298735.2_Oar_v4.0 | 31 319 | 14 663 | 16 656 |
| OCD |
| Eukaryota | F | Degu | SFA | GCF_000260255.1_OctDeg1.0 | 20 663 | 15 343 | 5320 |
| ocu |
| Eukaryota | na | Rabbit | biogrid_3.5.173, GEO_hic | GCF_000003625.3_OryCun2.0 | 27 567 | 14 450 | 13 117 |
| ola |
| Eukaryota | na | Japanese medaka | Fish | GCF_002234675.1_ASM223467v1 | 31 537 | 15 135 | 16 402 |
| ORA |
| Eukaryota |
| Aardvark | SFA | GCF_000298275.1_OryAfe1.0 | 19 243 | 14 511 | 4732 |
| ORM |
| Eukaryota | na | Indian medaka | Fish | GCF_002922805.1 Om_v0.7.RACA | 29 506 | 15 615 | 13 891 |
| osa |
| Eukaryota | na | Rice | biogrid_3.5.173 | GCF_001433935.1_IRGSP‐1.0 | 18 258 | 12 404 | 5854 |
| PAP |
| Eukaryota | na | Leopard | Cats | GCF_001857705.1_PanPar1.0 | 42 102 | 14 693 | 27 409 |
| PEF |
| Eukaryota | na | Yellow perch | Fish | GCF_004354835.1 PFLA_1.0 | 30 056 | 16 335 | 13 721 |
| PEM |
| Eukaryota | A | Prairie deer mouse | SFA | GCF_000500345.1_Pman_1.0 | 33 249 | 15 592 | 17 657 |
| pfa | Eukaryota | na | Malaria parasite P. falciparum | biogrid_3.5.173, Jones | GCF_000002765.4_ASM276v2 | 2001 | 1973 | 28 | |
| phu |
| Eukaryota | na | Human body louse | biogrid_3.5.173 | GCF_000006295.1_JCVI_LOUSE_1.0 | 5292 | 5290 | 2 |
| pret |
| Eukaryota | na | Guppy | Fish | GCF_000633615.1_Guppy_female_1.0_MT | 30 412 | 15 280 | 15 132 |
| ptg |
| Eukaryota | na | Tiger | Cats | GCF_000464555.1_PanTig1.0 | 21 205 | 13 229 | 7976 |
| ptr |
| Eukaryota | A | Chimpanzee | biogrid_3.5.173,SFA | GCF_002880755.1_Clint_PTRv2 | 57 743 | 14 939 | 42 804 |
| rcu |
| Eukaryota | na | castor bean | biogrid_3.5.173 | GCF_000151685.1_JCVI_RCG_1.1 | 14 121 | 10 018 | 4103 |
| rno |
| Eukaryota | A | Norway rat | biogrid_3.5.173,SFA | GCF_000001895.5_Rnor_6.0 | 40 251 | 16 426 | 23 825 |
| sasa |
| Eukaryota | na | Atlantic salmon | Fish | GCF_000233375.1_ICSASG_v2 | 63 095 | 28 784 | 34 311 |
| sce | Eukaryota | na | Baker's yeast | biogrid_3.5.173,GEO_hic | GCF_000146045.2_R64 | 3588 | 3588 | 0 | |
| SIV | Simian Immunodeficiency Virus | Viruses | na | SIV | biogrid_3.5.173 | GCF_000863925.1_ViralProj15501 | 4 | 4 | 0 |
| sly |
| Eukaryota | na | Tomato | biogrid_3.5.173 | GCF_000188115.3_SL2.50 | 17 131 | 11 914 | 5217 |
| smo |
| Eukaryota | na | na | biogrid_3.5.173 | GCF_000143415.4_v1.0 | 19 207 | 14 135 | 5072 |
| SOA |
| Eukaryota | A | European shrew | SFA | GCF_000181275.2 SorAra2.0 | 17 318 | 14 075 | 3243 |
| sot |
| Eukaryota | na | Potato | biogrid_3.5.173 | GCF_000226075.1_SolTub_3.0 | 17 431 | 12 698 | 4733 |
| spo |
| Eukaryota | na | Fission yeast | GEO_hic, biogrid_3.5.173 | GCF_000002945.1_ASM294v2 | 3053 | 3053 | 0 |
| spu |
| Eukaryota | na | Purple sea urchin | biogrid_3.5.173 | GCF_000002235.4_Spur_4.2 | 12 330 | 8773 | 3557 |
| ssc |
| Eukaryota | A | Pig | biogrid_3.5.173,SFA | GCF_000003025.6_Sscrofa11.1 | 47 355 | 15 297 | 32 058 |
| SV40 | Simian Virus 40 | Viruses | na | Macaca mulatta polyomavirus 1 | biogrid_3.5.173 | GCF_000837645.1_ViralProj14024 | 1 | 1 | 0 |
| TMV | Tobacco Mosaic Virus | Viruses | na | TMV | biogrid_3.5.173 | GCF_000854365.1_ViralProj15071 | 0 | 0 | 0 |
| URP |
| Eukaryota | F | Arctic ground squirrel | SFA | GCF_003426925.1 ASM342692v1 | 27 132 | 14 370 | 12 762 |
| USM | Eukaryota | na | na | biogrid_3.5.173 | GCF_000328475.2 Umaydis521_2.0 | 3265 | 3257 | 8 | |
| VAV |
| Viruses | na | na | biogrid_3.5.173 | GCF_000860085.1_ViralProj15241 | 24 | 24 | 0 |
| vvi |
| Eukaryota | na | Wine grape | biogrid_3.5.173 | GCF_000003745.3_12X | 20 127 | 12 120 | 8007 |
| xla |
| Eukaryota | na | African clawed frog | biogrid_3.5.173 | GCF_001663975.1_Xenopus_laevis_v2 | 40 278 | 21 671 | 18 607 |
| zma |
| Eukaryota | na | Maize | biogrid_3.5.173 | GCF_000005005.2_B73_RefGen_v4 | 24 391 | 14 736 | 9655 |
Jones et al. 2008 [19].
Fig. 1The MySQL scheme for EvoProDomDB. Six‐relation tables were included. Of these, four contained data regarding taxonomy (taxonomy), KO (ko_annotation,), super‐families (clan_domain), pfam domains (pfam_domain), such as taxonomy ranks, for example, genus and species, KO, domain and super‐family descriptions, respectively. The main relational tables contain protein, genomic and proteomic data (org_protein_annotation), as well as protein domain content (pfam data; see the main text for details).
Fig. 2Study workflow: A collection of 109 organisms was used to implement and test the EvoProDom model. The collection included six categories: (a) 15 fish; (ii) four subterranean, eight fossorial and 21 aboveground animals [15, 16]; (c) 65 organisms with known PPIs (BioGrid version 3.5.173, [17, 18]); (d) 17 organisms with HiC datasets; (e) four cats; and (f) 15 pathogenic organisms [19]. Protein domains were predicted using the Pfam (release 32.0) database, along with the search tool [7, 8]. Orthologous proteins were defined as belonging to a KEGG [12, 13] ortholog (KO) group. Assignment to a KO group was obtained using KofamKOALA [6].
Translocation events per superfamily (counts). Translocations are characterized by mobile domains in organisms classified based on superdomain taxonomy*. These organism groups are assigned representative superdomain taxonomy if all organisms share same superdomain taxonomy. Otherwise, they are assigned as ‘Mixed’. Finally, translocations are classified based on organism group classification to superdomains, for example, Eukaryota‐Eukaryota, which represent the majority of translocations (over 99%) (Translocation Class). The most frequent clan for Eukaryota‐Eukaryota is Ig. Related to Tables S1 and S2. *Superdomain taxa are Eukaryota, Viruses, and Bacteria. Super‐family annotation is provided (Super family Description).
| Translocation class | Super family Id | Super family name | Counts | Super family description |
|---|---|---|---|---|
| Eukaryota‐Eukaryota | 0011.26 | Ig | 1144 | Immunoglobulin superfamily |
| Eukaryota‐Eukaryota | 0010.21 | SH3 | 630 | Src homology‐3 domain |
| Eukaryota‐Eukaryota | 0465.3 | Ank | 529 | Ankyrin repeat superfamily |
| Eukaryota‐Eukaryota | 0001.27 | EGF | 414 | EGF superfamily |
| Eukaryota‐Eukaryota | 0361.4 | C2H2‐zf | 390 | Classical C2H2 and C2HC zinc fingers |
| Eukaryota‐Eukaryota | 0022.32 | LRR | 282 | Leucine Rich Repeat |
| Eukaryota‐Eukaryota | 0020.25 | TPR | 246 | Tetratrico peptide repeat superfamily |
| Eukaryota‐Eukaryota | 0229.11 | RING | 242 | Ring‐finger/U‐box superfamily |
| Eukaryota‐Eukaryota | 0186.14 | Beta_propeller | 222 | Beta propeller clan |
| Eukaryota‐Eukaryota | 0221.11 | RRM | 210 | RRM‐like clan |
| Eukaryota‐Eukaryota | 9999.0 | Unknown | 208 | null |
| Eukaryota‐Eukaryota | 0159.16 | E‐set | 187 | Ig‐like fold superfamily (E‐set) |
| Eukaryota‐Eukaryota | 0466.3 | PDZ‐like | 165 | PDZ domain‐like peptide‐binding superfamily |
| Eukaryota‐Eukaryota | 0016.22 | PKinase | 164 | Protein kinase superfamily |
| Eukaryota‐Eukaryota | 0266.9 | PH | 141 | PH domain‐like superfamily |
| Eukaryota‐Eukaryota | 0023.34 | P‐loop_NTPase | 121 | P‐loop containing nucleoside triphosphate hydrolase superfamily |
| Eukaryota‐Eukaryota | 0220.12 | EF_hand | 115 | EF‐hand like superfamily |
| Eukaryota‐Eukaryota | 0511.3 | Retroviral_zf | 95 | Retrovirus zinc finger‐like domains |
| Eukaryota‐Eukaryota | 0271.7 | F‐box | 79 | F‐box‐like domain |
| Eukaryota‐Eukaryota | 0003.21 | SAM | 74 | Sterile Alpha Motif (SAM) domain |
| Eukaryota‐Eukaryota | 0390.4 | zf‐FYVE‐PHD | 47 | FYVE/PHD zinc finger superfamily |
| Eukaryota‐Eukaryota | 0357.4 | SMAD‐FHA | 37 | SMAD/FHA domain superfamily |
| Eukaryota‐Eukaryota | 0063.25 | NADP_Rossmann | 37 | FAD/NAD(P)‐binding Rossmann fold Superfamily |
| Eukaryota‐Eukaryota | 0123.18 | HTH | 34 | Helix‐turn‐helix clan |
| Eukaryota‐Eukaryota | 0680.1 | WW | 34 | WW domain |
| Eukaryota‐Eukaryota | 0167.15 | Zn_Beta_Ribbon | 33 | Zinc beta‐ribbon |
| Eukaryota‐Eukaryota | 0006.20 | C1 | 25 | Protein kinase C, C1 domain |
| Eukaryota‐Eukaryota | 0306.4 | HeH | 24 | LEM/SAP HeH motif |
| Eukaryota‐Eukaryota | 0214.13 | UBA | 24 | UBA superfamily |
| Eukaryota‐Eukaryota | 0459.3 | BRCT‐like | 23 | BRCT like |
| Eukaryota‐Eukaryota | 0188.10 | CH | 23 | Calponin homology domain |
| Eukaryota‐Eukaryota | 0537.2 | CCCH_zf | 22 | CCCH‐zinc finger |
| Eukaryota‐Eukaryota | 0004.20 | Concanavalin | 20 | Concanavalin‐like lectin/glucanase superfamily |
| Eukaryota‐Eukaryota | 0072.20 | Ubiquitin | 19 | Ubiquitin superfamily |
| Eukaryota‐Eukaryota | 0033.14 | POZ | 17 | POZ domain superfamily |
| Eukaryota‐Eukaryota | 0154.11 | C2 | 11 | C2 superfamily |
| Eukaryota‐Eukaryota | 0007.18 | KH | 9 | K‐Homology (KH) domain Superfamily |
| Eukaryota‐Eukaryota | 0392.4 | Chaperone‐J | 8 | Chaperone J‐domain superfamily |
| Eukaryota‐Eukaryota | 0164.13 | CUB | 8 | CUB clan |
| Eukaryota‐Eukaryota | 0029.20 | Cupin | 8 | Cupin fold |
| Eukaryota‐Eukaryota | 0049.15 | Tudor | 8 | Tudor domain 'Royal family' |
| Eukaryota‐Eukaryota | 0172.17 | Thioredoxin | 8 | Thioredoxin‐like |
| Eukaryota‐Eukaryota | 0212.9 | SNARE | 8 | SNARE‐like superfamily |
| Eukaryota‐Eukaryota | 0124.15 | Peptidase_PA | 7 | Peptidase clan PA |
| Eukaryota‐Eukaryota | 0575.2 | EFTPs | 7 | Translation proteins of Elongation Factors superfamily |
| Eukaryota‐Eukaryota | 0137.15 | HAD | 7 | HAD superfamily |
| Eukaryota‐Eukaryota | 0021.18 | OB | 7 | OB fold |
| Eukaryota‐Eukaryota | 0364.4 | Leu‐IlvD | 7 | LeuD/IlvD‐like |
| Eukaryota‐Eukaryota | 0541.2 | SH2‐like | 6 | SH2, phosphotyrosine‐recognition domain superfamily |
| Eukaryota‐Eukaryota | 0671.1 | AAA_lid | 5 | AAA+ ATPase lid domain superfamily |
| Eukaryota‐Eukaryota | 0244.9 | PGBD | 5 | PGBD superfamily |
| Eukaryota‐Eukaryota | 0192.13 | GPCR_A | 5 | Family A G protein‐coupled receptor‐like superfamily |
| Eukaryota‐Eukaryota | 0173.11 | STIR | 5 | STIR superfamily |
| Eukaryota‐Eukaryota | 0602.2 | Kringle | 5 | Kringle/FnII superfamily |
| Eukaryota‐Eukaryota | 0642.1 | SOCS_box | 4 | SOCS‐box like superfamily |
| Eukaryota‐Eukaryota | 0178.16 | PUA | 4 | PUA/ASCH superfamily |
| Eukaryota‐Eukaryota | 0041.13 | Death | 4 | Death Domain Superfamily |
| Eukaryota‐Eukaryota | 0183.14 | PAS_Fold | 4 | PAS domain clan |
| Eukaryota‐Eukaryota | 0084.13 | ADP‐ribosyl | 3 | ADP‐ribosylation Superfamily |
| Eukaryota‐Eukaryota | 0015.20 | MFS | 3 | Major Facilitator Superfamily |
| Eukaryota‐Eukaryota | 0198.16 | HHH | 3 | Helix‐hairpin‐helix superfamily |
| Eukaryota‐Eukaryota | 0661.1 | Gain | 3 | GPCR autoproteolysis inducing |
| Eukaryota‐Eukaryota | 0497.3 | GST_C | 3 | Glutathione S‐transferase, C‐terminal domain |
| Eukaryota‐Eukaryota | 0030.16 | Ion_channel | 3 | Ion channel (VIC) superfamily |
| Eukaryota‐Eukaryota | 0107.12 | KOW | 2 | KOW domain |
| Eukaryota‐Eukaryota | 0492.3 | S4 | 2 | S4 domain superfamily |
| Eukaryota‐Eukaryota | 0055.13 | AMP‐binding_C | 2 | AMP‐binding enzyme C‐terminal domain superfamily |
| Eukaryota‐Eukaryota | 0055.13 | Nucleoplasmin | 2 | Nucleoplasmin‐like/VP (viral coat and capsid proteins) superfamily |
| Eukaryota‐Eukaryota | 0027.15 | RdRP | 2 | RNA‐dependent RNA polymerase |
| Eukaryota‐Eukaryota | 0202.11 | GBD | 2 | Galactose‐binding domain‐like superfamily |
| Eukaryota‐Eukaryota | 0028.22 | AB_hydrolase | 2 | Alpha/Beta hydrolase fold |
| Eukaryota‐Eukaryota | 0677.1 | GHMP_C | 1 | GHMP C‐terminal domain superfamily |
| Eukaryota‐Eukaryota | 0025.14 | His_Kinase_A | 1 | His Kinase A (phospho‐acceptor) domain |
| Eukaryota‐Eukaryota | 0088.16 | Alk_phosphatase | 1 | Alkaline phosphatase‐like |
| Eukaryota‐Eukaryota | 0607.2 | TNF_receptor | 1 | TNF receptor‐like superfamily |
| Mixed‐Mixed | 0070.13 | ACT | 1 | ACT‐like domain |
| Eukaryota‐Eukaryota | 0113.13 | GT‐B | 1 | Glycosyl transferase clan GT‐B |
| Eukaryota‐Eukaryota | 0449.3 | G‐PATCH | 1 | DExH‐box splicing factor binding site |
| Eukaryota‐Eukaryota | 0144.13 | Periplas_BP | 1 | Periplasmic binding protein like |
| Eukaryota‐Eukaryota | 0505.3 | Pentapeptide | 1 | Pentapeptide repeat |
| Eukaryota‐Eukaryota | 0547.2 | GF_recep_C‐rich | 1 | Growth factor receptor Cys‐rich |
| Eukaryota‐Mixed | 0021.18 | OB | 1 | OB fold |
| Eukaryota‐Eukaryota | 0026.20 | CU_oxidase | 1 | Multicopper oxidase‐like domain |
| Eukaryota‐Eukaryota | 0110.12 | GT‐A | 1 | Glycosyl transferase clan GT‐A |
| Eukaryota‐Eukaryota | 0236.17 | PDDEXK | 1 | PD‐(D/E)XK nuclease superfamily |
| Eukaryota‐Eukaryota | 0672.1 | p35 | 1 | Baculovirus p35 protein superfamily |
| Eukaryota‐Eukaryota | 0125.15 | Peptidase_CA | 1 | Peptidase clan CA |
| Eukaryota‐Eukaryota | 0117.11 | uPAR_Ly6_toxin | 1 | uPAR/Ly6/CD59/snake toxin‐receptor superfamily |
| Eukaryota‐Eukaryota | 0005.27 | Kazal | 1 | Kazal like domain |
| Eukaryota‐Bacteria | 9999.0 | Unknown | 1 | null |
| Eukaryota‐Mixed | 9999.0 | Unknown | 1 | null |
| Eukaryota‐Eukaryota | 0196.12 | DSRM | 1 | DSRM‐like clan |
| Eukaryota‐Eukaryota | 0381.4 | Metallo‐HOrase | 1 | Metallo‐hydrolase/oxidoreductase superfamily |
| Eukaryota‐Eukaryota | 0114.12 | HMG‐box | 1 | HMG‐box like superfamily |
| Eukaryota‐Eukaryota | 0109.12 | CDA | 1 | Cytidine deaminase‐like (CDA) superfamily |
| Eukaryota‐Eukaryota | 0552.2 | Hect | 1 | Hect, E3 ligase catalytic domain |
| Eukaryota‐Eukaryota | 0426.4 | HRDC‐like | 1 | HRDC‐like superfamily |
| Eukaryota‐Eukaryota | 0630.1 | PSI | 1 | Plexin fold superfamily |
Fig. 3Illustration of translocation event for FERM_C. FERM_C (red domain) underwent a reciprocal translocation event between two orthologous protein groups 16822 (FRMD6) and 10637 (MYLIP, MIR). Accordingly, the red domain (FERM_C) is present in FRMD6 and absent from MYLIP for organisms CAA, etc., while for organisms CHA, etc., FERM_C is present in MYLIP and missing from FRMD6. FERM_C (FERM C‐terminal PH‐like domain); FERM. Orthologous proteins are indicated by refseqs for each organism, and multiple proteins per organism represent paralogue proteins. Organism codes are indicated in Table 1.