| Literature DB >> 32973219 |
Gon Carmi1, Somnath Tagore1,2, Alessandro Gorohovski1, Aviad Sivan1, Dorith Raviv-Shay1, Milana Frenkel-Morgenstern3.
Abstract
In contrast to fossorial and above-ground organisms, subterranean species have adapted to the extreme stresses of living underground. We analyzed the predicted protein-protein interactions (PPIs) of all gene products, including those of stress-response genes, among nine subterranean, ten fossorial, and 13 aboveground species. We considered 10,314 unique orthologous protein families and constructed 5,879,879 PPIs in all organisms using ChiPPI. We found strong association between PPI network modulation and adaptation to specific habitats, noting that mutations in genes and changes in protein sequences were not linked directly with niche adaptation in the organisms sampled. Thus, orthologous hypoxia, heat-shock, and circadian clock proteins were found to cluster according to habitat, based on PPIs rather than on sequence similarities. Curiously, "ordered" domains were preserved in aboveground species, while "disordered" domains were conserved in subterranean organisms, and confirmed for proteins in DistProt database. Furthermore, proteins with disordered regions were found to adopt significantly less optimal codon usage in subterranean species than in fossorial and above-ground species. These findings reveal design principles of protein networks by means of alterations in protein domains, thus providing insight into deep mechanisms of evolutionary adaptation, generally, and particularly of species to underground living and other confined habitats.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32973219 PMCID: PMC7519090 DOI: 10.1038/s41598-020-71976-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
All organisms included in the PASTORAL database, with a complete number of proteins in the corresponding proteome.
| Ecology | Organism Name | Organism ID | Heat-shock | Hypoxia-related | Circadian | |||
|---|---|---|---|---|---|---|---|---|
| Proteins | PPIs | Proteins | PPIs | Proteins | PPIs | |||
| F | COC | 17 | 2,325 | 9 | 436 | 1 | 37 | |
| F | cfo | 9 | 1,421 | 2 | 205 | 2 | 55 | |
| F | mjv | 62 | 4,577 | 10 | 321 | 8 | 379 | |
| F | soc | 9 | 1,325 | 1 | 47 | 2 | 55 | |
| F | cge | 9 | 1,994 | 6 | 273 | 3 | 53 | |
| F | DAN | 19 | 2,433 | 9 | 439 | 2 | 37 | |
| F | DIO | 18 | 2,431 | 9 | 443 | 3 | 50 | |
| F | MIO | 20 | 2,586 | 9 | 463 | 3 | 57 | |
| F | OCD | 20 | 2,552 | 9 | 466 | 3 | 56 | |
| F | PEM | 19 | 2,504 | 9 | 452 | 3 | 58 | |
| S | ASM | 40 | 3,975 | 6 | 295 | 8 | 364 | |
| S | CRS | 8 | 1,248 | 2 | 65 | 1 | 36 | |
| S | fcd | 9 | 1,323 | 5 | 313 | 1 | 34 | |
| S | MYL | 63 | 4,846 | 9 | 340 | 8 | 380 | |
| S | zne | 10 | 1,691 | 2 | 70 | 2 | 48 | |
| S | CHA | 20 | 2,402 | 9 | 472 | 2 | 44 | |
| S | FUD | 20 | 2,408 | 9 | 442 | 2 | 42 | |
| S | hgl | 19 | 2,572 | 8 | 311 | 4 | 62 | |
| S | ngi | 21 | 2,603 | 9 | 462 | 4 | 62 | |
| A | bta | 18 | 1,512 | 9 | 301 | 4 | 55 | |
| A | dme | 15 | 486 | 3 | 30 | 2 | 10 | |
| A | dre | 47 | 5,137 | 9 | 375 | 8 | 404 | |
| A | ERE | 30 | 2,293 | 16 | 424 | 4 | 61 | |
| A | fca | 16 | 1,416 | 8 | 235 | 3 | 49 | |
| A | gga | 17 | 1,447 | 8 | 260 | 4 | 47 | |
| A | hsa | 35 | 2,850 | 10 | 484 | 5 | 98 | |
| A | mmu | 32 | 1,672 | 9 | 306 | 6 | 61 | |
| A | oaa | 13 | 1,123 | 7 | 234 | 2 | 23 | |
| A | ORA | 35 | 2,115 | 10 | 427 | 4 | 78 | |
| A | ptr | 19 | 1,486 | 10 | 309 | 5 | 65 | |
| A | rno | 22 | 1,668 | 7 | 276 | 5 | 62 | |
| A | ssc | 19 | 1,519 | 16 | 280 | 5 | 55 | |
The number of heat-shock proteins, hypoxia-related proteins and circadian proteins, along with their PPIs, was assessed and collected in PASTORAL. Organisms in KEGG are coded by lowercase letters; organisms not in KEGG are coded by uppercase letters. A-aboveground, F-fossorial, and S-subterranean organisms.
Figure 1The study overview. Fully sequenced genomes with coding sequences and annotated proteomes were collected from 32 species living in three broad ecological niches: subterranean, fossorial, and aboveground. For collected proteins (1,350,898), protein domains, protein disordered regions, and KEGG orthologous annotation (624,787) were predicted using the Pfam search tool[53] along with HMMER[60] , IUPred2A[44], and the KEGG database[17], respectively. Next, 5,879,879 PPIs were evaluated using our previously developed ChiPPI tool[15]. Briefly, ChiPPI uses a domain-domain co-occurrence table. When a certain domain is missing, ChiPPI evaluates the corresponding missing interactors in the PPI network[15], based on real PPI data (363,816) as obtained from BioGrid (release 3.4.163)[16]. Finally, PPI data are organized in PASTORAL, a dedicated database.
KEGG Orthologs: Heat-shock (upper), hypoxia-related (middle) and circadian (bottom) proteins.
| KO | Info |
|---|---|
| 19369 | HSPB11; heat shock protein beta-11 |
| 19765 | HSBP1; heat shock factor-binding protein 1 |
| 3283 | HSPA1_8; heat shock 70 kDa protein 1/8 |
| 4455 | HSPB1; heat shock protein beta-1 |
| 8879 | HSPB8; heat shock protein beta-8 |
| 9414 | HSF1; heat shock transcription factor 1 |
| 9415 | HSF2; heat shock transcription factor 2 |
| 9416 | HSF3; heat shock transcription factor 3 |
| 9417 | HSF4; heat shock transcription factor 4 |
| 9485 | HSP110; heat shock protein 110 kDa |
| 9487 | HSP90B, TRA1; heat shock protein 90 kDa beta |
| 9489 | HSPA4; heat shock 70 kDa protein 4 |
| 9490 | HSPA5, BIP; heat shock 70 kDa protein 5 |
| 9543 | HSPB2; heat shock protein beta-2 |
| 9544 | HSPB3; heat shock protein beta-3 |
| 9545 | HSPB6; heat shock protein beta-6 |
| 9546 | HSPB7; heat shock protein beta-7 |
| 9547 | HSPB9; heat shock protein beta-9 |
| 18055 | HIF1AN; hypoxia-inducible factor 1-alpha inhibitor (HIF hydroxylase) |
| 6711 | PH-4; hypoxia-inducible factor prolyl 4-hydroxylase |
| 8268 | HIF1A; hypoxia-inducible factor 1 alpha |
| 9095 | HIF2A, EPAS1; hypoxia-inducible factor 2 alpha |
| 9096 | HIF3A; hypoxia-inducible factor 3 alpha |
| 9486 | HYOU1; hypoxia up-regulated 1 |
| 9592 | EGLN, HPH; hypoxia-inducible factor prolyl hydroxylase |
| 2223 | CLOCK, KAT13D; circadian locomotor output cycles kaput protein [EC:2.3.1.48] |
| 2633 | PER2; period circadian protein 2 |
| 21599 | CIART, CHRONO, GM129; circadian-associated transcriptional repressor |
| 21753 | PASD1; circadian clock protein PASD1 |
| 21944 | PER1; period circadian protein 1 |
| 21945 | PER3; period circadian protein 3 |
Figure 2Hierarchical clustering. (A) All studied organisms were assessed for hierarchal clustering using an identity matrix from multiple alignment of PAS domain sequences from hypoxia-related orthologous proteins. (B) Clustering of the hypoxia-related proteins (in the 32 organisms studied (Table 2). (C) Clustering of the heat-shock proteins (Table 2). (D) Clustering of circadian proteins (Table 2). AU (approximately unbiased) p values and BP (bBootstrap probability) values are shown[59].
Figure 3Codon usage preference score (CUPS) density plots for 516 KOs common to all studied organisms. (A) Density plot for all 516 KOs in each habitat (ab.ground (aboveground), fossorial, and subterranean). Proportions (%) of KOs, relative to those seen in aboveground dwellers are indicated in terms of optimal (positive) and non-optimal (negative) codon usage presences (CUPS). Subterranean organisms adopt 75.0% (p value = 0.0008) more non-optimal codon usage than do aboveground species.
Figure 4Total PPIs and codon usage preference score (CUPS) with density plots for all proteins with disordered (A), mixed (B), and ordered (C) regions, stratified by ecology (aboveground, lime green (#66c2a5) *; fossorial, soft orange (#fc8d62) *; and subterranean, light blue (#8da0cb)*). Proteins with disordered, mixed, and ordered regions formed different clusters, with varying degrees of extreme values of PPIs and CUPS. Additionally, proteins with disordered regions that adopted non-optimal CUPS [− 50, − 20] were more numerous for subterranean animals than for their counterpart proteins in fossorial and aboveground animals. *Hexadecimal color number.
Figure 5(A) The PASTORAL interface, showing querying and analysis features. (B) The pop-up window of protein orthologs. (C) An example of a partial codon usage table (CUT) for Nannospalax galilli. The codon usage table provides an overall %GC content and number of CDS from which the table is computed. AA, amino acid; Fraction, the proportion of usage of the codon among its degenerate set, i.e. the set of codons that code for an AA; Frequency, the expected number of codons, given the input sequence(s), per 1,000 bases; and Number, the raw number of occurrences of the codon in the input sequences.