| Literature DB >> 34893861 |
Livio Ruzzante1, Romain Feron1, Maarten J M F Reijnders1, Antonin Thiébaut1, Robert M Waterhouse1.
Abstract
Roles of constraints in shaping evolutionary outcomes are often considered in the contexts of developmental biology and population genetics, in terms of capacities to generate new variants and how selection limits or promotes consequent phenotypic changes. Comparative genomics also recognizes the role of constraints, in terms of shaping evolution of gene and genome architectures, sequence evolutionary rates, and gene gains or losses, as well as on molecular phenotypes. Characterizing patterns of genomic change where putative functions and interactions of system components are relatively well described offers opportunities to explore whether genes with similar roles exhibit similar evolutionary trajectories. Using insect immunity as our test case system, we hypothesize that characterizing gene evolutionary histories can define distinct dynamics associated with different functional roles. We develop metrics that quantify gene evolutionary histories, employ these to characterize evolutionary features of immune gene repertoires, and explore relationships between gene family evolutionary profiles and their roles in immunity to understand how different constraints may relate to distinct dynamics. We identified three main axes of evolutionary trajectories characterized by gene duplication and synteny, maintenance/stability and sequence conservation, and loss and sequence divergence, highlighting similar and contrasting patterns across these axes amongst subsets of immune genes. Our results suggest that where and how genes participate in immune responses limit the range of possible evolutionary scenarios they exhibit. The test case study system of insect immunity highlights the potential of applying comparative genomics approaches to characterize how functional constraints on different components of biological systems govern their evolutionary trajectories.Entities:
Keywords: zzm321990 Anopheles mosquito; evolutionary profiling; gene expression; gene families; innate immunity
Mesh:
Year: 2022 PMID: 34893861 PMCID: PMC8788225 DOI: 10.1093/molbev/msab352
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 8.800
Evolutionary Feature Metric Descriptions.
| Evolutionary feature | Acronym | Description | Data source |
|---|---|---|---|
| Taxonomic age | AGE | Age of the last common ancestor of species in an OG, in terms of millions of years since divergence, computed from the ultrametric species phylogeny | 43-insect orthology |
| Universality | UNI | The proportion of the total species present in an OG (all species, UNI = 1) | 43-insect orthology |
| Duplicability | DUP | The proportion of species present in an OG that have multicopy orthologs | 43-insect orthology |
| Average copy number | ACN | The average (mean) ortholog copy number across all species present in an OG | 43-insect orthology |
| Copy number variation | CNV | The standard deviation of ortholog counts per species present in an OG divided by the ACN | 43-insect orthology |
| Expansions | EXP | CAFE quantified proportions of gene gain nodes for an OG | 43-insect orthology |
| Contractions | CON | CAFE quantified proportions of gene loss nodes for an OG | 43-insect orthology |
| Stability | STA | CAFE quantified proportions of no copy-number change nodes for an OG | 43-insect orthology |
| Synteny | SYN | The proportion of orthologs in an OG that maintains their orthologous neighbors in the genomes of the other species | 43-insect orthology |
| Evolutionary rate | EVR | The average rate of protein sequence divergence normalized by the distance (% identity) between each pair of species as computed by OrthoDB | 43-insect orthology |
| PAML’s dS | PDS | The number of synonymous substitutions per synonymous site as computed by PAML | 19- |
| PAML’s dN | PDN | The number of nonsynonymous substitutions per nonsynonymous site as computed by PAML | 19- |
| PAML’s dN/dS | SEL | The nonsynonymous to synonymous substitution ratio (dN/dS) as computed by PAML | 19- |
| Nonsynonymous SNP proportion | NSP | The proportion of all coding-sequence SNPs that were nonsynonymous (averaged over genes per OG) |
|
| Nonsynonymous SNP density | NSD | The density of nonsynonymous SNPs over a gene’s coding-sequence length (averaged over genes per OG) |
|
| Synonymous SNP density | SSD | The density of synonymous SNPs over a gene’s coding-sequence length (averaged over genes per OG) |
|
| Whole genome alignability | WGA | The number of species aligned, per nucleotide from the whole-genome alignment, averaged over coding-sequence length (averaged over genes per OG) | 22 mosquitoes |
| 36 | |||
| PhastCons constraint | PHC | PhastCons quantified constraint scores, per nucleotide from the whole-genome alignment, averaged over coding-sequence length (averaged over genes per PG) | 22 mosquitoes |
| 36 |
Note.—For each evolutionary feature, the metric name, acronym, description, and source data are presented (see Materials and Methods for details).
The Anopheles gambiae and Drosophila melanogaster Immunity Gene Catalogs.
| Acronym | Summary description |
|
| ||
|---|---|---|---|---|---|
| Genes | OGs | Genes | OGs | ||
| GALE | Galectins bind specifically to β-galactoside sugars and can function as pattern recognition receptors in innate immunity | 9 | 6 | 6 | 5 |
| GNBP | Gram-negative binding proteins (or β-1,3-glucan-binding proteins) are a family of carbohydrate-binding pattern recognition receptors | 7 | 3 | 3 | 3 |
| PGRP | Peptidoglycan recognition proteins are pattern recognition receptors capable of recognizing the peptidoglycan from bacterial cell walls | 7 | 5 | 12 | 6 |
| SCRA | Scavenger receptors are made up of different classes that function as pattern recognition receptors for a broad range of ligands including from pathogens | 5 | 5 | 5 | 4 |
| SCRB | 13 | 10 | 14 | 9 | |
| CTL | C-type lectins are carbohydrate-binding proteins with roles in pathogen opsonization, encapsulation, and melanization, as well as immune signaling cascades | 25 | 20 | 37 | 29 |
| FREP | Fibrinogen-related proteins (also known as FBNs) are a family of pattern recognition receptors with homology to the C terminus of the fibrinogen β and γ chains | 38 | 15 | 13 | 6 |
| LRIM | Leucine-rich repeat immune proteins are mosquito immune factors that activate complement-like defense responses against pathogens | 24 | 20 | 0 | 0 |
| ML | MD-2-like proteins, also known as Niemann-Pick Type C-2 proteins, possess myeloid-differentiation-2-related lipid-recognition domains involved in recognizing lipopolysaccharide | 16 | 7 | 8 | 5 |
| NIMROD | Nimrods have been shown to bind bacteria leading to their phagocytosis by hemocytes, they contain epidermal growth factor-like domains | 3 | 3 | 12 | 8 |
| TEP | Thioester-containing proteins are related to vertebrate complement factors and α2-macroglobulin protease inhibitors, their activation through proteolytic cleavage leads to phagocytosis or killing of pathogens | 10 | 5 | 5 | 5 |
| IMDSIG | The immune deficiency pathway is characterized by peptidoglycan recognition protein receptors, intracellular signal transducers (IMDSIG) and modulators (IMDMOD), and the NF-κB transcription factor Relish | 9 | 9 | 10 | 10 |
| IMDMOD | 6 | 6 | 6 | 6 | |
| JASTSIG | The JAK and the STAT are two core components of the JAK/STAT pathway, with signal transducers (JASTSIG) and modulators (JASTMOD) involved in cellular responses to stress or injury | 3 | 3 | 6 | 6 |
| JASTMOD | 3 | 3 | 4 | 4 | |
| TOLLSIG | The intracellular components of the Toll pathway are homologous to the toll-like receptor innate immune pathway in mammals, with signal transducers (TOLLSIG) and modulators (TOLLMOD) culminating in activation of the NF-κB transcription factors Dorsal | 5 | 5 | 6 | 6 |
| TOLLMOD | 8 | 8 | 8 | 8 | |
| CASP | Caspases are cysteine-aspartic proteases involved in immune signaling cascades and apoptosis | 15 | 6 | 7 | 5 |
| CLIPA | Subfamilies of CLIP-domain serine proteases are defined by patterns of cysteine residues, several CLIPs have roles as activators or modulators of immune signaling cascades | 20 | 13 | 12 | 10 |
| CLIPB | 27 | 20 | 15 | 13 | |
| CLIPC | 8 | 6 | 7 | 7 | |
| CLIPD | 9 | 8 | 10 | 10 | |
| CLIPE | 9 | 7 | 3 | 3 | |
| IAP | Inhibitors of apoptosis are important in antiviral responses and are involved in regulating immune signaling and suppressing apoptotic cell death | 8 | 5 | 4 | 4 |
| SRPN | Serine protease inhibitors, or serpins, modulate many signaling cascades; they act as suicide substrates to inhibit their target proteases | 18 | 16 | 30 | 20 |
| AMP | Antimicrobial peptides are the classical effector molecules of innate immunity; they include defensins, cecropins, and attacins that are involved in bacterial killing by disrupting their membranes | 9 | 8 | 10 | 5 |
| LYS | Lysozymes are key effector enzymes that hydrolyze peptidoglycans present in the cell walls of many bacteria, causing cell lysis | 7 | 1 | 17 | 3 |
| PPO | Prophenoloxidases are key enzymes in the melanization cascade that helps to kill invading pathogens and is important for wound healing | 9 | 1 | 3 | 1 |
| GPX | Glutathione, heme, and thioredoxin peroxidases are enzymes involved in the metabolism of reactive oxygen species that are toxic to pathogens | 2 | 2 | 2 | 2 |
| HPX | 15 | 10 | 10 | 9 | |
| TPX | 5 | 5 | 6 | 6 | |
| SOD | Superoxide dismutases are antioxidant enzymes involved in the metabolism of toxic superoxide into oxygen or hydrogen peroxide | 4 | 4 | 4 | 4 |
| APHAG | Autophagy-related genes participate in a form of cell death characterized by the formation of an internal autophagosome where pathogens are degraded | 19 | 19 | 22 | 22 |
| SRRP | Small regulatory RNA pathway members are involved in RNA interference and include argonautes, dicers, piwis, and helicases | 28 | 23 | 22 | 20 |
| SPZ | Spaetzle-like proteins contain a cysteine knot domain, the cleavage of Spaetzle results in binding of the product to the Toll receptor and subsequent activation of the Toll pathway | 5 | 5 | 6 | 6 |
| TOLL | Toll receptors connect extracellular pathogen recognition to intracellular Toll pathway signaling and activation of immune defense responses | 12 | 6 | 9 | 6 |
|
|
|
|
|
| |
Note.—Brief descriptions of immune gene families or pathway components are presented along with counts of the numbers of genes and OGs for the mosquito and fly catalogs.
Fig. 1.Evolutionary feature profiles of mosquito immune gene families. Evolutionary profiling highlights similar and contrasting patterns across all 36 immune gene families or subfamilies (rows). Deviations from the typical metric values for the suite of 18 evolutionary feature metrics (columns) are computed as the difference between the family mean and the average over all OGs from other immune-related gene families (Δ). For visualization, values of Δ are scaled by the absolute maximum Δ per metric, that is, for each metric the distribution is transformed by dividing all values by the absolute maximum Δ. Values therefore range from a minimum of –1 for metrics where the largest deviation is below the mean, that is lower than other families, and the maximum of 1 for metrics where the largest deviation is above the mean, that is higher than other families. The significance of the difference of the distribution of metric values (no scaling) for each family compared with all other families was assessed using the Wilcoxon rank-sum (Mann–Whitney U) test and a permutation test (asterisks correspond to the lower P value from these two tests; ***P ≤ 0.01, **P ≤ 0.05, *P ≤ 0.1). Feature acronyms are defined in table 1. Family acronyms are defined in table 2 and are colored according to categories defined based on their putative roles in the principal immune phases: classical recognition (red), other recognition (blue), pathway signaling (bright green), pathway modulation (purple), cascade modulation (orange), antimicrobial effectors (pink), effector enzymes (olive green), autophagy (dark cyan), RNAi (black), cytokines (brown), and toll receptors (dark green). See text for definitions of evolutionary feature acronyms: taxonomic spread and copy-number features in blue; sequence-based features in red. Evolutionary feature profiles of mosquito immune gene families with median differences (Δ) are presented in supplementary figure S1, Supplementary Material online.
Characteristic Evolutionary Features of Immune Gene Families and Subfamilies.
| Family | Significantly higher | Significantly lower | Interpretation summary |
|---|---|---|---|
| GALE | – | – | No extreme features |
| GNBP | DUP, ACN, SYN, WGA | – | Duplications, maintained neighborhood, widely alignable |
| PGRP | EXP, NSD, SSD | – | Duplications, population variation |
| SCRA | AGE | CON | Ancient, stable copy-number |
| SCRB | AGE, UNI, WGA, PHC | PDN, SEL, CON, NSD | Ancient, widespread, widely alignable, constrained sequence, constrained substitutions, stable copy-number, population conservation |
| CTL | SEL, CON, SYN | UNI, SSD, WGA, PHC | Relaxed substitutions, losses, maintained neighborhood, widespread, population conservation, sparsely alignable, relaxed sequence |
| FREP | ACN, SEL, CON, NSP | AGE, UNI, STA, SYN, WGA, PHC | Duplications, relaxed substitutions, losses, amino acid divergence, young, sparse, unstable copy-number, shuffled neighborhood, sparsely alignable, relaxed sequence |
| LRIM | PDN, PDS, SEL, CON, EVR, NSP, NSD | AGE, UNI, EXP, WGA, PHC | Relaxed substitutions, losses, amino acid divergence, population variation, young, sparse, stable copy-number, sparsely alignable, relaxed sequence |
| ML | DUP, ACN, EXP, SYN | PDS, STA | Duplications, maintained neighborhood, constrained substitutions, unstable copy-number |
| NIMROD | – | SYN | Shuffled neighborhood |
| TEP | EVR, NSD, SSD | WGA, PHC | Amino acid divergence, population variation, sparsely alignable, relaxed sequence |
| IMDSIG | AGE, UNI, STA, SYN, EVR | CON | Ancient, widespread, stable copy-number, maintained neighborhood, amino acid divergence |
| JASTSIG | EVR, SSD | SYN | Amino acid divergence, population variation, shuffled neighborhood |
| TOLLSIG | STA | CON, WGA | Stable copy-number, sparsely alignable |
| IMDMOD | AGE, UNI, PDS, SSD, PHC | CON, EVR, NSP | Ancient, widespread, relaxed synonymous substitutions, population variation, constrained sequence, stable copy-number, amino acid conservation |
| JASTMOD | – | – | No extreme features |
| TOLLMOD | AGE, UNI, STA, PHC | PDN, SEL, CON, EVR, NSP, NSD | Ancient, widespread, stable copy-number, constrained sequence, relaxed substitutions, amino acid divergence, population variation |
| CASP | DUP, ACN, CNV, CON | SSD | Duplications, losses, population conservation |
| CLIPA | – | AGE, UNI, SYN | Young, sparse, shuffled neighborhood |
| CLIPB | PDN, SEL, CON, EVR, NSP | AGE, UNI, PDS, STA, SSD, WGA, PHC | Relaxed substitutions, losses, amino acid divergence, young, sparse, constrained synonymous substitutions, unstable copy-number, population conservation, sparsely alignable, relaxed sequence |
| CLIPC | DUP, ACN, CNV, CON, EVR | UNI, STA, PHC | Duplications, losses, amino acid divergence, sparse, unstable copy-number, relaxed sequence |
| CLIPD | UNI | CON, EVR | Widespread, stable copy-number, amino acid conservation |
| CLIPE | EVR, NSP | AGE, UNI, PHC | Amino acid divergence, young, sparse, relaxed sequence |
| IAP | SEL | WGA | Relaxed substitutions, sparsely alignable |
| SRPN | CNV, EVR, NSD | SYN | Duplications, amino acid divergence, shuffled neighborhood |
| AMP | CON, NSD | AGE, UNI, STA, WGA | Losses, amino acid divergence, young, sparse, unstable copy-number, sparsely alignable |
| LYS | DUP, ACN, CNV, EXP, SYN | STA | Duplications, maintained neighborhood, unstable copy-number |
| PPO | DUP, ACN, CNV, EXP | STA | Duplications, unstable copy-number |
| GPX | EXP, PHC | STA | Duplications, constrained sequence, unstable copy-number |
| HPX | UNI, STA, WGA | SEL, CON, EVR | Widespread, stable copy-number, widely alignable, relaxed substitutions, amino acid conservation |
| TPX | AGE, UNI, STA, WGA, PHC | PDN, SEL, CON, EVR, NSP, SSD | Ancient, widespread, stable copy-number, widely alignable, constrained sequence, constrained substitutions, amino acid conservation, population conservation |
| SOD | WGA, PHC | PDN, SEL, EVR | Widely alignable, constrained sequence, constrained substitutions, amino acid conservation |
| APHAG | AGE, UNI, STA, WGA, PHC | ACN, CNV, PDN, PDS, SEL, CON, EVR, NSP, NSD | Ancient, widespread, stable copy-number, widely alignable, constrained sequence, constrained substitutions, amino acid conservation, population conservation |
| SRRP | AGE, UNI, PDN, PDS, STA, SYN, WGA, PHC | SEL, CON, EVR, NSP, NSD | Ancient, widespread, relaxed substitutions, stable copy-number, maintained neighborhood, widely alignable, constrained sequence, amino acid conservation, population conservation |
| SPZ | STA | – | Stable copy-number |
| TOLL | AGE, UNI, DUP, ACN, WGA | PDN, SEL, CON | Ancient, widespread, duplications, widely alignable, constrained substitutions |
Note.—For each immune-related immune family, evolutionary features with significantly higher or significantly lower metrics compared with other immune families are listed with summarized interpretations.
Fig. 2.Clustering heatmap and dendrograms of immune families and their evolutionary features. Groupings of families and subsets of features delineated by hierarchical clustering using the matrix of evolutionary feature profiles of all immune gene families. Hierarchical clustering results are visualized for the immune families (n = 36) and evolutionary features (n = 18) using scaled median metrics with a Pearson’s correlation-based distance matrix and average linkage agglomerative clustering. The heatmap displays the relative values of the scaled metrics from low in blue to high in red. The dendrograms show the quantified distances (similarities) between each of the families, and between each of the features, and their groupings, determined by the clustering algorithm and distance method. Support for each node of the two dendrograms is shown with green-filled circles, using multiscale bootstrap resampling to estimate AU support values. PCA supports three major groupings of the four subsets of evolutionary features with PC1 and PC2 capturing 78.2% of the variance. Feature acronyms are defined in table 1. Family acronyms are defined in table 2 and are colored according to categories defined based on their putative roles in the principal immune phases: classical recognition (red), other recognition (blue), pathway signaling (bright green), pathway modulation (purple), cascade modulation (orange), antimicrobial effectors (pink), effector enzymes (olive green), autophagy (dark cyan), RNAi (black), cytokines (brown), and toll receptors (dark green). See text for definitions of evolutionary feature acronyms, colored according to groupings in the dendrogram and PCA. Clustering heatmap and dendrograms of immune families and their evolutionary features using mean metrics are presented in supplementary figure S2, Supplementary Material online.
Fig. 3.Network of immune family expression similarities based on the VectorBase Expression Map. The network layout optimized with a spring model provides a 2D visualization of expression similarities for pairwise comparisons of all 36 immune-related gene families computed as gene co-occurrence scores across the VectorBase Expression Map (AgamP4.11 VB-2019-02). Families with more similar gene expression profiles are placed closer together in the graph. Significant co-occurrences are indicated with connecting lines: light blue <0.05, royal blue <0.01, and dark blue <0.005 with line thickness scaled to the P value and co-occurrence scores indicated for all pairs with P<0.01. Family acronyms are defined in table 2 and are colored according to categories defined based on their putative roles in the principal immune phases: classical recognition (red), other recognition (blue), pathway signaling (bright green), pathway modulation (purple), cascade modulation (orange), antimicrobial effectors (pink), effector enzymes (olive green), autophagy (dark cyan), RNAi (black), cytokines (brown), and toll receptors (dark green).
Fig. 4.Pairwise comparisons of immune family expression similarity and evolutionary similarity. Evolutionary similarities (based on feature metric medians) of pairs of gene families are compared with their expression similarities at (A) fine-scale resolution and (B) broad-scale resolution. Pairs of families with significant (P<0.05) gene expression co-occurrence scores are shown with purple circles, with nonsignificant pairs shown in gray, and with circle sizes scaled by the P value. Family acronyms are defined in table 2 and are colored according to categories defined based on their putative roles in the principal immune phases: classical recognition (red), other recognition (blue), pathway signaling (bright green), pathway modulation (purple), cascade modulation (orange), antimicrobial effectors (pink), effector enzymes (olive green), autophagy (dark cyan), RNAi (black), cytokines (brown), and toll receptors (dark green).