| Literature DB >> 34079582 |
Vladimir M Jovanovic1,2, Melanie Sarfert1, Carlos S Reyna-Blanco3,4, Henrike Indrischek5,6,7, Dulce I Valdivia8, Ekaterina Shelest9, Katja Nowick1.
Abstract
Gene regulatory factors (GRFs), such as transcription factors, co-factors and histone-modifying enzymes, play many important roles in modifying gene expression in biological processes. They have also been proposed to underlie speciation and adaptation. To investigate potential contributions of GRFs to primate evolution, we analyzed GRF genes in 27 publicly available primate genomes. Genes coding for zinc finger (ZNF) proteins, especially ZNFs with a Krüppel-associated box (KRAB) domain were the most abundant TFs in all genomes. Gene numbers per TF family differed between all species. To detect signs of positive selection in GRF genes we investigated more than 3,000 human GRFs with their more than 70,000 orthologs in 26 non-human primates. We implemented two independent tests for positive selection, the branch-site-model of the PAML suite and aBSREL of the HyPhy suite, focusing on the human and great ape branch. Our workflow included rigorous procedures to reduce the number of false positives: excluding distantly similar orthologs, manual corrections of alignments, and considering only genes and sites detected by both tests for positive selection. Furthermore, we verified the candidate sites for selection by investigating their variation within human and non-human great ape population data. In order to approximately assign a date to positively selected sites in the human lineage, we analyzed archaic human genomes. Our work revealed with high confidence five GRFs that have been positively selected on the human lineage and one GRF that has been positively selected on the great ape lineage. These GRFs are scattered on different chromosomes and have been previously linked to diverse functions. For some of them a role in speciation and/or adaptation can be proposed based on the expression pattern or association with human diseases, but it seems that they all contributed independently to human evolution. Four of the positively selected GRFs are KRAB-ZNF proteins, that induce changes in target genes co-expression and/or through arms race with transposable elements. Since each positively selected GRF contains several sites with evidence for positive selection, we suggest that these GRFs participated pleiotropically to phenotypic adaptations in humans.Entities:
Keywords: KRAB-ZNF; archaic humans; gene regulatory evolution; great apes; phenotypic evolution; primate; speciation; transcription factor
Year: 2021 PMID: 34079582 PMCID: PMC8166252 DOI: 10.3389/fgene.2021.662239
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
List of mined databases with the description of stored information and covered topic.
| Ensembl | Provides access to genomes, their annotation information, domains, structures, external links and some analysis tools. In addition, it contains information on variation for human and chimpanzee genomes, and population-based distribution of the variation |
| EMBL-EBI Expression Atlas | Provides the freely available information about gene and protein expression, from microarray, bulk and single cell RNA-Seq studies |
| NCBI (National Center for Biotechnology Information) | Provides access to gene, genome and protein sequences, structure and annotation information, publications, as well as information on genome variation (for instance, SNPs) |
| UniProt | Contains various general information on proteins, their sequence and structure, function, domains and ontology |
| OMIM (Online Mendelian Inheritance in Man) | Contains information on known mendelian disorders and focuses on the relationship between phenotype and genotype |
| GO (Gene Ontology) | Contains information on the functions of genes, together with their hierarchical classification into functional categories |
| Kyoto Encyclopedia of Genes and Genomes (KEGG) | Provides information on a large array of high-level functions of genes and proteins, collecting their orthologs, metabolic pathways, disease-related network variation etc. |
| ProteomicsDB | Provides information on human proteome, isoforms of proteins, expression per tissue, and other analytics |
| Bgee | Retrieve and compare gene expression patterns between animal species |
| STRING | Contains information on protein-protein interactions |
| ProteomeHD | Contains information on co-regulation between the proteins, with additional analytics and GO terms |
| EdgeExpressDB (FANTOM4-EEDB) | Provides information on co-expression networks between expressed components of mammalian genomes |
| FANTOM CAT (FANTOM5) | Provides atlases of functional parts of mammalian genomes such as promoters, enhancers, lncRNAs and miRNAs, together with metadata |
FIGURE 1(A) Schematic representation of the primate species tree used for the analyses, based on 10 kTrees (Arnold et al., 2010), with analyzed branches (human and great apes) shown in red color. Branch lengths do not represent evolutionary distances. The number of GRFs within specific genomes that was included in the analyses, after filtering the read-throughs and recent duplications, is given in brackets following the species abbreviation. Species abbreviations: Ogar, Otolemur garnetti; Psim, Prolemur simus; Pcoq, Propithecus coquereli; Mmur, Microcebus murinus; Tsyr, Tarsius syrichta; Sbbo, Saimiri boliviensis; Ccap, Cebus capucinus; Cjac, Callithrix jacchus; Anan, Aotus nancymaae; Ppyg, Pongo abelii; Ggor, Gorilla gorilla; Ptro, Pan troglodytes; Ppan, Pan paniscus; Hsap, Homo sapiens; Nleu, Nomascus leucogenys; Rbie, Rhinopithecus bieti; Rrox, Rhinopithecus roxellana; Ptep, Piliocolobus tephrosceles; Capa, Colobus angolensis palliates; Mmul, Macaca mulatta; Mfas, Macaca fascicularis; Mnem, Macaca nemestrina; Panu, Papio anubis; Tgel, Theropithecus gelada; Mleu, Mandrillus leucophaeus; Caty, Cercocebus atys; Csab, Chlorocebus sabaeus. (B) Distribution of the number of GRFs having a particular number of orthologs.
FIGURE 2The 10 most abundant families of great apes GRFs. Within the C2H2-ZNF family, the number of KRAB-ZNF genes is marked with a black bar. Species abbreviations are the same as in Figure 1A.
FIGURE 3Venn diagrams of PAML (orange) and HyPhy (green) results and their overlap (yellow). The names of the candidate genes and the number of positively selected sites are indicated. Upper row: branch-site selected GRFs (left) and total PSS (right) for the human branch, lower row: the same for the great ape branch.
The 11 positively selected codons (PSS) within three genes with positive selection in the human branch that were detected by BEB and MEME, along with the respective nucleotide and amino acid (in brackets) changes.
| 726 | AGT (S) > AGA (R) | / | True PSS |
| 728 | GGC (G) > GAC (D) | / | True PSS |
| 155 | CCT (P) > TCT (S) | / | True PSS |
| 573 | ACA (T) > ATA (I) | rs199686868 | True PSS |
| 591 | CGG| CAG| GTT (R| Q|V) > TGG (W) | rs200381384 | True PSS |
| 629 | ACA (T) > AGA (R) | rs112192848 | True PSS |
| 657 | ACA (T) > AGA (R) | rs112679149 | True PSS |
| 681 | AG[A| T] (R| S) > ACT (T) | rs6875787 | Minor allele |
| 737 | TGT| ATT (C| I) > AGA (R) | / | True PSS |
| 219 | CAA (Q) > CTA (L) | / | True PSS |
| 348 | GAC (D) > GAA (E) | rs13064905 | False positive |
| 464 | [A| C]GT (S| R) > CAT (H) | rs1808125 | False positive |
FIGURE 4Domain structures of (A) PRDM9; (B) ZNF860. Green boxes indicate ZNF domains. The pie charts represent frequencies of the positively selected (yellow) and ancestral variants (purple) for the respective positions within the proteins.