| Literature DB >> 29676774 |
José Aguilar-Rodríguez1,2,3, Leto Peel4,5, Massimo Stella6, Andreas Wagner1,2,7, Joshua L Payne2,8.
Abstract
Recent advances in high-throughput technologies are bringing the study of empirical genotype-phenotype (GP) maps to the fore. Here, we use data from protein-binding microarrays to study an empirical GP map of transcription factor (TF) -binding preferences. In this map, each genotype is a DNA sequence. The phenotype of this DNA sequence is its ability to bind one or more TFs. We study this GP map using genotype networks, in which nodes represent genotypes with the same phenotype, and edges connect nodes if their genotypes differ by a single small mutation. We describe the structure and arrangement of genotype networks within the space of all possible binding sites for 525 TFs from three eukaryotic species encompassing three kingdoms of life (animal, plant, and fungi). We thus provide a high-resolution depiction of the architecture of an empirical GP map. Among a number of findings, we show that these genotype networks are "small-world" and assortative, and that they ubiquitously overlap and interface with one another. We also use polymorphism data from Arabidopsis thaliana to show how genotype network structure influences the evolution of TF-binding sites in vivo. We discuss our findings in the context of regulatory evolution.Entities:
Keywords: Transcription factors; evolvability; molecular evolution; mutations
Mesh:
Year: 2018 PMID: 29676774 PMCID: PMC6055911 DOI: 10.1111/evo.13487
Source DB: PubMed Journal: Evolution ISSN: 0014-3820 Impact factor: 3.694
Data analyzed in this study
| Species | Number of TFs | Number of DNA‐binding domains |
|---|---|---|
|
| 217 | 25 |
|
| 118 | 16 |
|
| 190 | 25 |
Figure 1Intranetwork statistics for 190 TFs from M. musculus. The distributions of genotype network (A) diameter, (B) characteristic path length, (C) clustering coefficient, and (D) assortativity. (E) Assortativity (vertical axis) and its relationship to the number of genotypes in the dominant genotype network (horizontal axis). The horizontal dashed line indicates an uncorrelated (nonassortative) mixing pattern. (F) The distribution of the genotype network route factor.
Figure 2The structural properties of genotype networks are indicative of binding site diversity in extant populations of A. thaliana. Shannon's diversity of a TF's polymorphic binding sites is shown in relation to (A) the number of nodes, (B) characteristic path length, and (C) route factor of its genotype network. The label of the y‐axis applies to all panels.
Figure 3Matrices of internetwork relationships for the genotype networks of TF binding sites from M. musculus. Heatmaps of log10‐transformed (A) overlap and (B) , the probability of mutating from the genotype network of phenotype p to the genotype network of phenotype q. The rows and columns are grouped according to binding domain, which are ordered alphabetically on the horizontal axis: A, AP‐2; B, ARID/BRIGHT; C, AT hook; D, bHLH; E, bZIP; F, C2H2 ZF; G, CxxC; H, E2F; I, Ets; J, Forkhead; K, GATA; L, GCM; M, Homeodomain; N, Homeodomain + POU; O, IRF; P, MADS box; Q, Myb/SANT; R, Ndt80/PhoG; S, Nuclear receptor; T, RFX; U, SAND; V, SMAD; W, Sox; X, T‐Box; Y: TBP. Within the DNA‐binding domain groups, the rows and columns are ordered by the size of each TF's dominant genotype network, such that network size increases from top to bottom and from left to right. Labels on the vertical axis indicate the name of the TFs, which can be read on the computer by zooming in. Cells colored in gray indicate either N/A values (on the diagonal) or values equal to zero (off‐diagonal).
Figure 4Phenotype space covering. (A) The proportion of phenotypes covered as a function of the mutational radius n from a given binding site, averaged across all binding sites of the murine TF Sp110. The maximum proportion of phenotypes covered plateaus at a much lower level when considering just neutral mutations than when considering non‐neutral mutations. Error bars are the standard deviations of the mean. (B) The maximum proportion of phenotypes covered by neutral mutations as a function of the number of binding sites in the dominant genotype network, for all 190 murine TFs. The black line shows the fitted linear regression to the data () and the shaded gray area denotes 95% confidence intervals. The figure also shows the Spearman's correlation and its associated P‐value.
Figure 5Matrices of internetwork relationships for the genotype networks of binding domains from M. musculus. Heatmaps of log10‐transformed (A) overlap and (B) , the probability of mutating from the genotype network of phenotype p to the genotype network of phenotype q. Each row and column represents a different genotype network. Domains are ordered alphabetically. Cells colored in gray indicate either N/A values (on the diagonal) or values equal to zero (off‐diagonal).