| Literature DB >> 27222536 |
Xavier Prudent1, Genis Parra1, Peter Schwede1, Juliana G Roscito1, Michael Hiller2.
Abstract
The growing number of sequenced genomes allows us now to address a key question in genetics and evolutionary biology: which genomic changes underlie particular phenotypic changes between species? Previously, we developed a computational framework called Forward Genomics that associates phenotypic to genomic differences by focusing on phenotypes that are independently lost in different lineages. However, our previous implementation had three main limitations. Here, we present two new Forward Genomics methods that overcome these limitations by (1) directly controlling for phylogenetic relatedness, (2) controlling for differences in evolutionary rates, and (3) computing a statistical significance. We demonstrate on large-scale simulated data and on real data that both new methods substantially improve the sensitivity to detect associations between phenotypic and genomic differences. We applied these new methods to detect genomic differences involved in the loss of vision in the blind mole rat and the cape golden mole, two independent subterranean mammals. Forward Genomics identified several genes that are enriched in functions related to eye development and the perception of light, as well as genes involved in the circadian rhythm. These new Forward Genomics methods represent a significant advance in our ability to discover the genomic basis underlying phenotypic differences between species. Source code: https://github.com/hillerlab/ForwardGenomics/.Entities:
Keywords: evolutionary and comparative genomics; gene loss.; phenotype–genotype associations
Mesh:
Year: 2016 PMID: 27222536 PMCID: PMC4948712 DOI: 10.1093/molbev/msw098
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Overview of three Forward Genomics methods. (A) Global %id values are computed by comparing the reconstructed sequence of the common ancestor of the species of interest (blue circle) to the sequence of an extant species. Local %id values are computed between the sequences at the start and end of each branch, which is either a reconstructed ancestral sequence (blue or green circle) or the sequence of an extant species. The branches in the phylogenetic tree are proportional to the number of substitutions per neutral site. Outgroup species are used to reconstruct the common ancestor. (B) The perfect-match method (Hiller et al. 2012b) assumes that the given phenotypic presence/absence (checkmark/cross) vector includes trait-losses in independent lineages and conducts a genome-wide search for genomic regions where all trait-loss species have a lower global %id value (higher sequence divergence) compared with all trait-preserving species. This is illustrated by a positive grey margin that separates the global %id values of both groups of species. (C) The GLS Forward Genomics method derives a covariance matrix that captures the phylogenetic relatedness between species. As illustrated for the first two species, the covariance between two species is the summed length of the branches that are shared between both species (highlighted in orange). The variance of a species is the summed length of all branches from the common ancestor to this species. Lower case letters indicate the length of the branches in the phylogenetic tree. A phylogenetic GLS approach (Grafen 1989) is used to compute a linear regression between the transformed normalized global %id values and the phenotypic pattern. The significance of a positive slope of the regression line is used as the significance of the association between phenotype and genotype. (D) The Branch method uses Dollo parsimony to estimate ancestral phenotypic states given the presence/absence pattern of the trait in the extant species. Each branch is then classified as trait-loss (red) or trait-preserving (blue). Local %id values are normalized by the expected value of a branch of the same length. If a genomic region is involved in the trait-loss, we expect that trait-loss branches are associated with lower normalized local %id values. The significance of a positive Pearson correlation coefficient is used as the significance for the association between phenotype and genotype.
Fig. 2.Performance of the three Forward Genomics methods on simulated data shown by precision-sensitivity plots (right) for three of the 32 trait-loss scenarios (left). Trait losses occurred at the red crosses in the phylogeny, and following trait loss the 210 trait-involved genomic regions evolved neutrally along the parts of the branches shown in red. The red cross in the precision-sensitivity plot for the perfect-match method marks its performance when we consider only genomic regions where all trait-loss species have a lower global %id value compared with all trait-preserving species. The other 29 trait loss scenarios are shown in supplementary figures 5–33, Supplementary Material online.
Fig. 3.Performance of the three Forward Genomics methods on 32 trait-loss scenarios. (A) The sensitivity at 90% precision is plotted for 32 different trait-loss scenarios. Consistently, the GLS and branch method improve the sensitivity compared with the perfect-match method. (B–F) Properties of the trait-loss scenarios and properties of the trait-involved genomic regions influence the performance: (B) Age of the trait loss, measured by how long the trait-involved elements evolved neutrally; (C) number of independent trait-losses; (D) evolutionary rate in the trait-loss lineages; (E) length of trait-involved elements; and (F) strength of selection on trait-involved elements in the branches where they evolve under selection. Weak, medium, or strong refers to genomic regions that accept mutations with an average probability of > 0.66, 0.33–0.66, <0.33, respectively.
Fig. 4.Robustness of the new Forward Genomics methods to uncertainties in the phylogenetic tree. The sensitivity at a precision of 90% of all 32 trait loss scenarios is shown for (A) the Epitheria and the Exafroplacentalia tree topology (supplementary fig. 34A, Supplementary Material online) and (B) three trees where random noise was added to each branch length (supplementary fig. 34B, Supplementary Material online). Solid lines show the results using the phylogeny that was used to produce the simulated data (reproduced from fig. 3 for comparison). Please note that the perfect-match method considers neither topology nor branch lengths, thus always gives the same results. The number of scenarios where the achieved sensitivity is higher than the sensitivity of the perfect-match method is shown in the legend.
Fig. 5.The GLS and branch method outperform the perfect-match method on the trait “loss of vitamin C synthesis”. (A) Gulo exons are ranked higher with the GLS and the branch method than with perfect-match. For GLS and the branch method, each conserved coding region was ranked by its P-value. For perfect-match, we used the size of the margin for ranking, which is the difference between the lowest %id value of a trait-preserving species and the highest %id value of a trait-loss species. Gulo exon 2 is ranked first for all three methods. (B) The significance of most Gulo exons increases if the megabat P. vampyrus is excluded from the list of trait loss species. The trait loss in P. vampyrus happened more recently than in Haplorrhini primates, guinea pig, and the microbat M. lucifugus. We computed the difference between the margin (perfect-match) and the log P-value (GLS and branch method) between the screen that used all nonvitamin C synthesizing species and the screen where P. vampyrus was excluded. Positive differences indicate a better match to the trait loss. The significance of Gulo exons 9 and 10 decreases because both exons are deleted in P. vampyrus. Gulo exon 1, which only encodes the start codon, is excluded.
Fig. 6.The GLS and branch method detects several conserved coding regions that are diverged in two blind mammals, the blind mole rat, and the cape golden mole. Manhattan plots show the genomic location of 184,412 conserved coding regions and their associated P-values computed by the GLS (A) and branch method (B). All conserved coding regions that correspond to exons of the genes with a function in eye development and perception of light (supplementary table 2, Supplementary Material online) are shown in red.
Functional Enrichments of the 208 Genes for Which the GLS and Branch Method Detected Increased Divergence in Blind Mammals.
| Ontology | Adjusted P-value | Genes |
|---|---|---|
| Sensory perception of light stimulus (GO:0050953) | 2.3E−09 | CRYBB1;ABCA4;CRYBB3;CRYBA1;KRT12;CRYBB2;CRYBA4;CACNA1F;ARR3;GUCY2F;USH2A;GABRR2;BFSP2;OPN1MW;RDH5;GJA8;RGR;IMPG1 |
| Visual perception (GO:0007601) | 2.3E−09 | |
| Sensory perception (GO:0007600) | 8.8E−05 | CRYBB1;ABCA4;CRYBB3;CRYBA1;CRYBB2;KRT12;CRYBA4;TAAR3;ARR3;CACNA1F;GUCY2F;USH2A;GABRR2;BFSP2;OPN1MW;RDH5;GJA8;RGR;IMPG1 |
| Lens development in camera-type eye (GO:0002088) | 0.008 | LIM2;CRYBA2;CRYBA1;GJA8;GJE1 |
| Detection of light stimulus (GO:0009583) | 0.011 | OPN1MW;GJA10;ABCA4;RDH5;CACNA2D4;CACNA1F;RGR;GUCY2F |
| Detection of visible light (GO:0009584) | 0.026 | OPN1MW;GJA10;ABCA4;RDH5;CACNA2D4;CACNA1F;GUCY2F |
| Structural constituent of eye lens (GO:0005212) | 1.5E−09 | LIM2;BFSP2;BFSP1;CRYBB1;CRYBB3;CRYBA2;CRYBB2;CRYBA1;CRYBA4 |
| MP0005551_abnormal_eye_electrophysiology | 7.8E−08 | ABCA4;CACNA2D4;CACNA1F;ARR3;USH2A;GUCY2F;GABRR1;GJA10;RDH5;SLC16A8;GJA8;RGS11;RGR |
| MP0002697_abnormal_eye_size | 0.020 | LIM2;HECTD1;HSF4;CRYBA1;CRYBB2;GJA8;GJE1 |
| MP0005193_abnormal_anterior_eye | 0.018 | LIM2;BFSP2;BFSP1;HSF4;KRT12;CRYBB2;CRYBA1;GJA8;LYST;GJE1 |
| MP0003787_abnormal_imprinting | 0.032 | SNRPN;ARID4A;ARID4B |
| MP0008877_abnormal_DNA_methylation | 0.040 | |
| MP0005253_abnormal_eye_physiology | 0.040 | BFSP2;ABCA4;RDH5;GJA8;RGR |
| Zonular cataract (HP:0010920) | 2.6E−07 | BFSP1;CRYBB1;HSF4;CRYBB3;CRYBA1;CRYBB2;CRYBA4;GJA8 |
| Corneal dystrophy (HP:0001131) | 0.007 | OPN1MW;CRYBB1;CRYBB2;KRT12;CRYBA4;GJA8 |
| Nuclear cataract (HP:0100018) | 0.004 | CRYBB1;HSF4;CRYBB3;GJA8 |
| Cataract | 4.6E−09 | LIM2;BFSP2;BFSP1;CRYBB1;CRYBB3;HSF4;CRYBA1;CRYBB2;CRYBA4;GJA8 |
| Retina | 0.009 | CH25H;ABCA4;RDH5;CRYBB2;SLC16A8;RGR;ARR3;IMPG1;TNS1 |
| Lens | 0.044 | TKTL1;LIM2;UHRF2;CRYBB1;CYP3A44;CRYBB3;CRYBA2;CRYBA1;CRYBB2;CRYBA4;GJE1;BFSP2;BFSP1;EWSR1;GJA10;CAPRIN2;HSF4;PCNX;GJA8;BIRC7;AGFG1 |
| Retina | 0.044 | PPM1N;ABCA4;BRAF;CACNA2D4;ARR3;USH2A;GABRR2;GABRR1;PIK3CA;OPN1MW;BC030499;UBN2;FBXL5;IMPG1;DRD4;SERINC4 |
| Crystallin | 5.8E−05 | CRYBB1;CRYBA2;CRYBA1;CRYBB2;CRYBA4 |
Enrichments were computed by Enrichr (Chen et al. 2013).