| Literature DB >> 33211099 |
Audrey Defosset1, Arnaud Kress1, Yannis Nevers1,2,3,4, Raymond Ripp1, Julie D Thompson1, Olivier Poch1, Odile Lecompte1.
Abstract
In the multiomics era, comparative genomics studies based on gene repertoire comparison are increasingly used to investigate evolutionary histories of species, to study genotype-phenotype relations, species adaptation to various environments, or to predict gene function using phylogenetic profiling. However, comparisons of orthologs have highlighted the prevalence of sequence plasticity among species, showing the benefits of combining protein and subprotein levels of analysis to allow for a more comprehensive study of genotype/phenotype correlations. In this article, we introduce a new approach called BLUR (BLAST Unexpected Ranking), capable of detecting genotype divergence or specialization between two related clades at different levels: gain/loss of proteins but also of subprotein regions. These regions can correspond to known domains, uncharacterized regions, or even small motifs. Our method was created to allow two types of research strategies: 1) the comparison of two groups of species with no previous knowledge, with the aim of predicting phenotype differences or specializations between close species or 2) the study of specific phenotypes by comparing species that present the phenotype of interest with species that do not. We designed a website to facilitate the use of BLUR with a possibility of in-depth analysis of the results with various tools, such as functional enrichments, protein-protein interaction networks, and multiple sequence alignments. We applied our method to the study of two different biological pathways and to the comparison of several groups of close species, all with very promising results. BLUR is freely available at http://lbgi.fr/blur/.Entities:
Keywords: comparative genomics; evolution; genotype/phenotype relations; sequence analysis
Year: 2021 PMID: 33211099 PMCID: PMC7851591 DOI: 10.1093/gbe/evaa248
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.Schematic representation of the proposed approach. (A) The relative conservations for two proteins (1 and 2) in 13 different species. Colored blocks represent conserved sequence regions (blocks). A variation of hue between two blocks of the same color indicates a small divergence in sequence. Protein 1 shows expected taxonomic variations. For protein 2, the orange block is missing in species F and G. (B) The BLAST results for proteins 1 and 2 using Species A as the query. In the Protein 1 BLAST, the species F, G and H, I, J are ranked together, since their respective sequences are similar. In the Protein 2 BLAST, Species H, I, and J are ranked similarly to Protein 1, whereas species F and G are ranked further down due to the missing orange block.
. 2.Schematic representation of the BLUR protocol. A reference proteome is compared with a proteome database with BlastP, and the results are stored in a database (not shown here). For each user-selected groups 1 and 2, BLUR establishes the relative conservation of both groups for each protein using three criteria: ratio of mean E-value in log space, difference of mean distance to the query, and ranking of one group compared with the other. The relative conservation is then analyzed on the whole proteome level, and outliers are detected using Tukey’s fences method, and classified into priority lists.
Query Species Available in BLUR for Each of the Three Life Domains, with the Number of Proteins in the Proteome Used
| Domain | Query Species (Taxonomy ID) | Number of Proteins | Life Group (number of species) |
|---|---|---|---|
| Eukaryota |
| 21,044 | Opisthokonta (557)/Metazoa (169) |
|
| 22,298 | ||
|
| 24,125 | Metazoa (169) | |
|
| 13,780 | ||
|
| 19,990 | ||
|
| 6,049 | Fungi (384) | |
|
| 5,142 | ||
|
| 6,601 | ||
|
| 27,619 | Viridiplantae (73) | |
|
| 14,266 | ||
|
| 4,995 | Eukaryota (734) | |
|
| 5,340 | ||
|
| 12,731 | ||
|
| 8,031 | ||
|
| 15,903 | ||
| Bacteria |
| 1,852 | Bacteria (3,846) |
|
| 4,260 | ||
|
| 8,038 | ||
|
| 1,027 | ||
|
| 895 | ||
|
| 4,347 | ||
|
| 4,782 | ||
|
| 1,553 | ||
| Archaea |
| 536 | Archaea (179) |
|
| 1,788 | ||
|
| 2,938 | ||
|
| 3,208 |
Note.—The last column indicates in which life group the query species can be used, as well as the number of species in the group.
. 3.Home page of the BLUR website with the different steps necessary to run BLUR. Step 1 allows the user to select one of the three life domains (Eukaryota, Bacteria, Archaea), then the query species used for the BLAST search, as well as the life group to study. Step 2 allows the user to select the first group of interest, which can either be a clade, several species, or several clades, but must be in the life group selected in Step 1. Finally, Step 3 consists in the selection of the second group to be compared, which can either be chosen by the user, or automatically using taxonomy. The last step is the selection of the type of relations to use for the BLAST computation: orthology (default) or homology. The user can also restore a previous session using a session ID provided on the result page.
. 4.Main interaction network of proteins absent in Dikarya (blue nodes), and proteins predicted to have differential conservation with High priority (orange nodes) or Mid priority (green nodes). The network contains highly linked clusters of proteins that are both absent and divergent in Dikarya, and that are enriched in GO terms corresponding to ciliary components, thus validating the proposed method.
. 5.Multiple sequence alignment of ARMC4. (A) Overview of the multiple sequence alignment of ARMC4. Vertebrates (ciliated species) and ciliated Fungi sequences are similar with a long N-terminal domain that is absent in nonciliated Fungi. (B) Zoom on a portion of the alignment where differential conservation can be observed. Ciliated Fungi are very similar to Vertebrates, whereas other, nonciliated Fungi are more divergent.
. 6.Interaction networks of proteins absent in Enterobacterales (blue nodes), of high priority (orange nodes) and of mid priority (green nodes). Several clusters contained over ten proteins with high confidence links between them, including a cluster containing the main Sox proteins, and a cluster corresponding to the iron–sulfur proteins found in the hdr cluster.
Examples of Application of BLUR Using Various Query Species and Groups of Interest
| Query species | Comparison | Protein lists | GO enrichment | Network | Network enrichment |
|---|---|---|---|---|---|
|
| Basidiomycota over Ascomycota | 469 absent in Ascomycota, 32 High priority, 112 Mid priority | RNA processing ( | Main network of 208 proteins: 140 absent, 14 High priority, 54 Mid priority | Several clusters: mRNA splicing ; ribosome biogenesis; regulation of signal transduction |
|
| Lophotrochozoa over Ecdysozoa | 775 Absent in Ecdysozoa, 23 High priority, 105 Mid priority | Nervous system process ( | 224 Proteins with a least one interaction: 177 Absent, 10 High, 37 Mid priority) | Several small networks: steroid biosynthetic process; regulation of apoptotic process; cilium assembly; cell cycle |
|
| Liliopsida over Eudicotyledons | 107 Absent in Eudicotyledons, 18 High priority, 81 Mid priority | Photosynthesis ( | 44 Proteins with at least one interaction: 15 absent, 7 High priority, 22 Mid priority | Photosynthesis |
|
| Betaproteobacteria over Alphaproteobacteria | 252 Absent in Alphaproteobacteria, 5 High priority, 28 Mid priority | Pilus organization ( | Main network of 91 proteins: 77 absent, 2 High priority, 12 Mid priority | Several clusters: cell motility; pilus organization; asexual reproduction |
|
| Selenomonadales over Veillonellales | 635 Absent in Veillonellales, 23 High priority, 34 Mid priority | Locomotion ( | Main network of 401 proteins: 364 absent, 18 High priority, 19 Mid priority | Several clusters: spore germination; locomotion; antibiotic metabolic process |
. 7.Examples of differential conservation detected by BLUR. Comparison was done between two groups of Actinopterygii, Otomorpha (above the red line), and Euteleosteomorpha (below the red line). Homo sapiens was used as a query species, the multiple sequence alignments contain sequences of mammals. (A) Multiple sequence alignment of CNP. Differential conservation of a large region can be seen in protein sequences of Otomorpha (B) Multiple sequence alignment of CCDC92. Differential conservation of a small motif can be seen in protein sequences of Otomorpha. (C) Multiple alignment of PDCL. Differential conservation of single amino acids can be observed in protein sequences of Otomorpha.