| Literature DB >> 30496538 |
Richard J Nuckels1,2, Chris C Nice1, Dana M García1.
Abstract
We analyzed evolutionary rates of conserved, duplicated myosin V (myo5) genes in nine teleost species to examine the outcomes of duplication events. Syntenic analysis and ancestral chromosome mapping suggest one tandem gene duplication event leading to the appearance of myo5a and myo5c, two rounds of whole genome duplication for vertebrates, and an additional round of whole genome duplication for teleosts account for the presence and location of the myo5 genes and their duplicates in teleosts and other vertebrates and the timing of the duplication events. Phylogenetic analyses reveal a previously unidentified myo5 clade that we refer to now as myo5bb. Analysis using dN/dS rate comparisons revealed large regions within duplicated myo5 genes that are highly conserved. Codons identified in other studies as encoding functionally important portions of the Myo5a and Myo5b proteins are shown to be highly conserved within the newly identified myo5bb clade and in other myo5 duplicates. As much as 30% of 319 codons encoding the cargo-binding domain in the myo5aa genes are conserved in all three codon positions in nine teleost species. For the myo5bb cargo-binding domain, 6.6% of 336 codons have zero substitutions in all nine teleost species. Using molecular evolution assays, we identify the myo5bb branch as being subject to evolutionary rate variation with the cargo-binding domain, having 20% of the sites under positive selection and the motor domain having 8% of its sites under positive selection. The high number of invariant codons coupled with relatively high dN/dS values in the region of the myo5 genes encoding the ATP-binding domain suggests the encoded proteins retain function and may have acquired novel functions associated with changes to the cargo-binding domain.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30496538 PMCID: PMC6372264 DOI: 10.1093/gbe/evy258
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Cladogram and syntenic diagram supporting key hypothesized evolutionary events in the history of myo5 genes in teleosts and other chordates. Putative vertebrate genome duplication events (R1 and R2) led to the creation of three myo5 copies, myo5a (green rectangle), myo5ba (brown oval), and myo5bb (blue triangle) in jawed vertebrates. The other copy that should have been created from two whole rounds of genome duplication likely became a pseudogene. The timing of the second genome duplication (R2*) of lamprey has been debated (see text for details). Each shape represents a gene, and the numbers under the teleost and mammal genes are listed below with the corresponding gene name. Shapes that are bordered and unshaded are identified as being orthologous and paralogous. Shaded shapes show orthologous relationships. Gene 1 is cyp19a1and it is colored as a pink rectangle. It is found near myo5aa and myo5ab/myo5c in teleosts and near myo5a/myo5c in spotted gar, mammals, and chicken. Gene 2 is mapk6 and it is shown as a black bordered, unshaded rectangle. It is found near myo5aa in teleosts and near myo5a/myo5c in spotted gar, mammals, chicken, duck, turtle, Xenopus, and shark. We found other mapk genes near myo5ba (unshaded, black bordered oval) and near myo5bb (unshaded, black triangle) gene families. We propose a tandem gene duplication event (TGD) that occurred before the divergence of jawed vertebrates which led to the formation of myo5a and myo5c as neighboring genes (green and gray rectangles in box). The TGD could have taken place before or after R2. The third whole genome duplication specific to teleosts (R3) led to the formation of myo5aa and myo5ab with a subsequent loss of a duplicated myo5c next to teleost myo5aa. The chromosomal locations for these genes on zebrafish are as follows: myo5aa chromosome 18, myo5ab chromosome 25, and myo5c chromosome 25 directly downstream of myo5ab. The location of myo5c directly downstream from myo5ab was observed in all teleosts examined. Likewise, the location for myo5c is directly downstream of myo5a in nonteleost vertebrates. Similar syntenic observations were made for other teleosts and nonteleosts, supporting the inference that myo5a and myo5c are tandem duplicates. Gene names corresponding with numbers listed under teleost and mammal genes are as follows: 1a-cyp19a1, 1b-cyp19b, 2a-mapk6, 2b-map2ka, 2c-mapk4, 3-gnb5, 4a-arpp19a, 4b-arpp19b, 5aa-myo5aa, 5ab-myo5ab, 5ba-myo5ba, 5bb-myo5bb, 5c-myo5c, 6-fam214a, 7a-onecut1, 7b-onecut3, 8-ap4e1, 9-rsl24d1, 10-prtgb, 11-pvrl1a, 12-chek1, 13-cfap53, 14-il7r, 15-capslb, 16-lmbrd2, 17-btbd2, 18-hmg20b, 19-unk13a, 20-mbd3b, 21-tcf3, 22-zbtb7, 23-atp8b2, 24-skai, 25-acaa2, 26-mex3d, 27-ensab, and 28-pigo. See supplementary table 1, Supplementary Material online for myo5 gene identifiers and chromosomal locations.
. 2.—Ten ancestral vertebrate proto-chromosomes have been previously described along with thirteen ancestral teleost chromosomes (Nakatani et al. 2007; Bian et al. 2016). All myo5 genes were traced back to ancestral vertebrate chromosome A (Panel A). After two whole rounds of genome duplication and a fission event, six chromosomal fragments (A0–A5) existed. myo5 genes and select coduplicated genes are shown in the boxed region along with what ancestral chromosome fragment these genes are derived from. The chromosomal location of the genes in various organisms is listed at the bottom of the boxed regions. “Hs” for H. sapiens, “Gg” for G. gallus, “Sp. Gar” for spotted gar, “Anc. Teleost” for Ancestral teleost, “Ch” for Chromosome, “LG” for linkage group. In panel B, ancestral teleost chromosomes are shown along with the 3 chromosomes that gave rise to the myo5 genes in teleosts (“Dr” for D. rerio, “Ol” for O. latipes).
. 4.—Alignment of myo5 sequences used in this study. Panel A shows the full-length alignment size using 87 species along with the three smaller subsets (motor domain, neck, and CBD) that were used for further characterization of the myo5 gene family. Panel B shows the smaller sequences found among lamprey and Ciona intestinalis. Panel C shows the sequences that are full length for the provided species or group of species with the first number in parentheses showing the number of full-length sequences available for that species or group of species and the second number showing the total number of myo5 sequences that have been found for that species or group of species. Panel D shows which sequences out of our total number of 87 sequences are truncated or missing some part of the full-length sequence.
Teleost dN/dSValues
| Whole Gene | 5′ End and ATP Binding | Neck and Actin-Binding | 3′ End and Cargo-Binding Domain | ||||
|---|---|---|---|---|---|---|---|
| 1,915 Codons | 217 Codons | 21 Codons | 742 Codons | 23 Codons | 319 Codons | 10 Codons | |
| 0.27 | 0.05 | 0.05 | 0.23 | 0.23 | 0.10 | 0.13 | |
| 0.36 | 0.12 | 0.02 | 0.41 | 0.14 | 0.35 | 0.23 | |
| 0.26 | 0.06 | 0.01 | 0.32 | 0.07 | 0.19 | 0.00 | |
| 0.41 | 0.08 | 0.02 | 0.39 | 0.05 | 0.32 | 0.25 | |
| 0.26 | 0.07 | 0.00 | 0.27 | 0.04 | 0.26 | **** | |
| Average | 0.31 | 0.08 | 0.02 | 0.32 | 0.11 | 0.24 | 0.14 |
Note.—Regions that play a role in functionality (motor domain and CBD) have very low dN/dS values. dN/dS values are reflective of codon changes that lead to synonymous (S) or nonsynonymous (N) codons. dN/dS values are presented for each clade for whole duplicated genes composed of 1,915 codons. dN/dS values are also presented for smaller regions for each clade which contain the ATP-binding domain, the actin-binding domain and neck region, and the CBD. For the smaller subsets of codons encoding the ATP-binding domain (21 codons), four of the five myo5 genes in teleosts show higher levels of conservation than the larger 5′ region; whereas the myo5aa clade has the same dN/dS value for the smaller subset of codons. For the smaller subset of codons related to actin-binding (23 codons) there is strong conservation for both myo5b duplicates and for myo5c. For the smaller subset of codons encoding the CBD (10 codons), we see strong conservation for the myo5ba duplicate, suggesting the protein encoded likely binds to Rab11a, and the Myo5bb duplicate may bind to other cargo. There is not a value listed for myo5c and the smaller subset of 10 codons (****) in the CBD as it is unknown what amino acids are involved in this process for the orthologous myo5c in human.
Percentage of Invariant Codons in Teleost myo5 Genes
| Total Codons | Invariants/Extreme Purifying Selection % of Codons Where d | |
|---|---|---|
| 217 | 13.4 | |
| 217 | 12.0 | |
| 217 | 13.4 | |
| 217 | 12.9 | |
| 217 | 11.5 | |
| 728 | 16.8 | |
| 742 | 6.2 | |
| 748 | 11.0 | |
| 734 | 11.2 | |
| 703 | 7.5 | |
| 319 | 30.1 | |
| 322 | 7.5 | |
| 323 | 23.8 | |
| 336 | 6.6 | |
| 327 | 10.4 | |
| 1,908 | 16.7 | |
| 1,938 | 8.0 | |
| 1,904 | 13.6 | |
| 1,668 | 8.1 | |
| 1,761 | 11.8 |
Note.—The percentage of invariant codons for each myo5 clade in teleosts is surprisingly high. The number of codons used in each alignment is shown along with the percentage of invariant sites (codons) for each alignment. For each clade there are 8–9 teleost sequences. For some of the regions there are large differences in the number of invariant sites found in the myo5ab clades compared with the myo5aa clades and when comparing the myo5ba clades to the myo5bb clades. The largest differences occur in the 3′ end of the myo5 genes where the CBD is located. Extreme purifying selection is defined here as dN = dS = 0. No substitutions were identified in any of the 3 codon positions for these sites. For the CBD (dilute domain) 30% of the codons for the teleost myo5aa clade showed extreme purifying selection, but only 7.5% of codons in the myo5ab clade showed extreme purifying selection. The data were generated using MEGA6 and HyPhy.
Branch Site-Random Effects Likelihood (BS-REL) Test Results
| Branch | Positive | Neutral | Purifying | ||
|---|---|---|---|---|---|
| Motor domain | myo5bb clade | 0.022 | 0.08 | 0.34 | 0.58 |
| CBD | myo5b clade | 0.014 | 0.13 | 0.03 | 0.84 |
| myo5ba clade | 0.014 | 0.04 | 0.03 | 0.93 | |
| myo5bb clade | 0.040 | 0.20 | 0.26 | 0.54 |
Note.—Using the BS-REL test through the Datamonkey server, the CBD and motor domain showed evidence of episodic diversifying selection. Twenty percent of the sites along the myo5bb CBD branch are subject to positive selection, 26% of the sites along the same branch are subject to neutral selection, and 54% of the sites along this branch are subject to purifying selection. The results of the BS-REL test for the myo5bb clade are highlighted.
. 3.—(A) Phylogenetic tree for full-length coding sequence (6,468 bp) of myo5 using MrBayes 3.1 and a GTR+I+G model of evolution. Teleost myo5bb clade shown in blue with an extended branch leading up to the clade. Posterior probability values are provided for some nodes. If not shown, the posterior probability value ranges from 0.94 to 1. “X” labels in myo5ba and myo5bb clades denote posterior probability values between 0.62 and 0.78. The scale bar represents 0.1 substitutions per site. (B) Phylogenetic tree for the CBD (1,035 bp fragment) of myo5 using MrBayes 3.1 and a GTR+I+G model of evolution. This 1,035 bp fragment is found at the 3′ end of the myo5 gene and includes the coding sequence for the dilute domain for myo5a. Teleost myo5bb clade is shown in blue with an extended branch leading up to the clade. (C) Phylogenetic tree for the motor domain (a 651 bp fragment) of myo5 using MrBayes 3.1 and a GTR+I+G model of evolution. An alignment of 80 sequences from 18 different species was created, and the 5′end of the myo5 gene including the ATP-binding domain was used to generate this tree. The teleost myo5bb clade is shown in blue with an extended branch leading up to the clade. There are a few more branches that appear unresolved in this tree as a result of the high level of sequence conservation for the motor domain across taxa for the myo5 gene clades. (D) Phylogenetic tree for the neck and coiled coil domain (a 2,505 bp fragment) of myo5 using MrBayes 3.1 and a GTR+I+G model of evolution. The region of the myo5 gene used for this tree also includes a portion of the motor domain that includes the actin-binding domain but excludes the ATP-binding domain. Teleost myo5bb clade shown in blue with an extended branch leading up to the clade. More of the branches are resolved compared with previously presented trees as a result of the diversity of the gene sequence in the neck region of the myo5 gene family. Nodes without a posterior probability value are greater than 0.75 with most values being 1.
. 5.—dN/dS rates and percentage of codons that are invariant or under extreme purifying selection for all 5 myo5 genes in teleosts. We see smaller dN/dS rates for myo5aa compared with myo5ab and for myo5ba compared with myo5bb for all cases using smaller regions of the myo5 genes. The myo5aa gene has more invariant codons than its duplicate myo5ab. However, very similar percentages of invariant codons are observed for the motor and neck domain for the myo5ba and myo5bb duplicates. The myo5ba CBD has a much higher percentage of invariant codons compared with the paralogous myo5bb CBD, suggesting high conservation in the encoded protein as would be necessary to assure binding to the Rab11a cargo. The diversity seen in the myo5bb clade suggests that this duplicate has picked up a new function or ability to bind to other cargo.
Results from MEME (Mixed Effects Model of Evolution) and REL (Random Effects Likelihood)
| MEME (No. of Sites) | REL (# of Sites) | ||||
|---|---|---|---|---|---|
| No. of Sequences | Total Codons | REL + | REL − | ||
| Motor-myo5a | 25 | 217 | 3 | 0 | 180 |
| Motor-myo5aa | 9 | 217 | 2 | 0 | 217 |
| Motor-myo5ab | 9 | 217 | 0 | 2 | 183 |
| Motor-myo5b | 22 | 217 | 3 | 0 | 217 |
| Motor-myo5ba | 8 | 217 | 1 | 0 | 217 |
| Motor-myo5bb | 8 | 217 | 1 | 0 | 217 |
| Motor-myo5c | 8 | 217 | 1 | 0 | 217 |
| Neck-myo5a | 25 | 830 | 16 | 0 | 830 |
| Neck-myo5aa | 9 | 830 | 7 | 4 | 226 |
| Neck-myo5ab | 9 | 830 | 6 | 0 | 830 |
| Neck-myo5b | 25 | 830 | 26 | 1 | 408 |
| Neck-myo5ba | 9 | 831 | 20 | 0 | 242 |
| Neck-myo5bb | 8 | 830 | 17 | 0 | 377 |
| Neck-myo5c | 8 | 830 | 5 | 0 | 96 |
| CBD-myo5a | 24 | 343 | 3 | 0 | 343 |
| CBD-myo5aa | 8 | 343 | 1 | 2 | 132 |
| CBD-myo5ab | 8 | 343 | 1 | 1 | 103 |
| CBD-myo5b | 23 | 343 | 5 | 2 | 180 |
| CBD-myo5ba | 9 | 343 | 0 | 0 | 78 |
| CBD-myo5bb | 8 | 343 | 5 | 5 | 247 |
| CBD-myo5c | 14 | 343 | 1 | 0 | 69 |
Note.—Summary of results from MEME (Mixed Effects Model of Evolution) and REL (Random Effects Likelihood) hypothesis testing using HyPhy package from datamonkey.org. A large number of sites showing episodic diversifying selection in the neck region of the myo5 gene are identified. The functional domains are in the motor domain and in the cargo-binding domain (CBD). In the CBD we see a more episodic diversifying selection (MEME) in the myo5bb clade of teleosts versus the myo5ba clade of teleosts. We also see large variations between these two clades when comparing the REL results. The REL results show the number of sites (codons) experiencing positive (REL +) or negative/purifying (REL −) selection. Cells reporting results from 8 to 9 sequences are based solely on teleost sequences. The myo5c CBD clade consists of 8 teleost sequences and 6 nonteleost sequences. The clades with 22–25 sequences contain all the teleost sequences in that group (16–18 sequences) plus 6–8 nonteleost sequences.