| Literature DB >> 25307962 |
Fumiichiro Yamamoto1, Emili Cid1, Miyako Yamamoto1, Naruya Saitou2, Jaume Bertranpetit3, Antoine Blancher4.
Abstract
The ABO system is one of the most important blood group systems in transfusion/transplantation medicine. However, the evolutionary significance of the ABO gene and its polymorphism remained unknown. We took an integrative approach to gain insights into the significance of the evolutionary process of ABO genes, including those related not only phylogenetically but also functionally. We experimentally created a code table correlating amino acid sequence motifs of the ABO gene-encoded glycosyltransferases with GalNAc (A)/galactose (B) specificity, and assigned A/B specificity to individual ABO genes from various species thus going beyond the simple sequence comparison. Together with genome information and phylogenetic analyses, this assignment revealed early appearance of A and B gene sequences in evolution and potentially non-allelic presence of both gene sequences in some animal species. We argue: Evolution may have suppressed the establishment of two independent, functional A and B genes in most vertebrates and promoted A/B conversion through amino acid substitutions and/or recombination; A/B allelism should have existed in common ancestors of primates; and bacterial ABO genes evolved through horizontal and vertical gene transmission into 2 separate groups encoding glycosyltransferases with distinct sugar specificities.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25307962 PMCID: PMC5377540 DOI: 10.1038/srep06601
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Species-dependent distribution of FUT1/FUT2/SEC1 α1,2-fucosyltransferase genes and ABO/GBGT1/A3GALT2/GGTA1/GLT6D1 α1,3-Gal(NAc) transferase genes.
This table shows the distribution of α1,2-FT genes and α1,3-Gal(NAc)T genes in a variety of organisms. Ensembl gene identifiers are listed only with the meaningful digits, excluding 0 s on the left from their IDs. Genes were categorized into groups based on Ensembl gene trees, chromosomal locations, and our own analyses, and they are aligned in different columns and shown highlighted in different colors. Amino acid sequences corresponding to the codons 266–268 of human A/B transferases are also shown. The symbol “---” indicates the absence of sequence motif, and “N/A” means not annotated in databases. A single column of “Pseudo/Ancient” was used to list two types of annotated gene sequences: The ABO retropseudogene sequences that were originally derived from an intronless cDNA are highlighted in tan color (Pseudo) and the sequences that formed a cluster next to the ABO gene in the phylogenetic analysis are highlighted in yellow (Ancient). The gene sequences that formed a cluster outside of the ABO/GBGT1 genes are highlighted in orange, and they are shown separately in the “ABO/GBGT1 Ancient” column. The annotated genes may or may not be functional, the latter of which may also be called as O genes or pseudogenes. Note that genome sequences were not complete for many species, and therefore, errors may exist. In addition, there are numerous homologous sequences that have yet to be annotated and mapped on chromosomes. Furthermore, polymorphism may also exist.
Figure 2Evolution of α1,3-Gal(NAc) transferase family of genes.
The MEGA5 software was used to analyze 104 amino acid sequences potentially encoding intact ABO proteins. The amino acid sequences corresponding to codons 69–354 of the human A transferase were examined. 1,000 bootstrap replications were computed. Branches leading to ABO, GBGT1, A3GALT2, GGTA1, and GLT6D1 genes are colored in yellow, grey, green, purple, and blue, respectively. The bootstrap frequencies are shown on the branching points. Fishes, amphibians, reptiles, and birds are marked with closed circles in red, purple, green, and dark blue whereas mammals are unmarked. The species code names correspond to the names shown in the “Ensembl Database” column in Fig. 1. For instance, PTR for chimpanzee (Pan troglodytes) is obtainable by removing ENS and G from the database name (ENSPTRG).
Consensus organization of genes surrounding α1,3-Gal(NAc) transferase and α1,2-fucosyltransferase genes
| α1,3-Gal(NAc) transferase genes |
|---|
| REXO4->, <-C9ORF96, SURF4->, <-SURF2, SURF1->, <-RPL7A, MED22->, SURF6->, |
| <-ZSCAN20, PHC2->, |
| <-FAM83E, EMP3->, <- |
| GUCA1B->, MAPK8IP1->, |
| TTLL11->, <-DAB2IP, |
| OBP2A->, PAEP->, <- |
Chromosomal regions containing α1,3-Gal(NAc)T and α1,2-FT genes have remained stable in many species with the consensus organization shown. The arrows indicate the direction of transcription.
Specificity and activity of human A transferase expression constructs containing various amino acids at codons 263–268
| (I). G at codon 268 | (II). A at codon 268 | (III). Additional | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Codons | A | B | A/B | Codons | A | B | A/B | Codons | A | B | A/B |
| (266–268) | Activity | Activity | Specificity | (266–268) | Activity | Activity | Specificity | (266–268) | Activity | Activity | Specificity |
| +++++ | − | A | +++++ | ++ | AB | +++++ | − | A | |||
| +++++ | − | A | CGA | ++++ | +++ | AB | − | − | − | ||
| DGG | ++ | ++ | AB | DGA | − | +++ | B | ++++ | +++ | AB | |
| EGG | ++++ | − | A | EGA | − | ++++ | B | − | +++++ | B | |
| FGG | − | ++++ | B | FGA | − | ++++ | B | − | +++ | B | |
| GGG | ++++ | − | A | ++++ | +++ | AB | − | +++++ | B | ||
| HGG | − | ++++ | B | HGA | − | ++++ | B | − | +++++ | B | |
| IGG | ++++ | ++++ | AB | − | ++++ | B | SSE | − | − | − | |
| KGG | − | − | − | KGA | − | +++ | B | TAS | − | − | − |
| +++++ | − | A | +++++ | + | AB | − | − | − | |||
| ++++ | ++++ | AB | − | ++++ | B | ++++ | − | A | |||
| NGG | +++++ | + | AB | NGA | ++++ | ++ | AB | − | − | − | |
| PGG | +++++ | − | A | PGA | ++++ | − | A | TSE | − | − | − |
| ++++ | +++ | AB | QGA | − | +++++ | B | |||||
| RGG | − | − | − | RGA | − | − | − | (263–268) | |||
| ++++ | − | A | ++++ | +++ | AB | − | − | − | |||
| TGG | +++++ | − | A | ++++ | +++ | AB | FYFTSE | − | − | − | |
| VGG | ++++ | − | A | VGA | ++++ | +++ | AB | HYYMGG | ++++ | ++++ | AB |
| WGG | ++ | + | AB | WGA | − | ++++ | B | YYYAGG | +++++ | − | A |
| YGG | − | ++++ | B | YGA | − | ++ | B | YYYMGG | +++++ | +++ | AB |
| YYYTGS | +++++ | − | A | ||||||||
| YYYTSE | − | − | − | ||||||||
| YYYTSG | +++++ | − | A | ||||||||
The left 2 sets show the results of a library of human A transferase expression constructs containing any of 20 potential amino acid residues at codon 266 with glycine of A transferase or alanine of B transferase at codon 268. The right set shows the results of additional constructs that were not included in the library. The results of immunostaining with anti-A or anti-B antibodies were adjusted by transfection efficiency using co-transfected GFP-positive cell percentages. The activity is shown in a semi-quantitative manner on a 4-fold exponential scale with 5+ highest and − none. The letter size in A/B Specificity reflects the activity strength whereas “−” indicates no activity. The constructs shown in bold type are mentioned in the text.
Genes adjacent to ABO genes
| Species | Gene order* |
|---|---|
| Human ( | 1->, <-2, <-3, 4->, <-6, 7->, 8->, |
| Chimpanzee ( | 1->, <-2, <-3, <-16, 4->, 7->, 8->, |
| Gorilla ( | 1->, <-2, <-3, 4->, <-6, 7->, 8->, |
| Orangutan ( | 1->, <-2, <-3, 5->, 4->, <-6, 7->, 8->, <-18, 9->, 10->, |
| Rhesus macaque ( | 1->, <-2, 5->, <-3, 4->, <-6, 7->, 8->, 21->, |
| Marmoset ( | 1->, <-2, 5->, <-3, 4->, <- |
| Bushbaby ( | 1->, <-2, <-3, 4->, <-6, 7->, 8->, |
| Mouse ( | 1->, <-27, 5->, <-3, 4->, <-6, 7->, 8->, |
| Rat ( | 1->, <-32, 5->, <-3, 4->, <-33, 7->, 8->, |
| Rabbit ( | |
| Dog ( | 5->, <-3, <-6, 7->, 8->, |
| Ferret ( | 1->, <-2, 5->, 4->, 7->, 8->, <- |
| Horse ( | 1->, <-2, <-3, 4->, <-6, 7->, 8->, |
| Cow ( | 1->, <-2, <-3, 4->, <-6, 7->, 8->, |
| Microbat ( | |
| Elephant ( | 1->, <-2, 4->, <-6, 7->, 8->, |
| Opossum ( | <-3, <-6, 4->, 7->, 8->, <- |
| Platypus ( | |
| Flycatcher ( | 1->, <-2, 5->, <-3, 4->, <-6, 7->, 8->, |
| Zebra finch ( | 1->, <-2, <-2, 5->, <-3, 4->, <-6, 7->, 8->, 8->, |
| Turkey ( | 1->, <-2, 5->, <-3, 4->, <-6, 7->, 8->, 86->, |
| Duck ( | 1->, <-2, 5->, <-3, 4->, <-6, 7->, 8->, 86->, |
| Chicken ( | 1->, <-2, 5->, <-3, 4->, <-6, 7->, 8->, 86->, |
| Softshell turtle ( | 99->, <-100, <-101, 1->, <-2, <- |
| Xenopus frog ( | <-6, 7->, <-105, 8->, |
When there is long gap, double slash (//) is given. Three key amino acid sequences are shown in parentheses for ABO, GBGT1, and GLT6D1 genes. The genes in the inserted chromosomal region that is specifically present in the rat genome and is absent in the mouse genome are shown in bold type. Other genes are abbreviated as follows.
Figure 3(a): A phylogenetic tree of ABO proteins/peptides from species possessing multiple copies of ABO gene. Phylogenetic analyses were performed with protein/peptide sequences from species that contain more than one ABO genes in their genomes. Processed intronless retropseudogenes were excluded from analysis. The amino acid sequences were analyzed in its entirety. Potentially functional proteins from full genes with the initiation and termination codons and peptides from partial genes without them are marked with circles and triangles, respectively. The symbol's color indicates potential sugar specificity (GalNAc, galactose, GalNAc/galactose, none, and unknown for red, green, yellow, blue, and black, respectively). Amino acid sequences corresponding to the codons 266–268 of human A/B transferases are also shown in parentheses. Genes in the same species are bracketed. When potential A and B gene sequences are both present in a single species, the bracket was colored in red. Horse genes and ferret genes in 2 separate clusters are bracketed in blue and purple, respectively. Other species are bracketed in dark blue. (b): A phylogenetic tree of originally intronless ABO retropseudogene products. The entire protein sequences of processed retropseudogenes were analyzed. Branches leading to different amino acid sequences at the important positions are coded in different colors. (c): ABO gene evolution in bacteria. EMBL-EBI InterPro database listed 57 bacterial proteins within the GT6 family. 56 proteins/peptides, excluding 1 short one, were aligned to construct a phylogenetic tree. A gene from Helicobacter mustelae and B gene from Escherichia coli O86 strain were included in the study, and their results are shown in bold type. The B gene-encoded protein (E1I6K1) consists of 234 amino acids, and the bacterial protein sequences corresponding to codons 2–219 of this protein were analyzed. The amino acid sequence motifs corresponding to the codons 266–268 of human A/B transferases are also shown in parentheses. In E1I6K1 these correspond to codons 145–147. The symbols' color indicates sugar specificity of transferases: red, green, and yellow for GalNAc, galactose, and both, respectively, assuming that they are functional.