| Literature DB >> 18522757 |
Nupur Oswal1, Narinder Singh Sahni, Alok Bhattacharya, Sneha Sudha Komath, Rohini Muthuswami.
Abstract
BACKGROUND: The first step of GPI anchor biosynthesis is catalyzed by PIG-A, an enzyme that transfers N-acetylglucosamine from UDP-N-acetylglucosamine to phosphatidylinositol. This protein is present in all eukaryotic organisms ranging from protozoa to higher mammals, as part of a larger complex of five to six 'accessory' proteins whose individual roles in the glycosyltransferase reaction are as yet unclear. The PIG-A gene has been shown to be an essential gene in various eukaryotes. In humans, mutations in the protein have been associated with paroxysomal noctural hemoglobuinuria. The corresponding PIG-A gene has also been recently identified in the genome of many archaeabacteria although genes of the accessory proteins have not been discovered in them. The present study explores the evolution of PIG-A and the phylogenetic relationship between this protein and other glycosyltransferases.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18522757 PMCID: PMC2446393 DOI: 10.1186/1471-2148-8-168
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
List of organisms and proteins surveyed.
| HS | PIG-A | HSPA | Eukarya | |
| RN | PIG-A | RNPA | Eukarya | |
| EH | PIG-A | EHPA | Eukarya | |
| CA | PIG-A | CAPA | Eukarya | |
| SC | PIG-A | SCPA | Eukarya | |
| SP | PIG-A | SPPA | Eukarya | |
| LM | PIG-A | LMPA | Eukarya | |
| GL | PIG-A | GLPA | Eukarya | |
| DM | PIG-A | DMPA | Eukarya | |
| PF | PIG-A | PFPA | Eukarya | |
| DD | PIG-A | DDPA | Eukarya | |
| TB | PIG-A | TBPA | Eukarya | |
| OS | PIG-A | OSPA | Eukarya | |
| AT | PIG-A | ATPA | Eukarya | |
| CE | PIG-A | CEPA | Eukarya | |
| PT | PIG-A | PTPA | Eukarya | |
| AP | PIG-A | APPA | Archaea | |
| TA | PIG-A | TAPA | Archaea | |
| MB | PIG-A | MBPA | Archaea | |
| MT | PIG-A | MTPA | Archaea | |
| BT | Glycosyltransferase | BTGT | Bacteria | |
| CB | Glycosyltransferase | CBGT | Bacteria | |
| AM | Glycosyltransferase | AMGT | Bacteria | |
| MS | Glycosyltransferase | MSGT | Bacteria | |
| AS | Glycosyltransferase | ASGT | Bacteria | |
| PA | Glycosyltransferase | PAGT | Bacteria | |
| PF | Glycosyltransferase | PFGT | Archaea | |
| DH | Glycosyltransferase | DHGT | Bacteria | |
| CT | Glycosyltransferase | CTGT | Bacteria | |
| MA | Glycosyltransferase | MAGT | Archaea | |
| MY | Glycosyltransferase | MYGT | Bacteria | |
| CP | Glycosyltransferase | CPGT | Eukaryote | |
| MT | LPS glycosyltransferase | MTLT | Archaea | |
| BT | LPS glycosyltransferase | BTLT | Bacteria | |
| BH | Glycosyltransferase | BHGT | Bacteria |
Proteins present in GPI-GnT complex.
| PIG-A | PIG-P | PIG-C | PIG-Q | PIG-H | DPM2 | PIG-Y | |
|---|---|---|---|---|---|---|---|
| HS | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| RN | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| EH | Yes | Yes | Yes | Yes | No | No | No |
| CA | Yes | Yes | Yes | Yes | Yes | No | Yes (Eri1) |
| SC | Yes | Yes | Yes | Yes | Yes | No | Yes (Eri1) |
| SP | Yes | Yes | Yes | Yes | Yes | Yes | Yes (Eri1) |
| LM | Yes | Yes | Yes | Yes | Yes | Yes | No |
| GL | Yes | No | No | No | No | No | No |
| DM | Yes | Yes | Yes | Yes | Yes | No | No |
| PF | Yes | Yes | Yes | Yes | Yes | No | No |
| DD | Yes | Yes | Yes | Yes | Yes | Yes | No |
| TB | Yes | Yes | Yes | Yes | Yes | Yes | No |
| OS | Yes | Yes | Yes | Yes | Yes | Yes | No |
| AT | Yes | Yes | Yes | Yes | Yes | Yes | No |
| CE | Yes | Yes | Yes | Yes | No | No | No |
| PT | Yes | ND | ND | ND | ND | ND | ND |
BLAST analysis identified only PIG-A from P. tetraurelia as this has gene has been cloned and studied. The genome of P. tetraurelia has not been sequenced and annotated. Therefore, we have not been able to study the presence of PIG-H, PIG-C, PIG-P, PIG-Q, PIG-Y, and DPM2 in this organism.
Figure 1Identification of conserved motifs in PIG-A protein from eukaryotes. Clustal W analysis using MAFFT identified twelve conserved motifs in PIG-A protein. Three of these motifs are absent in G. lamblia.
Figure 2Phylogenetic analysis of PIG-A protein from eukaryotes. Phylogenetic tree was constructed using phylodendron program. G. lamblia appears to have diverged away from other eukaroyotic PIG-A proteins.
Figure 3Phylogenetic analysis of PIG-A protein from archaeabacteria and eukaryotes. Phylogenetic tree was constructed using phylodendron program. The giardial protein appears to be closer to the archaeal proteins than to other eukaryotic PIG-A proteins.
Conserved motifs in PIG-A proteins from eukaryotes.
| MOTIF # | MOTIF SEQUENCE |
|---|---|
| CM1 | [STC]-D-F-F-[YFC]-P-X-X-G-G-[VI]-E-X-H-X-[YF] |
| CM2 | G-[HFNL]-[KRS]-[VI]-[VI]-[ITV]-X-T-[HRN]-[AQNFSGK]-[YN]-X-X-[RTC]-X-G-[VI] |
| CM3 | [GY]-[LIM]-[KT]-V-Y-[YH]-X-P |
| CM4 | [PLA]-X-X-[RS]-X-[ILV]-[FLVH]-[VIRLY]-[RE]-[EH]-X-[IVF]-X-[IV]-[ILV]-H-[SGAC]-H-[GQSA]-[STAN]-[FLATY]-S |
| CM5 | G-X-[KPQRS]-[TAV]-[VFCI]-[FLY]-T-[DE]-H-S-[LM] |
| CM6 | I-[CAS]-V-S-X-[TCEIV]-[STCGN]-[KRE]-[ED]-N-[TML]-[VCIRS]-[LVIM]-[RL] |
| CM7 | [PFK]-X-X-X-X-[VIMT]-[VI]-[PG]-N-[AI] |
| CM8 | [IV]-[VAI]-[VIF]-[VILMA]-X-R-[LM]-[VYFT]-[YPQF]-[RN]-K-G-X-D-L |
| CM9 | [FWVY]-[ILVY]-[VI]-[GAV]-G-[EDNS]-G-P-[KMR] |
| CM10 | [GC]-[HDQ]-I-[FYG]-[LIV]-[NHI]-X-S-[LY]-[TL]-E-[AG]-[FY]-[CGS]-X-[AVIS]-[IL]-[VIL]-E-[AS]-[AL]-[SQ]-[CE]-[GNA]-[LC] |
| CM11 | [STA]-[TS]-X-V-G-G-[IVT]-[PDSK]-[ES]-V-[LY]-[PK] |
| CM12 | Y-[STDN]-[WP]-X-X-[VI]-[AS]-X-[RK]-[TV]-[EVYQ]-X-[VIS]-[YH] |
Figure 4Identification of conserved motifs in PIG-A proteins from archaeabacteria and eukaryotes. Clustal W analysis of PIG-A protein from archaeabacteria and eukaryotes using MAFFT led to the identification of conserved motifs in these proteins.
Conserved motifs in PIG-A proteins after aligning the eukarya and archaea sequences.
| MOTIF # | MOTIF SEQUENCE |
|---|---|
| CM1ar | D-[FTW]-[FHY]-[YFCP]-[PS]-X-X-[GD]-G-[VI] |
| CM2ar | G-[HNLFY]-X-[VI]-[VISMH]-[ITV]-[VIMF]-[TS]-[HRVN]-X-[YNLG] |
| CM3ar | [VIK]-[YVI]-X-X-[PK] |
| CM4ar | [RESFD]-[EHLNG]-[VIPYF]-X-[IV]-[IV]-[HN]-X-H |
| CM5ar | [AGS]-[KRNGS]-X-[MVLI]-G-X-[KPQRS]-X-X-X-T-[DENF]-H-[ST]-[LMID]-[FAYV] |
| CM6ar | [IL]-[CASF]-[VL]-[SY]-X-X-[KREA]-[EDMK]-[NKVD]-[TMLS]-X-X-[RGAM] |
| CM7ar | [NDE] |
| CM8ar | [VAIL]-X-X-X-R-[LMI]-[VYFT]-X-[RNDK]-K-G-X-[DHYQ]-[LVNR] |
| CM9ar | [IVM]-[GAIV]-G-X-G-[PE] |
| CM10ar | [IVL]-[FYGT]-X-X-X-S-[LYIS]-X-[ED]-[ASGT]-[FY]-[CGS]-X-X-[ILAV]-[VILF]-E-[AS]-[ALMI]-[SQA]-[CESK]-[GNAE] |
| CM11ar | [VIM]-[STAV]-[TSM]-X-[VQDNH]-[GFS]-[GP]-[IVTL]-X-[EDS]-[VNI] |
| CM12ar | Y-X-[WPL]-X-X-[VIH]-X-X-X-X-X-X-[VIS]-[YH] |
Figure 5Phylogenetic analysis of PIG-A and glycosyltransferase proteins from prokaryotes, archaeabacteria, and eukaryotes. Phylogenetic analysis using phylodendron program confirms that the PIG-A protein from eukaryotes are evolutionarily a separate branch of proteins.
Figure 6Identification of conserved motifs in PIG-A and glycosyltransferases proteins from prokarotes, archaeabacteria, and eukaryotes. Clustal W analysis using MAFFT was done to identify conserved motifs in PIG-A and glycosyltransferase proteins from prokaryotes, archaeabacteria, and eukaryotes.
Conserved motifs in PIG-A proteins after aligning the eukarya and archaea PIG-A sequences with bacterial and archeal glycosyltransferases.
| MOTIF # | MOTIF SEQUENCE |
|---|---|
| CM4gt | [HN]-X-[HQ] |
| CM5gt | [TH]-X-H |
| CM8gt | K-[GS] |
| CM9gt | G-X-[GE] |
| CM10gt | [FYGTAL]-X-X-X-S-X-X-[ED]-X-[FLY]-[CSGP]-X-X-X-X-E-[AS] |
| CM11gt | [GFSES]-[GP] |
Sequences of motifs used for PLSR analysis.
| Motif | Sequence |
|---|---|
| CM1 | [STC]-D-F-F-[YFC]-P-X-X-G-G-[VI]-E-X-H-X-[YF] |
| CM1a | D-[FTW]-[FHY]-[YFCP]-[PS]-X-X-[GD]-G-[VI] |
| CM1b | [STC]-D-F-F-[YFC]-P-X-X-G-G-[VI] |
| CM1c | G-G-[VI]-E-X-H-X-[YF] |
| CM1d | D-[FTW]-[FHY]-[YFCP]-[PS]-X-X-[GD]-G-[VI]-[EQS]-X-[HYS] |
| CM2 | G-[HFNL]-[KRS]-[VI]-[VI]-[ITV]-X-T-[HRN]-[AQNFSGK]-Y-X-X-[RTC]-X-G-[VI] |
| CM2a | G-[HNLFY]-X-[VI]-[VISMH]-[ITV]-[VIMF]-[TS]-[HRVN]-X-Y |
| CM2b | G-[HNLFY]-X-[VI]-[VISMH]-[ITV]-[VIMF]-[TS]-[HRVN]-X-[YNLG] |
| CM2c | G-[HFNLY]-[KRS]-[VI]-[VISMH]-[ITV]-[VIMF]-[TS]-[HRVN]-[AQNFSGK]-[YNLG] |
| CM2d | G-[HFNL]-[KRS]-[VI]-[VI]-[ITV]-X-T-[HRN]-[AQNFSGK]-Y |
| CM3 | [GY]-[LIM]-[KT]-V-Y-[YH]-X-P |
| CM4 | [PLA]-X-X-[RS]-X-[ILV]-[FLVH]-[VIRLY]-[RE]-[EH]-X-[IVF]-X-[IV]-[ILV]-[GAC] |
| CM4b | [RE] – [EH] – [RQNSKE] – [VIF] – X – [IV] – [ILV] – H – [SAGC] – H |
| CM4c | [RE]-[EH]-[RQNSKE]-[VIF]-X-[IV]-[ILV]-H-[SAGC]-H-X-X-X-S |
| CM4d | [FLVH]-X-[RE]-[EH]-[RQNSKE]-[VIF]-X-[IV]-[ILV]-H-[SAGC]-H-X-X-X-S |
| CM4e | [ILV]-[FLVH]-X-[RE]-[EH]-[RQNSKE]-[VIF]-X-[IV]-[ILV]-H-[SAGC]-H-X-X-X-S |
| CM5 | G-X-[KPQRS]-[TAV]-[VFCI]-[FLY]-T-[DE]-H-S-[LM] |
| CM5a | G-X-[KPQRS]-[TAV]-[VFCI]-[FLY]-T-[DE]-H-S-[LM]-[FYA]-[GRS]-[FLG] |
| CM5b | G-[LIYFV]-[QRKPS]-X-X-[FLYA]-T-[DENF]-H-[ST]-[LMID] |
| CM5c | G-[LIYFV]-[QRPSK]-[TVARPS]-[VFCI]-[FLYAV]-T-[DENF]-H-[ST]-[LIMD] |
| CM5d | G-X-[QRPSK]-[TVARPS]-[VFCI]-[FLYAV]-T-[DENF]-H-[ST]-[LIMD] |
| CM5e | G-X-[QRPSK]-X-[VFCI]-[FLYAV]-T-[DENF]-H-[ST]-[LIMD] |
| CM6 | I-[CAS]-V-S-X-[TCEIV]-[STCGN]-[KRE]-[ED]-N-[TML]-[VCIRS]-[LVIM]-[RL] |
| CM6e | [TCEIV]-[STCGN]-[KRE]-[ED]-N-[TML]-[VCIRS]-[LVIM]-[RL] |
| CM6f | V-S-X-[TCEIV]-[STCGN]-[KRE]-[ED]-N-[TML]-[VCIRS]-[LVIM]-[RL] |
| CM6g | [CAFS]-V-S-X-[TCEIV]-[STCGN]-[KRE]-[ED]-N-[TML]-[VCIRS] |
| CM8 | [IV]-[VAI]-[VIF]-[VILMA]-X-R-[LM]-[VYFT]-[YPQF]-[RN]-K-G-X-D-L |
| CM8a | [IV]-[VAI]-[VIF]-X-X-R-[LM]-X-[YPQF]-[RN]-K-G-X-D-L |
| CM8b | [IV]-[VAI]-[VIF]-X-X-R-[LM]-X-X-[RN]-K-G-X-D-L |
| CM8e | [VILMA]-X-R-[LM]-[VYFT]-[YPQF]-[RN]-K-G-X-D-L |
| CM8f | X-X-R-[LM]-X-[YPQF]-[RN]-K-G-X-D-L |
| CM8g | R-[LM]-X-[YPQF]-[RN]-K-G-X-D-L |
| CM9 | [FWVY]-[ILVY]-[VI]-[GAV]-G-[EDNS]-G-P-[KMR] |
| CM10 | [GC]-[HDQ]-I-[FYG]-[LV]-[NHI]-X-S-[LY]-T-E-[AG]-[FY]-[CGS]-X-[AVIS]-[IL]-[VI]-E-[AS]-[AL]-[SQ]-[CE]-[GNA]-[LC] |
| CM10a | S-[LY]-T-E-[AG]-[FY]-[CGS]-X-[AVIS]-[IL]-[VI]-E-[AS]-[AL]-[SQ]-[CE]-[GNA]-[LC] |
| CM10b | [GC]-[HDQ]-I-X-[LV]-[NHI]-X-S-[LY]-T-E-[AG]-[FY]-[CGS]-X-X-[IL]-[VI]-E-[AS]-[AL]-[SQ]-[CE]-[GNA]-[LC] |
| CM10c | [GCA]-X-[IVL]-[FYGT]-[LVIA]-X-X-S-[LYS]-[TLAND]-E-[AGST]-[FY]-[CGS]-X-X-[ILVA]-[VIFL]-E-[AS] |
| CM10d | [GCA]-X-[IVL]-[FYGT]-[LVIA]-X-X-S-[LYS]-X-E-[AGST]-[FY]-[CGS]-X-X-[ILVA]-[VIFL]-E-[AS]-[ALMI]-[SQA] |
| CM11 | [STA]-[TS]-X-V-G-G-[IVT]-[PDSK]-[ES]-V-[LY]-[PK] |
| CM11a | [STA]-[TS]-X-V-G-G-[IVT]-X-[ES]-V-[LY]-[PK] |
| CM11b | V-G-G-[IVT]-X-[ES]-V-[LY]-[PK] |
| CM12 | Y-[STDN]-[WP]-X-X-[VI]-[AS]-X-[RK]-[TV]-[EVYQ]-X-[VIS]-[YH] |
| CM12c | Y-X-[WP]-X-X-[VI]-[AS]-X-[RK]-[TV]-X-X-[VIS]-[YH] |
The table shows the confusion matrix for the results obtained using PLSR.
| Known Class | ||||
| PIG-A | GT4 | |||
| PIG-A | 31/43 | 12/43 | ||
| Predicted Class | ||||
| GT4 | - | 10/10 |
The thirteen variables selected as significant in the first six DCV segments.
| 1. | CM1, CM1a, CM1b |
| 2. | CM2c |
| 3. | CM4e, CM5e |
| 4. | CM10a, CM3 |
| 5. | CM10, CM10b, CM1 |
| 6. | CM10b, CM1b |