| Literature DB >> 22582156 |
Abstract
Glycosylphosphatidylinositol-anchored high-density lipoprotein-binding protein 1 (GPIHBP1) functions as a platform and transport agent for lipoprotein lipase (LPL) which functions in the hydrolysis of chylomicrons, principally in heart, skeletal muscle and adipose tissue capillary endothelial cells. Previous reports of genetic deficiency for this protein have described severe chylomicronemia. Comparative GPIHBP1 amino acid sequences and structures and GPIHBP1 gene locations were examined using data from several mammalian genome projects. Mammalian GPIHBP1 genes usually contain four coding exons on the positive strand. Mammalian GPIHBP1 sequences shared 41-96% identities as compared with 9-32% sequence identities with other LY6-domain-containing human proteins (LY6-like). The human N-glycosylation site was predominantly conserved among other mammalian GPIHBP1 proteins except cow, dog and pig. Sequence alignments, key amino acid residues and conserved predicted secondary structures were also examined, including the N-terminal signal peptide, the acidic amino acid sequence region which binds LPL, the glycosylphosphatidylinositol linkage group, the Ly6 domain and the C-terminal α-helix. Comparative and phylogenetic studies of mammalian GPIHBP1 suggested that it originated in eutherian mammals from a gene duplication event of an ancestral LY6-like gene and subsequent integration of exon 2, which may have been derived from BCL11A (B-cell CLL/lymphoma 11A gene) encoding an extended acidic amino acid sequence. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s13205-011-0026-4) contains supplementary material, which is available to authorized users.Entities:
Year: 2011 PMID: 22582156 PMCID: PMC3339605 DOI: 10.1007/s13205-011-0026-4
Source DB: PubMed Journal: 3 Biotech ISSN: 2190-5738 Impact factor: 2.406
Mammalian GPIHBP1 and human LY6-like genes and proteins
| Species | RefSeq ID Ensembla | GenBank ID | UNIPROT ID | Amino acids | Chromosome location | Coding exons | Gene size bps | Subunit MW | Signal peptide (cleavage site) | Gene expression levelf | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human |
| NM_178172 | BC035810 | Q8IV16 | 184 | 8:144,295,143-144,297,390 | 4 (+ve) | 3,976 | 19,806 | 1-20 [RG-QT] | 0.4 |
| Chimpanzee |
| XP_001151889a | –b | –b | 166 | 8:143,181,557-143,183,786 | 4 (+ve) | 2,230c | 17,540 | 1-20 [RG-QT] | na |
| Orangutan |
| XP_002819549a | –b | –b | 184 | 8:151,582,751-151,585,275 | 5 (+ve) | 2,525c | 19,778 | 1-20 [RG-QT] | na |
| Rhesus |
| XP_001085384a | –b | –b | 184 | 8:145,833,092-145,835,237 | 4 (+ve) | 2,146c | 19,768 | 1-22 [QA-QQ] | na |
| Marmoset |
| XP_002759233a | –b | –b | 182 | 16:51,444,208-51,447,357 | 4 (−ve) | 3,150c | 19,993 | 1-22 [QA-EP] | na |
| Mouse |
| NM_026730 | BC061225 | Q9D1N2 | 225 | 15:75,427,109-75,428,551 | 4 (+ve) | 1,556 | 24,566 | 1-22 [WA-QE] | 0.7 |
| Rat |
| NM_001130547 | –b | –b | 236 | 7:113,538,462-113,540,137 | 4 (+ve) | 1,676c | 25,562 | 1-22 [WA-QE] | 0.1 |
| Guinea pig |
| ENSCPOT2066d | –b | –b | 167 | sc95:2379881-2381261e | 4 (+ve) | 1,381c | 18,240 | 1-22 [QA-QE] | na |
| Horse |
| XP_001496557a | –b | –b | 176 | 9:81,888,709-81,890,489 | 4 (+ve) | 1,781c | 19,003 | 1-20 [SG-QV] | na |
| Cow |
| XP_590408a | –b | –b | 171 | 14:1,462,446-1,464,219 | 4 (+ve) | 1,774c | 17,990 | 1-22 [RA-QE] | na |
| Dog |
| XP_851590 | –b | –b | 180 | 13:136,185,964-136,187,786 | 4 (+ve) | 1,482c | 18,383 | 1-20 [RA-QD] | na |
| Pig |
| –b | CF361073d | –b | 180 | 4:136,185,964-136,187,786 | 4 (−ve) | 1,823c | 19,274 | 1-22 [RA-QE] | na |
| |
| NM_005672 | BC048808 | O46653 | 123 | 8:143,748,728-143,761,153 | 3 (+ve) | 2,268 | 12,912 | 1--20 [TA-LL] | 1.2 |
| |
| NM_017527 | BC117142 | Q17RY6 | 165 | 8:143,781,946-143,784,786 | 3 (+ve) | 4,054 | 18,673 | 1-17 [WT-DA] | 0.9 |
| |
| NM_020427 | BC105135 | P55000 | 103 | 8:143,822,564-143,823,803 | 3 (−ve) | 1,467 | 11,186 | 1-22 [EA-LK] | 0.1 |
| |
| NM_205545 | BC119019 | Q6UXB3 | 125 | 8:143,831,704-143,833,869 | 3 (−ve) | 2,234 | 13,115 | 1-22 [PA-LR] | 0.1 |
| |
| NM_177476 | BC032036 | Q9BZG9 | 116 | 8:143,856,588-143,857,375 | 3 (−ve) | 5,823 | 12,641 | 1-20 [QA-LD] | 1.8 |
| |
| NM_003695 | BC031330 | B2R5F1 | 128 | 8:143,865,011-143,863,294 | 3 (−ve) | 1,711 | 13,286 | 1-20 [LT-LR] | 0.6 |
| |
| NM_002066 | BC126336 | Q99445 | 158 | 8:143,916,217-143,928,261 | 3 (+ve) | 6,250 | 17,730 | 1-17 [AA-SA] | <0.1 |
| |
| NM_001127213 | BC119708 | Q16553 | 131 | 8:144,102,357-144,103,203 | 3 (+ve) | 3,926 | 13,507 | 1-20 [SS-LM] | 4.3 |
| |
| NM_002347 | BC030192 | B2RAD2 | 140 | 8:144,239,670-144,241,065 | 3 (−ve) | 2,126 | 14,669 | 1-25 [HG-LW] | 0.7 |
| |
| NM_178172 | BC035810 | Q8IV16 | 184 | 8:144,295,143-144,297,390 | 4 (+ve) | 3,976 | 19,806 | 1-20 [RG-QT] | 0.4 |
GenBank IDs are derived from NCBI http://www.ncbi.nlm.nih.gov/genbank/, Ensembl ID was derived from Ensembl genome database http://www.ensembl.org, UNIPROT refers to UniprotKB/Swiss-Prot IDs for individual proteins (see http://kr.expasy.org), bps refers to base pairs of nucleotide sequences; the number of coding exons are listed, the predicted signal N-peptide cleavage site is listed
RefSeq The reference amino acid sequence
a,cPredicted Ensembl amino acid sequence
bNot available
dRefers to an expressed sequence tag (EST) sequence encoding pig GPIHBP1
eGuinea pig scaffold
fFrom AceView http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/
Mouse, cow, opossum and zebrafish LY6-like genes and proteins
| Species | RefSeq ID Ensembla | GenBank ID | UNIPROT ID | Amino acids | Chromosome location | Coding exons (strand) | Gene size bps | |
|---|---|---|---|---|---|---|---|---|
|
|
| NP_082492a | BC110462 | Q9D7U0 | 123 | 15:74,545,285-74,547,024 | 3 (+ve) | 2,231 |
|
| NM_020519 | BC125244 | Q9Z0K7 | 110 | 15:74,558,464-74,554,039 | 3 (−ve) | 1,383 | |
|
| NM_026671 | BC132407 | Q9DD23 | 127 | 15:74,564,759-74,562,671 | 3 (−ve) | 1,951 | |
|
| NP_035968a | BC037541 | Q9WVC2 | 116 | 15:74,583,477-74,578,272 | 3 (−ve) | 673 | |
|
| NP_034872a | BC022806 | Q14210 | 127 | 15:74,592,789-74,593,990 | 3 (−ve) | 1,202 | |
|
| EDL29447a | BC055822 | na | 119 | 15:74,602,554-74,609,268 | 3 (−ve) | 6,715 | |
|
| NM_029627 | BC049723 | Q9CWP4 | 154 | 15:74,630,417-74,627,298 | 3 (−ve) | 2,510 | |
|
| NM_025929 | BC116397 | Q9CQ11 | 111 | 15:74,710,281-74,712,095 | 3 (−ve) | 1,815 | |
|
| NM_008529 | BC002116 | Q99JA5 | 136 | 15:74,785,480-74,790,336 | 3 (+ve) | 903 | |
|
| NM_020498 | BC125390 | Q9WU67 | 134 | 15:74,810,347-74,813,489 | 3 (−ve) | 3,143 | |
|
| NM_010738 | BC002070 | P05533 | 134 | 15:74,825,695-74,828,034 | 3 (−ve) | 2,340 | |
|
| NM_010741 | BC010760 | Q91XG0 | 131 | 15:74,875,445-74,879,260 | 3 (−ve) | 3,107 | |
|
| NM_001099217 | BC092082 | P09568 | 131 | 15:74,938,976-74,942,097 | 3 (−ve) | 3,122 | |
|
| NM_008530 | BC152856 | P35460 | 134 | 15:75,099,160-75,102,277 | 3 (+ve) | 3,118 | |
|
| NM_011837 | BC028758 | Q8K356 | 139 | 15:75,397,918-75,381,698 | 3 (−ve) | 1,090 | |
|
|
| XP_002692640a | na | na | 126 | 14:1,122,649-1,123,776 | 4 (−ve) | 1,128 |
|
| XP_001256661a | na | na | 128 | 14:1,127,319-1,129,201 | 3 (−ve) | 1,883 | |
|
| NP_001039686a | na | na | 116 | 14:1,141,889-1,142,634 | 3 (−ve) | 746 | |
|
| NP_001069985a | na | na | 116 | 14:1,155,662-1,156,820 | 3 (−ve) | 1,159 | |
|
| DAA72886a | na | na | 154 | 14:1,219,764-1,228,363 | 4 (−ve) | 8,600 | |
|
| NP_001070493a | na | na | 154 | 14:1,258,190-1,265,326 | 3 (−ve) | 7,137 | |
|
| NP_001039535a | na | na | 130 | 14:1,387,941-1,388,689 | 3 (+ve) | 749 | |
|
| NP_001073104a | na | na | 140 | 14:1,449,309-1,450,750 | 3 (−ve) | 1,442 | |
|
|
| XP_001381780a | na | na | 132 | 3:428,776,701-428,782,624 | 3 (−ve) | 5,924 |
|
| XP_001381786a | na | na | 237 | 3:428,806,270-428,813,247 | 4 (−ve) | 6,978 | |
|
| XP_001381791a | na | na | 162 | 3:428,858,073-428,880,937 | 3 (−ve) | 22,865 | |
|
| XP_001381798a | na | na | 120 | 3:428,958,198-428,965,110 | 3 (−ve) | 6,913 | |
|
| XP_001381801a | na | na | 117 | 3:428,986,162-428,995,118 | 3 (−ve) | 8,957 | |
|
| XP_001373482a | na | na | 126 | 3:439,197,554-439,200,063 | 3 (+ve) | 2,510 | |
|
| XP_001373600a | na | na | 141 | 3:439,414,602-439,420,686 | 3 (+ve) | 6,085 | |
|
|
| NM_001004670 | BC081426 | Q66IA6 | 174 | 9:24,104,899-24,151,695 | 4 (−ve) | 46,797 |
Ensembl ID was derived from Ensembl genome database http://www.ensembl.org, UNIPROT refers to UniprotKB/Swiss-Prot IDs for individual proteins (see http://kr.expasy.org), bps refers to base pairs of nucleotide sequences, the number of coding exons are listed
RefSeq The reference amino acid sequence
aPredicted Ensembl amino acid sequence
Vertebrate BCL11A genes and proteins
| Mammalian BCL11A Gene | Species | RefSeq ID Ensembla | GenBank ID | UNIPROT ID | Amino acids | Chromosome location | Coding exons (strand) | Gene size bps |
|---|---|---|---|---|---|---|---|---|
| Human |
| NM_018014 | BC021098 | Q9H165 | 773 | 2:60,678,303-60,780,633 | 5 (−ve) | 102,331 |
| Orangutan |
| XP_002812058a | na | na | 808 | 2:50,366,387-50,465,154 | 6 (+ve) | 98,768 |
| Marmoset |
| XP_002757779a | na | na | 808 | 14:46,690,157-46,792,384 | 6 (+ve) | 102,228 |
| Mouse |
| NM_016707 | BC010585 | Q9QYE3 | 773 | 11:23,978,391-24,072,787 | 5 (+ve) | 94,397 |
| Pig |
| XP_003125157a | AK231444 | na | 773 | 3:74,933,998-75,031,771 | 5 (+ve) | 97,774 |
| Rabbit |
| XP_002709742a | na | na | 821 | 2:125,646,621-125,730,521 | 4 (+ve) | 83,901 |
| Dog |
| XP_865536a | na | na | 773 | 10:63,737,516-63,836,852 | 5 (−ve) | 99,337 |
| Chicken |
| NM_001031031 | AJ551441 | Q5F459 | 796 | 3:1,829,458-1,877,784 | 3 (−ve) | 48,237 |
| Lizard |
| XP_003216184a | na | na | 796 | 276:252,030-507,710b | 3 (+ve) | 255,681 |
| Zebrafish |
| NP_001035481a | na | A2BE84 | 829 | 13:26,077,202-26,148,770 | 3 (+ve) | 71,569 |
BCL11A refers to the gene encoding vertebrate B-cell CLL/lymphoma 11A sequences
Ensemble ID was derived from Ensembl genome database http://www.ensembl.org; UNIPROT refers to UniprotKB/Swiss-Prot IDs for individual proteins (see http://kr.expasy.org); bps refers to base pairs of nucleotide sequences; the number of coding exons are listed
RefSeq The reference amino acid sequence
aPredicted Ensembl amino acid sequence
bRefers to scaffold ID
Fig. 1Amino acid sequence alignments for mammalian GPIHBP1 and human LY6-like sequences. See Table 1 for sources of glycosylphosphatidylinositol-anchored high-density lipoprotein-binding protein 1 (GPIHBP1) and human LY6-like sequences: GPIHBP1—Hu human, Or orangutan, Rh rhesus, Ma marmoset, Ho horse, Co cow, Mo mouse, Ra rat; Human LY6-like: 6D-LY6D; 6E-LY6E; 6D2-LY6D2; 6H-LY6H; 6K-LY6K; 6NX-LY6NX. Asterisks show identical residues for proteins, colon similar alternate residues, dot dissimilar alternate residues. Residues predicted for involvement in N-signal peptide formation are shown in red, N-glycosylated and potential N-glycosylated Asn sites are in green bold, key GPIHBP1 functional residues 56Gly and 114Gln are in shaded pink, predicted disulfide bond Cys residues are shown; α-helices predicted for GPIHBP1 are in shaded yellow, β-sheets (β1–β5) predicted for mammalian GPIHBP1 or for human LY6-like sequences are in shadedgrey, bold underlined font shows residues corresponding to known or predicted exon start sites. Exon numbers refer to GPIHBP1 human gene exons, the sequences for the UPAR/Ly6 domain are shown, C-terminal hydrophobic amino acid segment is shown as shadedgreen, known (human and mouse) or predicted mammalian GPIHBP1 and human LY6-like GPI-binding sites are shown in shadedblue
Percentage identities for mammalian GPIHBP1 amino acid sequences and the human LY6-like amino acid sequences
| GPIHBP1 | Human | Orangutan | Rhesus | Marmoset | Mouse | Rat | Guinea pig | Dog | Pig | Cow | Horse | Human | Human | Human | Human | Human | Human |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LY6D | LY6E | LY6H | LY6K | LYPD2 | LYNX1 | ||||||||||||
| Human | 100 | 96 | 84 | 80 | 54 | 52 | 53 | 60 | 53 | 54 | 59 | 14 | 27 | 26 | 10 | 24 | 13 |
| Orangutan | 96 | 100 | 85 | 84 | 54 | 53 | 53 | 61 | 54 | 55 | 60 | 17 | 24 | 23 | 10 | 24 | 13 |
| Rhesus | 84 | 85 | 100 | 73 | 53 | 53 | 51 | 57 | 50 | 49 | 59 | 10 | 30 | 25 | 13 | 24 | 12 |
| Marmoset | 80 | 84 | 73 | 100 | 46 | 50 | 51 | 57 | 51 | 46 | 57 | 18 | 27 | 22 | 9 | 23 | 12 |
| Mouse | 54 | 54 | 53 | 46 | 100 | 82 | 63 | 53 | 51 | 50 | 55 | 14 | 20 | 22 | 9 | 25 | 14 |
| Rat | 52 | 53 | 53 | 50 | 82 | 100 | 61 | 51 | 49 | 52 | 51 | 15 | 20 | 21 | 13 | 15 | 25 |
| Guinea pig | 53 | 53 | 51 | 51 | 63 | 61 | 100 | 41 | 41 | 43 | 48 | 9 | 16 | 17 | 6 | 18 | 16 |
| Dog | 60 | 61 | 57 | 57 | 53 | 51 | 41 | 100 | 56 | 56 | 64 | 16 | 31 | 26 | 18 | 28 | 18 |
| Pig | 53 | 54 | 50 | 51 | 51 | 49 | 41 | 56 | 100 | 65 | 56 | 20 | 26 | 25 | 17 | 22 | 15 |
| Cow | 54 | 55 | 49 | 46 | 50 | 52 | 43 | 56 | 65 | 100 | 60 | 16 | 25 | 25 | 20 | 21 | 14 |
| Horse | 59 | 60 | 59 | 57 | 55 | 51 | 48 | 64 | 56 | 60 | 100 | 20 | 29 | 25 | 8 | 24 | 16 |
| Human LY6D | 14 | 17 | 10 | 18 | 14 | 15 | 9 | 16 | 20 | 16 | 20 | 100 | 25 | 30 | 14 | 32 | 28 |
| Human LY6E | 27 | 24 | 30 | 27 | 20 | 20 | 16 | 31 | 26 | 25 | 29 | 25 | 100 | 32 | 19 | 17 | 32 |
| Human LY6H | 26 | 23 | 25 | 22 | 22 | 21 | 17 | 26 | 25 | 25 | 25 | 30 | 32 | 100 | 16 | 28 | 25 |
| Human LY6K | 10 | 10 | 13 | 9 | 9 | 13 | 6 | 18 | 17 | 20 | 8 | 14 | 19 | 16 | 100 | 22 | 18 |
| Human LYPD2 | 24 | 24 | 24 | 23 | 25 | 15 | 18 | 28 | 22 | 21 | 24 | 32 | 17 | 28 | 22 | 100 | 31 |
| Human LYNX1 | 13 | 13 | 12 | 12 | 14 | 25 | 16 | 18 | 15 | 14 | 16 | 28 | 32 | 25 | 18 | 31 | 100 |
Numbers show the percentage of amino acid sequence identities
Predicted N-glycosylation sites for mammalian GPIHBP1 sequences
| Mammalian GPIHBP1 | Species | Site 1 | Site 1 potential | Site 2 | Site 2 potential | Site 3 | Site 3 potential | Site 4 | Site 4 potential | No. of potential sites |
|---|---|---|---|---|---|---|---|---|---|---|
| Human |
|
|
|
|
| 2 | ||||
| Orangutan |
|
|
|
|
| 2 | ||||
| Rhesus |
|
|
| 1 | ||||||
| Marmoset |
|
|
| 1 | ||||||
| Mouse |
|
|
| 1 | ||||||
| Rat |
|
|
| 1 | ||||||
| Guinea Pig |
| 76NQTE | NP | 150NGTT | NP | 0 | ||||
| Horse |
|
|
|
|
| 2 | ||||
| Cow |
| 0 | ||||||||
| Dog |
| 0 | ||||||||
| Pig |
| 0 |
Predicted N-glycosylation sites were identified using NetNGlyc 1.0 web tools (http://www.cbs.dtu.dk/services/NetNGlyc/)³², potential for N-glycosylation sites was determined by the web tools (maximum level of 1)
Bold values designate high probability of forming an N-glycosylation site
N Asparagine, L leucine, Q glutamine, T threonine, C cysteine, R arginine, E glutamate, H histidine, V valine, NP no prediction for an N-glycosylation site
Fig. 2Predicted tertiary structures for the UPAR/Ly6 domain for human, rat, guinea pig and pig GPIHBP1. Predicted GPIHBP1 tertiary structures were obtained using SWISS MODEL methods; the rainbow color code describes the tertiary structures from the N- (blue) to C-termini (red color) for human, rat, guinea pig and pig GPIHBP1 UPAR/Ly6 domains; arrows indicate the directions for β-sheets
Fig. 3Comparative gene clusters for mammalian LY6-like genes. LY6-like gene clusters are identified with the size of the cluster (in kilobases) in each case. Individual LY6-like genes were identified and positioned using data summarized in Tables 1 and 2. The arrow shows the direction for transcription:right arrow the positive strand; left arrow the negative strand. Note the absence of an identified GPIHBP1 gene on the opossum genome
Fig. 4Gene and mRNA structures for the human, mouse and rat GPIHBP1 genes. Derived from the AceView website http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/ (Thierry-Mieg and Thierry-Mieg 2006); mature isoform variants (a) are shown with capped 5′- and 3′-ends for the predicted mRNA sequences. NM refers to the NCBI reference sequence. Exons are in shaded pink; untranslated 5′- and 3′ sequences are in open pink, introns are represented as pink lines joining exons, the directions for transcription are shown as 5′→3′, sizes of mRNA sequences are shown in kilobases (kb)
Fig. 5Comparative sequences for mammalian 5′-flanking, 5′-untranslated and coding regions for the GPIHBP1 genes. Derived from the UCSC Genome Browser using the Comparative Genomics track to examine alignments and evolutionary conservation of GPIHBP1 gene sequences; genomic sequences aligned for this study included primate (human, orangutan, rhesus and marmoset), non-primate eutherian mammal (mouse, rat, guinea pig, dog, horse and cow), a marsupial (opossum), a monotreme (platypus) and bird species (chicken); conservation measures were based on conserved sequences across all of these species in the alignments which included the 5′-flanking, 5′-untranslated, exons, introns and 3′-untranslated regions for the GPIHBP1 gene; regions of sequence identity are shaded in different colors for different species
Fig. 6Phylogenetic tree of mammalian GPIHBP1 and other LY6-like sequences. The tree is labeled with the gene name and the name of the animal and is ‘rooted’ with the zebrafish (Danio rerio) LY6PD sequence. Note the major cluster for the mammalian GPIHBP1 sequences and several major groups of the other LY6-like sequences: LYNX1, LY6D, LY6H, SLURP1, LYPD2, PSCA, LT6E, LY6K, and GML. A genetic distance scale is shown (% amino acid substitutions). The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Only replicate values of 90 or more which are highly significant are shown with 100 bootstrap replicates performed in each case
Fig. 7Proposal for generating the GPIHBP1 gene during eutherian mammalian evolution. This hypothesis is for a two-step process for generating the GPIHBP1 gene: (1) a LY6-like gene duplication event in a common ancestor for eutherian mammals; and (2) retroviral transfer of a region of the BCL11A gene in the ancestral genome encoding acidic amino acids generating a GPIHBP1-like gene containing a new exon