| Literature DB >> 28401457 |
Abstract
IDS is responsible for the lysosomal degradation of heparan sulfate and dermatan sulfate and linked to an X-linked lysosomal storage disease, mucopolysaccharidosis 2 (MPS2), resulting in neurological damage and early death. Comparative IDS amino acid sequences and structures and IDS gene locations were examined using data from several vertebrate genome projects. Vertebrate IDS sequences shared 60-99% identities with each other. Human IDS showed 47% sequence identity with fruit fly (Drosophila melanogaster) IDS. Sequence alignments, key amino acid residues, N-glycosylation sites and conserved predicted secondary and tertiary structures were also studied, including signal peptide, propeptide and active site residues. Mammalian IDS genes usually contained 9 coding exons. The human IDS gene promoter contained a large CpG island (CpG46) and 5 transcription factor binding sites, whereas the 3'-UTR region contained 5 miRNA target sites. These may contribute to IDS gene regulation of expression in the brain and other neural tissues of the body. An IDS pseudogene (IDSP1) was located proximally to the IDS gene on the X-chromosome in primate genomes. Phylogenetic analyses examined the relationships and potential evolutionary origins of the vertebrate IDS gene. These suggested that IDS has originated in an invertebrate ancestral genome and retained throughout vertebrate evolution and conserved on marsupial and eutherian X-chromosomes, with the exception of rat Ids on chromosome 8.Entities:
Keywords: Amino acid sequence; Evolution; IDS; IDS gene regulation; Iduronate 2-sulfatase; Vertebrates; X-chromosome
Year: 2017 PMID: 28401457 PMCID: PMC5388652 DOI: 10.1007/s13205-016-0595-3
Source DB: PubMed Journal: 3 Biotech ISSN: 2190-5738 Impact factor: 2.893
Vertebrate IDS Proteins
| ID | Species | UNIPROT ID | Amino acids | Subunit MW | pI |
| Signal peptide | % Identity human IDS |
|---|---|---|---|---|---|---|---|---|
| Human |
| P22304 | 550 | 61,873 | 5.2 | 115, 144, 246, 280, 325, 513, 537 | 1..25 | 100 |
| Chimpanzee |
| na | 550 | 61,861 | 5.2 | 115, 144, 246, 280, 325, 513, 537 | 1..25 | 99 |
| Orangutan |
| H2PX10 | 550 | 62,083 | 5.4 | 115, 144, 246, 280, 325, 513, 537 | 1..25 | 96 |
| Baboon |
| na | 550 | 61,885 | 5.1 | 115, 144, 246, 280, 325, 513, 537 | 1..25 | 96 |
| Marmoset |
| F7EJG2 | 550 | 61,812 | 5.4 | 115, 144, 246, 280, 325, 513, 537 | 1..25 | 94 |
| Mouse |
| Q08890 | 552 | 62,186 | 5.5 | 117, 146, 248, 282, 515, 539 | 1..29 | 86 |
| Rat |
| Q32KJ4 | 543 | 62,370 | 5.5 | 117, 146, 248, 181, 515, 539 | 1..20 | 85 |
| Cow |
| F1N2D5 | 547 | 61,389 | 5.8 | 112, 141, 243, 277, 509, 533 | 1..20 | 82 |
| Sheep |
| W5PI67 | 547 | 61,019 | 5.6 | 112, 141, 243, 277, 510, 534 | 1..20 | 82 |
| Opossum |
| F7DJA1 | 558 | 63,374 | 5.3 | 129, 260, 294, 339, 457, 524, 552 | 1..23 | 75 |
| Tasmanian devil |
| na | 539 | 61,392 | 5.3 | 111, 140, 242, 276, 321, 505, 509, 533 | 1..22 | 74 |
| Chicken |
| F1NFI0 | 601 | 68,047 | 6.6 | 156, 185, 287, 321, 366, 584 | na | 67 |
| Lizard |
| H9GGQ8 | 524 | 59,239 | 5.8 | 92, 121, 223, 257, 478, 507 | na | 63 |
| Frog |
| A8WGX6 | 542 | 61,858 | 6.1 | 112, 141, 243, 277, 322 | 1..18 | 66 |
| Zebra fish |
| A1A5V0 | 561 | 63,771 | 7.7 | 109, 138, 181, 240, 274, 499 | 1..25 | 60 |
| Fruit Fly |
| na | 502 | 57,760 | 7.3 | 93, 12, 22, 22, 22, 82, 60, 400 | na | 47 |
UNIPROT refers to UniprotKB/Swiss-Prot IDs for individual IDS proteins (see http://kr.expasy.org); pI refers to theoretical isoelectric points
Vertebrate IDS Genes
|
| Species | RefSeq ID | GenBank ID | Chromosome location | Coding exons (strand) | Gene size (bps) |
|---|---|---|---|---|---|---|
| Human |
| NM_000202 | BC006170 | X:149,482,749–149,505,137 | 9 (−ve) | 22,389 |
|
|
| na | na | X:149,525,002–149,525,923 | na | 922 |
| Chimpanzee |
| XP_016799854 | na | X:150,217,197–150,239,595 | 9 (−ve) | 22,399 |
| Orangutan |
| XP_002832265 | na | X:149,468,629–149,491,886 | 9 (−ve) | 23,258 |
| Baboon |
| XP_003918436 | na | X:137,241,259–137,263,520 | 9 (−ve) | 22,262 |
| Marmoset |
| XP_002763402 | na | X:136,661,096–136,690,421 | 9 (−ve) | 29,326 |
| Mouse |
| NM_010498 | BN000750 | X:70,346,204–70,364,903 | 9 (−ve) | 18,700 |
| Rat |
| XP_017451660 | BN000743 |
| 9 (−ve) | 16,055 |
| Cow |
| NM_001192851 | na | X:32,309,006–32,324,359 | 9 (−ve) | 15,354 |
| Sheep |
| XP_012016345 | na | X:81,295,118–81,310,976 | 9 (+ve) | 15,859 |
| Opossum |
| XP_007507328 | na | X:38,769,936–38,797,831 | 9 (−ve) | 27,896 |
| Tasmanian devil |
| XP_012408735 | na | X_GL867598:1,290,327–1,307,074 | 9 (−ve) | 16,748 |
| Chicken |
| XP_015133789 | na | 4:18,031,638–18,046,283 | 9 (+ve) | 14,646 |
| Lizard |
| XP_016851828 | na | GL343310:1,066,926–1,092,995 | 8 (+ve) | 26,070 |
| Frog |
| NM_001197132 | BC154891 | KB021658:33,136,298–33,145,211 | 9 (+ve) | 8914 |
| Zebra fish |
| NM_001080068 | BC128823 | 14:20,572,602–20,594,434 | 8 (−ve) | 21,833 |
| Fruit Fly |
| NM_139557 | AAY55004 | 3L:3,378,315–3,380,000 | 4 (+ve) | 1686 |
GenBank IDs are derived from NCBI http://www.ncbi.nlm.nih.gov/genbank/; GL and KB refer to a scaffold; bps refers to base pairs of nucleotide sequences; the number of coding exons are listed
RefSeq The reference sequence, XP predicted sequence, na not available
Fig. 1Amino acid sequence alignments for vertebrate IDS sequences. See Table 1 for sources of IDS sequences; asterisk shows identical residues for IDS subunits; colon similar alternate residues; dot dissimilar alternate residues; predicted phosphoresidues are in pink; predicted N-glycosylated Asn sites are in green; the active site residues (for human IDS) are shown in blue; active site residue subject to modification is shown as A; predicted α-helices for human IDS is in shaded yellow and numbered in sequence; predicted β-sheets are in shaded gray and also numbered in sequence from the N-terminus; bold underlined font shows residues corresponding to known or predicted exon start sites; exon numbers refer to human IDS gene exons; leader peptide is in brown; propeptide in red
Fig. 2Amino acid sequence alignments for mammalian IDS N-terminus sequences. See Table 1 for sources of IDS sequences; asterisk shows identical residues for IDS subunits; colon similar alternate residues; dot dissimilar alternate residues; the active site residues (for human IDS) are shown in blue; leader peptide is in brown; propeptide in red; bold underlined font shows residues corresponding to known or predicted exon start sites; exon numbers refer to human IDS gene exons
Fig. 3Predicted tertiary structure for human IDS. The predicted structure for human IDS is based on the reported structure for human ARSA (Chrusczcz et al. 2003) and obtained using the SWISS MODEL web site based on PDB 1N2KA http://swissmodel.expasy.org/workspace/. The rainbow color code describes the 3-D structures from the N- (blue) to C-termini (red color) for residues 35–549 for human IDS; predicted α-helices, β-sheets, proposed active site cleft, and N- and C-termini are shown
Fig. 4Tissue expression for human IDS. RNA-seq gene expression profiles across 53 selected tissues (or tissue segments) were examined from the public database for human IDS, based on expression levels for 175 individuals (Data Source: GTEx Analysis Release V6p (dbGaP Accession phs000424.v6.p1) (http://www.gtex.org). Tissues: 1. Adipose-Subcutaneous; 2. Adipose-Visceral (Omentum); 3. Adrenal gland; 4. Artery-Aorta; 5. Artery-Coronary; 6. Artery-Tibial; 7. Bladder; 8. Brain-Amygdala; 9. Brain-Anterior cingulate Cortex (BA24); 10. Brain-Caudate (basal ganglia); 11. Brain-Cerebellar Hemisphere; 12. Brain-Cerebellum; 13. Brain-Cortex; 14. Brain-Frontal Cortex; 15. Brain-Hippocampus; 16. Brain-Hypothalamus; 17. Brain-Nucleus accumbens (basal ganglia); 18. Brain-Putamen (basal ganglia); 19. Brain-Spinal Cord (cervical c-1); 20. Brain-Substantia nigra; 21. Breast-Mammary Tissue; 22. Cells-EBV-transformed lymphocytes; 23. Cells-Transformed fibroblasts; 24. Cervix-Ectocervix; 25. Cervix-Endocervix; 26. Colon-Sigmoid; 27. Colon-Transverse; 28. Esophagus-Gastroesophageal Junction; 29. Esophagus- Mucosa; 30. Esophagus-Muscularis; 31. Fallopian Tube; 32. Heart-Atrial Appendage; 33. Heart-Left Ventricle; 34. Kidney-Cortex; 35. Liver; 36. Lung; 37. Minor Salivary Gland; 38. Muscle-Skeletal; 39. Nerve-Tibial; 40. Ovary; 41. Pancreas; 42. Pituitary; 43. Prostate; 44. Skin-Not Sun Exposed (Suprapubic); 45. Skin-Sun Exposed (Lower leg); 46. Small Intestine-Terminal Ileum; 47. Spleen; 48. Stomach; 49. Testis; 50. Thyroid; 51. Uterus; 52. Vagina; 53. Whole Blood
Fig. 5Gene structure and major gene transcripts for the human IDS gene. Derived from the AceView website http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/ (Thierry-Mieg and Thierry-Mieg 2006); shown with capped 5′- and 3′- ends for the predicted mRNA sequences; NM refers to the NCBI reference sequence; coding exons are in pink; the direction for transcription is shown as 5′ → 3′; a large CpG46 island at the gene promoter is shown (see Table 4 for details of CpG islands for human and other vertebrate IDS gene promoters); 5 predicted transcription factor binding sites (TFBS) for human IDS are shown (see Table 1s for details); 5 predicted miRNA target sites were identified within the extended 3′-UTR region of human IDSa and IDSb transcripts
Identification of transcription factor binding sites (TFBS) within the human IDS gene promoter
| TFBS | Strand | Chr 1 Position | Function/role | Sequence | UNIPROT ID |
|---|---|---|---|---|---|
| BACH2 | (+ve) | X:148,585,129–139 | Binds to Maf recognition elemants |
| Q9BYV9 |
| AP1 | (−ve) | X:148,585,128–140 | Regulating cells forming the skeleton |
| P01101 |
| NFE2 | (+ve) | X:148,585,128–138 | Regulating erythroid maturation | AGCTGAGTCAT | Q16621 |
| BACH1 | (+ve) | X:148,585,127–141 | Coordinates transcription by MAFK | TAGCTGAGTCATGCA | O14867 |
| XBP1 | (+ve) | X:148,584,868–884 | Regulation during ER stress | ATGGTCACATAGCCATT | P17861 |
The identification of TFBS within the IDS promoter region was undertaken using the human genome browser (http://genome.ucsc.edu); UNIPROT refers to UniprotKB/Swiss-Prot IDs for individual TFBS sequences (see http://kr.expasy.org); ER refers to endoplasmic reticulum
Vertebrate IDS CpG Islands
| Vertebrate | CpG Island ID | Chromosomal position | CpG size | C count plus G count | % C or G | Ratio of observed to expected CpG |
|---|---|---|---|---|---|---|
| Human | CpG 46 | ChrX:148,586,553–148,586,984 | 432 | 279 | 65 | 1.02 |
| Baboon | CpG 50 | ChrX:137,263,406–137,263,837 | 432 | 306 | 71 | .92 |
| Rhesus | CpG53 | ChrX:143,222,778–143,223,221 | 444 | 318 | 72 | .93 |
| Mouse | CpG 26 | ChrX:70,364,872–70,365,161 | 290 | 159 | 55 | 1.2 |
| Rat | CpG 26 | Chr8:69,175,527–69,175,735 | 209 | 138 | 66 | 1.14 |
| Cow | CpG 53 | ChrX:32,324,232–32,324,656 | 425 | 317 | 75 | .9 |
| Dog | CpG 51 | ChrX;117,515,293–117,515,743 | 451 | 293 | 65 | 1.1 |
| Opossum | CpG2 29 | ChrX:38,797,675–38,797,993 | 319 | 189 | 59 | 1.05 |
| Chicken | CpG 54 | Chr4:18,031,448–18,032,009 | 562 | 333 | 59 | 1.09 |
The identification of IDS CpG islands, sequences and properties was undertaken using various vertebrate genome browsers (http://genome.ucsc.edu)
Fig. 6Phylogenetic tree of vertebrate IDS amino acid sequences. The tree is labeled with the vertebrate and fruit fly IDS. A genetic distance scale is shown. The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Replicate values of .9 or more which are highly significant (values of .9 or more), are shown with 100 bootstrap replicates performed in each case