| Literature DB >> 15752422 |
Michael E Bradley1, Steven A Benner.
Abstract
BACKGROUND: Blocks of duplicated genomic DNA sequence longer than 1000 base pairs are known as low copy repeats (LCRs). Identified by their sequence similarity, LCRs are abundant in the human genome, and are interesting because they may represent recent adaptive events, or potential future adaptive opportunities within the human lineage. Sequence analysis tools are needed, however, to decide whether these interpretations are likely, whether a particular set of LCRs represents nearly neutral drift creating junk DNA, or whether the appearance of LCRs reflects assembly error. Here we investigate an LCR family containing the sulfotransferase (SULT) 1A genes involved in drug metabolism, cancer, hormone regulation, and neurotransmitter biology as a first step for defining the problems that those tools must manage.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15752422 PMCID: PMC555591 DOI: 10.1186/1471-2148-5-22
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Genomic organization of the SULT1A LCR family. (A) 30 LCRs (red) aligned to the SULT1A3 LCR (blue). Core sequences of SULT1A and LCR16a families are shown between dashed lines. (B) Chromosome 16 positions of 29 SULT1A3-related LCRs. (C) Known genes, bacterial sequencing contigs, and LCRs (outlined in boxes) in three 350 kbp regions of chromosome 16.
SULT1A3-related LCRs
| LCR Name* | Chromosome | Strand | Start | End | Length | % Identity† |
| A | chr18p | + | 11605429 | 11633851 | 28422 | 97.8 |
| B | chr16p | + | 11985022 | 12003971 | 18949 | 97.6 |
| C | chr16p | - | 14747420 | 14753000 | 5580 | 94.8 |
| D | chr16p | - | 14766628 | 14792117 | 25489 | 96.5 |
| E | chr16p | - | 14805750 | 14832437 | 26687 | 96.6 |
| F | chr16p | - | 14996007 | 15072649 | 76642 | 96.9 |
| G | chr16p | + | 15161625 | 15185467 | 23842 | 95.6 |
| H | chr16p | - | 15417052 | 15453865 | 36813 | 95.9 |
| I | chr16p | - | 16394409 | 16416404 | 21995 | 96.4 |
| J | chr16p | + | 16437719 | 16461029 | 23310 | 96.5 |
| K | chr16p | + | 18371484 | 18394809 | 23325 | 96.4 |
| L | chr16p | + | 18414255 | 18434928 | 20673 | 96.4 |
| M | chr16p | - | 18834216 | 18904410 | 70194 | 96.5 |
| N | chr16p | + | 18962854 | 18969729 | 6875 | 95.2 |
| O | chr16p | + | 21376182 | 21480283 | 104101 | 97.8 |
| P | chr16p | + | 21808293 | 21910109 | 101816 | 98.3 |
| Q | chr16p | - | 22414809 | 22523008 | 108199 | 97.1 |
| R | chr16p | + | 28316465 | 28341127 | 24662 | 97.8 |
| S | chr16p | + | 28427424 | 28467064 | 39640 | 97.3 |
| 1A1 | chr16p | + | 28481970 | 28490644 | 8674 | 86.0 |
| 1A2 | chr16p | + | 28494950 | 28502357 | 7407 | 86.6 |
| T | chr16p | - | 28621035 | 28630803 | 9768 | 97.7 |
| U | chr16p | - | 28692200 | 28714506 | 22306 | 98.1 |
| V | chr16p | - | 28800873 | 28828646 | 27773 | 97.7 |
| W | chr16p | - | 29084138 | 29108487 | 24349 | 97.8 |
| X | chr16p | + | 29426409 | 29498137 | 71728 | 97.6 |
| 1A4 | chr16p | + | 29498152 | 29644489 | 146337 | 99.1 |
| 1A3 | chr16p | + | 30236110 | 30388351 | 152241 | 100 |
| Y | chr16q | + | 69784235 | 69818803 | 34568 | 96.2 |
| Z | chr16q | + | 70016088 | 70061019 | 44931 | 97.4 |
| AA | chr16q | - | 74188141 | 74209430 | 21289 | 97.4 |
*LCR names are as in Figure 1. †Percent identity is relative to the 1A3 LCR.
Duplication Status of SULT Genes
| Accession | Gene | Chromosome | ORF Length | ORF Duplicated |
| NM_001055 | chr16 | 895 | 895 | |
| NM_001054 | chr16 | 895 | 895 | |
| NM_003166 | chr16 | 895 | 895 | |
| NM_014465 | chr4 | 804 | 0 | |
| NM_001056 | chr2 | 898 | 0 | |
| NM_006588 | chr2 | 916 | 0 | |
| NM_005420 | chr4 | 892 | 0 | |
| NM_003167 | chr19 | 864 | 210 | |
| NM_004605 | chr19 | 1059 | 0 | |
| NM_014351 | chr22 | 862 | 0 |
SULT1A4 and SULT1A3 Genomic Region Differences
| Location* | Nucleotide | SULT1A4 Region | SULT1A3 Region |
| 5' UTR | -6,246 | G | C |
| 5' UTR | -6,118 | C | T |
| 5' UTR | -6,007 | G | C |
| 5' UTR | -5,246 | - | T |
| 5' UTR | -4,433 | - | T |
| Intron 1B | -2,775 | C | T |
| Intron 1B | -2,671 | - | T |
| Intron 1B | -2,670 | - | T |
| Intron 1B | -2,594 | T | G |
| Intron 1A | -91 | - | A |
| Exon 2 | +105 | A | G |
| Intron 4 | +853 | - | A |
| Intron 4 | +1,487 | A | G |
| Exon 8 | +3,569 | - | A |
| Exon 8 | +3,570 | - | A |
| Exon 8 | +3,571 | - | T |
| Exon 8 | +3,572 | - | T |
| 3' UTR | +5,379 | G | C |
| 3' UTR | +6,438 | C | - |
| 3' UTR | +6,335 | C | - |
| 3' UTR | +6,210 | C | - |
* 21 alignment positions are shown where the nucleotide/gapping (-) of the SULT1A4 region differed from that of the SULT1A3 region. Exon and intron names of the SULT1A3 gene are according to [33]. All nucleotides are numbered relative to the first nucleotide of the start codon, which has a value of +1. There was no position 'zero'. The last nucleotide of the coding sequence occurs at position +3,188. Approximately 3 kb of upstream (5' UTR) and downstream (3' UTR) nucleotides were included in the comparison.
Evidence of SULT1A4 Expression
| Accession | Gene* | Tissue† | Pos. 105 |
| [Genbank:CB147451] | Liver | A | |
| [Genbank:BF087636] | head-neck | A | |
| [Genbank:W76361] | fetal heart | A | |
| [Genbank:W81033] | fetal heart | A | |
| [Genbank:BC014471] | pancreas, epitheliod carcinoma | A | |
| [Genbank:F08276] | infant brain | G | |
| [Genbank:BF814073] | Colon | G | |
| [Genbank:BG819342] | Brain | G | |
| [Genbank:BM702343] | optic nerve | G | |
| [Genbank:BQ436693] | large cell carcinoma | G | |
| [Genbank:AA323148] | cerebellum | G | |
| [Genbank:AA325280] | cerebellum | G | |
| [Genbank:AA349131] | fetal adrenal gland | G | |
| [Genbank:L25275] | placenta | G |
*Gene classifications made according to the nucleotide at position 105 as described in the text. †Tissue descriptions were taken from GenBank accessions.
Figure 2SULT1A gene tree. TREx upper-limit date estimates of hominoid SULT1A duplications are shown as Ma in red. KA/KS values estimated by PAML are shown above branches. Infinity (8) indicates a non-reliable KA/KS value greater than 100. The 1A3/1A4 branch is dashed. NCBI accession numbers of sequences used: chimpanzee 1A1 [Genbank:BK004887], chimpanzee 1A2 [Genbank:BK004888], chimpanzee 1A3 [Genbank:BK004889], ox [Genbank:U34753], dog [Genbank:AY069922], gorilla 1A1 [Genbank:BK004890], gorilla 1A2 [Genbank:BK004891], gorilla 1A3 [Genbank:BK004892], human 1A1 [Genbank:L19999], human 1A2 [Genbank:U34804], human 1A3 [Genbank:L25275], human 1A4 [Genbank:BK004132], macaque [Genbank:D85514], mouse [Genbank:L02331], pig [Genbank:AY193893], platypus [Genbank:AY044182], rabbit [Genbank:AF360872], rat [Genbank:X52883].
Figure 3Synteny plots demonstrating SULT1A3 is the progenitor locus of the hominoid SULT1A family. Each box shows a VISTA percent identity plot between a section of the human genome and a section of a rodent genome. Different rodent genomes and alignment methods are indicated as follows: 1 = mouse (Oct. 2003 build) multiple alignment method (MLAGAN); 2 = rat (June 2003 build) multiple alignment method (MLAGAN); 3 = mouse (October 2003 build) pairwise alignment method (LAGAN). Human gene locations are shown above and human chromosome 16 coordinates below.
Likelihood Values and Parameter Estimates for SULT1A Genes
| Model | f.p.* | Log L | Parameter Estimates† | ||
| One-ratio | 39 | - 5,047.81 | KA/KS = 0.15 | ||
| Free-ratios | 69 | - 5,005.18 | KA/KS ratios for each branch shown in Figure 2 | ||
| Neutral | 36 | - 5,021.14 | ( | ||
| KA/KS 0 = 0 | KA/KS 1 = 1 | ||||
| Selection | 38 | - 4,884.89 | ( | ||
| KA/KS 0 = 0 | KA/KS 1 = 1 | KA/KS2 = 0.19 | |||
| Discrete (k = 2) | 37 | - 4,931.05 | |||
| KA/KS 0 = 0.06 | KA/KS 1 = 0.77 | ||||
| Discrete (k = 3) | 40 | - 4,880.78 | ( | ||
| KA/KS 0 = 0.02 | KA/KS 1 = 0.31 | ||||
| Beta | 37 | - 4,884.27 | |||
| Beta+selection | 39 | - 4,879.97 | |||
| Model A | 38 | - 5,013.29 | ( | ||
| KA/KS 0 = 0 | KA/KS 1 = 1 | ||||
| Model B | 40 | - 4,886.52 | ( | ||
| KA/KS 0 = 0.04 | KA/KS1 = 0.56 | ||||
*f.p. is the number of free parameters in each model. †Evidence for positive selection is shown in boldface. Proportions of sites in each KA/KS class, p0, p1, and p2, were not free parameters when in parentheses. Neutral site-specific model assumes two site classes having fixed KA/KS ratios of 0 and 1, with the proportion of sites in each class estimated as free parameters. Selection site-specific model assumes a third proportion of sites with KA/KS estimated from the data. Discrete model assumes 2 or 3 site classes (k) with the proportion of sites, and KA/KS ratios for each proportion, estimated as free parameters. Beta model assumes a beta distribution of sites, where the distribution is shaped by the parameters p and q. Beta+selection model assumes an additional class of sites having a KA/KS ratio estimated from the data. Model A, an extension of the neutral model, assumes a third site class on the 1A3/1A4 branch with KA/KS estimated from the data. Model B, an extension of the discrete model with two site classes (k = 2), also assumes a third site class on the 1A3/1A4 branch with KA/KS estimated from the data.
Likelihood Ratio Tests for the SULT1A Genes
| Selection vs. Neutral | Discrete (k = 3) vs. One-ratio | Beta+selection vs. Beta | Model A vs. Neutral | Model B vs. Discrete (k = 2) | |
| Log L1 | - 4,884.89 | - 4,880.78 | - 4,879.97 | - 5,013.29 | - 4,886.52 |
| Log L0 | - 5,021.14 | - 5,047.81 | - 4,884.27 | - 5,021.14 | - 4,931.05 |
| 2ΔLog L | 272.50 | 334.06 | 8.60 | 15.70 | 89.06 |
| d.f. | 2 | 4 | 2 | 2 | 2 |
| P-value | P < 0.001 | P < 0.001 | 0.01 < P < 0.05 | P < 0.001 | P < 0.001 |
| 3 (0.86) | |||||
| 7 (0.63) | |||||
| 30 (0.71) | |||||
| 35 (0.73) | |||||
| 143 (0.51) | |||||
| 236 (0.53) | |||||
| 245 | 245 | ||||
| 261 | |||||
| 275 (0.70) | |||||
| 288 (0.89) | |||||
| 290 | |||||
| 293 (0.72) |
*In parentheses for each positively selected site is the posterior probability that the site belongs to the class with KA/KS >1. Posterior probabilities >90% are bold-face. Positively selected sites also experiencing non-synonymous change on the 1A3/1A4 branch are underlined.
Figure 4Non-synonymous changes along the 1A3/1A4 branch cluster on the SULT1A1 enzyme structure [PDB: 1LS6] [26]. Red sites experienced non-synonymous changes, green sites experienced synonymous changes. The PAPS donor substrate and p-nitrophenol acceptor substrates are shown in blue. Image was generated using Chimera [86].
Non-synonymous Changes on the 1A3/1A4 Branch
| Site* | Nucleotide Changes/Site | Hominoid SULT1A Ancestor | Hominoid SULT1A3 Ancestor | |||||
| Residue | PP† | Physicochemical Properties | Residue | PP | Physicochemical Properties | |||
| 44 | 1 | Ser | (1.00) | tiny polar | → | Asn | (1.00) | small polar |
| 71 | 1 | His | (0.99) | non-polar aromatic positive | → | Asn | (1.00) | small polar |
| 76 | 1 | Phe | (1.00) | non-polar aromatic | → | Tyr | (1.00) | aromatic |
| 77 | 2 | Met | (0.99) | non-polar | → | Val | (1.00) | small non-polar aliphatic |
| 1 | Phe | (1.00) | non-polar aromatic | → | Val | (1.00) | small non-polar aliphatic | |
| 85 | 1 | Lys | (1.00) | Positive | → | Asn | (1.00) | small polar |
| 86 | 2 | Val | (0.98) | small non-polar aliphatic | → | Asp | (1.00) | small polar negative |
| 3 | Ile | (0.98) | non-polar aliphatic | → | Glu | (1.00) | polar negative | |
| 93 | 1 | Met | (0.00) | non-polar | → | Leu | (1.00) | non-polar aliphatic |
| 101 | 1 | Ala | (1.00) | tiny non-polar | → | Pro | (1.00) | small |
| 1 | Leu | (1.00) | non-polar aliphatic | → | Ile | (1.00) | son-polar aliphatic | |
| 1 | Thr | (1.00) | tiny polar | → | Ser | (1.00) | tiny polar | |
| 1 | Ala | (1.00) | tiny non-polar | → | Pro | (1.00) | small | |
| 143 | 1 | Tyr | (1.00) | aromatic | → | His | (1.00) | non-polar aromatic positive |
| 144 | 2 | His | (0.99) | non-polar aromatic positive | → | Arg | (1.00) | polar positive |
| 2 | Ala | (1.00) | tiny non-polar | → | Glu | (1.00) | polar negative | |
| 148 | 1 | Val | (1.00) | small non-polar aliphatic | → | Ala | (1.00) | tiny non-polar |
| 222 | 1 | Leu | (0.99) | non-polar aliphatic | → | Phe | (1.00) | non-polar aromatic |
* Sites underlined were identified as being positively selected using the branch-site specific models. †Posterior probabilities that the ancestral residues are correct, conditional on the model of sequence evolution used.