| Literature DB >> 16640792 |
Loris Mularoni1, Roderic Guigó, M Mar Albà.
Abstract
BACKGROUND: Amino acid tandem repeats are found in nearly one-fifth of human proteins. Abnormal expansion of these regions is associated with several human disorders. To gain further insight into the mutational mechanisms that operate in this type of sequence, we have analyzed a large number of mutation variants derived from human expressed sequence tags (ESTs).Entities:
Mesh:
Substances:
Year: 2006 PMID: 16640792 PMCID: PMC1557989 DOI: 10.1186/gb-2006-7-4-r33
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Human amino acid repeat variants
| Repeat type | Number of repeats with EST coverage* | Average codon homogeneity | Average number of ESTs | Polymorphic repeats (%) | Polymorphic up-down (%)† | Gap/total repeat variants (%)‡ |
| All | 2,227 | 0.49 | 27.4 | 115 (5.2) | 110-106 (4.8) | 60/137 (44%) |
| A | 249 | 0.37 | 35.5 | 14 (5.6) | 16-9 (5) | 8/17 (47%) |
| E | 487 | 0.55 | 28.8 | 31 (6.4) | 20-20 (4.1) | 15/35 (43%) |
| G | 193 | 0.48 | 27.7 | 12 (6.2) | 13-12 (6.5) | 4/15 (26%) |
| L | 210 | 0.55 | 26.8 | 5 (2.4) | 11-13 (5.7) | 4/5 (80%) |
| P | 312 | 0.41 | 26.8 | 17 (5.4) | 17-17 (5.4) | 3/22 (14%) |
| S | 315 | 0.41 | 22.7 | 9 (2.9) | 10-8 (2.8) | 2/11 (18%) |
| K | 134 | 0.5 | 36.9 | 7 (5.2) | 6-10(5.9) | 3/7 (43%) |
| Q | 137 | 0.66 | 19.7 | 14 (10.2) | 8-9 (6.2) | 15/17 (88%) |
*Number of repeats covered by at least four ESTs. †Number of polymorphic sequences immediately upstream (up) and downstream (down) of repeats; the percentages in parentheses were calculated by taking them together. ‡Number of repeat polymorphic variants involving gaps with respect to the total number of variants.
Figure 1Number of polymorphic variants for regions containing different kinds of amino acid repeats. For the upstream and downstream sequences adjacent to the repeat the average value was taken. Bars indicate the actual values of both repeat adjacent sides.
Amino acid substitutions in polymorphic variants
| From/to | A | E | G | L | P | S | K | Q | V | I | M | F | W | T | C | Y | N | D | R | H | |
| 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| E | 0 | 0 | 4 | 0 | 0 | 0 | 8 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | |
| G | 2 | 2 | 0 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 3 | 0 | |
| L | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| P | 7 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 2 | 1 | |
| S | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | |
| K | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |
| Q | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| Rel. frequency | 0.11 | 0.03 | 0.10 | 0.01 | 0.04 | 0.10 | 0.10 | 0.08 | 0.08 | 0.00 | 0.00 | 0.04 | 0.00 | 0.11 | 0.03 | 0.00 | 0.01 | 0.06 | 0.08 | 0.03 | |
| 0 | 0 | 6 | 0 | 4 | 3 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | ||
| E | 0 | 0 | 1 | 0 | 0 | 0 | 5 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | |
| G | 6 | 2 | 0 | 0 | 0 | 2 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 1 | 5 | 0 | |
| L | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 4 | 0 | 4 | 1 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
| P | 8 | 0 | 0 | 2 | 0 | 6 | 0 | 3 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | |
| S | 0 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | |
| K | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 1 | 0 | |
| Q | 0 | 1 | 0 | 3 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 3 | |
| Rel. frequency | 0.10 | 0.04 | 0.07 | 0.05 | 0.05 | 0.10 | 0.05 | 0.05 | 0.13 | 0.01 | 0.03 | 0.04 | 0.01 | 0.04 | 0.03 | 0.00 | 0.02 | 0.05 | 0.07 | 0.03 |
*Upstream and downstream sequences taken together. Rel. frequency, relative frequency.
Figure 2Frequency of synonymous and non-synonymous nucleotide substitutions for regions containing different kinds of amino acid repeats. For the upstream and downstream sequences adjacent to the repeat the average value was taken. Bars indicate the actual values of both repeat adjacent sides.
Figure 3Codon homogeneity distribution of the sequence regions encoding different types of repeats: polymorphic with substitutions, polymorphic with expansions or contractions (gaps), all repeats. Codon homogeneity value intervals labeled as X-Y stand for values >X and <=Y (for example, 0-0.2 are values >0 and <= 0.2).
Repeat gap polymorphic variants
| Ensembl ID | Locus link ID | AA | Position* | Size* | Size variant | Len. protein* | Number of ESTs† | Codon max run‡ | Codon hom.§ | Max run size‡ | Description |
| ENSP00000282388 | ZFP36L2 | Q | 394 | 7 | 9 | 494 | 195 | CAG | 1 | 7 | Butyrate response factor 2 (TIS11D protein) |
| ENSP00000324790 | TDE2L | Q | 363 | 5 | 6 | 455 | 56 | CAG | 1 | 5 | Tumor differentially expressed 2-like |
| ENSP00000317661 | CACNA1A | Q | 2,311 | 13 | 11 | 2,505 | 10 | CAG | 1 | 13 | Voltage-dependent P/Q-type calcium channel alpha-1A subunit (CACNA1A) |
| ENSP00000280665 | DCP1B | Q | 251 | 10 | 11 | 617 | 7 | CAG | 0.90 | 9 | mRNA decapping enzyme 1B |
| ENSP00000348018 | ZNF384 | Q | 439 | 16 | 15 | 516 | 23 | CAG | 0.88 | 14 | Zinc finger protein 384 (nuclear matrix transcription factor 4) |
| ENSP00000264883 | Q | 92 | 5 | 6 | 507 | 33 | CAG | 0.80 | 4 | Nucleoporin p54 (54 kDa nucleoporin) | |
| ENSP00000229279 | ATN1 | Q | 482 | 19 | 16 | 1,189 | 7 | CAG | 0.79 | 15 | Atrophin-1 (dentatorubral-pallidoluysian atrophy protein; DRPLA) |
| ENSP00000265773 | SMARCA2 | Q | 215 | 23 | 22 | 1,590 | 8 | CAG | 0.57 | 13 | Possible global transcription activator SNF2L2 (SNF2-alpha) |
| ENSP00000354597 | KIAA0476 | Q | 815 | 16 | 13 | 1,417 | 8 | CAG | 0.56 | 9 | Unknown function |
| ENSP00000272804 | KIAA1946 | Q | 42 | 14 | 15,16 | 428 | 4 | CAG | 0.43 | 6 | KIAA1946 |
| ENSP00000313603 | ABCF1 | Q | 63 | 10 | 9,11 | 845 | 20 | CAG | 0.40 | 4 | ATP-binding cassette. sub-family F, member 1 |
| ENSP00000252891 | NUMBL | Q | 426 | 20 | 18 | 609 | 9 | CAG | 0.35 | 7 | Numb-like protein (Numb-R) |
| ENSP00000304689 | THAP11 | Q | 103 | 29 | 28 | 314 | 12 | CAG | 0.34 | 10 | THAP domain protein 11 (HRIHFB2206) |
| ENSP00000345671 | NCOA3 | Q | 1,243 | 29 | 28 | 1,420 | 8 | CAG | 0.31 | 9 | Nuclear receptor coactivator 3 isoform b |
| ENSP00000301187 | TMC4 | E | 56 | 5 | 4 | 706 | 12 | GAG | 1 | 5 | Transmembrane channel-like 4 |
| ENSP00000315064 | MAGEF1 | E | 152 | 6 | 4,7 | 307 | 49 | GAG | 1 | 6 | Melanoma-associated antigen F1 (MAGE-F1 antigen) |
| ENSP00000340702 | E | 630 | 10 | 9,11 | 686 | 6 | GAG | 1 | 10 | 106 kDa O-GlcNAc transferase-interacting protein | |
| ENSP00000262680 | NRD1 | E | 149 | 5 | 4 | 1,219 | 33 | GAA | 0.80 | 4 | Nardilysin precursor (EC 342461) (N-arginine dibasic convertase) |
| ENSP00000252455 | PRKCSH | E | 312 | 13 | 12 | 528 | 15 | GAG | 0.77 | 10 | Glucosidase II beta subunit precursor (PKCSH) |
| ENSP00000253237 | GRWD1 | E | 123 | 6 | 5 | 446 | 79 | GAA | 0.50 | 3 | Glutamate-rich WD-repeat protein 1 |
| ENSP00000262710 | ACIN1 | E | 269 | 12 | 11 | 1,341 | 5 | GAG | 0.50 | 6 | Apoptotic chromatin condensation inducer in the nucleus (Acinus) |
| ENSP00000346324 | E | 60 | 7 | 8 | 109 | 249 | GAG | 0.43 | 3 | Predicted: similar to prothymosin alpha | |
| ENSP00000263274 | LIG1 | E | 152 | 6 | 5 | 919 | 19 | GAG/GAA | 0.33 | 2 | DNA ligase I (polydeoxyribonucleotide synthase [ATP]) |
| ENSP00000304498 | PODXL2 | E | 161 | 11 | 9 | 529 | 39 | GAG | 0.27 | 3 | Endoglycan |
| ENSP00000345444 | APLP2 | E | 220 | 7 | 5 | 707 | 84 | GAG/GAA | 0.14 | 1 | Amyloid-like protein 2 precursor (CDEI-box binding protein) |
| ENSP00000350479 | RPL14 | A | 149 | 10 | 11,12 | 215 | 213 | GCT | 1 | 10 | 60S ribosomal protein L14 (CAG-ISL 7) |
| ENSP00000255608 | BTBD2 | A | 40 | 14 | 15,16 | 525 | 9 | GCC | 0.93 | 13 | BTB/POZ domain containing protein 2 |
| ENSP00000305783 | RBM23 | A | 368 | 9 | 10 | 423 | 53 | GCT | 0.56 | 5 | RNA-binding region containing protein 4 (pplicing factor SF2) |
| ENSP00000346678 | A | 130 | 6 | 5 | 232 | 50 | GCA | 0.33 | 2 | Similar to splicing factor. arginine/serine-rich 4 isoform c | |
| ENSP00000330188 | A | 266 | 5 | 6 | 434 | 50 | GCA/GCT | 0.20 | 1 | Similar to splicing factor. arginine/serine-rich 4 isoform c | |
| ENSP00000324573 | FLII | A | 410 | 6 | 5 | 1,269 | 25 | GCA/GCT | 0.17 | 1 | Flightless-I protein homolog |
| ENSP00000255631 | G | 24 | 6 | 9 | 359 | 96 | GGC | 0.83 | 5 | hsp70-interacting protein | |
| ENSP00000246533 | CAPNS1 | G | 36 | 20 | 21 | 268 | 100 | GGC | 0.50 | 10 | Calpain small subunit 1 (CSS1) |
| ENSP00000218072 | SRPX | L | 16 | 7 | 6 | 464 | 21 | CTG | 1 | 7 | Sushi repeat-containing protein SRPX precursor |
| ENSP00000315602 | CHRNA3 | L | 16 | 7 | 6 | 505 | 5 | CTG | 1 | 7 | Neuronal acetylcholine receptor protein, alpha-3 chain precursor |
| ENSP00000344134 | MOG | L | 16 | 6 | 5 | 206 | 13 | CTC | 1 | 6 | Myelin-oligodendrocyte glycoprotein precursor |
| ENSP00000240617 | L | 17 | 8 | 7 | 553 | 22 | CTG | 0.88 | 7 | Unknown function | |
| ENSP00000304072 | DDX54 | K | 89 | 5 | 6 | 882 | 97 | AAG | 1 | 5 | DEAD-box protein 54 |
| ENSP00000285814 | MKI67IP | K | 211 | 5 | 6 | 293 | 79 | AAG | 0.60 | 3 | MKI67 (FHA domain) interacting nucleolar phosphoprotein |
| ENSP00000276212 | GPC3 | P | 25 | 6 | 5 | 580 | 54 | CCG | 0.83 | 5 | Glypican-3 precursor (Intestinal protein OCI-5) |
| ENSP00000312296 | CKAP4 | P | 42 | 5 | 4 | 602 | 11 | CCG | 0.80 | 4 | Cytoskeleton-associated protein 4 |
| ENSP00000286910 | PCGF6 | P | 23 | 5 | 7 | 350 | 7 | CCT | 0.40 | 2 | Polycomb group ring finger 6 isoform a |
| ENSP00000301653 | KRT16 | S | 72 | 5 | 6 | 473 | 248 | AGC | 1 | 5 | Keratin, type I cytoskeletal 16 (cytokeratin 16) |
| ENSP00000307804 | MLLT3 | S | 382 | 9 | 7 | 568 | 5 | AGC/TCC | 0.11 | 1 | AF-9 protein |
*Refers to the Ensembl protein. Len., length. Size, size of repeat. †Number of ESTs covering the repeat. ‡Max run, longest pure codon run within the repeat-encoding sequence. §Codon hom. (homogeneity), size of Max run divided by size of the repeat. AA, amino acid. Size variant can include several size variants (for example, 15,16)