| Literature DB >> 18852891 |
Yumi Yamaguchi-Kabata1, Makoto K Shimada, Yosuke Hayakawa, Shinsei Minoshima, Ranajit Chakraborty, Takashi Gojobori, Tadashi Imanishi.
Abstract
BACKGROUND: A great amount of data has been accumulated on genetic variations in the human genome, but we still do not know much about how the genetic variations affect gene function. In particular, little is known about the distribution of nonsense polymorphisms in human genes despite their drastic effects on gene products. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2008 PMID: 18852891 PMCID: PMC2561068 DOI: 10.1371/journal.pone.0003393
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Analysis of polymorphisms with gene structure.
Top: Scheme of analysis pipeline of polymorphisms with gene structure. Bottom: Screen shots taken from ‘Transcript View’ in H-InvDB that show classified SNPs and their positions (blue bars) in the CASP12 gene.
SNPs and indels in exon, intron and other genomic regions.
| Exon | Intron | Other genomic regions | |
| SNPs | 249,182 | 3,332,537 | 5,209,127 |
| Indels | 9,742 | 185,761 | 249,648 |
Polymorphisms mapped on single positions were analyzed with 36,712 protein-coding genes.
Classified SNPs in exon regions.
| Region | Effects on translation | Genes in category I–IV | All protein-coding genes |
| 5′UTR | 23454 [3.3×10−3/site] | 51881 | |
| ORF | Total | 85233 [2.7×10−3/site] | 96164 |
| Synonymous | 37484 [4.1×10−3 /site] | 40484 | |
| Nonsynonymous | 46261 [2.1×10−3 /site] | 53754 | |
| AA↔Ter | 938 | 1258 | |
| Unclassifiede | 398 | 421 | |
| Stop codon | Total | 152 | 247 |
| Synonymous | 63 | 88 | |
| Ter↔AA | 89 | 159 | |
| 3′UTR | 69691 [3.3×10−3/site] | 104510 | |
| Total | 178378 | 252555 |
Representative transcripts in 23,717 genes whose function were defined or suggested (similarity category I–III) and genes annotated as conserved hypothetical proteins (similarity category IV).
Representative transcripts in all protein-coding genes (36,712) including genes in similarity category I–IV plus similarity category V–VII (hypothetical protein, hypothetical short protein, and pseudogene candidate, respectively).
Densities of polymorphisms are shown in brackets as average number of polymorphisms per site. The average lengths of the 5′UTR, ORF and 3′UTR regions in 23717 genes were 303.9 bp, 1343.5 bp, and 877.6 bp, respectively. The densities of SNPs for synonymous, nonsynonymous and nonsense SNPs in ORFs were calculated based on the numbers of potential nucleotide sites for synonymous, nonsynonymous and nonsense mutations in coding regions. The density of nonsense SNPs is shown in Table 3.
SNPs causing changes between amino acids and stop codons.
SNPs causing changes between amino acids and stop codons.
| Region | Effects on translation | Genes in category I–IV | All protein-coding genes |
| ORF | Nonsense | 910 [0.85×10−3/site] | 1183 |
| Read-through | 28 | 75 | |
| Stop codon | Read-through | 67 | 110 |
| Nonsense | 22 | 49 |
These two gene sets are the same as Table 2.
Possible read-through SNPs in which alleles coding stop codons were ancestral type. This may be due to existence of shorter ORFs in the ancestral population.
Possible nonsense SNPs in which alleles coding stop codons were derived alleles. This may be due to existence of longer ORFs in the ancestral population.
The densities of nonsense SNPs in ORFs were calculated based on the numbers of potential nucleotide sites for nonsense mutations in coding regions.
Insertions and deletions in exon regions.
| Genes in category I–IV | All protein-coding genes | |
| 5′UTR | 785 [0.11×10−3] | 2005 |
| ORF | 1120 [0.035×10−3] | 1532 |
| 3′UTR | 3323 [0.16×10−3] | 4942 |
| Total | 5225 | 8479 |
These two gene sets are the same as Table 2.
Densities of polymorphisms are shown in brackets as average number of polymorphisms per site.
Three indels were located on both of ORF and UTR.
Frequency of each type of codon change for nonsense SNPs.
| TAA | TAG | TGA | Total | ||||
| Aaa→Taa | 33 | Aag→Tag | 31 | Aga→Tga | 20 | ||
| 1st |
|
|
|
|
|
| 748 |
| Gaa→Taa | 80 | Gag→Tag | 125 | Gga→Tga | 32 | ||
| tCa→tAa | 27 | tCg→tAg | 19 | tCa→tGa | 25 | ||
| 2nd |
|
| 200 | ||||
| tTa→tAa | 18 | tTg→tAg | 18 | tTa→tGa | 13 | ||
| taC→taA | 25 | taC→taG | 25 | tgC→tgA | 22 | ||
| 3rd |
|
| 235 | ||||
| taT→taA | 19 | taT→taG | 27 | tgT→tgA | 32 | ||
| Total | 264 | 487 | 432 | 1183 |
Bold letters show nucleotide changes by transition.
P<0.005 by chi-square test.
Nonsense SNPs and prediction of NMD.
| Predicted to cause NMD | Not for NMD | Total | |
| Known pathological variants | 8 | 0 | 8 |
| Other nonsense SNPs | 573 | 602 | 1175 |
| Total | 581 | 602 | 1183 |
This prediction is based on that mRNA would be destroyed if a stop codon occurs in the 5′ side of the boundary, which is 50–55 nucleotides upstream from the 3′ end of the second to last exon. Here, the nonsense SNPs located in the 5′ side of the boundary, which was set at 50 nucleotides upstream from the 3′ end of the second to last exon, were predicted to cause NMD.
This number includes SNPs in genes consisting of only one exon.
P = 0.0033 by Fisher's exact test.
Nonsense SNPs with known pathological effects.
| Acc# | Chr | Gene symbol | SNP | Variation | OMIM | Biological effects |
| M60092 | 1 |
| rs17602729 | Gln12Ter | 102770 | AMPD deficiency |
| M12272 | 4 |
| rs283413 | Gly78Ter | 103730 | Parkinson disease |
| BC073741 | 7 |
| rs10250779 | Trp78Ter | 261670 | Myopathy |
| AF000571 | 11 |
| rs17215500 | Arg518Ter | 607542 | Long QT syndrome 1 |
| AY358222 | 11 |
| rs497116 | Arg125Ter | 608633 | Sepsis susceptibility |
| M86407 | 11 |
| rs2228325 | Arg577Ter | 102574 | Athletic performance |
| L41870 | 13 |
| rs3092891 | Arg445Ter | 180200 | Bilateral retinoblastoma |
| AF068760 | 15 |
| rs28989186 | Arg194Ter | 602860 | Premature chromatid separation trait and mosaic variegated aneuploidy syndrome |
Functional bias of genes having nonsense SNPs causing NMD.
| Top level | Gene Ontology no. | Gene Ontology | Observed gene no. | Expected gene no. | Ratio of enrichment | P value |
| Biological process | 0006118 | electron transport | 15 | 4.23 | 3.55 | 5.03×10−5 |
| 0006468 | protein amino acid phosphorylation | 16 | 7.28 | 2.20 | 4.98×10−3 | |
| Cellular component | 0016020 | membrane | 41 | 22.55 | 1.82 | 5.57×10−4 |
| 0005578 | proteinaceous extracellular matrix | 8 | 1.21 | 6.62 | 2.17×10−6 | |
| Molecular function | 0005524 | ATP binding | 35 | 17.15 | 2.04 | 1.79×10−4 |
| 0004713 | protein tyrosine kinase activity | 16 | 6.46 | 2.48 | 1.56×10−3 | |
| 0004674 | protein serine/threonine kinase activity | 16 | 6.78 | 2.36 | 2.51×10−3 | |
| 0000166 | nucleotide binding | 14 | 5.61 | 2.50 | 2.79×10−3 | |
| 0004672 | protein kinase activity | 16 | 7.15 | 2.24 | 4.21×10−3 | |
| 0003723 | RNA binding | 10 | 3.11 | 3.22 | 1.82×10−3 | |
| 0005506 | iron ion binding | 8 | 2.00 | 4.00 | 1.32×10−3 | |
| 0005509 | calcium ion binding | 16 | 7.65 | 2.09 | 7.89×10−3 | |
| 0005215 | transporter activity | 10 | 3.44 | 2.91 | 3.76×10−3 | |
| 0016491 | oxidoreductase activity | 11 | 4.24 | 2.59 | 5.76×10−3 | |
| 0003779 | actin binding | 6 | 1.27 | 4.74 | 2.24×10−3 | |
| 0004759 | carboxylesterase activity | 5 | 0.24 | 20.44 | 4.19×10−6 |
Number of genes with a molecular function in the 581 genes in which nonsense SNPs causing NMD were found.
Expected number of genes that have a biological function in a sample of 581 genes, assuming a proportion of genes with a molecular function in all human genes.
Enrichment of a biological term in the genes for nonsense SNPs was statistically evaluated as a upper probability in a hypergeometric distribution.
Functional bias of genes having nonsense SNPs not causing NMD.
| Top level | Gene Ontology no. | Gene Ontology | Observed gene no. | Expected gene no. | Ratio of enrichment | P value |
| Biological process | 0007156 | homophilic cell adhesion | 6 | 1.42 | 4.23 | 3.05×10−3 |
| 0006310 | DNA recombination | 3 | 0.19 | 15.50 | 8.25×10−4 | |
| 0006414 | translational elongation | 3 | 0.34 | 8.85 | 4.48×10−3 | |
| 0042254 | ribosome biogenesis and assembly | 2 | 0.15 | 13.77 | 8.68×10−3 | |
| Cellular component | 0005853 | eukaryotic translation elongation factor 1 complex | 2 | 0.13 | 15.50 | 6.82×10−3 |
| Molecular function | 0004194 | pepsin A activity | 2 | 0.18 | 11.27 | 1.30×10−2 |
| 0003746 | translation elongation factor activity | 2 | 0.29 | 6.89 | 3.35×10−2 |
Number of genes with a molecular function in the 602 genes in which nonsense SNPs causing NMD were found.
Expected number of genes that have a biological function in a sample of 602 genes, assuming a proportion of genes with a molecular function in all human genes.
Enrichment of a biological term in the genes for nonsense SNPs was statistically evaluated as a upper probability in a hypergeometric distribution.