| Literature DB >> 19832993 |
Yong Zhang1, Shujuan Lu, Shuqi Zhao, Xiaofeng Zheng, Manyuan Long, Liping Wei.
Abstract
BACKGROUND: New genes generated by retroposition are widespread in humans and other mammalian species. Usually, this process copies a single parental gene and inserts it into a distant genomic location. However, retroposition of two adjacent parental genes, i.e. co-retroposition, had not been reported until the hominoid chimeric gene, PIPSL, was identified recently. It was shown how two genes linked in tandem (phosphatidylinositol-4-phosphate 5-kinase, type I, alpha, PIP5K1A and proteasome 26S subunit, non-ATPase, 4, PSMD4) could be co-retroposed from a single RNA molecule to form this novel chimeric gene. However, understanding of the origination and biological function of PIPSL requires determination of the coding potential of this gene as well as the evolutionary forces acting on its hominoid copies.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19832993 PMCID: PMC2773790 DOI: 10.1186/1471-2148-9-252
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Statistics of polymorphism, which was generated by DnaSP [39].
| complete | 4,200 | 21 | 0.00101 | 0.00064 | 8 | 0.0002 |
| 5'flanking | 736 | 1 | 0.00028 | 0.00041 | 1 | 0.0001 |
| 3'flanking | 875 | 11 | 0.00255 | 0.00148 | 6 | 0.0010 |
| 2,589 | 9 | 0.00071 | 0.00042 | 1C | 0.00001 |
A. "Complete" indicates the full locus of PIPSL, which consists of 736 bp 5' flanking region, 1,482 bp PIP5K1A-derived CDS, 1,107 bp PSMD4-derived CDS and 875 bp 3' flanking region. 5' flanking region includes both 592 bp promoter and 144 bp 5' UTR, while 3' flanking region indicates 3' UTR.
B. θ is the selection coefficient or 4Nμ (Nand μ indicates the effective population size and mutation rate, respectively).
C. This unique indel is heterozygous.
D. Indel diversity is calculated using the most conservative way, i.e., we only count diallelic difference and regard the copy number variants in 3' UTR as only two alleles. Thus, the indel diversity of 3' UTR might be underestimated.
Figure 1Gene structure of . The aqua arrows between top two bars mark the correspondence between two parental genes and PIPSL. "ATG" and "TAG" in black indicate the border of the ORF. It is almost the complete fusion product of PIP5K1A and PSMD4, although the original start codon (the boxed "ATG" in rose) was destroyed due to a human specific deletion [13]. The distance between this original start codon and the current assumed start codon is only 60 bps. The left-hand and right-hand pale blocks mark the sequenced promoter region and 3' UTR region, respectively. The gold arrows indicate from which region polymorphisms are, like 5' UTR, coding region and 3' UTR. "Segregating sites" show the ID of polymorphisms. "Reference position" indicates the position relative to the starting point of sequenced reads. The first base corresponds to 592 bp upstream relative to the transcription start site of PIPSL. "Reference sequence" marks the consensus sequence in those locations with "D" indicating deletions relative to the consensus, with the nucleotides deleted shown in individuals. Letters in uppercase indicate homozygous mutations, while letters in lowercase indicate heterozygous mutations. "031","032" and so on indicate ID of samples. Herein, 031~040, 041~049, 820~914 and 014~089 are samples from African American, Africans in the south of the Sahara, Russian and Chinese, respectively.
The probability of CDS generating not more than nine SNPs if the whole PIPSL locus is homogeneously neutral.
| 2,589 | 9 | 0.00071 | 0.0005 | |
| 3'flanking | 875 | 11 | 0.00255 | ~ |
HKA test using chimp as the outgroup.
| 2,589 | 9 | 13.33 | 31 | 26.67 | ||
| 3'flanking | 875 | 11 | 6.67 | 9 | 13.33 | |
100,000 simulations are ran to track frame-disrupting features in case of CDS region derived from PSMD4, CDS region derived from PIP5K1A and the complete CDS of PIPSL, respectively.
| PAML | 2.53 | 0.417 | 1 | 1 | 0.00071D | |
| Dnapars | 0.402 | 0.00093 | ||||
| PAML | 1.50 | 0.00045 | 2 | 0 | <10-5 | |
| Dnapars | 0.00044 | <10-5 | ||||
| Complete-Gene | PAML | 1.91 | 0.01016 | 3 | 1 | <10-5 |
| Dnapars | 0.00879 | <10-5 |
A. We used two distinct methods to reconstruct the ancestral sequence, PAML and Dnapars of Phylip package (Maximum-parsimony based inference). In all cases, there results are similar between each other.
B. PNaNs is defined as the proportion of simulated datasets that show a Na/Ns ratio smaller or equal to the observation. Here, Na and Ns indicates the number of nonsynonymous mutations and that of synonymous mutations, respetively.
C. Pdis corresponds to the percentage of simulated datasets that demonstrates a number of frame-disrupting mutations (stop codons and frameshifts) smaller or equal to the observed number. Out of all lineages of interest, human, chimp, orangutan and gibbon, only three stop codons and one indel are observed and those only in the gibbon genome. Specifically, two nonsense substitutions occur in the PIP5K1A-derived region, while the other one nonsense substitution and one indel situate in the PSMD4-derived region.
Figure 2The evolutionary process of . Blue and yellow bars marks ancestral branches leading to human and chimp in PSMD4-derived region and PIP5K1A-derived region, respectively. "P" indicates the parental gene. The number like "5.1/3.1" indicates how many nonsynonymous substitutions and synonymous substitutions occur in this branch, while the number in thicker font like "0.66" indicates Ka/Ks. In addition, we mark all branches with Ka/Ks significantly different with one by "a", which means a p of 0~0.05. Considering the small number of substitutions, we also mark those branches with a marginal significance (p of 0.05~0.1) by "b".
Selection of different lineages based on CODEML.
| Human | N | N |
| Chimp | N | N |
| Human/Chimp ancestor | A | N |
| Gorilla | N/A | N |
| Orangutan | C | C |
| Gibbon | N | N |
N, A and C are short for "failure to reject Neutral null model", "Adaptive" and "Constrained", respectively. Selection force for PSMD4-derived region of gorilla is still unknown given the lack of trace data.
Exon-array based retropseudogene expression profile across 11 tissues.
| Breast | 0.00 | 42 | 39 |
| Cerebellum | 0.00 | 58 | 63 |
| Heart | 0.02 | 74 | 54 |
| Kidney | 0.00 | 58 | 64 |
| Liver | 0.01 | 53 | 43 |
| Muscle | 0.01 | 58 | 42 |
| Pancreas | 0.00 | 57 | 56 |
| Prostate | -0.02 | 24 | 43 |
| Spleen | 0.00 | 37 | 31 |
| Testis | 0.00 | 51 | 38 |
| Thyroid | 0.00 | 39 | 39 |
A. Expression value calculated is a log value, which might be smaller than 0 (See Methods). This column shows the median expression of all pseudogenes in the tissue of interest.
B. This column counts the number of pseudogenes which has the highest expression in the tissue of interest no matter how much this highest expression should be.
C. This column is similar to the previous column except the highest expression for a pseudogenes should be above the criterion of 0.2. This value indicates a presence of expression by manually checking the correlation between EST data and exon-array data on UCSC genome browser.