| Literature DB >> 18273382 |
Wen-Jiu Guo1, Ping Li, Jun Ling, Shao-Ping Ye.
Abstract
Microsatellites are short tandem repeats of one to six bases in genomic DNA. As microsatellites are highly polymorphic and play a vital role in gene function and recombination, they are an attractive subject for research in evolution and in the genetics and breeding of animals and plants. Orphan genes have no known homologs in existing databases. Using bioinformatic computation and statistical analysis, we identified 19,26 orphan genes in the rice (Oryza sativa ssp. Japanica cv. Nipponbare) proteome. We found that a larger proportion of orphan genes are expressed after sexual maturation and under environmental pressure than nonorphan genes. Orphan genes generally have shorter protein lengths and intron size, and are faster evolving. Additionally, orphan genes have fewer PROSITE patterns with larger pattern sizes than those in nonorphan genes. The average microsatellite content and the percentage of trinucleotide repeats in orphan genes are also significantly higher than in nonorphan genes. Microsatellites are found less often in PROSITE patterns in orphan genes. Taken together, these orphan gene characteristics suggest that microsatellites play an important role in orphan gene evolution and expression.Entities:
Year: 2007 PMID: 18273382 PMCID: PMC2216055 DOI: 10.1155/2007/21676
Source DB: PubMed Journal: Comp Funct Genomics ISSN: 1531-6912
Figure 1The cumulative percentages of orphan genes against different BLAST E-cutoffs. The x-axis indicates the BLAST E-cutoffs following negative logarithmic transformation, and the y-axis indicates the number of orphan genes obtained at different E values. The curve rises sharply as E values drop and then levels off. The turning point is around the E-cutoff = 10−4. At E-cutoff = 10−4, we obtained 18,398 orphans out of a total of 59,712, which accounts for 30.8% of the total protein sequences of the annotated proteome.
Figure 2Comparison of the number of orphan and nonorphan genes expressed in different tissues or following injury or hormone treatment. Germinating: shoot and roots of germinating seeds in the library; callus: callus library; shoot: green shoot, shoot, shoot and callus, shoot and root of germinating seeds and mixed shoot (normalized library); flower: flower library; panicles: mixture of library 21 and library 22 (panicles less than 5 cm stage and panicles two weeks after flowering), mixture of library 29 and library 33 (panicles mixture of one, two, and three weeks after flowering and supermix), mixture of library 29 and library 35 (panicles mixture of one, two, and three weeks after flowering), mixture of library 30 and library 34 (panicles mixture of one, two, and three weeks after flowering and supermix), mixture of library 30 and library 36 (panicles mixture of one, two, and three weeks after flowering), and mixture of library 19 and library 20 (panicles more than 5 cm stage and panicles one day after flowering); injury: Cd-treated callus, cold-treated callus, etiolated shoot, heat-treated callus, and UVC irradiated shoot; hormone treated: ABA (abscisic acid) ABA-treated callus and NAA (naphthaleneacetic acid)-treated callus. All the count values (the values shown are the corresponding percentages) were tested by Pearson Chi-Square by means of whole and separate data pairs (for example, germinating and callus can form a data pair). The whole table test is significant, P = 1.4 × 10−46.
Comparison of average protein length and intron size between orphan and nonorphan genes using ESTs.
| Nonorphan gene | Orphan gene | Probability | |
|---|---|---|---|
| Protein length | 583 | 245 | 0 |
| Intron size | 2277.959428 | 1474.6711 | 2.6202E-52 |
The statistical test of Mann-Whitney U was conducted. Intron size is an average of the sum of all the introns within a gene. The table shows that both average protein length and intron size are highly significant.
Comparison of average mismatch rates in high similarity pairs (HSPs) of indica-janpanica EST alignments at an E-cutoff = 10−20 in different tissues.
| Tissue | Nonorphan gene | Orphan gene | Probability |
|---|---|---|---|
| Panicles | 5.8875806 | 6.066929 | 0.015195 |
| Callus | 5.9784517 | 6.200713 | 0.003167 |
The mismatch rate = 100 − identity rate. The mismatch rate includes indels (insert and deletes) and substitutions in HSP of the BLAST alignment. The mismatch rates in both tissues are statistically significant using the Mann-Whitney U test.
Figure 3Microsatellite content among gene components. IntronT and intron1 represent the microsatellite content in all introns and in the first intron of a gene, respectively. CDS represents the microsatellite content in CDS (coding sequence). In intronT and CDS, the microsatellite content of orphan genes is significantly higher than nonorphan genes. All the data pairs are highly significant (P ≪ .01) in the Mann-Whitney U test. ** Probability <.05; *probability <.01.
Figure 4Triplet microsatellite content as a percentage of the total mononucleotide to pentanucleotide microsatellite content of orphan and nonorphan genes. IntronT and intron1 represent microsatellite content in all introns and in the first intron of a gene, respectively. All the data pairs were tested by the Mann-Whitney U test. ** Probability <.05; *probability <.01.
Comparison of number of PROSITE patterns and average pattern size in CDSs of orphan and nonorphan genes.
| Indicators | Nonorphan gene | Orphan gene | Mann-Whitney probability |
|---|---|---|---|
| Number of PROSITE Patterns | 33.76025 | 15.93652 | 0 |
| Average PROSITE Pattern size | 6.825061 | 6.965661 | 3.36221E-26 |
The table shows the PROSITE pattern complexity of orphan and nonorphan genes. The number includes repetitive PROSITE patterns in the sequence. The Mann-Whitney U test was applied.
Interaction between microsatellite loci and PROSITE patterns in orphan and nonorphan genes.
| Interaction | Nonorphan gene | Orphan gene |
|---|---|---|
| Microsatellite loci outside PROSITE patterns | 482 (2.9%) | 118 (6.6%) |
| Microsatellite loci overlapping PROSITE patterns | 0 | 2 (0.1%) |
| Microsatellite loci within PROSITE patterns | 16345 (97.1%) | 1666 (93.3%) |
The table was tested by Chi-Square test of crosstab. The significant probability of Pearson Chi-Square was 1.36744 × 10−20. The number includes repetitive PROSITE patterns.