| Literature DB >> 17397552 |
Michael C Riley1, Amanda Clare, Ross D King.
Abstract
BACKGROUND: We are interested in understanding the locational distribution of genes and their functions in genomes, as this distribution has both functional and evolutionary significance. Gene locational distribution is known to be affected by various evolutionary processes, with tandem duplication thought to be the main process producing clustering of homologous sequences. Recent research has found clustering of protein structural families in the human genome, even when genes identified as tandem duplicates have been removed from the data. However, this previous research was hindered as they were unable to analyse small sample sizes. This is a challenge for bioinformatics as more specific functional classes have fewer examples and conventional statistical analyses of these small data sets often produces unsatisfactory results.Entities:
Mesh:
Year: 2007 PMID: 17397552 PMCID: PMC1855069 DOI: 10.1186/1471-2105-8-112
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Ranking of all genes on each of the chromosomes.
| Chr | Rank | Original SD | Mean MC SD | Std Err |
| 1 | 1000 | 2.71 | 2.43 | 0.054 |
| 2 | 1000 | 2.67 | 2.31 | 0.052 |
| 3 | 957 | 2.42 | 2.29 | 0.051 |
| 4 | 1000 | 2.74 | 2.44 | 0.054 |
| 5 | 1000 | 2.51 | 2.19 | 0.049 |
Table detailing the ranking, the standard deviation in the distribution of the original genes (Original SD), the mean of 1000 standard deviations from the Monte Carlo simulations (Mean MC SD) and the standard error (Std Err) on all five chromosomes (Chr). The standard deviation gives us a measure of clustering and the difference between Original SD and Mean MC SD divided by Std Err gives us the significance (see main text)
Figure 1Probability density plots gap lengths. Probability density function plots of inter gene gap lengths for all five chromosomes of A. thaliana. The curves are not asymptotic to the Y axis; the peak occurs between 300–700 bp.
Average ranking of each GO level across all five chromosomes.
| Level | Ave. ranking (TD removed) | Ave. ranking (all) |
| 1 | 713 | 796 |
| 2 | 705 | 779 |
| 3 | 652 | 725 |
| 4 | 675 | 745 |
Average ranking of all the functional classes analysed with and without tandem duplicates (TD) on all five chromosomes of A. thaliana from four levels of the Gene Ontology hierarchy showing that the degree of clustering of the distribution of broadly classified genes is similar to that of the more specific classifications
Figure 2Distribution of rankings without tandem duplicates. Distribution of rankings of the functional classes without tandem duplicates at level 1, the ten most general functional classes of the GO hierarchy of both W and C strands across all five chromosomes of Arabidopsis thaliana. The labels on the x axis refer to the Gene Ontology classifications described in table 4. The y axis is representative of the relative degree of clustering of genes, where 500 indicates what we would expect if the genes are located at random, above 500 is increasingly clustered and below 500 the genes are increasingly evenly spaced apart. This plot demonstrates that different functional classes have remarkably different degrees of clustering.
Figure 3Distribution of rankings with tandem duplicates. Distribution of rankings of the functional classes including tandem duplicates at level 1 of the GO hierarchy of both W and C strands across all five chromosomes of Arabidopsis thaliana. The labels on the x axis refer to the Gene Ontology classifications. Refer to table 4 for a description of these annotations. This plot is the same as figure 2, but with the tandem duplicates included. This demonstrates that tandem duplicates increase clustering by a small degree in all of the most general functional classes. Note that we found some more specific classes at level 4 that were much less susceptible to tandem duplication (see main text).
Details of genes from the centromeres that were excluded.
| Chromosome | Start (Mbp) | End (Mbp) | Genes excluded |
| 1 | 11.5 | 18.5 | At1g32000 |
| 2 | 0.0 | 7.2 | At2g01050 – At2g16160 |
| 3 | 9.1 | 17.1 | At3g25100 – At3g47090 |
| 4 | 0.0 | 6.0 | At4g00010 – At4g11240 |
| 5 | 5.4 | 16.9 | At5g16500 – At5g42320 |
Details of the centromeric regions excluded from the analysis showing the start and end locations of the centromeres determined by the method given in the text
Description of GO annotations.
| Class No. | Description |
| GO:0003824 | Catalytic activity |
| GO:0004871 | Signal transducer activity |
| GO:0005198 | Structural molecule activity |
| GO:0005215 | Transporter activity |
| GO:0005488 | Binding |
| GO:0016209 | Anti oxidant activity |
| GO:0030234 | Enzyme regulator activity |
| GO:0030528 | Transcription regulator activity |
| GO:0045182 | Translation regulator activity |
| GO:0045735 | Nutrient reservoir |
Descriptions of the Gene Ontology annotations used in the boxplots in figure 2 and figure 3