| Literature DB >> 22779033 |
Hie Lim Kim1, Mineyo Iwase, Takeshi Igawa, Tasuku Nishioka, Satoko Kaneko, Yukako Katsura, Naoyuki Takahata, Yoko Satta.
Abstract
We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures "Flowers" because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes.Entities:
Year: 2012 PMID: 22779033 PMCID: PMC3388347 DOI: 10.1155/2012/917678
Source DB: PubMed Journal: Int J Evol Biol ISSN: 2090-052X
Figure 1Example of a Flower. A dot plot shows an example of a Flower located on chromosome 10. Diagonal lines in the plot indicate the positions of blast hits, and the colors of the lines represent the identity of alignments as shown in the right-hand side of the plot. A blue bar indicates a detected copy unit, and the sum of alignments of blast hits equals the sum of the copy length. A pink bar indicates a Flower region, and the length of the bar represents the length of the Flower. A purple bar indicates a region called the duplicated region that contains the copy.
Statistical data for 291 Flowers. Definitions of each term are described in Section 2.
|
| Sum | Mean | S.D. | Median | Min. | Max. |
|---|---|---|---|---|---|---|
| Flower length (Kb) | 179,197 | 615 | 1,349 | 210 | 51 | 11,695 |
|
| 0.33 | 0.22 | 0.27 | 0.01 | 0.96 | |
| Effective number of copy units | 3 | 3 | 3 | 2 | 22 | |
| Copy unit length (Kb) | 50 | 95 | 23 | 1 | 1,122 | |
| Length of inverted copy/ | 0.4 | 0.4 | 0.5 | 0.0 | 1.0 | |
| Minimum identity (%) | 88.5 | 5.0 | 88.0 | 77.7 | 100.0 | |
| Maximum identity (%) | 98.0 | 2.6 | 99.2 | 89.8 | 100.0 | |
| Gene region length/ | 0.4 | 0.3 | 0.4 | 0.0 | 1.0 | |
| Gene numbera | 2,844 | 10 | 19 | 4 | 0 | 133 |
| Pseudogene number/gene numbera | 0.3 | 0.3 | 0.3 | 0.0 | 1.0 |
aThe number of Flower genes.
Figure 4Fixation time of pseudogenes. We tested three types of gene conversion (double-headed arrows): (a) allelic-trans, (b) cis, and (c) nonallelic-trans gene conversion. The three graphs show that the pseudogene fixation time (in units of a generation) in a population (2N = 100) depends on the gene conversion rate (Nc = 0–20). The mutation rate is assumed to be constant (2Nμ = 1) for the three cases shown here.
Figure 2Distribution of Flowers and gene density on chromosomes. A bar on a chromosome denotes the position of a Flower and the width of the bar represents the Flower length. The color of the bar represents the patterns of Flowers: small (red) and large (purple) Flowers. Gray bars stand for no genomic sequence data. Mapping of the bars was accomplished using ColoredChromosomes [11]. Gene density is represented alongside each chromosome. The scale for the gene density is placed at the top of the figure.
Statistics of Flowers in each chromosome.
| Chr. | Flower number | Flower number /1Mb | Flower lengtha (Kb) | Copy unit lengtha (Kb) | Effective number of copy unitsa | Min. identitya (%) | Max. identitya (%) | Gene numbera,b | Pseudogene number/Gene number a,b |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 27 | 0.12 | 472 | 80 | 3.3 | 89.8 | 98.7 | 12 | 0.3 |
| 2 | 10 | 0.04 | 1,110 | 102 | 2.8 | 86.5 | 98.8 | 19 | 0.3 |
| 3 | 6 | 0.03 | 271 | 17 | 2.3 | 88.0 | 96.4 | 4 | 0.7 |
| 4 | 10 | 0.05 | 346 | 47 | 3.5 | 88.2 | 97.6 | 6 | 0.4 |
| 5 | 9 | 0.05 | 556 | 39 | 5.7 | 88.0 | 96.5 | 9 | 0.2 |
| 6 | 15 | 0.09 | 238 | 39 | 2.5 | 88.1 | 97.2 | 6 | 0.3 |
| 7 | 17 | 0.11 | 789 | 54 | 3.0 | 88.8 | 97.6 | 12 | 0.4 |
| 8 | 11 | 0.08 | 373 | 40 | 3.4 | 87.6 | 97.2 | 14 | 0.3 |
| 9 | 12 | 0.10 | 1,308 | 129 | 3.2 | 88.3 | 97.2 | 17 | 0.3 |
| 10 | 14 | 0.11 | 707 | 74 | 2.7 | 90.3 | 97.5 | 8 | 0.2 |
| 11 | 18 | 0.14 | 405 | 26 | 4.0 | 87.5 | 96.8 | 8 | 0.5 |
| 12 | 12 | 0.09 | 223 | 18 | 2.4 | 88.7 | 96.4 | 2 | 0.1 |
| 13 | 6 | 0.06 | 264 | 24 | 2.1 | 90.2 | 97.3 | 2 | 0.4 |
| 14 | 6 | 0.07 | 449 | 75 | 2.6 | 86.8 | 99.1 | 20 | 0.5 |
| 15 | 7 | 0.09 | 2,911 | 141 | 3.8 | 87.1 | 99.9 | 27 | 0.4 |
| 16 | 10 | 0.13 | 1,082 | 85 | 3.0 | 89.1 | 99.3 | 13 | 0.4 |
| 17 | 11 | 0.14 | 1,031 | 73 | 2.7 | 91.4 | 99.2 | 12 | 0.3 |
| 18 | 3 | 0.04 | 268 | 16 | 4.1 | 87.5 | 99.0 | 1 | 0.0 |
| 19 | 26 | 0.47* | 422 | 23 | 3.9 | 86.5 | 97.0 | 7 | 0.2 |
| 20 | 5 | 0.08 | 374 | 35 | 2.7 | 86.1 | 98.5 | 2 | 0.0 |
| 21 | 1 | 0.03 | 94 | 5 | 2.0 | 95.6 | 98.2 | 1 | 1.0 |
| 22 | 10 | 0.29* | 351 | 31 | 2.4 | 90.3 | 98.3 | 11 | 0.3 |
| X | 39 | 0.26 | 270 | 28 | 3.8 | 89.1 | 99.2 | 5 | 0.2 |
| Y | 6 | 0.23 | 2,319 | 110 | 6.1 | 88.6 | 99.9 | 30 | 0.6 |
|
| |||||||||
| Mean | 12 | 0.12 | 693 | 52 | 3.5 | 88.7 | 98.0 | 10 | 0.3 |
| S.D. | 9 | 0.10 | 680 | 37 | 1.2 | 2.0 | 1.1 | 8 | 0.2 |
| Mean w/o XY | 11 | 0.11 | 638 | 51 | 3.3 | 88.7 | 97.9 | 10 | 0.3 |
| S.D. w/o XY | 7 | 0.10 | 607 | 37 | 0.9 | 2.1 | 1.0 | 7 | 0.2 |
aMean values of each chromosome, bFlower genes.
*P < 0.05 by Z-test.
Comparison of gene density of Flowers with randomly selected regions. Except Flowers with D length ≥ 1 Mb, the number of Flower genes was compared to that of 1,000 randomly selected regions on the human genome.
| Gene number in 277 Flowers | Gene number/10 Kb | Protein-coding gene number | Protein-coding gene number/10 kb | ||
|---|---|---|---|---|---|
| Flower | 1,841 | 0.61* | 1,030 | 0.34* | |
|
| |||||
| Randomly selected region ( | Mean | 371 | 0.12 | 282 | 0.09 |
| S.D. | 39 | 0.01 | 30 | 0.01 | |
| Min. | 272 | 0.09 | 204 | 0.07 | |
| Max. | 561 | 0.19 | 439 | 0.15 | |
*P = 0, Z-test.
Functions of genes in Flowers. For a GO category, we tested significance of frequency of Flower genes compared to total number of the human genes. This table represents GO categories showing P < 10−4, the observed number of Flower genes ≥3, and the ratio of observation to expectation ≥3.
| GO: ID | Detail | Catalog | Observationa | Obs/Expb | Multigene families on Flowers |
|---|---|---|---|---|---|
| GO: 0004556 | Alpha-amylase activity | Function | 3 | 21.1 | Amylase alpha |
|
| |||||
| GO: 0015020 | Glucuronosyltransferase activity | Function | 15 | 15.0 | UDP glucuronosyltransferase |
|
| |||||
| GO: 0019864 | IgG binding | Function | 7 | 14.7 | Fc fragment of IgG |
|
| |||||
| GO: 0016339 | Calcium-dependent cell-cell adhesion | Process | 12 | 10.5 | Protocadherin beta |
|
| |||||
| GO: 0003823 | Antigen binding | Function | 19 | 10.0 | Immunoglobulin, leukocyte immunoglobulin-like receptor, killer cell immunoglobulin-like receptor |
|
| |||||
| GO: 0004364 | Glutathione transferase activity | Function | 9 | 9.0 | Glutathione S-transferase |
|
| |||||
| GO: 0019882 | Antigen processing and presentation | Process | 13 | 8.6 | Major histocompatibility complex, class I, MHC class I polypeptide-related sequence, retinoic acid early transcript, UL16-binding protein, C-type lectin domain family |
|
| |||||
| GO: 0006805 | Xenobiotic metabolic process | Process | 10 | 8.4 | UDP glucuronosyltransferase, defensin, alpha, aldo-keto reductase family |
|
| |||||
| GO: 0006952 | Defense response | Process | 17 | 4.9 | Interferon, alpha, leukocyte immunoglobulin-like receptor, pregnancy-specific beta-1-glycoprotein, major histocompatibility complex, class I, SP140 nuclear body protein |
|
| |||||
| GO: 0032312 | Regulation of ARF GTPase activity | Process | 7 | 4.8 | ArfGAP with GTPase domain, centaurin, gamma-like family |
|
| |||||
| GO: 0020037 | Heme binding | Function | 24 | 4.4 | Cytochrome P450, nitric oxide synthase, HECT domain and RLD |
|
| |||||
| GO: 0007565 | Female pregnancy | Process | 10 | 4.3 | Pregnancy-specific beta-1-glycoprotein |
|
| |||||
| GO: 0005792 | Microsome | Component | 33 | 3.9 | UDP glucuronosyltransferase, cytochrome P450, flavin-containing monooxygenase, hydroxy-delta-5-steroid dehydrogenase |
|
| |||||
| GO: 0042742 | Defense response to bacterium | Process | 12 | 3.2 | Defensin, alpha, defensin, beta, MHC class I polypeptide-related sequence |
|
| |||||
| GO: 0009615 | Response to virus | Process | 12 | 3.1 | Defensin, alpha, interferon, alpha, chemokine (C-C motif) ligand, leukocyte immunoglobulin-like receptor |
aThe observed number of Flower genes for a GO category.
bThe ratio of the observed number of Flower genes to the expected number of genes.
Figure 3Scatter plot of maximum and minimum identity. The X and Y axes show the maximum and minimum percent identity of blast hits in a Flower. A dot represents one Flower. The 291 Flowers were classified into three groups, based on the number of Flower genes: 0 ~ 1, 2 ~ 20, and more than 20, are colored purple, blue, and yellow, respectively. The two yellow dots outlined in red are two exceptional Flowers with low maximum identity.
Statistics of detected gene conversion events within a Flower. The average and standard deviation of several values in the 157 Flowers and 533 genes, experiencing gene conversion events in the human lineage.
| 157 Flowers | No. of events /Flower | Prop. of converted / | Prop. of gene regions in the converted region | No. of genes /Flower | No. of converted genes/Flower |
|---|---|---|---|---|---|
| Average | 21 | 0.25 | 0.49 | 5 | 3 |
| S.D. | 29 | 0.31 | 0.35 | 5 | 3 |
|
| |||||
| 533 Genes | No. of events/gene | Prop. of converted/gene | |||
|
| |||||
| Average | 6 | 0.21 | |||
| S.D. | 20 | 0.30 | |||