| Literature DB >> 20096123 |
Cinthya J Zepeda-Mendoza1, Tzitziki Lemus, Omar Yáñez, Delfino García, David Valle-García, Karla F Meza-Sosa, María Gutiérrez-Arcelus, Yamile Márquez-Ortiz, Rocío Domínguez-Vidaña, Claudia Gonzaga-Jauregui, Margarita Flores, Rafael Palacios.
Abstract
BACKGROUND: Identical sequences with a minimal length of about 300 base pairs (bp) have been involved in the generation of various meiotic/mitotic genomic rearrangements through non-allelic homologous recombination (NAHR) events. Genomic disorders and structural variation, together with gene remodelling processes have been associated with many of these rearrangements. Based on these observations, we identified and integrated all the 100% identical repeats of at least 300 bp in the NCBI version 36.2 human genome reference assembly into non-overlapping regions, thus defining the Identical Repeated Backbone (IRB) of the reference human genome.Entities:
Mesh:
Year: 2010 PMID: 20096123 PMCID: PMC2845111 DOI: 10.1186/1471-2164-11-60
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1General structure of complex ISTs. An example of the ISTs that integrate the IRB is shown. Each line represents an IC and it is drawn according to its position on the IST. The black line at the bottom represents the IST sequence; the remaining colours represent the distinct chromosomes where the ICs that compose this IST are located.
Figure 2IST densities of the human chromosomes. The total genome was divided in 1 Mb windows and the total number of bp that belonged to ISTs within the window was counted. All the chromosomes are represented in the figure. The numbers between parentheses represent the percentage of the chromosome that pertains to the IRB. Yellow colour represents an IST density from 1 bp to <1 Kb per Mb; green, from 1 Kb to <10 Kb; purple, from 10 Kb to <100 Kb and red from 100 Kb to 1 Mb. Most white spaces represent gaps in the reference human genome.
Masked bp in total genome and IRB
| Type of element | Bp in total genome (a) | Bp in IRB (a) |
|---|---|---|
| LINE | 602817717 (43.1) | 15825138 (47.7) |
| SINE | 391097725 (27.9) | 7358371 (22.2) |
| LTR | 253911472 (18.1) | 5707753 (17.2) |
| RNA | 1082672 (0.1) | 39790 (0.1) |
| Satellite | 11217284 (0.8) | 1614833 (4.9) |
| DNA transposons | 94488985 (6.8) | 1237710 (3.7) |
| Low Complexity | 40301995 (2.9) | 962052 (2.9) |
| Unknown | 4683496 (0.3) | 454254 (1.4) |
a = % of masked sequence. A comparison of the ratios of the major repeated elements in the IRB and the reference human genome is shown in parenthesis.
Figure 3Genes in the IRB. A) Scheme of the distinct cases that were found in analyzing the nature of the genes. Green colour means that all gene data are congruent between the set and Ensembl, blue colour represents function incongruence, red colour represents that at least one of the gene copies is not annotated in the Ensembl database, yellow colour means incongruence in size. B) Genes that were totally comprised by the IRB are shown in their respective chromosomes. Each bar represents a gene. The numbers between parentheses represent the sets of genes that are present in each chromosome; numbers that follow the comas are the names of the sets that have copies of their elements in other chromosomes.
In silico hybridization of IRB genes against Venter and Watson genomes
| Hit number | Ratios | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| # | Gene name | Type | Ensembl ID | Size | Ref | Vent | Wat | Vent/Ref | Wat/Ref | |
| 1 | AC006328.5 | Novel miRNA | ENSG00000221640 | 82 | 2 | 8 | 2 | 4.0 | 1.0 | |
| 2 | AC116165.7 | Novel miRNA | ENSG00000221000 | 82 | 4 | 17 | 10 | 4.3 | 2.5 | |
| 3 | AC123768.8 | Novel miRNA | ENSG00000221405 | 82 | 2 | 10 | 7 | 5.0 | 3.5 | |
| 4 | hsa-mir-1233 | Known miRNA | ENSG00000221065 | 82 | 2 | 5 | 2 | 2.5 | 1.0 | |
| 5 | AC119751.3 | Novel miRNA | ENSG00000221212 | 83 | 2 | 9 | 2 | 4.5 | 1.0 | |
| 6 | AC135995.7 | Novel miRNA | ENSG00000221095 | 83 | 3 | 3 | 3 | 1.0 | 1.0 | |
| 7 | AC136698.6 | Novel miRNA | ENSG00000221008 | 83 | 3 | 20 | 4 | 6.7 | 1.3 | |
| 8 | SCARNA18 | Novel misc_RNA | ENSG00000212253 | 83 | 3 | 13 | 14 | 4.3 | 4.7 | |
| 9 | hsa-mir-511-2 | Known miRNA | ENSG00000207937 | 87 | 2 | 6 | 2 | 3.0 | 1.0 | |
| 10 | hsa-mir-514-3 | Known miRNA | ENSG00000207866 | 88 | 2 | 5 | 6 | 2.5 | 3.0 | |
| 11 | SNORD103 | Known snoRNA | ENSG00000200154 | 91 | 2 | 17 | 9 | 8.5 | 4.5 | |
| 12 | AC147055.2 | Novel miRNA | ENSG00000212033 | 93 | 2 | 6 | 1 | 3.0 | 0.5 | |
| 13 | AC068704.4 | Novel miRNA | ENSG00000221682 | 95 | 2 | 10 | 1 | 5.0 | 0.5 | |
| 14 | Y_RNA | Novel misc_RNA | ENSG00000206706 | 98 | 4 | 19 | 10 | 4.8 | 2.5 | |
| 15 | U6 | Novel snRNA | ENSG00000201789 | 99 | 2 | 15 | 3 | 7.5 | 1.5 | |
| 16 | hsa-mir-1184 | Known miRNA | ENSG00000221190 | 99 | 3 | 6 | 3 | 2.0 | 1.0 | |
| 17 | Y_RNA | Known misc_RNA | ENSG00000201138 | 100 | 2 | 6 | 7 | 3.0 | 3.5 | |
| 18 | Y_RNA | Novel misc_RNA | ENSG00000199641 | 100 | 9 | 55 | 26 | 6.1 | 2.9 | |
| 19 | AC137056.3 | Novel mi_RNA | ENSG00000221119 | 102 | 2 | 10 | 6 | 5.0 | 3.0 | |
| 1 | 20 | AC019322.8 | Novel misc_RNA | ENSG00000200514 | 103 | 2 | 13 | 18 | 6.5 | 9.0 |
| 21 | U6 | Novel snRNA | ENSG00000206655 | 103 | 2 | 3 | 1 | 1.5 | 0.5 | |
| 22 | AC068020.7 | Novel miRNA | ENSG00000221027 | 104 | 2 | 13 | 3 | 6.5 | 1.5 | |
| 23 | AL031963.40 | Novel miRNA | ENSG00000221162 | 105 | 2 | 10 | 3 | 5.0 | 1.5 | |
| 24 | U6 | Novel snRNA | ENSG00000212612 | 107 | 3 | 7 | 4 | 2.3 | 1.3 | |
| 25 | U6 | Novel snRNA | ENSG00000212419 | 107 | 4 | 22 | 15 | 5.5 | 3.8 | |
| 26 | U6 | Novel snRNA | ENSG00000206804 | 107 | 2 | 17 | 6 | 8.5 | 3.0 | |
| 27 | U6 | Novel snRNA | ENSG00000206972 | 107 | 2 | 8 | 8 | 4.0 | 4.0 | |
| 28 | U6 | Novel snRNA | ENSG00000200493 | 107 | 3 | 14 | 6 | 4.7 | 2.0 | |
| 29 | BX842679.19 | Novel rRNA | ENSG00000191555 | 108 | 2 | 4 | 8 | 2.0 | 4.0 | |
| 30 | AL627230.15 | Novel misc_RNA | ENSG00000199432 | 110 | 4 | 22 | 6 | 5.5 | 1.5 | |
| 31 | 5S_rRNA | Novel rRNA | ENSG00000212154 | 112 | 2 | 2 | 4 | 1.0 | 2.0 | |
| 32 | 5S_rRNA | Novel rRNA | ENSG00000212173 | 113 | 2 | 9 | 2 | 4.5 | 1.0 | |
| 2 | 33 | 5S_rRNA | Novel rRNA | ENSG00000206584 | 116 | 2 | 4 | 0 | 2.0 | 0.0 |
| 34 | 5S_rRNA | Novel rRNA | ENSG00000200336 | 118 | 2 | 11 | 5 | 5.5 | 2.5 | |
| 1 | 35 | 5S_rRNA | Novel rRNA | ENSG00000199270 | 119 | 16 | 216 | 162 | 13.5 | 10.1 |
| 1 | 36 | 5S_rRNA | Novel rRNA | ENSG00000201925 | 119 | 16 | 216 | 162 | 13.5 | 10.1 |
| 37 | SNORA11D | Known snoRNA | ENSG00000221475 | 128 | 2 | 5 | 3 | 2.5 | 1.5 | |
| 38 | AC019322.8 | Novel snoRNA | ENSG00000206793 | 133 | 2 | 4 | 2 | 2.0 | 1.0 | |
| 39 | hsa-mir-1302-2 | Known miRNA | ENSG00000221661 | 138 | 4 | 17 | 10 | 4.3 | 2.5 | |
| 2 | 40 | AC006983.4 | Novel snoRNA | ENSG00000207143 | 139 | 2 | 6 | 0 | 3.0 | 0.0 |
| 41 | SCARNA17 | Novel misc_RNA | ENSG00000212286 | 143 | 2 | 8 | 2 | 4.0 | 1.0 | |
| 2 | 42 | U1 | Novel snRNA | ENSG00000207519 | 154 | 2 | 1 | 0 | 0.5 | 0.0 |
| 43 | U1 | Novel snRNA | ENSG00000206945 | 160 | 2 | 6 | 1 | 3.0 | 0.5 | |
| 44 | U1 | Novel snRNA | ENSG00000202064 | 161 | 2 | 9 | 4 | 4.5 | 2.0 | |
| 45 | U1 | Novel snRNA | ENSG00000207273 | 162 | 2 | 9 | 3 | 4.5 | 1.5 | |
| 2 | 46 | U1 | Novel snRNA | ENSG00000201183 | 162 | 2 | 3 | 0 | 1.5 | 0.0 |
| 47 | U1 | Known snRNA | ENSG00000207389 | 164 | 7 | 62 | 15 | 8.9 | 2.1 | |
| 48 | U1 | Novel snRNA | ENSG00000207226 | 164 | 2 | 5 | 1 | 2.5 | 0.5 | |
| 2 | 49 | U1 | Novel snRNA | ENSG00000206585 | 164 | 2 | 3 | 0 | 1.5 | 0.0 |
| 50 | U1 | Known snRNA | ENSG00000206588 | 164 | 7 | 62 | 15 | 8.9 | 2.1 | |
| 2 | 51 | U1 | Novel snRNA | ENSG00000206828 | 164 | 2 | 0 | 3 | 0.0 | 1.5 |
| 52 | U1 | Novel snRNA | ENSG00000201105 | 167 | 2 | 6 | 4 | 3.0 | 2.0 | |
52 X non-coding RNA genes within the IRB were compared against the raw sequence reads of the Venter and Watson genomes.
1 - Genes whose copy number is higher when compared to the NCBI assembly.
2 - Genes who had no hits on the Venter or Watson genomes.