| Literature DB >> 19171065 |
Natalia Volfovsky1, Taras K Oleksyk, Kristine C Cruz, Ann L Truelove, Robert M Stephens, Michael W Smith.
Abstract
BACKGROUND: Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.Entities:
Mesh:
Year: 2009 PMID: 19171065 PMCID: PMC2654908 DOI: 10.1186/1471-2164-10-51
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Indels Observed in the Human and Chimpanzee Chromosome 22 Comparison
| SINE/Alu | 355,124 | (1.07) | 1593 | |
| Line/L1 | 138,880 | (0.4) | 247 | |
| SINE/Mir | 18,767 | (0.05) | 69 | |
| LTR/ERV1 | 60,003 | (0.18) | 232 | |
| Other | 67,359 | (0.2) | 136 | |
| Insertion | 34,868 | (0.11) | 746 | |
| Deletion | 47,793 | (0.14) | 683 | |
A total of 6,278 indels of size ≥ 10 bp were seen in comparison of human and chimpanzee chromosomes 22. Most of these were either in Short Tandem Repeats (STRs) or Known Repetitive DNAs, where the major types are shown. The remaining indels (0.25%) were classified as insertions or deletions relative to the human sequence. Another 0.25% of the indels were 2–9 bp (19,932, totalling 76,486 bp). In addition, 346,771 Single Nucleotide Differences were seen in the UCSC alignment totalling 1.0% of the bases compared.
Figure 1Classification of Indels into Core Types, Based on the Flanking Sequence. The indels are classified into 3 core types based on their similarity to the sequences in the 5 Kb flanking regions. The unique type is formed by the indels with no similarity to the flanking regions. The indels with at least one exact copy of indel sequence in the flanking regions define the exact type; and the approximate type includes indels with only partial (sub-repeats of indel sequence) or complex (combination of indels sub-repeats) copies of indel in the flanking region. Sub-repeats (length of ≥ 10 bp) are shown in frames, colors are used to designate between different core types: unique (blue), exact (red) and approximate (green).
Figure 2Distribution of Indels Among Different Genome and Core Indel Classes. Frequency of the three core indel classes: approximate, exact, and unique, are contrasted in the observed and resampled datasets (orange bars) There is an excess of observed approximate and exact indels, and a shortage of unique indels compared to the expected values for chromosome 22 (LR χ2, d.f. = 2, χ2 = 916, p < 0.0001). Colours within the bars representing observed data indicate the relative frequency of the three genome classes (chromosome-multiple, chromosome-unique, and genome-unique). The distribution of core indel types among the genome classes is not random with majority represented by genome-unique indels (LR χ2, d.f. = 4, χ2 = 23.28, p = 0.0001, Table S2.1A) (see Additional file 1)).
Figure 3Distribution of Indels Classified by Their Location Relative to Gene Elements. Indels were distributed unequally across the genome with most of them present within the introns and the intergenic regions (lower panel). There were significantly fewer observed indels within functional elements group than expected (LR χ2, d.f. = 4, χ2 = 27.63, p < 0.0001). The chart in the upper panel represents distribution of indels within the functional element category classified further by specific functional region. Promoter, coding regions and splice sites contain many fewer observed indels than expected (LR χ2, d.f. = 4, χ2 = 46.82, p < 0.0001).
Figure 4Distribution of Indel Length Among the Three Core Classes. (A) Approximate indels have the largest length, followed by exact, and then unique (Table S2.1A, p < .0001). (B) Approximate and unique indels are shorter than expected (Table S1.1A, p < .0001 (see Additional File 1)). Distribution of exact indels in both here and in Fig. S3 appears jagged due to the lower sample size (n = 168) in this class compared to the other two: approximate and unique.
Laboratory results of the insertions/deletions that are predicted to have an effect on coding regions of genes
| 24 | truncation from 1020 to 753 aas | yes | unknown | |
| 21 | 7 aa insert | unknown | - | |
| 68 | truncation from 125 to 113 aas | yes | Del | |
| 65 | truncated from 717 to 68 aas | yes | Del | |
| 515 | truncation from 1960 to 1893 aas | unknown | - | |
| 74 | truncates from 1235 to 1220 aas | yes | Ins | |
| 36 | undetermined | yes | unknown | |
| 250 | truncates from 849 to 734 aas | different ins | unknown | |
| 18 | truncated from 1020 to 252 aas | unknown | - | |
| 13 | substitution in protein | unknown | - | |
| Various* | 10–148 | splice site mismatch with no impact |
With the use of GENSCAN, the 23 indels found in coding exons or in splice site regions of the human REFseq genes were analyzed for their impact to these genes. Out of the 23, 10 of these indels impact genes through truncation/insertion of amino acids or substitution within a protein. The other 13 indels had no affect on the proteins generated from genes DGCR8, ZDHHC8, BRD1, SELO, SBF1, CHKB, PCQAP, BCR, PRODH, RUTBC3, CYP2D7P1, &ACSIN2. In order to confirm the presence of an insertion or a deletion, the loci and its harboring regions were sequenced in each of the species. Assembly confirmation is dependent on whether the indel product is completely present (all of the bases) in the corresponding species predicted. For gene FLJ44385, the indel was present in its entirety. The assembly was also confirmed for genes SMC1L2 and FLJ41993 with all of the bases of the indel present, however additional sequence was also detected. The inferred ancestral state of each indel is based on whether a species has the insertion or deletion of that locus and the species' position on the phylogenetic tree.
* DGCR8, ACSIN2, BCR, BRD1, CHKB, CYP2D7P1, DGCR8, PCQAP, PRODH, RUTBC3, SBF1, SELO, &ZDHHC8
**Indel verified and additional sequence is present
Figure 5Laboratory results of the insertions/deletions that are predicted to have an effect on coding regions of genes. 23 indels found in coding exons or in splice site regions of the human REFseq gene. The inferred ancestral state of each indel was based on whether a species has the insertion or deletion of that locus and the species' position on the phylogenetic tree.