| Literature DB >> 24195013 |
Kamal Rawal1, Sangey Dorji, Amit Kumar, Anwesha Ganguly, Ankit Singh Grewal.
Abstract
Recently published gorilla genome has offered an opportunity to study human evolution through variety of approaches. Mobile genetic elements (MGEs) insert non randomly in genome through mechanisms such as retrotransposition and may cause gene inactivation, transduction, regulation of gene expression and genome expansion. Here we report that majority of gorilla genome is occupied with MGEs (> 36%) with presence of LTRs and Non-LTRs such as Alus and L1s. Other types of MGEs such as MIRs, retrovirus like elements ERVs and DNA transposons are also found using repeatmasker and ELAN pipeline. The distribution is similar to Humans and Macaca genome. Using DNA Scanner we also scanned preinsertion loci for number of different properties such as DNA denaturation, energy measures, potential for protein interactions and sequence based features. We also predicted preinsertion loci with > 70% accuracy using a machine learning tool called insertion site finder (ISF) based upon support vector machines.Entities:
Keywords: Alu; L1; LINEs; SINEs; mobile genetic elements; physiochemical properties; primates; truncation points
Year: 2013 PMID: 24195013 PMCID: PMC3812790 DOI: 10.4161/mge.25675
Source DB: PubMed Journal: Mob Genet Elements ISSN: 2159-2543
Table 1. Summary of transposable elements in gorilla genome
| Chromosome no. | MIRs | LINE1 | LINE2 | L3/CR1 | ERVL | ERVLMaLRs | ERV | ERVclassII | hAT-Charle | TcMar-Tigger | Unclassified | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 91871 | 42291 | 41364 | 26891 | 3063 | 6697 | 15574 | 7983 | 634 | 12946 | 6139 | 520 |
| 2a | 38096 | 14943 | 22798 | 9258 | 1254 | 3280 | 8357 | 3955 | 226 | 7085 | 3040 | 186 |
| 2b | 39589 | 15791 | 26554 | 10686 | 1365 | 4257 | 10258 | 4611 | 275 | 8033 | 3654 | 232 |
| 3 | 62483 | 29994 | 41408 | 18504 | 2248 | 6694 | 15493 | 7207 | 514 | 14484 | 5416 | 373 |
| 4 | 51541 | 22609 | 38952 | 16426 | 1736 | 7357 | 16305 | 7961 | 599 | 9508 | 5568 | 293 |
| 5 | 70191 | 27423 | 31568 | 15842 | 1781 | 4479 | 11247 | 5062 | 356 | 10542 | 4175 | 338 |
| 6 | 54639 | 20082 | 34897 | 14374 | 1860 | 5504 | 12690 | 6361 | 452 | 10123 | 4806 | 323 |
| 7 | 61046 | 17277 | 31824 | 11823 | 1594 | 4680 | 11114 | 6226 | 414 | 9072 | 4198 | 589 |
| 8 | 45944 | 20456 | 29423 | 13493 | 1541 | 4846 | 11889 | 5737 | 435 | 7747 | 3919 | 242 |
| 9 | 42510 | 20001 | 24463 | 11053 | 1359 | 3539 | 8655 | 4025 | 303 | 7403 | 2904 | 225 |
| 10 | 51595 | 19025 | 27478 | 10958 | 1365 | 3819 | 10141 | 4913 | 340 | 8233 | 3591 | 258 |
| 11 | 43907 | 25484 | 25990 | 14785 | 1755 | 3915 | 9339 | 4390 | 334 | 7388 | 3282 | 265 |
| 12 | 53875 | 21217 | 25459 | 13997 | 1518 | 4089 | 10637 | 4804 | 331 | 8414 | 3459 | 266 |
| 13 | 27254 | 9549 | 20480 | 7705 | 926 | 3359 | 8027 | 4338 | 225 | 4959 | 2863 | 147 |
| 14 | 33196 | 12630 | 17438 | 7980 | 953 | 2928 | 7151 | 3302 | 231 | 5199 | 2318 | 185 |
| 15 | 33979 | 12510 | 16597 | 7128 | 980 | 2237 | 5182 | 2505 | 172 | 5494 | 2201 | 137 |
| 16 | 44337 | 15098 | 13939 | 7880 | 672 | 2332 | 6541 | 2677 | 197 | 6227 | 1531 | 140 |
| 17 | 31725 | 9704 | 18919 | 6994 | 870 | 3340 | 8203 | 3540 | 250 | 5521 | 2563 | 183 |
| 18 | 23161 | 8703 | 15599 | 5954 | 942 | 2465 | 5825 | 2909 | 142 | 4306 | 2097 | 129 |
| 19 | 47025 | 6755 | 8552 | 4802 | 153 | 1540 | 2963 | 2744 | 466 | 2526 | 881 | 168 |
| 20 | 25569 | 11830 | 11180 | 6897 | 511 | 2253 | 5114 | 1802 | 82 | 5287 | 1369 | 154 |
| 21 | 11510 | 2974 | 6822 | 2113 | 228 | 1396 | 3789 | 1586 | 72 | 1875 | 859 | 36 |
| 22 | 20426 | 7827 | 5319 | 4023 | 343 | 837 | 2214 | 1044 | 101 | 1797 | 687 | 76 |
| X | 42894 | 17909 | 45679 | 12042 | 1651 | 5092 | 12193 | 6750 | 459 | 9041 | 3650 | 233 |
A pictorial representation can be viewed in the Figure 1, which clearly shows the differential distribution of the various MGEs in the genome.

Figure 1. The chromosome wise representation of the distribution of MGEs in the gorilla genome.
Table 2. Chromosome wise summary of the SINE elements in gorilla genome
| Chromosome no. | Total length (bp) | SINEs | ||
|---|---|---|---|---|
| | | |||
| 1 | 229507203 | 28090071 | 134557 | 208.75 |
| 2a | 1113551968 | 11251200 | 53188 | 211.53 |
| 2b | 131632457 | 11899664 | 55606 | 213.99 |
| 3 | 199944510 | 19414804 | 92794 | 209.22 |
| 4 | 201139530 | 15782085 | 74411 | 212.09 |
| 5 | 165930986 | 20810185 | 97878 | 212.61 |
| 6 | 171703152 | 16075800 | 74977 | 214.40 |
| 7 | 158137892 | 16975146 | 78557 | 216.08 |
| 8 | 145327772 | 14118592 | 66630 | 211.89 |
| 9 | 121947112 | 13107115 | 62655 | 209.19 |
| 10 | 147764049 | 15126370 | 70834 | 213.54 |
| 11 | 133470886 | 14310615 | 69572 | 205.69 |
| 12 | 133360231 | 15996265 | 75292 | 212.45 |
| 13 | 97499607 | 7992543 | 36939 | 216.37 |
| 14 | 88974843 | 9721051 | 45987 | 211.38 |
| 15 | 82026568 | 9885170 | 46623 | 212.02 |
| 16 | 80971650 | 12725256 | 59528 | 213.76 |
| 17 | 94257108 | 9016411 | 41540 | 217.05 |
| 18 | 78787515 | 6820885 | 31990 | 213.21 |
| 19 | 56181278 | 11866327 | 53843 | 220.38 |
| 20 | 62603092 | 7807243 | 37473 | 208.34 |
| 21 | 35451371 | 3201567 | 14543 | 220.14 |
| 22 | 35671106 | 5969251 | 28278 | 211.09 |
| X | 154045127 | 13000221 | 60999 | 213.12 |
Table 3. Chromosome wise summary of the LINE elements in gorilla genome
| Chromosome no. | Total length (bp) | Length occupied | No. of elements | Average length of element |
|---|---|---|---|---|
| 1 | 229507203 | 35719713 | 71763 | 497.74 |
| 2a | 1113551968 | 17825260 | 33525 | 531.70 |
| 2b | 131632457 | 21599024 | 38868 | 555.70 |
| 3 | 199944510 | 33940485 | 62566 | 542.47 |
| 4 | 201139530 | 33070071 | 57467 | 575.46 |
| 5 | 165930986 | 24894380 | 49469 | 503.23 |
| 6 | 171703152 | 28474055 | 51529 | 552.58 |
| 7 | 158137892 | 24387235 | 45617 | 534.60 |
| 8 | 145327772 | 23886020 | 44708 | 534.26 |
| 9 | 121947112 | 18867313 | 37128 | 508.16 |
| 10 | 147764049 | 20895555 | 40065 | 521.54 |
| 11 | 133470886 | 21911049 | 42747 | 512.57 |
| 12 | 133360231 | 20811356 | 41225 | 504.82 |
| 13 | 97499607 | 15886349 | 29288 | 542.41 |
| 14 | 88974843 | 13905388 | 26549 | 523.76 |
| 15 | 82026568 | 12722443 | 24879 | 511.37 |
| 16 | 80971650 | 9370538 | 22615 | 414.35 |
| 17 | 94257108 | 14593897 | 26941 | 541.69 |
| 18 | 78787515 | 12068676 | 22639 | 533.09 |
| 19 | 56181278 | 5520135 | 13525 | 408.14 |
| 20 | 62603092 | 8267354 | 18683 | 442.50 |
| 21 | 35451371 | 4834686 | 9215 | 524.65 |
| 22 | 35671106 | 4145031 | 9728 | 426.09 |
| X | 154045127 | 39142498 | 59596 | 656.79 |
Table 4. Chromosome wise summary of the LTR elements in gorilla genome
| Chromosome no. | Total length (bp) | LTR elements | ||
|---|---|---|---|---|
| | | |||
| 1 | 229507203 | 16331504 | 31734 | 514.63 |
| 2a | 1113551968 | 7938316 | 16285 | 487.46 |
| 2b | 131632457 | 9873134 | 19928 | 495.44 |
| 3 | 199944510 | 15861956 | 30683 | 516.96 |
| 4 | 201139530 | 17614113 | 32882 | 535.67 |
| 5 | 165930986 | 10618919 | 21725 | 488.78 |
| 6 | 171703152 | 13456407 | 25647 | 524.67 |
| 7 | 158137892 | 11400774 | 22995 | 495.79 |
| 8 | 145327772 | 11703956 | 23430 | 499.52 |
| 9 | 121947112 | 8137837 | 16944 | 480.27 |
| 10 | 147764049 | 9403542 | 19649 | 478.57 |
| 11 | 133470886 | 9697311 | 18429 | 526.19 |
| 12 | 133360231 | 10222587 | 20364 | 501.99 |
| 13 | 97499607 | 8318528 | 16279 | 510.99 |
| 14 | 88974843 | 7087817 | 13954 | 507.94 |
| 15 | 82026568 | 4878339 | 10401 | 469.02 |
| 16 | 80971650 | 4955225 | 11971 | 413.93 |
| 17 | 94257108 | 7798515 | 15689 | 497.06 |
| 18 | 78787515 | 5736835 | 11595 | 494.76 |
| 19 | 56181278 | 3859077 | 7768 | 496.79 |
| 20 | 62603092 | 3862835 | 9450 | 408.76 |
| 21 | 35451371 | 3322579 | 6960 | 477.38 |
| 22 | 35671106 | 1814702 | 4256 | 426.38 |
| X | 154045127 | 14856258 | 24978 | 594.77 |
Table 5. Summary of MGEs in gorilla genome
| TE superfamily | Counts (copy no.) | Length (bp) | % of sequence covered |
|---|---|---|---|
| Non-LTR | | ||
| 1048363 | 248964770 | 8.53 | |
| MIRs | 412082 | 61542196 | 2.11 |
| 582702 | 387476383 | 13.28 | |
| LINE2 | 261608 | 71320928 | 2.44 |
| L3/CR1 | 42953 | 6666746 | 0.22 |
| LTR elements | | ||
| ERVL | 90935 | 44219203 | 1.51 |
| ERVL-MaLRs | 218901 | 93556837 | 3.20 |
| ERV_classI | 106432 | 71144203 | 2.43 |
| ERV_classII | 7610 | 7508372 | 0.26 |
| DNA elements | | ||
| hAT-Charlie | 173210 | 34481719 | 1.18 |
| TcMar-Tigger | 75170 | 30557703 | 1.04 |
| Unclassified | 5698 | 2800558 | 0.09 |
| Total | 3025664 | 1060239618 | |

Figure 2. The MGE content comparision between chromososme 1 and X of gorilla genome.
Table 6. DNA SCANNER output of gorilla chromosome 22 showing position and parameter values of A-rule
| Position | Parameter value |
|---|---|
| 0 | 0.275602587 |
| 1 | 0.275720165 |
| 2 | 0.277954145 |
| 3 | 0.279188713 |
| 4 | 0.278306878 |
| 5 | 0.279835391 |
| 6 | 0.282304527 |
| 7 | 0.282716049 |
| 8 | 0.28265726 |
| 9 | 0.282892416 |
| 10 | 0.282951205 |
| 11 | 0.28265726 |
| 12 | 0.283656673 |
| 13 | 0.283891828 |
| 14 | 0.28212816 |
| 15 | 0.282480894 |
| 16 | 0.28377425 |
| 17 | 0.284538507 |
| 18 | 0.284009406 |
| 19 | 0.283127572 |
| 20 | 0.282951205 |
| 21 | 0.284068195 |
| 22 | 0.287360376 |
| 23 | 0.289065256 |
| 24 | 0.288359788 |
| 25 | 0.289535567 |
| 26 | 0.291534392 |
| 27 | 0.292004703 |
| 28 | 0.291828336 |
| 29 | 0.292651382 |
| 30 | 0.29159318 |
| 31 | 0.289300412 |
| 32 | 0.288594944 |
| 33 | 0.28712522 |
| 34 | 0.285008818 |
| 35 | 0.283715461 |
| 36 | 0.281951793 |
| 37 | 0.281128748 |
| 38 | 0.281422693 |
| 39 | 0.281599059 |
| 40 | 0.281834215 |
| 41 | 0.282774838 |
| 42 | 0.28547913 |
| 43 | 0.288536155 |
| 44 | 0.290299824 |
| 45 | 0.292357437 |
| 46 | 0.294532628 |
| 47 | 0.293592005 |
| 48 | 0.292239859 |
| 49 | 0.293004115 |
| 50 | 0.295238095 |
| 51 | 0.295884774 |
| 52 | 0.293180482 |
| 53 | 0.291651969 |
| 54 | 0.293415638 |
| 55 | 0.296061141 |
| 56 | 0.298353909 |
| 57 | 0.300470312 |
| 58 | 0.301528513 |
| 59 | 0.300058789 |
| 60 | 0.29888301 |
| 61 | 0.300587889 |
| 62 | 0.303821282 |
| 63 | 0.304644327 |
| 64 | 0.302880658 |
| 65 | 0.303527337 |
| 66 | 0.306819518 |
| 67 | 0.312698413 |
| 68 | 0.321105232 |
| 69 | 0.328747795 |
| 70 | 0.335390947 |
| 71 | 0.340270429 |
| 72 | 0.345679012 |
| 73 | 0.353497942 |
| 74 | 0.362081129 |
| 75 | 0.371369782 |
| 76 | 0.378542034 |
| 77 | 0.385067607 |
| 78 | 0.392945326 |
| 79 | 0.398059965 |
| 80 | 0.400352734 |
| 81 | 0.400764256 |
| 82 | 0.396413874 |
| 83 | 0.38489124 |
| 84 | 0.367430923 |
| 85 | 0.351440329 |
| 86 | 0.333803645 |
| 87 | 0.310229277 |
| 88 | 0.287536743 |
| 89 | 0.268077601 |
| 90 | 0.254323021 |

Figure 3. Various signals upstream of the insertion sites of Alu in chromosome 22. The y axis represents value of the property and the x-axis gives the relative position with respect to the insertion site.
Table 7. Performance of ISF in gorilla chromosome 21 and 22 for Alu element
| Chromosome | Linear kernel |
|---|---|
| 21 | 0.7779 |
| 22 | 0.7663 |

Figure 4. Accuracy (AC) is the proportion of the total number of predictions that were correct: AC = (a + d) / (a + b + c + d). Recall is the proportion of positive cases that were correctly identified: R = d / (c + d). Precision is the proportion of the predicted positive cases that were correct: p = d / (b + d). Sensitivity is the ability of the system to identify actual positives: Sn = TP/TP + FN. Specificity is the ability of the system to reject negative examples: Sp = TN/FP+TN.

Figure 5. The boxes are plotted against horizontal representations of the input sequences with the reference sequence on top (human). The size of each box is determined by the start and stop positions in the sub-alignment. The shading of the boxes and connector line are scaled according to the sub-alignment score where solid black represents the highest score obtained, light gray the lowest. Lastly the color of the connecting line is used to indicate the sub-alignment orientation, black for +/+, red for +/−. Where windows overlap, those with the highest score are displayed on top. The dark pink lines represent Alu elements whereas parrot green represent L1 elements. The white portions in the beginning and in the last section of human (top) represent undetermined sequences (NNNNN etc). This section represents position numbers 1…250000 bp of human and gorilla chromosome 1. The GFF files were generated for Alus and L1s. The additional figures in the document show subsequent sections of chromosome 1.