| Literature DB >> 24432193 |
Arun S Seetharam1, Gary W Stuart2.
Abstract
Type IIB restriction endonucleases are site-specific endonucleases that cut both strands of double-stranded DNA upstream and downstream of their recognition sequences. These restriction enzymes have recognition sequences that are generally interrupted and range from 5 to 7 bases long. They produce DNA fragments which are uniformly small, ranging from 21 to 33 base pairs in length (without cohesive ends). The fragments are generated from throughout the entire length of a genomic DNA providing an excellent fractional representation of the genome. In this study we simulated restriction enzyme digestions on 21 sequenced genomes of various Drosophila species using the predicted targets of 16 Type IIB restriction enzymes to effectively produce a large and arbitrary selection of loci from these genomes. The fragments were then used to compare organisms and to calculate the distance between genomes in pair-wise combination by counting the number of shared fragments between the two genomes. Phylogenetic trees were then generated for each enzyme using this distance measure and the consensus was calculated. The consensus tree obtained agrees well with the currently accepted tree for the Drosophila species. We conclude that multi-locus sub-genomic representation combined with next generation sequencing, especially for individuals and species without previous genome characterization, can accelerate studies of comparative genomics and the building of accurate phylogenetic trees.Entities:
Keywords: Phylogenomics; Reduced genomic representation; Restriction-site associated DNA (RAD) tags; Type IIB restriction enzymes
Year: 2013 PMID: 24432193 PMCID: PMC3883493 DOI: 10.7717/peerj.226
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Workflow of the entire process of generating phylogeny from the Type IIB fragments.
Various Drosophila species and source databases used for the analysis. The GC% for each genome was calculated using infoseq from the EMBOSS package.
| Genome | GC% | Size | Source |
|---|---|---|---|
|
| 42.56 | 230.99 mb | FlyBase |
|
| 41.82 | 168.58 mb | NCBI |
|
| 41.62 | 166.39 mb | NCBI |
|
| 40.31 | 170.51 mb | NCBI |
|
| 42.65 | 152.71 mb | FlyBase |
|
| 40.90 | 156.31 mb | NCBI |
|
| 41.93 | 151.04 mb | NCBI |
|
| 38.84 | 200.46 mb | FlyBase |
|
| 41.38 | 163.57 mb | NCBI |
|
| 42.05 | 168.73 mb | FlyBase |
|
| 40.22 | 193.82 mb | FlyBase |
|
| 45.29 | 188.37 mb | FlyBase |
|
| 45.43 | 152.73 mb | FlyBase |
|
| 40.07 | 193.90 mb | NCBI |
|
| 38.52 | 165.75 mb | Princeton University |
|
| 42.53 | 166.57 mb | FlyBase |
|
| 43.06 | 137.82 mb | FlyBase |
|
| 40.01 | 181.00 mb | NCBI |
|
| 40.80 | 206.02 mb | FlyBase |
|
| 37.89 | 235.51 mb | FlyBase |
|
| 42.43 | 165.69 mb | FlyBase |
List of enzymes used for the fragment generation from the 21 Drosophila species.
Frequency indicates estimated distance between cut sites given a random sequence with all the 4 bases in equal probability and length refers to blunt tag length.
| Enzyme | Recognition sequence | Frequency | Length |
|---|---|---|---|
|
| GCANNNNNNTGC | 4096 | 32 |
|
| GAACNNNNNNTCC | 8192 | 27 |
|
| ACNNNNGTAYC | 4096 | 28 |
|
| CGANNNNNNTGC | 2048 | 32 |
|
| GAGNNNNNCTC | 4096 | 27 |
|
| ACNNNNNCTCC | 2048 | 27 |
|
| GGGAC | 512 | 21 |
|
| GACNNNNNNTGG | 2048 | 27 |
|
| CAANNNNNGTGG | 8192 | 33 |
|
| AAGNNNNNCTT | 4096 | 27 |
|
| GAYNNNNNRTC | 1024 | 27 |
|
| GAACNNNNNCTC | 8192 | 27 |
|
| GAACNNNNNNTAC | 8192 | 27 |
Total number of fragments generated using 13 different Type IIB restriction enzymes for each of the 21 Drosophila genomes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 34804 | 11421 | 6151 | 51646 | 21457 | 52433 | 101183 | 46042 | 16405 | 38109 | 74174 | 11193 | 8344 |
|
| 41242 | 12667 | 6875 | 63518 | 22752 | 51248 | 109404 | 44554 | 18178 | 41284 | 75291 | 12177 | 10210 |
|
| 35642 | 10893 | 6616 | 51208 | 20363 | 50001 | 98937 | 45563 | 17131 | 39286 | 73197 | 10545 | 8622 |
|
| 43207 | 11314 | 6068 | 59905 | 18764 | 45496 | 93763 | 43259 | 18466 | 41866 | 75238 | 11027 | 9753 |
|
| 42781 | 10517 | 5914 | 60434 | 18119 | 43684 | 85735 | 40020 | 17793 | 31931 | 66412 | 9979 | 8677 |
|
| 36455 | 10170 | 5699 | 51988 | 18236 | 43177 | 86365 | 42020 | 17568 | 40795 | 72398 | 9682 | 8335 |
|
| 38374 | 11698 | 5338 | 60448 | 20161 | 47056 | 89928 | 39223 | 17489 | 37380 | 69222 | 11070 | 8868 |
|
| 49667 | 5891 | 5212 | 61420 | 17341 | 30379 | 58175 | 35658 | 16642 | 34409 | 64560 | 8062 | 6977 |
|
| 39192 | 10361 | 5516 | 54698 | 21908 | 50258 | 99784 | 44066 | 16846 | 40965 | 68593 | 10765 | 8126 |
|
| 39711 | 9908 | 6037 | 59203 | 16840 | 41168 | 81877 | 39221 | 17651 | 31350 | 68204 | 9243 | 8303 |
|
| 54782 | 6294 | 5234 | 64186 | 21048 | 33289 | 60708 | 36674 | 14774 | 33071 | 65210 | 9090 | 8012 |
|
| 43327 | 10706 | 7567 | 59923 | 25287 | 53206 | 113002 | 48862 | 16329 | 31779 | 76473 | 12267 | 8940 |
|
| 43650 | 10461 | 7466 | 60237 | 25174 | 53269 | 111423 | 48990 | 16358 | 31417 | 74808 | 12175 | 8774 |
|
| 36920 | 10920 | 6177 | 56203 | 18139 | 44894 | 93524 | 41357 | 17133 | 40153 | 76711 | 10442 | 9247 |
|
| 40344 | 9877 | 5957 | 56771 | 17044 | 41850 | 80010 | 38107 | 17037 | 32142 | 67070 | 9414 | 8378 |
|
| 39876 | 10371 | 5808 | 59204 | 17430 | 42659 | 83936 | 39380 | 17276 | 31541 | 68359 | 9792 | 8289 |
|
| 38549 | 9815 | 5547 | 56820 | 16777 | 40735 | 79826 | 37436 | 16666 | 30304 | 64321 | 9148 | 7773 |
|
| 37489 | 11463 | 5431 | 58887 | 19189 | 45240 | 91825 | 39992 | 26269 | 37277 | 74002 | 10801 | 8987 |
|
| 58785 | 6943 | 5774 | 64912 | 18097 | 31951 | 66710 | 38679 | 15733 | 37692 | 65275 | 9290 | 8551 |
|
| 34033 | 7083 | 6177 | 43299 | 15103 | 35578 | 70085 | 39996 | 17240 | 42202 | 77102 | 7941 | 9626 |
|
| 42202 | 10300 | 6165 | 59442 | 17885 | 43748 | 83095 | 39920 | 18007 | 33024 | 69632 | 9887 | 8765 |
Figure 2The consensus phylogenetic tree obtained by combining the trees obtained for each of the 13 enzymes.
The phylogenetic tree for each enzyme was calculated by extracting the corresponding fragments and then counting the number of shared fragment between every pair of species. The upper branch support values represent the percentage agreement over 13 enzymes and the bottom values indicate number of enzymes out of total 13 enzymes supporting the branch.
Figure 3Single enzyme tree (AloI enzyme) showing the branch length.