| Literature DB >> 27645252 |
Lev I Rubanov1, Alexandr V Seliverstov2, Oleg A Zverkov2, Vassily A Lyubetsky2.
Abstract
BACKGROUND: Perfectly or highly conserved DNA elements were found in vertebrates, invertebrates, and plants by various methods. However, little is known about such elements in protists. The evolutionary distance between apicomplexans can be very high, in particular, due to the positive selection pressure on them. This complicates the identification of highly conserved elements in alveolates, which is overcome by the proposed algorithm.Entities:
Keywords: Alveolates; Apicomplexan parasites; Dense subgraph; Highly conserved element; Phylogeny; Ultraconserved element
Mesh:
Substances:
Year: 2016 PMID: 27645252 PMCID: PMC5028923 DOI: 10.1186/s12859-016-1257-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
All used species and their genome accession numbers
| Organism | Source | Accession |
|---|---|---|
| Coccidia (apicomplexans) | ||
|
| GenBank | GCA_000769155.1 |
|
| EuPathDB | ToxoDB 26 |
|
| GenBank | GCA_000258005.2 |
|
| GenBank | GCA_000208865.2 |
|
| GenBank | GCA_000727475.1 |
|
| GenBank | GCA_000006565.2 |
| Plasmodium (apicomplexans) | ||
|
| EuPathDB | PlasmoDB 25 |
|
| GenBank | GCA_000003075.2 |
|
| GenBank | GCA_000002765.1 |
|
| EuPathDB | PlasmoDB 25 |
| Piroplasmida (apicomplexans) | ||
|
| GenBank | GCA_000165395.1 |
|
| GenBank | GCA_000691945.1 |
|
| GenBank | GCA_000003225.1 |
|
| GenBank | GCA_000342415.1 |
|
| GenBank | GCA_000740895.1 |
|
| GenBank | GCA_000165365.1 |
| Cryptosporidium (apicomplexans) | ||
|
| EuPathDB | CryptoDB 26 |
|
| GenBank | GCA_000006425.2 |
|
| EuPathDB | CryptoDB 26 |
|
| GenBank | GCA_000006515.1 |
|
| GenBank | GCA_000165345.1 |
| Other apicomplexans | ||
|
| GenBank | GCA_000223845.4 |
|
| GenBank | GCA_000172235.1 |
| Chromerida (alveolata) | ||
|
| EuPathDB | CryptoDB 26 |
|
| GenBank | GCA_001179505.1 |
| Perkinsida (alveolata) | ||
|
| GenBank | GCA_000006405.1 |
| Ciliophora (ciliates) | ||
|
| GenBank | GCA_000189635.1 |
|
| GenBank | GCA_000165425.1 |
|
| GenBank | GCA_000220395.1 |
|
| GenBank | GCA_000325865.2 |
Fig. 1Stages and algorithms of the method
Fig. 2The flowchart of Algorithm 1
The number of repeating keys of length k in the long sequence of the complete genome of Sarcocystis neurona
| The number of occurrences of the key |
|
|
|
|
|---|---|---|---|---|
| 1 | 94919216 | 119034327 | 121193592 | 121823700 |
| 2 | 5284353 | 764077 | 401812 | 242260 |
| 3, 4 | 1438861 | 190900 | 65945 | 24494 |
| 5–8 | 486137 | 60275 | 15503 | 4875 |
| 9–16 | 183799 | 20869 | 4129 | 1116 |
| 17–32 | 72581 | 7429 | 1202 | 194 |
| 33–64 | 29861 | 2996 | 437 | 155 |
| 65–128 | 11196 | 1217 | 208 | 85 |
| 129–256 | 4986 | 498 | 86 | 41 |
| 257–512 | 1776 | 148 | 9 | 3 |
| 513–1024 | 739 | 67 | 8 | 6 |
| >1024 | 447 | 90 | 3 | 1 |
| The number of different keys | 102433952 | 120082826 | 121682934 | 122096930 |
| The mean number of occurrences | 1.21408 | 1.03559 | 1.02191 | 1.01833 |
Fig. 3The compaction of the source graph by Algorithm 2: a three word pairs identified in three sequences and the corresponding edges of the source graph; b union of words at new vertices, the intersections are marked by a darker color; edges x and y merged into edge z
Fig. 4The flowchart of Algorithm 3
Predicted HCEs
| HCE type (label) | Count | Description |
|---|---|---|
| protein | 8 988 | A protein according to the GenBank annotation |
| tRNA | 26 | A transfer RNA |
| tRNA-Sec | 1 | Selenocysteine transfer RNA |
| LSU_rRNA | 15 | Large subunit ribosomal RNA |
| SSU_rRNA | 5 | Small subunit ribosomal RNA |
| 5_8S_rRNA | 1 | 5.8S ribosomal RNA |
| U1 | 1 | U1 spliceosomal RNA |
| U2 | 1 | U2 spliceosomal RNA |
| ACEA_U3 | 1 | ACEA small nucleolar RNA U3 |
| U4 | 1 | U4 spliceosomal RNA |
| U5 | 1 | U5 spliceosomal RNA |
| U6 | 3 | U6 spliceosomal RNA |
| Protozoa_SRP | 1 | Protozoan signal recognition particle RNA (aka 7SL, 6S, 4.5S) |
| RNaseP_nuc | 2 | Nuclear ribonuclease P (RNase P) |
| snoR07 | 1 | Small nucleolar RNA snoR07 |
| snoR10 | 1 | Small nucleolar RNA snoR10 |
| SNORD36 | 1 | Small nucleolar RNA SNORD36 |
| intron | 163 | Non-coding region of a gene |
| unknown UCE | 706 | Not gene nor RNA predicted |
| Total | 9919 |
Fig. 5The tree predicted for 30 Alveolata species using their HCEs identified by our algorithm