| Literature DB >> 30453896 |
Mattia Di Gangi1,2, Giosuè Lo Bosco3,4, Riccardo Rizzo5.
Abstract
BACKGROUND: Nucleosomes are DNA-histone complex, each wrapping about 150 pairs of double-stranded DNA. Their function is fundamental for one of the primary functions of Chromatin i.e. packing the DNA into the nucleus of the Eukaryote cells. Several biological studies have shown that the nucleosome positioning influences the regulation of cell type-specific gene activities. Moreover, computational studies have shown evidence of sequence specificity concerning the DNA fragment wrapped into nucleosomes, clearly underlined by the organization of particular DNA substrings. As the main consequence, the identification of nucleosomes on a genomic scale has been successfully performed by computational methods using a sequence features representation.Entities:
Keywords: Deep learning networks; Epigenetic; Nucleosome classification; Recurrent neural networks
Mesh:
Substances:
Year: 2018 PMID: 30453896 PMCID: PMC6245688 DOI: 10.1186/s12859-018-2386-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The deep neural network architecture of the classifier
The distribution of samples in the first group of dataset
| First group | |||
|---|---|---|---|
| HM | DM | CE | |
| N | 2273 | 2900 | 2567 |
| L | 2300 | 2850 | 2608 |
| T | 4573 | 5750 | 5175 |
HM indicates the group of Human sequences; DM indicates the group of Drosophyla sequences; CE indicates the group of C. Elegans sequences. The row label N indicates the nucleosome sequences; the row label L indicates the linker sequences and T inidicates the total number of sequences
The distribution of samples in the second group of dataset
| Second group | ||||||||
|---|---|---|---|---|---|---|---|---|
| HM | DM | YS | ||||||
| LC | PM | 5U | LC | PM | 5U | WG | PM | |
| N | 97209 | 56404 | 11769 | 46054 | 48251 | 4669 | 39661 | 27373 |
| L | 65563 | 44639 | 4880 | 30458 | 28763 | 2704 | 4824 | 4463 |
| T | 162772 | 101043 | 16649 | 76512 | 77014 | 7373 | 44485 | 31836 |
The meaning of the row labels is the same of the Table 1; The label YS indicates the Yeast group of sequences; WC indicates the whole genome, LC indicates the largest chromosome, PM indicates the Propoter sequences; 5U indicates the sequences from the 5UTR exon region
Training/Validation/Test split for the second group of datasets
| Dataset | Total samples | #Training | #Validation | #Test |
|---|---|---|---|---|
| HM-LC | 162772 | 81298 | 1000 | 80474 |
| HM-PM | 101043 | 51321 | 1000 | 48722 |
| HM-5U | 16649 | 10654 | 1000 | 4995 |
| DM-LC | 76512 | 55512 | 1000 | 20000 |
| DM-PM | 77014 | 68312 | 1000 | 7702 |
| DM-5U | 7373 | 3687 | 737 | 2949 |
| YS-WG | 44485 | 39485 | 1000 | 4000 |
| YS-PM | 31836 | 27836 | 1000 | 3000 |
10-fold cross validation performances on the first group of dataset
| Method(Species) | Accuracy | Sensitivity | Specificity | |||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| iNuc-PseKNC(CE) | 86.90 | x | 90.30 | x | 83.55 | x |
| iNuc-PseKNC(DM) | 79.97 | x | 78,31 | x | 81.65 | x |
| iNuc-PseKNC(HM) |
| x | 87,86 | x |
| x |
| DLNN-3(CE) |
| 0.8 |
| 1.27 |
| 2,13 |
| DLNN-3(DM) |
| 1.13 |
| 2.55 |
| 2.65 |
| DLNN-3(HM) | 84.65 | 2.16 |
| 2.83 | 79.64 | 4.29 |
| DLNN-5(CE) |
| 2.45 |
| 3.68 |
| 5.54 |
| DLNN-5(DM) |
| 0.75 |
| 2.79 |
| 2.74 |
| LNN-5(HM) | 85.37 | 1.91 |
| 1,82 | 82.29 | 4.86 |
iNuc-PseKNC refers to the method introduced in [18]; CE, DM, HM refers to the datasets descried in Table 1; DLNN refers to the DLNN proposed in this paper and -3 or -5 refers to the kernel dimension in the first convolutional layer of the net. Best values are in bold
Area under the ROC curve performances on the first group of dataset
| N_score [ | NuPop [ | NucEnergEN [ | Segal [ | Field [ | Kaplan [ | Heijden [ | Finestr [ | DLNN-3 | DLNN-5 | |
|---|---|---|---|---|---|---|---|---|---|---|
| HM-LC | ∼0.65 | (0.6,0.65) | (0.6,0.65) | (0.35,0.4) | ∼0.65 | ∼0.65 | ∼0.6 | (0.6, 0.65) |
|
|
| DM-LC | 0.59 | (0.65,0.70) | ∼0.7 | 0.33 | (0.70,0.75) |
| (0.65,0.70) | 0.57 |
|
|
| YS-WG | 0.77 | 0.74 | (0.65,0.70) | 0.49 | 0.77 | ∼0.7 | ∼0.65 | ∼0.7 |
|
|
| HM-PM | (0.6,0.65) | 0.67 | (0.6,0.65) | (0.4,0.45) | (0.6,0.65) | ∼0.6 | ∼0.55 | ∼0.55 |
|
|
| DM-PM | 0.62 | ∼0.7 | (0.70,0.75) | 0.32 |
|
| (0.55,0.6) | (0.5,0.55) |
|
|
| YS-PM | 0.70 | 0.74 | (0.7,0.75) | 0.52 | 0.79 | (0.7,0.75) | ∼0.65 | ∼0.7 | 0.73 |
|
| HM-5U | (0.55,0.6) | ∼0.65 |
| 0.37 | ∼0.65 | ∼0.65 | ∼0.6 | (0.55,0.6) | 0.67 | 0.68 |
| DM-5U | 0.54 | (0.6,0.65) | (0.65,0.70) | 0.38 |
|
| (0.55,0.6) | ∼0.5 |
|
|
Each column refers to a computational method for nucleosome positioning. The rst eight column show the values reported in the paper by Liu et al. [47], sometimes with approximate values (interval range or close to symbol”). Last two columns regard our proposed method. Best values are in bold