| Literature DB >> 28185547 |
Danilo Pellin1,2, Clelia Di Serio3.
Abstract
BACKGROUND: In biomedical research a relevant issue is to identify time intervals or portions of a n-dimensional support where a particular event of interest is more likely to occur than expected. Algorithms that require to specify a-priori number/dimension/length of clusters assumed for the data suffer from a high degree of arbitrariness whenever no precise information are available, and this may strongly affect final estimation on parameters. Within this framework, spatial scan-statistics have been proposed in the literature, representing a valid non-parametric alternative.Entities:
Keywords: Binary genomic data; Cluster identification; Scan statistics; Viral integration sites
Mesh:
Substances:
Year: 2016 PMID: 28185547 PMCID: PMC5046198 DOI: 10.1186/s12859-016-1173-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schema of the relative scan statistics. Two data sets of Bernoulli trials are represented on an hypothetical small portion of a chromosome. Dark blue and red circle: genomic coordinate in which events (IS) was observed respectively for DataSet 1 and DataSet 2. Light blue and orange circle: genomic coordinates technically investigable but no-event (no integrations retrieved). Grey circle: blind region of the genome. Transparent area: example of moving windows of variable size regarding first three IS on the left
List of first 10 clusters identified in HIV data by scan statistics
| S | Chr | Start | End | IS count |
| Raw | Adj |
|---|---|---|---|---|---|---|---|
| 2463.2 | chr11 | 63175583 | 68111375 | 651 | 17.2 | <2e-16 | <2e-16 |
| 1795.1 | chr16 | 95090 | 3640598 | 444 | 19.6 | <2e-16 | <2e-16 |
| 1390.0 | chr17 | 70634094 | 73732441 | 386 | 15.5 | <2e-16 | <2e-16 |
| 1189.8 | chr17 | 75720251 | 78604915 | 323 | 16.2 | <2e-16 | <2e-16 |
| 1063.8 | chr3 | 46999507 | 52978572 | 424 | 8.5 | <2e-16 | <2e-16 |
| 1046.8 | chr6 | 30563526 | 33532447 | 325 | 12.6 | <2e-16 | <2e-16 |
| 1041.8 | chr9 | 138245676 | 139772487 | 224 | 26.9 | <2e-16 | <2e-16 |
| 732.0 | chr8 | 144469820 | 146194757 | 188 | 18.1 | <2e-16 | <2e-16 |
| 721.1 | chr19 | 572963 | 3118599 | 209 | 14.3 | <2e-16 | <2e-16 |
| 629.1 | chr17 | 1483915 | 4578114 | 238 | 9.2 | <2e-16 | <2e-16 |
List of first 10 clusters identified in MLV data by scan statistics
| S | Chr | Start | End | IS count |
| Raw | Adj |
|---|---|---|---|---|---|---|---|
| 386.5 | chr20 | 51646845 | 51991770 | 89 | 22.8 | <2E-16 | <2E-16 |
| 326.4 | chr20 | 10362242 | 10450134 | 55 | 51.8 | <2E-16 | <2E-16 |
| 318.4 | chr17 | 26646082 | 26672265 | 41 | 131.1 | <2E-16 | <2E-16 |
| 302.6 | chr17 | 76325116 | 76460372 | 56 | 39.5 | <2E-16 | <2E-16 |
| 285.6 | chr19 | 59566413 | 59591310 | 37 | 127.9 | <2E-16 | <2E-16 |
| 284.6 | chr21 | 38671040 | 39311896 | 90 | 12.2 | <2E-16 | <2E-16 |
| 279.2 | chr17 | 51718847 | 53782415 | 142 | 6.2 | <2E-16 | <2E-16 |
| 278.7 | chr1 | 25046795 | 28847012 | 183 | 4.7 | <2E-16 | <2E-16 |
| 267.7 | chr18 | 72291047 | 72971441 | 87 | 11.6 | <2E-16 | <2E-16 |
| 264.4 | chr12 | 6084417 | 10441567 | 197 | 4.2 | <2E-16 | <2E-16 |
Fig. 2a Length distributions of clusters identified by DBSCAN and scan statistics algorithm in MLV and HIV data sets. b Size distributions of clusters identified by DBSCAN and scan statistics algorithm in MLV and HIV data sets
Fig. 3HIV and MLV IS distributions on chr 17. HIV and MLV IS distributions on chromosome 17 estimated by means of Gaussian kernel with unbiased cross validation bandwidth selection (blue curve and red curve respectively). Comparative hotspots reported in [17] correspondent to segments indicated on third line in red (MLV comparative hotspot) and fourth line in blue (HIV comparative hotspots) taking into account for strand annotation. Fifth and sixth lines are dedicated to relative scan statistics. First two significant cluster identified using relative scan statistics with no correspondent comparative hotspots are highlighted (black box)
Fig. 4a Length distributions of clusters identified by Ambrosi et al. methods and relative scan statistics algorithm in MLV and HIV data sets. b Size distributions of clusters identified by Ambrosi et al. methods and relative scan statistics algorithm in MLV and HIV data sets
List of relative clusters identified by relative scan statistics
| S | Chr | Start | End | HIV IS | MLV IS |
| Type | Adj |
|---|---|---|---|---|---|---|---|---|
| 474.1 | chr11 | 63153734 | 68347426 | 659 | 129 | 1.91 | hiv | <2E-16 |
| 450.9 | chr6 | 30095760 | 33488528 | 332 | 7 | 4.49 | hiv | <2E-16 |
| 434.2 | chr16 | 95090 | 3561021 | 430 | 41 | 2.74 | hiv | <2E-16 |
| 260.9 | chr17 | 70835415 | 73732441 | 372 | 75 | 1.86 | hiv | <2E-16 |
| 227.0 | chr3 | 47041751 | 52978572 | 422 | 119 | 1.47 | hiv | <2E-16 |
| 219.4 | chr9 | 134493480 | 139818935 | 307 | 60 | 1.89 | hiv | <2E-16 |
| 213.5 | chr17 | 77047796 | 77746204 | 172 | 7 | 3.70 | hiv | <2E-16 |
| 191.9 | chr8 | 144548769 | 146194757 | 182 | 15 | 2.89 | hiv | <2E-16 |
| 122.0 | chr19 | 1027304 | 6006371 | 292 | 104 | 1.20 | hiv | <2E-16 |
| 115.4 | chr22 | 48983597 | 49573459 | 115 | 11 | 2.71 | hiv | <2E-16 |
| 105.6 | chr21 | 37559632 | 39311896 | 9 | 126 | -3.02 | mlv | <2E-16 |
| 102.1 | chr19 | 54074745 | 55048471 | 122 | 18 | 2.21 | hiv | <2E-16 |
| 99.3 | chr17 | 1069411 | 4213267 | 229 | 79 | 1.23 | hiv | <2E-16 |
| 96.4 | chr1 | 153550587 | 154168170 | 90 | 7 | 2.94 | hiv | <2E-16 |
| 91.8 | chr18 | 70832211 | 73059134 | 6 | 103 | -3.26 | mlv | <2E-16 |
| 91.5 | chr17 | 4573721 | 7723628 | 194 | 62 | 1.32 | hiv | <2E-16 |
| 86.5 | chr20 | 49745347 | 52129713 | 7 | 102 | -3.07 | mlv | <2E-16 |
| 86.0 | chr12 | 11729500 | 14430150 | 8 | 105 | -2.95 | mlv | <2E-16 |
| 83.3 | chr20 | 60901158 | 62379063 | 109 | 19 | 2.02 | hiv | <2E-16 |
| 81.3 | chr6 | 6536008 | 13289623 | 22 | 141 | -2.13 | mlv | <2E-16 |
Fig. 5HIV and MLV IS distributions on chr 19. HIV and MLV IS distributions on chromosome 19 estimated by means of Gaussian kernel with unbiased cross validation bandwidth selection (blue curve and red curve respectively). Comparative hotspots reported in [17] correspondent to segments indicated on third line in red (MLV comparative hotspot) and fourth line in blue (HIV comparative hotspots) taking into account for strand annotation. Fifth and sixth lines are dedicated to relative scan statistics. First two significant cluster identified using relative scan statistics with no correspondent comparative hotspots are highlighted (black box)