| Literature DB >> 17593969 |
Ulrich Abel1, Annette Deichmann, Cynthia Bartholomae, Kerstin Schwarzwaelder, Hanno Glimm, Steven Howe, Adrian Thrasher, Alexandrine Garrigue, Salima Hacein-Bey-Abina, Marina Cavazzana-Calvo, Alain Fischer, Dirk Jaeger, Christof von Kalle, Manfred Schmidt.
Abstract
Features such as mutations or structural characteristics can be non-randomly or non-uniformly distributed within a genome. So far, computer simulations were required for statistical inferences on the distribution of sequence motifs. Here, we show that these analyses are possible using an analytical, mathematical approach. For the assessment of non-randomness, our calculations only require information including genome size, number of (sampled) sequence motifs and distance parameters. We have developed computer programs evaluating our analytical formulas for the real-time determination of expected values and p-values. This approach permits a flexible cluster definition that can be applied to most effectively identify non-random or non-uniform sequence motif distribution. As an example, we show the effectivity and reliability of our mathematical approach in clinical retroviral vector integration site distribution.Entities:
Mesh:
Year: 2007 PMID: 17593969 PMCID: PMC1892803 DOI: 10.1371/journal.pone.0000570
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Mean values for random CIS formation (1000 IS) determined either with computer simulations or mathematically.
| Order of CIS | Mean Value Mathematical Formula | Mean Value Computer Simulations |
| 2nd |
| 9.75 |
| 3rd |
| 0.13 |
| 4th |
| 0.01 |
Simulations were performed with 50000 runs each. g, haploid size of the human genome: 3.12 x 106 kb; d n, genomic window size [kb] for CIS of nth order: d 2 = 30, d 3 = 50, and d 4 = 100; n is, number of (assumed) sampled integration sites: 1000.
Comparative analysis of mean values and p-values obtained computationally (‘Simulation’) or mathematically (‘Formula’).
| CIS | IS | MV Simulation | MV Formula |
|
|
| 3 | 140 | 0.188 | 0.190 | 0.0009 | 0.001 |
| 1 | 134 | 0.175 | 0.174 | 0.16 | 0.16 |
| 4 | 102 | 0.100 | 0.101 | 0 | 3.9×10−6 |
| 15 | 304 | 0.899 | 0.900 | 0 | 6.8×10−14 |
| 102 | 572 | 3.200 | 3.193 | 0 | <10−16 |
The results refer to the presence of CIS detected in 2 clinical X-SCID gene therapy studies [unpublished data]. Simulations were performed with 50000 runs on the haploid size of the human genome (3.12×106 kb). P-values estimated from simulations equal the proportion per 50000 runs in which the number of CIS was at least as high as the number observed in the trials. The genomic window size chosen for CIS of 2nd order was 30kb. CIS, number of identified CIS of 2nd order in patient and control samples pre- and post-transplant; IS, number of all unique identified integration sites in patient and control samples pre- and post-transplant; MV, mean value.
Formulas based statistical analysis of the results on CIS formation in clinical samples derived from 2 clinical X-SCID gene therapy studies [unpublished data].
| CIS | IS | MV Uniform* | MV Triangular§ |
|
|
| 3 | 140 | 0.191 | 0.212 | 0.001 | 0.0014 |
| 1 | 134 | 0.175 | 0.195 | 0.161 | 0.177 |
| 4 | 102 | 0.101 | 0.124 | 4.0 x 10−6 | 6.1 × 10−6 |
| 15 | 304 | 0.905 | 1.006 | 7.4 × 10−14 | 3.3 × 10−13 |
| 102 | 572 | 3.212 | 3.568 | <10−16 | <10−16 |
Calculations were performed on the haploid size of the human genome (3.12 × 106 kb) and on the basis of an IS skewing (25% of all IS) to the +/− 5 kb TSS region, for which an (*) uniform or a (§) triangular IS distribution, respectively, was assumed. 75% of IS were assumed to be uniformly distributed over the remaining human genome. The genomic window size chosen for CIS of 2nd order was 30 kb. CIS, number of identified CIS of 2nd order in patient and control samples pre- and post-transplant; IS, number of all unique identified integration sites in patient and control samples pre- and post-transplant; MV, mean value.