Literature DB >> 17593969

Real-time definition of non-randomness in the distribution of genomic events.

Ulrich Abel¹, Annette Deichmann, Cynthia Bartholomae, Kerstin Schwarzwaelder, Hanno Glimm, Steven Howe, Adrian Thrasher, Alexandrine Garrigue, Salima Hacein-Bey-Abina, Marina Cavazzana-Calvo, Alain Fischer, Dirk Jaeger, Christof von Kalle, Manfred Schmidt.

Abstract

Features such as mutations or structural characteristics can be non-randomly or non-uniformly distributed within a genome. So far, computer simulations were required for statistical inferences on the distribution of sequence motifs. Here, we show that these analyses are possible using an analytical, mathematical approach. For the assessment of non-randomness, our calculations only require information including genome size, number of (sampled) sequence motifs and distance parameters. We have developed computer programs evaluating our analytical formulas for the real-time determination of expected values and p-values. This approach permits a flexible cluster definition that can be applied to most effectively identify non-random or non-uniform sequence motif distribution. As an example, we show the effectivity and reliability of our mathematical approach in clinical retroviral vector integration site distribution.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2007 PMID： 17593969 PMCID： PMC1892803 DOI： 10.1371/journal.pone.0000570

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

With the sequences of complete genomes available [1]–[4], and accelerating technologies for high-throughput sequencing [5] genome wide sequence analyses of individual samples will soon become reality. Comparative analyses of sequence composition and sequence motif distribution have become central parts of genome and transcriptome research, providing new insights on evolution, physiology and medical diagnosis [6]–[15]. Our understanding of integrating viruses and related vectors in gene therapy trials is an interesting example of such approaches. Since the completion of the human and murine genome sequencing projects the location of the vector in the cellular genome can be defined precisely, allowing the determination of possible vector integration induced effects on the surrounding genomic DNA regions at the molecular level. Integration site analyses have gained increasing interest with the dramatic development of a retroviral vector-induced lymphoproliferative disease in 3 patients cured of X-linked severe combined immunodeficiency (X-SCID) that was triggered by insertional activation of the proto-oncogene LMO2 [16], [17]. Meanwhile, insertion induced side effects have been identified ranging from immortalization [18] to clonal dominance [19]–[22] and even oncogenesis [23]–[25] in a variety of gene therapy studies. These studies have in common that a clustering of integration sites (IS) in certain genomic loci was detectable, and likely provided a selective advantage for the affected cell clone. The clustering of integrations, termed common integration sites (CIS), as an indicator for clone selection has already been used in concerted retrovirus insertional mutagenesis studies that aimed to identify new cancer genes by determining the gene configuration near frequently affected integration site loci [26]–[28]. For CIS determination, computer simulations were performed to assess non-randomness of IS distribution in tumors [28]. To validate the correctness of our mathematical approach defining non-randomness and non-uniform sequence motif distribution, we analyzed the IS distribution and presence of CIS in 2 successful clinical SCID-X1 studies [29,30, unpublished data]. We considered 2, 3 or 4 insertions as CIS of 2nd, of 3rd or 4th order if they fell within a 30 kb, 50 kb or 100 kb window of genomic sequence from each other, respectively. Simultaneously, we performed computer simulations written in open source ‘R’-language (http://cran.r-project.org) for which a window of size d (d = the maximum distance defining a CIS of order n) was shifted through the ordered sequence of the IS. For each window W(j) = [IS(j),IS(j)+d] it was then counted how many CIS of order n including IS(j) as first element were contained in W(j). We show that our mathematical approach for defining biased IS distribution is comparable to the output of computational simulations. It may have advantages in performance of large quantities of individual analyses. Even if the null hypothesis of random uniform allocation is not adequate, as it is known from retroviral vector integration [31], our calculations can address segments of the genome located between sites of predilection for virus integration and can be extended to address non-uniform sequence motif distributions.

Results and Discussion

Part 1: Random uniform allocation of IS

For the purpose of this discussion, the unit of observation (location and distance) is kilobasepair (kb). We assume that a number n is of IS is randomly allocated (with a uniform distribution) to the locations of a genome consisting of g kb. A CIS of order n is an n-tuple of IS such that the maximum distance between the lowest and highest position is no greater than a fixed bound. Further terminology defining “size” or distance of a CIS of order n, i.e. maximum permissible distance between any two members of a CIS of order n probability that a given (sub)set of n IS that are randomly allocated form a CIS of order n probability that a given subset of m randomly allocated IS has a span ( = maximum distance between any two elements) of exactly d expected value of the number of CIS of order n We start with the elementary observation that E equals P times the number of subsets of IS consisting of n elements:Clearly,It remains to determine P(n,d). First note that P(1,d) = 0 for d>0. Furthermore, for all m≥1: A recursive formula for P(m,d), d>0, can be derived by breaking down the potential CIS of order m into subsets of m–1 elements having a span of d'≤d, to which an m-th IS is added such that the maximum span is exactly d:where r is a negligible correction term that arises because the uncorrected recursion formula is strictly valid only for subsets of IS that have a distance ≥d from the telomeres. By mounting the recursive ladder (m = 1,...,n), these formulas successively yield P(n,d), P, and E. In particular, one easily obtains (d>0): Plugging this into equations (2) and (1) yields for the expected value E: As shown in , our mathematical approximation corresponds extremely well to the mean values found in 50000 simulation runs.

Table 1

Mean values for random CIS formation (1000 IS) determined either with computer simulations or mathematically.

Order of CIS	Mean Value Mathematical Formula	Mean Value Computer Simulations
2^nd		9.75
3^rd		0.13
4^th		0.01

Simulations were performed with 50000 runs each. g, haploid size of the human genome: 3.12 x 106 kb; d n, genomic window size [kb] for CIS of nth order: d 2 = 30, d 3 = 50, and d 4 = 100; n is, number of (assumed) sampled integration sites: 1000. Statistical inferences, such as the calculation of p-values, can be based on the observation that, under the null hypothesis (H 0) of random uniform allocation of the IS, the number of CIS of order n is (approximately) Poisson distributed with parameter λ = E n. Thus, if the random variable X denotes the number of CIS of order n, and X = k is observed in a trial, then the p-value P(X≥k) of this observation calculated under H 0, i.e. from the Poisson distribution P o(E), is given bywhere the random variable χ has a chi-square distribution with 2 k degrees of freedom [32], [33]. The Poisson approximation to the true random distribution of CIS is exceedingly close. In fact, if the number of simulation runs is sufficiently high, the simulated distribution is virtually undistinguishable from P o(E). In particular, both the expected values and the p-values derived from P o(E) are nearly identical to those obtained in computer simulations. The latter point is apparent from , where for a final proof of principle of our mathematical calculations, results of the analysis of our integration data set retrieved from two clinical SCID-X1 therapy trials [unpublished data] are given.

Table 2

Comparative analysis of mean values and p-values obtained computationally (‘Simulation’) or mathematically (‘Formula’).

CIS	IS	MV Simulation	MV Formula	p-Value Simulation	p-Value Formula
3	140	0.188	0.190	0.0009	0.001
1	134	0.175	0.174	0.16	0.16
4	102	0.100	0.101	0	3.9×10⁻⁶
15	304	0.899	0.900	0	6.8×10⁻¹⁴
102	572	3.200	3.193	0	<10⁻¹⁶

The results refer to the presence of CIS detected in 2 clinical X-SCID gene therapy studies [unpublished data]. Simulations were performed with 50000 runs on the haploid size of the human genome (3.12×106 kb). P-values estimated from simulations equal the proportion per 50000 runs in which the number of CIS was at least as high as the number observed in the trials. The genomic window size chosen for CIS of 2nd order was 30kb. CIS, number of identified CIS of 2nd order in patient and control samples pre- and post-transplant; IS, number of all unique identified integration sites in patient and control samples pre- and post-transplant; MV, mean value.

The p-value can be calculated by means of either of the following commands (‘R’ code): 1–ppois(lambda = E, q = k–1) or pchisq(df = 2k, q = 2E). Using the data of (first line) 1–ppois(lambda = 0.19, q = 2) or pchisq(df = 6, q = 0.38). In both instances, the result is 0.00099. Alternatively, the table of the chisquare distribution with 6 degrees of freedom can be used to look up the probability P(X≤0.38). One should note that, for low E, the p-value of a single observed CIS is virtually identical to E. This implies that, for n>5, no p-values need to be calculated (and hence no formulas are required for E, n>5), because even with an extremely liberal definition of the CIS (d 5 = 500) and a fairly high number of IS (n is = 1000) a single CIS of order 5 will be statistically significant (p = 0.027). The results refer to the presence of CIS detected in 2 clinical X-SCID gene therapy studies [unpublished data]. Simulations were performed with 50000 runs on the haploid size of the human genome (3.12×106 kb). P-values estimated from simulations equal the proportion per 50000 runs in which the number of CIS was at least as high as the number observed in the trials. The genomic window size chosen for CIS of 2nd order was 30kb. CIS, number of identified CIS of 2nd order in patient and control samples pre- and post-transplant; IS, number of all unique identified integration sites in patient and control samples pre- and post-transplant; MV, mean value.

Part 2: Non-uniform allocation of IS

Defining non-randomness in the clustering of genomic events often requires additional precautions as sequence structures of interest may already have known specific distribution biases. In the case of our clinical example (unpublished data), it is known that retroviral vectors based on the murine leukaemia virus (MLV) tend to integrate into gene coding regions preferentially near the transcriptional start site (TSS) [34]-[36]. It is also proposed that additional factors, indeed mostly unknown, may influence the accessibility of vectors to certain genomic DNA regions [37]. Thus, the null hypothesis of random uniform allocation of MLV IS distribution may not be adequate according to the current ‘state of the art’, as has recently been argued [31]. In line with this study, we portioned the genome into 2 adequate areas that differ in the likelihood of getting targeted by vectors. Further terminology number of TSS an interval of +/-5kb around a TSS union of all T5 number of IS occuring in GT5 and in the complement of GT5, respectively number of CIS occurring in GT5, both in GT5 and in the complement of GT5 and in the complement of GT5 only, respectively Clearly, the expected value E n of the number CIS of order n is given by the following sum:In the following it will be shown how to calculate the terms on the right side of (5). We start with the expected value of n cis,GT5 fore what we assume that vector integration into any T5 occurs with the same probability. Thenwhere X is the number of CIS (among those occurring in GT5) that occur in a fixed T5. Observing that i IS in a fixed T5 yield CIS of order n in this T5 one easily obtains the expected value of X Since X is binomially distributed as ∼ B(n is,GT5,1/n TSS),Merging equations (6)–(8) yields the desired formula for E(n cis,GT5): If n is,GT5 is small compared to n TSS (undoubtedly, this is mostly the case), terms of higher order can be neglected so that, because (n TSS–1)/n TSS≈1, formula (9) simplifies to Notice that formulas (6)–(10) do not depend on the spatial distribution of the IS within the T5. (It is unnecessary to account for the closeness of IS within T5 because any pair – or triple, quadruple etc., for that matter – of IS within a T5 yields a CIS.) Clearly, the expected value of n cis,Mix E(n cis,Mix) is not independent of the distance between the IS and the TSS. Thus, inevitably, assumptions regarding the spatial distribution for the IS will influence its value. In the sequel, a formula for E(n cis,Mix) shall be derived for the case n = 2. As before, CIS of order 2 are defined by a maximum distance d 2 of 30kb between the IS. If the TSS are indistiguishable with respect to the probability distribution of the integrations, thenwhere p Mix denotes the probability that an arbitrary pair of IS (with one element in GT5 and one element in the complement of GT5) forms a CIS of order 2 around a fixed TSS. We will assume that the distributions of IS within a T5 and within +/-35 kb around a TSS are symmetric. Then, again using kb as unit of distance, In formula (12) the points x = 0 and y = 0 correspond to the TSS-5; f(x) designates the probability density function of vector integrations in T5; and g(y) designates the corresponding density function in [TSS-35, TSS-5]. Formula (12) shall be evaluated for two special cases: Case 1: Vector integrations are uniformly distributed in GT5 and in the complement of GT5, respectively. I.e., Solving the integrals in formula (12) we have Case 2: As above, vector integrations in the complement of GT5 are assumed to be uniformly distributed. However, a triangular distribution is assumed for f(x). The corresponding formula is easily calculated:By plugging this into (12) we get It may be surprising that a triangular distribution in T5 results in a higher expected value for n cis,Mix than a uniform distribution. However, this becomes more plausible if one notes that a higher value is also obtained if the IS are concentrated in an extreme manner within the T5, viz. in a one-point distribution with total mass in the TSS. In this special case (which is particularly easy to evaluate), p Mix = 50/(n TSS(g–n TSS)). If, with respect to the formation of CIS, the complement of GT5 could be regarded as a continuum, the expected value of n cis,Comp would be given by the formulas developed in Part 1 of this contribution. In the case of retroviral (MLV) vectors, however, the complement of GT5 has rather to be viewed as a partitioned set consisting of approximately TSS disjoint intervals. It follows that that the residual term on the right-hand side of equation (4) (Part 1) may no longer be negligible. Note however, the assumption of a continuum clearly tends to lead to an overestimation of the number of CIS, because the boundaries of the components reduce the number of CIS occurring in their neighborhood. It follows that the formulas derived in Part 1 form an upper bound for E(n cis,Comp). In particular, the true p-values are less or equal to the values calculated by means of the formulas derived in Part 1. Therefore, any positive statements regarding statistical significance remain valid. Moreover, the overestimation is probably fairly small given that the sections of GT5 located between the TSS are mostly rather wide compared to the length defining a CIS. Indeed, the null hypothesis of non-uniform allocation for IS distribution does not substantially change the results we have obtained based on the hypothesis of a random uniform allocation for CIS formation in our clinical samples ( ), as is shown in .

Table 3

Formulas based statistical analysis of the results on CIS formation in clinical samples derived from 2 clinical X-SCID gene therapy studies [unpublished data].

CIS	IS	MV Uniform^*	MV Triangular^§	p-Value Uniform*	p-Value Triangular^§
3	140	0.191	0.212	0.001	0.0014
1	134	0.175	0.195	0.161	0.177
4	102	0.101	0.124	4.0 x 10⁻⁶	6.1 × 10⁻⁶
15	304	0.905	1.006	7.4 × 10⁻¹⁴	3.3 × 10⁻¹³
102	572	3.212	3.568	<10⁻¹⁶	<10⁻¹⁶

Calculations were performed on the haploid size of the human genome (3.12 × 106 kb) and on the basis of an IS skewing (25% of all IS) to the +/− 5 kb TSS region, for which an (*) uniform or a (§) triangular IS distribution, respectively, was assumed. 75% of IS were assumed to be uniformly distributed over the remaining human genome. The genomic window size chosen for CIS of 2nd order was 30 kb. CIS, number of identified CIS of 2nd order in patient and control samples pre- and post-transplant; IS, number of all unique identified integration sites in patient and control samples pre- and post-transplant; MV, mean value. Our mathematical formulas allow a reliable, straightforward calculation of non-randomness in CIS and other genomic event distributions under the null hypothesis of uniform and non-uniform allocation. Using formula based workspaces (available on request), expected values and p-values can be calculated with ease in real-time. They may be preferable to computer simulations when (routine) high-speed processing of large quantities of analyses is needed. Our approach enables a closely problem-oriented, highly exact evaluation of non-randomness that is useful for assessing IS distribution in clinical trials and for assessing the distribution of any sequence motif of interest in a natural or artificial genome.

35 in total

1. Initial sequencing and analysis of the human genome.

Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal: Nature Date: 2001-02-15 Impact factor: 49.962

2. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

Authors: A A Camargo; H P Samaia; E Dias-Neto; D F Simão; I A Migotto; M R Briones; F F Costa; M A Nagai; S Verjovski-Almeida; M A Zago; L E Andrade; H Carrer; H F El-Dorry; E M Espreafico; A Habr-Gama; D Giannella-Neto; G H Goldman; A Gruber; C Hackel; E T Kimura; R M Maciel; S K Marie; E A Martins; M P Nobrega; M L Paco-Larson; M I Pardini; G G Pereira; J B Pesquero; V Rodrigues; S R Rogatto; I D da Silva; M C Sogayar; M F Sonati; E H Tajara; S R Valentini; F L Alberto; M E Amaral; I Aneas; L A Arnaldi; A M de Assis; M H Bengtson; N A Bergamo; V Bombonato; M E de Camargo; R A Canevari; D M Carraro; J M Cerutti; M L Correa; R F Correa; M C Costa; C Curcio; P O Hokama; A J Ferreira; G K Furuzawa; T Gushiken; P L Ho; E Kimura; J E Krieger; L C Leite; P Majumder; M Marins; E R Marques; A S Melo; M B Melo; C A Mestriner; E C Miracca; D C Miranda; A L Nascimento; F G Nobrega; E P Ojopi; J R Pandolfi; L G Pessoa; A C Prevedel; P Rahal; C A Rainho; E M Reis; M L Ribeiro; N da Ros; R G de Sa; M M Sales; S C Sant'anna; M L dos Santos; A M da Silva; N P da Silva; W A Silva; R A da Silveira; J F Sousa; D Stecconi; F Tsukumo; V Valente; F Soares; E S Moreira; D N Nunes; R G Correa; H Zalcberg; A F Carvalho; L F Reis; R R Brentani; A J Simpson; S J de Souza; M Melo
Journal: Proc Natl Acad Sci U S A Date: 2001-10-09 Impact factor: 11.205

3. Murine leukemia induced by retroviral gene marking.

Authors: Zhixiong Li; Jochen Düllmann; Bernd Schiedlmeier; Manfred Schmidt; Christof von Kalle; Johann Meyer; Martin Forster; Carol Stocking; Anke Wahlers; Oliver Frank; Wolfram Ostertag; Klaus Kühlcke; Hans-Georg Eckert; Boris Fehse; Christopher Baum
Journal: Science Date: 2002-04-19 Impact factor: 47.728

4. Initial sequencing and comparative analysis of the mouse genome.

Authors: Robert H Waterston; Kerstin Lindblad-Toh; Ewan Birney; Jane Rogers; Josep F Abril; Pankaj Agarwal; Richa Agarwala; Rachel Ainscough; Marina Alexandersson; Peter An; Stylianos E Antonarakis; John Attwood; Robert Baertsch; Jonathon Bailey; Karen Barlow; Stephan Beck; Eric Berry; Bruce Birren; Toby Bloom; Peer Bork; Marc Botcherby; Nicolas Bray; Michael R Brent; Daniel G Brown; Stephen D Brown; Carol Bult; John Burton; Jonathan Butler; Robert D Campbell; Piero Carninci; Simon Cawley; Francesca Chiaromonte; Asif T Chinwalla; Deanna M Church; Michele Clamp; Christopher Clee; Francis S Collins; Lisa L Cook; Richard R Copley; Alan Coulson; Olivier Couronne; James Cuff; Val Curwen; Tim Cutts; Mark Daly; Robert David; Joy Davies; Kimberly D Delehaunty; Justin Deri; Emmanouil T Dermitzakis; Colin Dewey; Nicholas J Dickens; Mark Diekhans; Sheila Dodge; Inna Dubchak; Diane M Dunn; Sean R Eddy; Laura Elnitski; Richard D Emes; Pallavi Eswara; Eduardo Eyras; Adam Felsenfeld; Ginger A Fewell; Paul Flicek; Karen Foley; Wayne N Frankel; Lucinda A Fulton; Robert S Fulton; Terrence S Furey; Diane Gage; Richard A Gibbs; Gustavo Glusman; Sante Gnerre; Nick Goldman; Leo Goodstadt; Darren Grafham; Tina A Graves; Eric D Green; Simon Gregory; Roderic Guigó; Mark Guyer; Ross C Hardison; David Haussler; Yoshihide Hayashizaki; LaDeana W Hillier; Angela Hinrichs; Wratko Hlavina; Timothy Holzer; Fan Hsu; Axin Hua; Tim Hubbard; Adrienne Hunt; Ian Jackson; David B Jaffe; L Steven Johnson; Matthew Jones; Thomas A Jones; Ann Joy; Michael Kamal; Elinor K Karlsson; Donna Karolchik; Arkadiusz Kasprzyk; Jun Kawai; Evan Keibler; Cristyn Kells; W James Kent; Andrew Kirby; Diana L Kolbe; Ian Korf; Raju S Kucherlapati; Edward J Kulbokas; David Kulp; Tom Landers; J P Leger; Steven Leonard; Ivica Letunic; Rosie Levine; Jia Li; Ming Li; Christine Lloyd; Susan Lucas; Bin Ma; Donna R Maglott; Elaine R Mardis; Lucy Matthews; Evan Mauceli; John H Mayer; Megan McCarthy; W Richard McCombie; Stuart McLaren; Kirsten McLay; John D McPherson; Jim Meldrim; Beverley Meredith; Jill P Mesirov; Webb Miller; Tracie L Miner; Emmanuel Mongin; Kate T Montgomery; Michael Morgan; Richard Mott; James C Mullikin; Donna M Muzny; William E Nash; Joanne O Nelson; Michael N Nhan; Robert Nicol; Zemin Ning; Chad Nusbaum; Michael J O'Connor; Yasushi Okazaki; Karen Oliver; Emma Overton-Larty; Lior Pachter; Genís Parra; Kymberlie H Pepin; Jane Peterson; Pavel Pevzner; Robert Plumb; Craig S Pohl; Alex Poliakov; Tracy C Ponce; Chris P Ponting; Simon Potter; Michael Quail; Alexandre Reymond; Bruce A Roe; Krishna M Roskin; Edward M Rubin; Alistair G Rust; Ralph Santos; Victor Sapojnikov; Brian Schultz; Jörg Schultz; Matthias S Schwartz; Scott Schwartz; Carol Scott; Steven Seaman; Steve Searle; Ted Sharpe; Andrew Sheridan; Ratna Shownkeen; Sarah Sims; Jonathan B Singer; Guy Slater; Arian Smit; Douglas R Smith; Brian Spencer; Arne Stabenau; Nicole Stange-Thomann; Charles Sugnet; Mikita Suyama; Glenn Tesler; Johanna Thompson; David Torrents; Evanne Trevaskis; John Tromp; Catherine Ucla; Abel Ureta-Vidal; Jade P Vinson; Andrew C Von Niederhausern; Claire M Wade; Melanie Wall; Ryan J Weber; Robert B Weiss; Michael C Wendl; Anthony P West; Kris Wetterstrand; Raymond Wheeler; Simon Whelan; Jamey Wierzbowski; David Willey; Sophie Williams; Richard K Wilson; Eitan Winter; Kim C Worley; Dudley Wyman; Shan Yang; Shiaw-Pyng Yang; Evgeny M Zdobnov; Michael C Zody; Eric S Lander
Journal: Nature Date: 2002-12-05 Impact factor: 49.962

5. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes.

Authors: Kevin C Miranda; Tien Huynh; Yvonne Tay; Yen-Sin Ang; Wai-Leong Tam; Andrew M Thomson; Bing Lim; Isidore Rigoutsos
Journal: Cell Date: 2006-09-22 Impact factor: 41.582

6. The genome sequence of the malaria mosquito Anopheles gambiae.

Authors: Robert A Holt; G Mani Subramanian; Aaron Halpern; Granger G Sutton; Rosane Charlab; Deborah R Nusskern; Patrick Wincker; Andrew G Clark; José M C Ribeiro; Ron Wides; Steven L Salzberg; Brendan Loftus; Mark Yandell; William H Majoros; Douglas B Rusch; Zhongwu Lai; Cheryl L Kraft; Josep F Abril; Veronique Anthouard; Peter Arensburger; Peter W Atkinson; Holly Baden; Veronique de Berardinis; Danita Baldwin; Vladimir Benes; Jim Biedler; Claudia Blass; Randall Bolanos; Didier Boscus; Mary Barnstead; Shuang Cai; Angela Center; Kabir Chaturverdi; George K Christophides; Mathew A Chrystal; Michele Clamp; Anibal Cravchik; Val Curwen; Ali Dana; Art Delcher; Ian Dew; Cheryl A Evans; Michael Flanigan; Anne Grundschober-Freimoser; Lisa Friedli; Zhiping Gu; Ping Guan; Roderic Guigo; Maureen E Hillenmeyer; Susanne L Hladun; James R Hogan; Young S Hong; Jeffrey Hoover; Olivier Jaillon; Zhaoxi Ke; Chinnappa Kodira; Elena Kokoza; Anastasios Koutsos; Ivica Letunic; Alex Levitsky; Yong Liang; Jhy-Jhu Lin; Neil F Lobo; John R Lopez; Joel A Malek; Tina C McIntosh; Stephan Meister; Jason Miller; Clark Mobarry; Emmanuel Mongin; Sean D Murphy; David A O'Brochta; Cynthia Pfannkoch; Rong Qi; Megan A Regier; Karin Remington; Hongguang Shao; Maria V Sharakhova; Cynthia D Sitter; Jyoti Shetty; Thomas J Smith; Renee Strong; Jingtao Sun; Dana Thomasova; Lucas Q Ton; Pantelis Topalis; Zhijian Tu; Maria F Unger; Brian Walenz; Aihui Wang; Jian Wang; Mei Wang; Xuelan Wang; Kerry J Woodford; Jennifer R Wortman; Martin Wu; Alison Yao; Evgeny M Zdobnov; Hongyu Zhang; Qi Zhao; Shaying Zhao; Shiaoping C Zhu; Igor Zhimulev; Mario Coluzzi; Alessandra della Torre; Charles W Roth; Christos Louis; Francis Kalush; Richard J Mural; Eugene W Myers; Mark D Adams; Hamilton O Smith; Samuel Broder; Malcolm J Gardner; Claire M Fraser; Ewan Birney; Peer Bork; Paul T Brey; J Craig Venter; Jean Weissenbach; Fotis C Kafatos; Frank H Collins; Stephen L Hoffman
Journal: Science Date: 2002-10-04 Impact factor: 47.728

7. High-throughput retroviral tagging to identify components of specific signaling pathways in cancer.

Authors: Harald Mikkers; John Allen; Puck Knipscheer; Like Romeijn; Augustinus Hart; Edwin Vink; Anton Berns; Lieke Romeyn
Journal: Nat Genet Date: 2002-08-19 Impact factor: 38.330

8. New genes involved in cancer identified by retroviral tagging.

Authors: Takeshi Suzuki; Haifa Shen; Keiko Akagi; Herbert C Morse; James D Malley; Daniel Q Naiman; Nancy A Jenkins; Neal G Copeland
Journal: Nat Genet Date: 2002-08-19 Impact factor: 38.330

9. Genome-wide retroviral insertional tagging of genes involved in cancer in Cdkn2a-deficient mice.

Authors: Anders H Lund; Geoffrey Turner; Alla Trubetskoy; Els Verhoeven; Ellen Wientjens; Danielle Hulsman; Robert Russell; Ronald A DePinho; Jack Lenz; Maarten van Lohuizen
Journal: Nat Genet Date: 2002-08-19 Impact factor: 38.330

10. The genome sequence of Drosophila melanogaster.

Authors: M D Adams; S E Celniker; R A Holt; C A Evans; J D Gocayne; P G Amanatides; S E Scherer; P W Li; R A Hoskins; R F Galle; R A George; S E Lewis; S Richards; M Ashburner; S N Henderson; G G Sutton; J R Wortman; M D Yandell; Q Zhang; L X Chen; R C Brandon; Y H Rogers; R G Blazej; M Champe; B D Pfeiffer; K H Wan; C Doyle; E G Baxter; G Helt; C R Nelson; G L Gabor; J F Abril; A Agbayani; H J An; C Andrews-Pfannkoch; D Baldwin; R M Ballew; A Basu; J Baxendale; L Bayraktaroglu; E M Beasley; K Y Beeson; P V Benos; B P Berman; D Bhandari; S Bolshakov; D Borkova; M R Botchan; J Bouck; P Brokstein; P Brottier; K C Burtis; D A Busam; H Butler; E Cadieu; A Center; I Chandra; J M Cherry; S Cawley; C Dahlke; L B Davenport; P Davies; B de Pablos; A Delcher; Z Deng; A D Mays; I Dew; S M Dietz; K Dodson; L E Doup; M Downes; S Dugan-Rocha; B C Dunkov; P Dunn; K J Durbin; C C Evangelista; C Ferraz; S Ferriera; W Fleischmann; C Fosler; A E Gabrielian; N S Garg; W M Gelbart; K Glasser; A Glodek; F Gong; J H Gorrell; Z Gu; P Guan; M Harris; N L Harris; D Harvey; T J Heiman; J R Hernandez; J Houck; D Hostin; K A Houston; T J Howland; M H Wei; C Ibegwam; M Jalali; F Kalush; G H Karpen; Z Ke; J A Kennison; K A Ketchum; B E Kimmel; C D Kodira; C Kraft; S Kravitz; D Kulp; Z Lai; P Lasko; Y Lei; A A Levitsky; J Li; Z Li; Y Liang; X Lin; X Liu; B Mattei; T C McIntosh; M P McLeod; D McPherson; G Merkulov; N V Milshina; C Mobarry; J Morris; A Moshrefi; S M Mount; M Moy; B Murphy; L Murphy; D M Muzny; D L Nelson; D R Nelson; K A Nelson; K Nixon; D R Nusskern; J M Pacleb; M Palazzolo; G S Pittman; S Pan; J Pollard; V Puri; M G Reese; K Reinert; K Remington; R D Saunders; F Scheeler; H Shen; B C Shue; I Sidén-Kiamos; M Simpson; M P Skupski; T Smith; E Spier; A C Spradling; M Stapleton; R Strong; E Sun; R Svirskas; C Tector; R Turner; E Venter; A H Wang; X Wang; Z Y Wang; D A Wassarman; G M Weinstock; J Weissenbach; S M Williams; K C Worley; D Wu; S Yang; Q A Yao; J Ye; R F Yeh; J S Zaveri; M Zhan; G Zhang; Q Zhao; L Zheng; X H Zheng; F N Zhong; W Zhong; X Zhou; S Zhu; X Zhu; H O Smith; R A Gibbs; E W Myers; G M Rubin; J C Venter
Journal: Science Date: 2000-03-24 Impact factor: 47.728

19 in total

1. A largely random AAV integration profile after LPLD gene therapy.

Authors: Christine Kaeppel; Stuart G Beattie; Raffaele Fronza; Richard van Logtenstein; Florence Salmon; Sabine Schmidt; Stephan Wolf; Ali Nowrouzi; Hanno Glimm; Christof von Kalle; Harald Petry; Daniel Gaudet; Manfred Schmidt
Journal: Nat Med Date: 2013-06-16 Impact factor: 53.440

2. Reduced genotoxicity of avian sarcoma leukosis virus vectors in rhesus long-term repopulating cells compared to standard murine retrovirus vectors.

Authors: Jingqiong Hu; Gabriel Renaud; Theotonius J Gomes; Theotonius Golmes; Andrea Ferris; Paul C Hendrie; Robert E Donahue; Stephen H Hughes; Tyra G Wolfsberg; David W Russell; Cynthia E Dunbar
Journal: Mol Ther Date: 2008-06-24 Impact factor: 11.454

3. Integrase-deficient lentiviral vectors mediate efficient gene transfer to human vascular smooth muscle cells with minimal genotoxic risk.

Authors: Helen E Chick; Ali Nowrouzi; Raffaele Fronza; Robert A McDonald; Nicole M Kane; Raul Alba; Christian Delles; William C Sessa; Manfred Schmidt; Adrian J Thrasher; Andrew H Baker
Journal: Hum Gene Ther Date: 2012-10-26 Impact factor: 5.695

4. Vector integration and tumorigenesis.

Authors: Christof von Kalle; Annette Deichmann; Manfred Schmidt
Journal: Hum Gene Ther Date: 2014-06 Impact factor: 5.695

5. Comparing DNA integration site clusters with scan statistics.

Authors: Charles C Berry; Karen E Ocwieja; Nirav Malani; Frederic D Bushman
Journal: Bioinformatics Date: 2014-01-30 Impact factor: 6.937

6. Bioinformatic clonality analysis of next-generation sequencing-derived viral vector integration sites.

Authors: Anne Arens; Jens-Uwe Appelt; Cynthia C Bartholomae; Richard Gabriel; Anna Paruzynski; Derek Gustafson; Nathalie Cartier; Patrick Aubourg; Annette Deichmann; Hanno Glimm; Christof von Kalle; Manfred Schmidt
Journal: Hum Gene Ther Methods Date: 2012-05-04 Impact factor: 2.396

Review 7. Stem cell gene therapy for fanconi anemia: report from the 1st international Fanconi anemia gene therapy working group meeting.

Authors: Jakub Tolar; Jennifer E Adair; Michael Antoniou; Cynthia C Bartholomae; Pamela S Becker; Bruce R Blazar; Juan Bueren; Thomas Carroll; Marina Cavazzana-Calvo; D Wade Clapp; Robert Dalgleish; Anne Galy; H Bobby Gaspar; Helmut Hanenberg; Christof Von Kalle; Hans-Peter Kiem; Dirk Lindeman; Luigi Naldini; Susana Navarro; Raffaele Renella; Paula Rio; Julián Sevilla; Manfred Schmidt; Els Verhoeyen; John E Wagner; David A Williams; Adrian J Thrasher
Journal: Mol Ther Date: 2011-05-03 Impact factor: 11.454

8. Insertion sites in engrafted cells cluster within a limited repertoire of genomic areas after gammaretroviral vector gene therapy.

Authors: Annette Deichmann; Martijn H Brugman; Cynthia C Bartholomae; Kerstin Schwarzwaelder; Monique M A Verstegen; Steven J Howe; Anne Arens; Marion G Ott; Dieter Hoelzer; Reinhard Seger; Manuel Grez; Salima Hacein-Bey-Abina; Marina Cavazzana-Calvo; Alain Fischer; Anna Paruzynski; Richard Gabriel; Hanno Glimm; Ulrich Abel; Claudia Cattoglio; Fulvio Mavilio; Barbara Cassani; Alessandro Aiuti; Cynthia E Dunbar; Christopher Baum; H Bobby Gaspar; Adrian J Thrasher; Christof von Kalle; Manfred Schmidt; Gerard Wagemaker
Journal: Mol Ther Date: 2011-08-23 Impact factor: 11.454

9. Recombinant AAV Integration Is Not Associated With Hepatic Genotoxicity in Nonhuman Primates and Patients.

Authors: Irene Gil-Farina; Raffaele Fronza; Christine Kaeppel; Esperanza Lopez-Franco; Valerie Ferreira; Delia D'Avola; Alberto Benito; Jesus Prieto; Harald Petry; Gloria Gonzalez-Aseguinolaza; Manfred Schmidt
Journal: Mol Ther Date: 2016-03-07 Impact factor: 11.454

10. The MYC, TERT, and ZIC1 genes are common targets of viral integration and transcriptional deregulation in avian leukosis virus subgroup J-induced myeloid leukosis.

Authors: Yuhao Li; Xuemei Liu; Zhen Yang; Chenggang Xu; Di Liu; Jianru Qin; Manman Dai; Jianyong Hao; Min Feng; Xiaorong Huang; Liqiang Tan; Weisheng Cao; Ming Liao
Journal: J Virol Date: 2013-12-26 Impact factor: 5.103