| Literature DB >> 22241771 |
Tracy L Bergemann1, Timothy K Starr, Haoyu Yu, Michael Steinbach, Jesse Erdmann, Yun Chen, Robert T Cormier, David A Largaespada, Kevin A T Silverstein.
Abstract
Insertional mutagenesis screens in mice are used to identify individual genes that drive tumor formation. In these screens, candidate cancer genes are identified if their genomic location is proximal to a common insertion site (CIS) defined by high rates of transposon or retroviral insertions in a given genomic window. In this article, we describe a new method for defining CISs based on a Poisson distribution, the Poisson Regression Insertion Model, and show that this new method is an improvement over previously described methods. We also describe a modification of the method that can identify pairs and higher orders of co-occurring common insertion sites. We apply these methods to two data sets, one generated in a transposon-based screen for gastrointestinal tract cancer genes and another based on the set of retroviral insertions in the Retroviral Tagged Cancer Gene Database. We show that the new methods identify more relevant candidate genes and candidate gene pairs than found using previous methods. Identification of the biologically relevant set of mutations that occur in a single cell and cause tumor progression will aid in the rational design of single and combinatorial therapies in the upcoming age of personalized cancer therapy.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22241771 PMCID: PMC3351147 DOI: 10.1093/nar/gkr1295
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of windows containing the indicated number of transposon insertions and the subset identified as statistically significant CISs in the GI tumor data set
| Window size (kb) | Total number of windows | Number of windows with indicated number of insertions (number of statistically significant CIS windows) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | 120 774 | 106 582 | 12 782 | 1235 | 145 | 21 | 2 | 0 | 1 | 2 | 0 |
| 50 | 48 313 | 35 910 | 9773 | 2080 | 400 | 98 | 33 | 11 | 3 | 1 | 0 |
| 70 | 34 516 | 23 034 | 8319 | 2364 | 565 | 156 | 45 | 20 | 8 | 0 | 1 |
| 100 | 24 161 | 13 703 | 6859 | 2458 | 764 | 247 | 75 | 28 | 13 | 9 | 1 |
| 20 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | |
| 50 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | |
| 70 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | |
| 100 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | |
aNumber of statistically significant CISs based on the PRIM are in parentheses.
bTotal number of windows in genome based on window size.
Figure 1.For various window sizes, a plot of the average rate of insertion for each mouse chromosome using the 15 857 insertions from the Starr et al. (10) data set. Conceptually, the rate parameter reflects the number of insertions per window, adjusting for the TA count. Chromosome 1 was dropped from the plot because for many mice this was where the donor transposon concatamer resided. All insertions that appeared on the same chromosome as their donor concatamer were removed in order to eliminate local-hopping artifacts. The local-hopping phenomenon is explained in more detail in Starr et al. (10).
Number of CCIs in a given genomic window in 2, 3, 4, 5 or 7 tumors from the GI tumor data set
| Window size (kb) | Number of tumors | ||||
|---|---|---|---|---|---|
| 2 | 3 | 4 | 5 | 7 | |
| 20 | 605 | 1 | 0 | 0 | 0 |
| 50 | 2972 | 26 | 2 | 0 | 0 |
| 70 | 5235 | 55 | 1 | 1 | 0 |
| 100 | 9009 | 121 | 6 | 1 | 1 |
Three CCIs occurring in GI tumor data set
| Window size | Locus A of CCI | Locus B of CCI | Number of tumors | ||||
|---|---|---|---|---|---|---|---|
| Chr | Start address | Gene name | Chr | Start address | Gene name | ||
| 70 | 10 | 122 140 001 | Ppm1h | 15 | 42 970 001 | Rspo2 | 4 |
| 100 | 5 | 148 300 001 | Pan3 | 11 | 86 500 001 | Cltc | 4 |
| 100 | 18 | 34 300 001 | Apc | 18 | 34 400 001 | Apc | 7 |
aLocus A and Locus B are the pair of loci composing the CCI.
bWhen multiple window sizes find the same CCI, the largest window size is reported.
cPhysical address of start of CCI based on NCBIM37 genome build.
dCandidate gene in locus.
Chr = Chromosome.
Number of co-occurring insertions and the subset identified as statistically significant CCIs in the RTCGD
| Window size (kb) | Number of tumors with a co-occurring pair of insertions | |||||||
|---|---|---|---|---|---|---|---|---|
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| 20 | 49 | 4 | 1 | 1 | 0 | 0 | 0 | 0 |
| 50 | 64 | 10 | 2 | 1 | 0 | 0 | 1 | 0 |
| 70 | 82 | 7 | 2 | 2 | 1 | 0 | 0 | 1 |
| 100 | 98 | 10 | 4 | 3 | 0 | 0 | 0 | 1 |
aNumber of statistically significant CISs based on the PRIM are in parenthesis.
Empirical FDR of PRIM using various sized simulated data sets
| Number of tumors | Number of insertions | |||
|---|---|---|---|---|
| 7500 | 15 000 | 16 000 | 30 000 | |
| 50 | 0.036 | 0.011 | ND | 0 |
| 135 | ND | ND | 0.002 | ND |
| 150 | 0.016 | 0.002 | ND | 0.001 |
| 300 | 0.002 | 0 | ND | 0 |
aFDR is based on 1000 simulations and is calculated as (number of simulations that produced a CCI)/(number of total simulations). In all individual simulations that identified a CCI, only one CCI was found, except in one of the simulations with 50 tumors and 7500 insertions, where one of the simulations yielded two CCIs.
bND indicates simulations were not done.
Number of simulations with pairs of insertions in the indicated number of tumors
| Number of tumors | Number of insertions | |||
|---|---|---|---|---|
| 7500 | 15 000 | 16 000 | 30 000 | |
| 50 | 681 (2) | 944 (3) | NDb | 935 (4) |
| 319 (3) | 56 (4) | 65 (5) | ||
| 135 | NDb | NDb | 130 (2) | NDb |
| 868 (3) | ||||
| 2 (4) | ||||
| 150 | 984 (2) | 374 (2) | NDb | 380 (3) |
| 16 (3) | 624 (3) | 168 (4) | ||
| 2 (4) | 2 (5) | |||
| 300 | 998 (2) | 899 (2) | NDb | 992 (3) |
| 2 (3) | 101 (3) | 8 (4) | ||
aFor example, in 1000 simulations using 50 tumors and 7500 insertions there are 681 simulations with a pair of insertions occurring in 2 tumors and 319 simulations with a pair of insertions occurring in 3 tumors. bND indicates simulations were not done.
Figure 2.Location of transposon insertions in CCIs. (A) Seven tumors had insertions in both the 3′ and 5′ regions of Apc. (B) Four tumors had insertions in both the upstream promoter of Rspo2 and in intron 1 of Ppmh1. All four insertions in the Rspo2 promoter inserted with the transposon viral promoter in the same orientation as the gene. (C) Four tumors had insertions in Cltc, two of which had insertions in Flt1 and the other two in Pan3. Insertions are depicted by a bent arrow, which points in the direction of the transposon promoter. Insertion numbers indicate tumors. Red arrow indicates direction of transcription. Solid black lines depict introns while dashed black lines depict intergenic DNA. Black boxes depict exons (A and B) while blue boxes depict genes (C). Arrow on bottom indicates length of DNA.