| Literature DB >> 21850262 |
Bing-Jian Feng1, Sean V Tavtigian, Melissa C Southey, David E Goldgar.
Abstract
Massively Parallel Sequencing (MPS) allows sequencing of entire exomes and genomes to now be done at reasonable cost, and its utility for identifying genes responsible for rare Mendelian disorders has been demonstrated. However, for a complex disease, study designs need to accommodate substantial degrees of locus, allelic, and phenotypic heterogeneity, as well as complex relationships between genotype and phenotype. Such considerations include careful selection of samples for sequencing and a well-developed strategy for identifying the few "true" disease susceptibility genes from among the many irrelevant genes that will be found to harbor rare variants. To examine these issues we have performed simulation-based analyses in order to compare several strategies for MPS sequencing in complex disease. Factors examined include genetic architecture, sample size, number and relationship of individuals selected for sequencing, and a variety of filters based on variant type, multiple observations of genes and concordance of genetic variants within pedigrees. A two-stage design was assumed where genes from the MPS analysis of high-risk families are evaluated in a secondary screening phase of a larger set of probands with more modest family histories. Designs were evaluated using a cost function that assumes the cost of sequencing the whole exome is 400 times that of sequencing a single candidate gene. Results indicate that while requiring variants to be identified in multiple pedigrees and/or in multiple individuals in the same pedigree are effective strategies for reducing false positives, there is a danger of over-filtering so that most true susceptibility genes are missed. In most cases, sequencing more than two individuals per pedigree results in reduced power without any benefit in terms of reduced overall cost. Further, our results suggest that although no single strategy is optimal, simulations can provide important guidelines for study design.Entities:
Mesh:
Year: 2011 PMID: 21850262 PMCID: PMC3151293 DOI: 10.1371/journal.pone.0023221
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Genetic Models Examined.
| Model | GRR | Penetrance | FRR | No. of loci for FRRtotal = 1.33 |
| I | 20 | 0.4 | 1.036 | 8 |
| II | 15 | 0.3 | 1.02 | 15 |
| III | 10 | 0.2 | 1.008 | 35 |
| IV | 7.5 | 0.15 | 1.0036 | 68 |
| V | 5 | 0.10 | 1.0016 | 179 |
Susceptibility allele frequency = 0.0001, sporadic rate = 0.02. GRR: genotype relative risk; FRR: familial relative risk.
Figure 1Pedigrees used in stage I and stage II.
Panel A: pedigree structure for stage I. Individuals sequenced are indicated by an arrow: ID 8 is sequenced when one individual is sequenced; IDs 8 and 11 when two are sequenced; IDs 8, 5 and 11 when three are sequenced. Panel B: pedigree structure for stage II. ID 8 is analyzed.
Selected filters examined in the simulation.
| Filter | One Case per pedigree | Two Cases per Pedigree |
| N1RV | All genes with a RV | All genes with a concordant RV |
| N1TS | All genes with a TSJ | All genes with a concordant TSJ |
| N2RV | All genes with RVs in 2+ pedigrees | All genes with RVs in 2+ pedigrees, at least one pedigrees concordant |
| N2TS | All genes with RVs in 2+ pedigrees, at least 1 RV is TSJ | All genes with RVs in 2+ pedigrees, at least 1 RV is TSJ, at least one pedigree concordant |
| N3RV | All genes with RVs in 3+ pedigrees | All genes with RVs in 3+ pedigrees, at least one pedigree concordant |
RV: rare variant; TSJ: truncating/splice junction variant.
Effect of different sample sizes and validation thresholds on type I and type II error in stage II.
| Number of variants needed for validation | ||||
| Model | nPeds | ≥2 | ≥3 | ≥4 |
| Not Associated | 150 | 0.010 | 0.0004 | 0.00002 |
| 250 | 0.026 | 0.002 | 0.0001 | |
| 350 | 0.049 | 0.005 | 0.0005 | |
| I | 150 | 1.0 | 0.99 | 0.98 |
| 250 | 1.0 | 1.0 | 1.0 | |
| 350 | 1.0 | 1.0 | 1.0 | |
| II | 150 | 0.98 | 0.94 | 0.85 |
| 250 | 1.0 | 1.0 | 0.99 | |
| 350 | 1.0 | 1.0 | 1.0 | |
| III | 150 | 0.78 | 0.54 | 0.32 |
| 250 | 0.95 | 0.86 | 0.70 | |
| 350 | 0.99 | 0.96 | 0.90 | |
| IV | 150 | 0.49 | 0.23 | 0.09 |
| 250 | 0.76 | 0.52 | 0.30 | |
| 350 | 0.90 | 0.74 | 0.54 | |
| V | 150 | 0.17 | 0.04 | 0.01 |
| 250 | 0.36 | 0.13 | 0.04 | |
| 350 | 0.52 | 0.26 | 0.10 | |
nPeds: number of pedigrees in stage II. Entries are the probability of meeting the specified validation criteria for a given sample size and model. Models are described in Table 1.
Number of false genes passed to stage II for various filtering strategies, averaged across the five disease models.
| Filter | Ns | Np = 10 | Np = 20 | Np = 30 | Np = 40 | Np = 60 |
| N1RV | 1 | 1359 | 2641 | 3857 | 5006 | 7119 |
| 2 | 174 | 346 | 520 | 687 | 1022 | |
| 3 | 87 | 175 | 262 | 347 | 518 | |
| N1TS | 1 | 199 | 396 | 591 | 785 | 1169 |
| 2 | 25 | 50 | 75 | 100 | 149 | |
| 3 | 12 | 24 | 38 | 50 | 74 | |
| N2RV | 1 | 34 | 138 | 305 | 527 | 1111 |
| 2 | 15 | 60 | 133 | 225 | 460 | |
| 3 | 12 | 45 | 96 | 159 | 316 | |
| N2TS | 1 | 9 | 37 | 83 | 144 | 309 |
| 2 | 4 | 17 | 37 | 65 | 138 | |
| 3 | 3 | 13 | 28 | 48 | 102 | |
| N3RV | 1 | 0 | 5 | 16 | 36 | 117 |
| 2 | 1 | 5 | 18 | 41 | 123 | |
| 3 | 1 | 6 | 20 | 43 | 124 |
Np: number of pedigrees in stage I; Ns: number of individuals sequenced per pedigree in stage I; Filters are described in Table 2.
Number of true genes found and study cost as a function of filter, number of pedigrees, and number of sequenced individuals for model II (GRR = 15, number of susceptibility genes = 15).
| nPeds | nCs | nExomes | N1RV | N1TS | N2RV | N2TS | N3RV | |||||
| cost | nTrue | cost | nTrue | cost | nTrue | cost | nTrue | Cost | nTrue | |||
| 10 | 1 | 10 | 3.5 | 6.98 | 0.5 | 4.11 | 0.1 | 1.80 | 0.1 | 1.24 | 0.0 | 0.31 |
| 10 | 2 | 20 | 0.5 | 5.32 | 0.1 | 3.06 | 0.1 | 1.73 | 0.1 | 1.34 | 0.1 | 0.31 |
| 10 | 3 | 30 | 0.4 | 5.10 | 0.2 | 2.71 | 0.2 | 1.93 | 0.1 | 1.33 | 0.1 | 0.32 |
| 20 | 1 | 20 | 6.7 | 10.87 | 1.1 | 7.07 | 0.4 | 5.40 | 0.2 | 4.12 | 0.1 | 1.85 |
| 20 | 2 | 40 | 1.0 | 8.72 | 0.3 | 5.24 | 0.3 | 5.15 | 0.2 | 4.25 | 0.2 | 2.21 |
| 20 | 3 | 60 | 0.7 | 8.17 | 0.3 | 4.93 | 0.4 | 5.59 | 0.3 | 4.10 | 0.3 | 2.39 |
| 30 | 1 | 30 | 9.8 | 12.97 | 1.6 | 9.09 | 0.9 | 8.34 | 0.3 | 6.58 | 0.2 | 4.15 |
| 30 | 2 | 60 | 1.6 | 10.95 | 0.4 | 7.54 | 0.6 | 8.41 | 0.3 | 6.95 | 0.3 | 4.53 |
| 30 | 3 | 90 | 1.0 | 10.45 | 0.5 | 6.92 | 0.6 | 8.34 | 0.4 | 6.78 | 0.4 | 4.93 |
| 40 | 1 | 40 | 12.7 | 13.86 | 2.2 | 10.67 | 1.5 | 10.76 | 0.5 | 9.26 | 0.3 | 6.67 |
| 40 | 2 | 80 | 2.1 | 12.33 | 0.6 | 8.64 | 0.9 | 10.64 | 0.5 | 9.21 | 0.4 | 7.37 |
| 40 | 3 | 120 | 1.4 | 11.65 | 0.6 | 8.41 | 0.9 | 10.46 | 0.6 | 9.26 | 0.6 | 7.70 |
| 60 | 1 | 60 | 18.1 | 14.69 | 3.2 | 12.61 | 3.0 | 13.26 | 1.0 | 12.03 | 0.6 | 10.92 |
| 60 | 2 | 120 | 3.1 | 13.93 | 0.9 | 11.22 | 1.7 | 13.33 | 0.9 | 11.69 | 0.8 | 11.17 |
| 80 | 1 | 80 | 22.9 | 14.90 | 4.2 | 13.66 | 5.0 | 14.53 | 1.7 | 13.39 | 1.0 | 13.24 |
GRR: genotype relative risk; nPeds: number of pedigrees sequenced; nCs: number of cases sequenced per pedigree; Cost is in million USD; nTrue: number of true susceptibility genes passed to the validation in stage II. Filters are described in Table 2.
Study cost and proportion of true genes passed to validation in stage II as a function of filter and genetic model, for selected sample sizes.
| Model | nPeds | nCs | N1RV | N1TS | N2RV | N2TS | N3RV | |||||
| cost | pr | cost | pr | cost | pr | cost | pr | cost | pr | |||
| GRR = 20,nGenes = 8 | 60 | 1 | 18.1 | 1.00 | 3.2 | 0.97 | 3.0 | 0.99 | 1.0 | 0.98 | 0.6 | 0.97 |
| 30 | 2 | 1.6 | 0.91 | 0.4 | 0.74 | 0.6 | 0.86 | 0.3 | 0.76 | 0.3 | 0.66 | |
| 20 | 3 | 0.7 | 0.81 | 0.3 | 0.55 | 0.4 | 0.67 | 0.3 | 0.55 | 0.3 | 0.41 | |
| GRR = 15,nGenes = 15 | 60 | 1 | 18.1 | 0.98 | 3.2 | 0.84 | 3.0 | 0.88 | 1.0 | 0.80 | 0.6 | 0.73 |
| 30 | 2 | 1.6 | 0.73 | 0.4 | 0.50 | 0.6 | 0.56 | 0.3 | 0.46 | 0.3 | 0.30 | |
| 20 | 3 | 0.7 | 0.54 | 0.3 | 0.33 | 0.4 | 0.37 | 0.3 | 0.27 | 0.3 | 0.16 | |
| GRR = 10,nGenes = 35 | 60 | 1 | 18.1 | 0.84 | 3.2 | 0.55 | 3.1 | 0.52 | 1.1 | 0.42 | 0.6 | 0.25 |
| 30 | 2 | 1.6 | 0.39 | 0.4 | 0.21 | 0.6 | 0.20 | 0.3 | 0.17 | 0.3 | 0.07 | |
| 20 | 3 | 0.7 | 0.24 | 0.3 | 0.13 | 0.4 | 0.11 | 0.3 | 0.08 | 0.3 | 0.03 | |
| GRR = 7.5,nGenes = 68 | 60 | 1 | 18.1 | 0.65 | 3.2 | 0.35 | 3.1 | 0.28 | 1.0 | 0.20 | 0.5 | 0.09 |
| 30 | 2 | 1.6 | 0.21 | 0.4 | 0.11 | 0.6 | 0.09 | 0.3 | 0.06 | 0.3 | 0.02 | |
| 20 | 3 | 0.7 | 0.12 | 0.3 | 0.06 | 0.4 | 0.05 | 0.3 | 0.03 | 0.3 | 0.01 | |
| GRR = 5,nGenes = 179 | 60 | 1 | 18.2 | 0.45 | 3.2 | 0.17 | 3.1 | 0.12 | 1.0 | 0.07 | 0.5 | 0.02 |
| 30 | 2 | 1.6 | 0.08 | 0.4 | 0.04 | 0.6 | 0.03 | 0.3 | 0.02 | 0.3 | 0.00 | |
| 20 | 3 | 0.7 | 0.04 | 0.3 | 0.02 | 0.4 | 0.01 | 0.3 | 0.01 | 0.3 | 0.00 | |
GRR: genotype relative risk; nGenes: number of susceptibility genes; nPeds: number of pedigrees sequenced; nCs: number of cases sequenced per pedigree; pr: proportion of true susceptibility genes passed to the validation in stage II; cost is in million USD. Filters are described in Table 2.
Number of true genes identified as a function of proportion of pathogenic mutations that are type TSJ.
| Np(Ns) = 20(1) | Np(Ns) = 20(2) | Np(Ns) = 40(1) | ||||
| nTrue | Cost | nTrue | Cost | nTrue | Cost | |
| Filter = N2RV pTSJ = 0.5 | 5.2 | 4.4 | 5.2 | 3.2 | 10.6 | 15.0 |
| Filter = N1TS pTSJ = 0.3 | 4.9 |
| 3.3 |
| 7.9 |
|
| Filter = N1TS pTSJ = 0.5 | 7.1 | 10.9 | 5.2 | 3.0 | 10.7 | 21.5 |
| Filter = N1TS pTSJ = 0.7 | 8.7 |
| 6.7 |
| 12.5 |
|
pTSJ: proportion of pathogenic mutations that are type TSJ; Np: number of pedigrees in stage I; Ns: number of individuals sequenced per pedigree in stage I. nTrue: number of true susceptibility genes passed to the validation in stage II; cost is in 100k USD.
Cost does not vary with pTSJ, therefore the number should be equal to that for Filter = N1TS pTSJ = 0.5. Filters are described in Table 2.