| Literature DB >> 21360310 |
Magdalena Harakalova1, Isaäc J Nijman, Jelena Medic, Michal Mokry, Ivo Renkens, Jan D Blankensteijn, Wigard Kloosterman, Annette F Baas, Edwin Cuppen.
Abstract
The costs and efforts for sample preparation of hundreds of individuals, their genomic enrichment for regions of interest, and sufficient deep sequencing bring a significant burden to next-generation sequencing-based experiments. We investigated whether pooling of samples at the level of genomic DNA would be a more versatile strategy for lowering the costs and efforts for common disease-associated rare variant detection in candidate genes or associated loci in a substantial patient cohort. We performed a pilot experiment using five pools of 20 abdominal aortic aneurysm (AAA) patients that were enriched on separate microarrays for the reported 9p21.3 associated locus and 42 additional AAA candidate genes, and sequenced on the SOLiD platform. Here, we discuss challenges and limitations connected to this approach and show that the high number of novel variants detected per pool and allele frequency deviations to the usually highly false positive cut-off region for variant detection in non-pooled samples can be limiting factors for successful variant prioritization and confirmation. We conclude that barcode indexing of individual samples before pooling followed by a multiplexed enrichment strategy should be preferred for detection of rare genetic variants in larger sample sets rather than a genomic DNA pooling strategy.Entities:
Mesh:
Year: 2011 PMID: 21360310 PMCID: PMC3099005 DOI: 10.1007/s12265-011-9263-5
Source DB: PubMed Journal: J Cardiovasc Transl Res ISSN: 1937-5387 Impact factor: 4.132
Fig. 1Schematic overview of the pathway model for common diseases. The pathway model of common diseases assumes that disease of a specific organ or tissue can be caused by impairment of any pathway influencing physiological function of the organ or tissue. Each pathway contains several genes and any point mutation or structural variation in relevant genes can contribute to the onset of the disease. Additionally, not only genetic but also intergenic mutations having a trans-effect on protein function should be considered. In addition to genetic factors, also environmental factors are expected to have a significant effect
Clinical information about the AAA patient set
| Patient pool | Diameter of aorta [mm]a | Gender f/m | Operated | Rupture | Ever smoked | Hypertension | Cardiovascular disease | Familial AAA |
|---|---|---|---|---|---|---|---|---|
| 1 | 59.9 | 5/15 | 16 | 0 | 0 | 13 | 10 | 8 |
| 2 | 61.2 | 4/16 | 14 | 0 | 14 | 14 | 8 | 20 |
| 3 | 60.8 | 0/20 | 15 | 0 | 20 | 8 | 20 | 0 |
| 4 | 51.4 | 2/18 | 11 | 0 | 12 | 20 | 11 | 1 |
| 5 | 60.6 | 2/18 | 14 | 2 | 13 | 10 | 9 | 0 |
aDiameter of infrarenal aorta. The cut-off for diagnosis of AAA was set at 30 mm.
Overview of the locus and genes included into array design
| Gene name | Gene description | Ensemble gene name | GRCh37 location |
|---|---|---|---|
| 1. 9p21.3 locus | Chromosome 9: 21,750,000–22,400,000 | ||
| Containing genes | |||
| CDKN2A | Cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4) | ENSG00000147889 | Chromosome 9: 21,967,752–21,995,300 |
| CDKN2B | Cyclin-dependent kinase inhibitor 2B (p15, inhibits CDK4) | ENSG00000147883 | Chromosome 9: 22,002,902–22,009,280 |
| CDKN2BAS | CDKN2B antisense RNA (non-protein coding) | ENSG00000240498 | Chromosome 9: 21,994,790–22,121,096 |
| 2. CDK pathway | |||
| CDK1 | Cyclin-dependent kinase 1 | ENSG00000170312 | Chromosome 10: 62,538,101–62,554,610 |
| CDK2 | Cyclin-dependent kinase 2 | ENSG00000123374 | Chromosome 12: 56,360,556–56,366,565 |
| CDK3 | Cyclin-dependent kinase 3 | ENSG00000108504 | Chromosome 17: 73,975,312–74,002,080 |
| CDK4 | Cyclin-dependent kinase 4 | ENSG00000135446 | Chromosome 12: 58,142,005–58,146,164 |
| CDK5 | Cyclin-dependent kinase 5 | ENSG00000164885 | Chromosome 7: 150,750,899–150,754,996 |
| CDK6 | Cyclin-dependent kinase 6 | ENSG00000105810 | Chromosome 7: 92,234,239–92,465,941 |
| CDK7 | Cyclin-dependent kinase 7 | ENSG00000134058 | Chromosome 5: 68,530,622–68,573,256 |
| CDKN2C | Cyclin-dependent kinase inhibitor 2C | ENSG00000123080 | Chromosome 1: 51,426,417–51,440,309 |
| CDKN2D | Cyclin-dependent kinase inhibitor 2D | ENSG00000129355 | Chromosome 19: 10,677,139–10,679,655 |
| TP53 | Tumor protein p53 | ENSG00000141510 | Chromosome 17: 7,565,257–7,590,863 |
| MDM1 | Mdm1 nuclear protein homolog (mouse) | ENSG00000111554 | Chromosome 12: 68,688,346–68,726,161 |
| MDM2 | Mdm2 p53 binding protein homolog (mouse) | ENSG00000135679 | Chromosome 12: 69,201,980–69,234,214 |
| 3. TGFbeta pathway | |||
| TGFB1 | Transforming growth factor, beta 1 | ENSG00000105329 | Chromosome 19: 41,836,651–41,859,816 |
| TGFB2 | Transforming growth factor, beta 2 | ENSG00000092969 | Chromosome 1: 218,519,391–218,617,959 |
| TGFB3 | Transforming growth factor, beta 3 | ENSG00000119699 | Chromosome 14: 76,424,442–76,448,092 |
| FBN1 | Fibrillin 1 | ENSG00000166147 | Chromosome 15: 48,700,505–48,937,918 |
| LTBP1 | Latent transforming growth factor beta binding protein 1 | ENSG00000049323 | Chromosome 2: 33,359,706–33,624,576 |
| THBS1 | Thrombospondin 1 | ENSG00000137801 | Chromosome 15: 39,873,280–39,889,665 |
| DCN | Decorin | ENSG00000011465 | Chromosome 12: 91,539,036–91,576,806 |
| ACVRL1 | Activin A receptor type II-like 1 | ENSG00000139567 | Chromosome 12: 52,301,202–52,317,145 |
| ACVR1B | Activin A receptor, type IB | ENSG00000135503 | Chromosome 12: 52,345,486–52,390,857 |
| TGFBR1 | Transforming growth factor, beta receptor I | ENSG00000106799 | Chromosome 9: 101,867,412–101,916,474 |
| TGFBR2 | Transforming growth factor, beta receptor II | ENSG00000163513 | Chromosome 3: 30,647,994–30,735,634 |
| TGFBR3 | Transforming growth factor, beta receptor III | ENSG00000069702 | Chromosome 1: 92,145,900–92,371,559 |
| ENG | Endoglin | ENSG00000106991 | Chromosome 9: 130,577,291–130,617,047 |
| SMAD2 | SMAD family member 2 | ENSG00000175387 | Chromosome 18: 45,359,466–45,456,926 |
| SMAD3 | SMAD family member 3 | ENSG00000166949 | Chromosome 15: 67,358,195–67,487,532 |
| SMAD4 | SMAD family member 4 | ENSG00000141646 | Chromosome 18: 48,556,583–48,611,415 |
| SMAD6 | SMAD family member 6 | ENSG00000137834 | Chromosome 15: 66,994,634–67,074,323 |
| SMAD7 | SMAD family member 7 | ENSG00000101665 | Chromosome 18: 46,446,224–46,477,081 |
| 4. Other candidate genes | |||
| ELN | Elastin | ENSG00000049540 | Chromosome 7: 73,442,119–73,484,237 |
| ACTA2 | Actin, alpha 2, smooth muscle, aorta | ENSG00000107796 | Chromosome 10: 90,694,831–90,751,147 |
| MYH11 | Myosin, heavy chain 11, smooth muscle | ENSG00000133392 | Chromosome 16: 15,796,992–15,950,890 |
| ACE | Angiotensin I converting enzyme (peptidyl-dipeptidase A) 1 | ENSG00000159640 | Chromosome 17: 61,554,432–61,599,209 |
| NOS | Nitric oxide synthase 3 (endothelial cell) | ENSG00000164867 | Chromosome 7: 150,688,083–150,711,676 |
| MMP2 | Matrix metallopeptidase 2 (gelatinase A, 72 kDa gelatinase, 72 kDa type IV collagenase) | ENSG00000087245 | Chromosome 16: 55,512,883–55,540,603 |
| MMP9 | Matrix metallopeptidase 9 (gelatinase B, 92 kDa gelatinase, 92 kDa type IV collagenase) | ENSG00000100985 | Chromosome 20: 44,637,547–44,645,200 |
| TIMP1 | TIMPmetallopeptidase inhibitor 1 | ENSG00000102265 | Chromosome X: 47,441,712–47,446,188 |
| MTHFR | Methylenetetrahydrofolate reductase (NAD(P)H) | ENSG00000177000 | Chromosome 1: 11,845,780–11,866,977 |
| ABCA1 | ATP-binding cassette, sub-family A (ABC1), member 1 | ENSG00000165029 | Chromosome 9: 107,543,283–107,690,518 |
| ABCB5 | ATP-binding cassette, sub-family B (MDR/TAP), member 5 | ENSG00000004846 | Chromosome 7: 20,654,830–20,816,658 |
| ABCC6 | ATP-binding cassette, sub-family C (CFTR/MRP), member 6 | ENSG00000091262 | Chromosome 16: 16,243,422–16,317,328 |
Fig. 2Sequencing, mapping and enrichment statistics. The total number of reads produced, number of reads mapping to the reference genome and number of reads overlapping with the enrichment design (a), the percentage of mappability and enrichment efficiency (b), design footprint covered by ≥1 read, by ≥20 reads, and by ≥500 reads (c), mean and median bp coverage (d), and evenness score (e) show similar pattern for all five patient pools indicating the robustness of the method. The number of novel variants detected in all five patient pools correlates with the size of the coding sequence of genes (f) indicating a random distribution of detected variants. It is not possible to statistically test for enrichment of variants in genes since we did not include control genes or control samples (full line trend line, dashed line mean value line, R 2 trend line equation)
Fig. 3Distribution of sequencing coverage. The plot indicates the percentage of bases in the design that is covered by the depth of coverage normalized for the average coverage. Example: on average 99% of all bp is covered by at least 10-percentile of mean coverage (251.4 ± 48.1 per pool) and approximately 40% of all bp reached the mean coverage (2,514 ± 481 per pool)
List of mutations detected in five pools
| Type of mutation | Patient pool | ||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| Non-synonymous codinga | 229 | 193 | 265 | 239 | 236 |
| Stop-gained | 5 | 9 | 7 | 7 | 5 |
| Stop-lost | 1 | 1 | 0 | 0 | 0 |
| Essential splice site | 2 | 3 | 2 | 4 | 5 |
| Splice site | 9 | 11 | 7 | 11 | 14 |
| Synonymous coding | 61 | 56 | 61 | 57 | 64 |
| 5'UTR | 48 | 31 | 47 | 60 | 54 |
| 3'UTR | 191 | 185 | 201 | 201 | 222 |
| Intronic | 79 | 80 | 91 | 87 | 89 |
| Downstream | 222 | 228 | 251 | 276 | 255 |
| Upstream | 285 | 273 | 329 | 319 | 299 |
| Intergenic | 578 | 582 | 649 | 739 | 626 |
| Within non-coding gene | 357 | 352 | 380 | 403 | 406 |
| Small InDels | 28 | 31 | 29 | 30 | 27 |
| All mutations | 2,095 | 2,035 | 2,319 | 2,433 | 2,302 |
aEnsemble predicted variation consequences
Fig. 4Non-reference allele percentage distribution of detected variants in AAA patient pools. Non-reference allele (NRA) percentage distribution of detected variants in AAA pools is significantly deviating from the commonly observed pattern in non-pooled experiments with a clear peak for heterozygous mutations at 30–50% NRA and homozygous mutations close to 100% NRA. Dashed lines are indicative of a distribution of NRA% of novel and known variants from an exome sequencing of a healthy individual
List of novel variants selected for capillary sequencing confirmation
| Chr. | Positiona | Allele change | Amino acid change | Amino acid positionb | Gene name | Gerp1c | Gerp2d | Mutation effect | Patient pool NRA% | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |||||||||
| 1 | 218520046 | G/A | MET/ILE | 1/443 | TGFB2 | 2.82 | 1.165 | Non-synonymous coding | 12 | 13 | 11 | 11e | 8 |
| 2 | 33477855 | G/C | CYS/SER | 378/1397 | LTBP1 | 2.82 | 2.171 | Non-synonymous coding | 8 | 16 | 13 | 15e | - |
| 3 | 30732960 | C/A | PRO/THR | 550/593 | TGFBR2 | 2.9 | 2.094 | Non-synonymous coding | 7e | – | – | – | – |
| 3 | 148459750 | A/T | LYS/stop | 310/360 | AGTR1 | No score | No score | Stop-gained | – | – | – | 3e | – |
| 5 | 68555715 | A/C | LYS/THR | 160/347 | CDK7 | 2.82 | 1.788 | Non-synonymous coding | – | 14 | – | 15e | – |
| 7 | 150698940 | A/G | THR/ALA | 512/1204 | NOS3 | 2.32 | 1.41 | Non-synonymous coding | 10e | 5 | – | 7 | 5 |
| 9 | 101891203 | T/G | PHE/CYS | 55/427 | TGFBR1 | 2.9 | 1.796 | Non-synonymous coding | 5 | 5 | 8 | 6e | 5 |
| 9 | 101891219 | G/C | GLU/ASP | 60/427 | TGFBR1 | 1.96 | 1.601 | Non-synonymous coding | – | – | 6e | 4 | – |
| 9 | 101894941 | C/T | SER/LEU | 165/504 | TGFBR1 | 2.9 | 1.614 | Non-synonymous coding | 5 | – | 4e | – | 3 |
| 12 | 52309230 | A/T | LYS/stop | 332/504 | ACVRL1 | 1.95 | 1.444 | Stop-gained | – | 6 | 5e | – | 3 |
| 12 | 68707507 | T/C | LYS/ARG | 509/715 | MDM1 | −1.15 | 0.841 | Non-synonymous coding | 18e | 19 | – | 15 | – |
| 15 | 48787411 | A/T | CYS/stop | 862/2872 | FBN1 | 1.66 | 1.497 | Stop-gained | 3 | 3 | 4 | 3e | – |
| 15 | 48807601 | A/T | LEU/HIS | 484/2872 | FBN1 | 2.9 | 1.508 | Non-synonymous coding | – | 3 | 3e | – | 3 |
| 15 | 48888508 | G/C | TYR/stop | 170/2872 | FBN1 | 1.11 | 1.114 | Stop gained | – | 3 | 4e | – | – |
| 15 | 48888520 | A/T | CYS/stop | 166/2872 | FBN1 | −0.67 | 0.961 | Stop-gained | 4 | 4 | 7e | – | 8 |
| 15 | 67073635 | A/G | ASP/GLY | 157/236 | SMAD6 | 1.87 | 1.144 | Non-synonymous coding | 5e | – | 4 | 3 | 6 |
| 16 | 15841741 | C/G | CYS/SER | 754/1946 | MYH11 | 2.82 | 2.118 | Non-synonymous coding | 6 | 8 | 11 | 13e | 11 |
| 16 | 15932031 | G/A | GLN/stop | 27/1946 | MYH11 | 2.82 | 2.017 | Stop-gained | – | – | – | 3e | – |
| 16 | 16282720 | A/G | PHE/LEU | 583/1504 | ABCC6 | 2.82 | 1.713 | Non-synonymous coding | 5e | 6 | – | 7 | 6 |
| 16 | 55530885 | T/C | LEU/PRO | 457/611 | MMP2 | 2.9 | 1.361 | Non-synonymous coding | 4 | 4 | 5 | 5e | 6 |
aEnsembl59 coordinate
bFirst number indicates the position of mutation, second total number of amino acids in a transcript
cEnsembl Genomic Evolutionary Rate Profiling (GERP) score
dEnsembl predicted variation consequences
ePool used for cappilary sequencing confirmation