| Literature DB >> 22536414 |
John W Cole1, O Colin Stine, Xinyue Liu, Abhishek Pratap, Yuching Cheng, Luke J Tallon, Lisa K Sadzewicz, Nicole Dueker, Marcella A Wozniak, Barney J Stern, James F Meschia, Braxton D Mitchell, Steven J Kittner, Jeffrey R O'Connell.
Abstract
The genetic architecture of ischemic stroke is complex and is likely to include rare or low frequency variants with high penetrance and large effect sizes. Such variants are likely to provide important insights into disease pathogenesis compared to common variants with small effect sizes. Because a significant portion of human functional variation may derive from the protein-coding portion of genes we undertook a pilot study to identify variation across the human exome (i.e., the coding exons across the entire human genome) in 10 ischemic stroke cases. Our efforts focused on evaluating the feasibility and identifying the difficulties in this type of research as it applies to ischemic stroke. The cases included 8 African-Americans and 2 Caucasians selected on the basis of similar stroke subtypes and by implementing a case selection algorithm that emphasized the genetic contribution of stroke risk. Following construction of paired-end sequencing libraries, all predicted human exons in each sample were captured and sequenced. Sequencing generated an average of 25.5 million read pairs (75 bp×2) and 3.8 Gbp per sample. After passing quality filters, screening the exomes against dbSNP demonstrated an average of 2839 novel SNPs among African-Americans and 1105 among Caucasians. In an aggregate analysis, 48 genes were identified to have at least one rare variant across all stroke cases. One gene, CSN3, identified by screening our prior GWAS results in conjunction with our exome results, was found to contain an interesting coding polymorphism as well as containing excess rare variation as compared with the other genes evaluated. In conclusion, while rare coding variants may predispose to the risk of ischemic stroke, this fact has yet to be definitively proven. Our study demonstrates the complexities of such research and highlights that while exome data can be obtained, the optimal analytical methods have yet to be determined.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22536414 PMCID: PMC3334983 DOI: 10.1371/journal.pone.0035591
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
GEOS population characteristics by case-control status.
| Cases(n = 889) | Controls(n = 927) |
| |
| Age (mean ± SD, years) | 41.3±6.9 | 39.6±6.8 | <0.001 |
| Female (%) | 41.5 | 43.6 | 0.37 |
| Self-reported race (%) | 0.22 | ||
| European ancestry | 52.42 | 56.42 | |
| African ancestry | 42.41 | 38.51 | |
| Others | 5.17 | 5.07 | |
| TOAST Subtype (%) | --- | --- | |
| Cardioembolic | 20.0 | ||
| Large Artery | 7.1 | ||
| Lacunar | 16.1 | ||
| Other Known Causes | 6.5 | ||
| Undetermined Causes | 50.3 | ||
| Hypertension (%) | 42.7 | 19.2 | <0.001 |
| Diabetes mellitus (%) | 16.7 | 5.1 | <0.001 |
| Angina/MI (%) | 5.3 | 0.7 | <0.001 |
| Current smokers (%) | 42.5 | 28.6 | <0.001 |
Characteristics of the 10 male stroke cases implemented in the pilot study.
| Study-ID | Ethnicity | Age | Positive Family History | Hypertension | Diabetes | Smoking | TOAST Stroke Type |
| M-006 | African American | 46 | No | No | No | Former | Dissection |
| M-0114 | African American | 48 | Yes | Yes | Yes | Never | Dissection |
| M-0379 | African American | 44 | Yes | No | No | Never | Cryptogenic |
| M-0432 | African American | 42 | Yes | No | No | Current | Lacunar |
| M-0823 | African American | 41 | Yes | Yes | No | Never | Cardioembolic |
| M-1012 | African American | 45 | Yes | No | No | Never | Lacunar |
| M-1096 | African American | 42 | Yes | No | No | Never | Cryptogenic |
| M-1107 | African American | 47 | Yes | Yes | No | Never | Lacunar |
| SW-393 | Caucasian | 57 | Yes | Yes | No | Never | Lacunar |
| SW-708 | Caucasian | 61 | Yes | Yes | No | Former | Lacunar |
Figure 1Summary of the pipeline steps.
Sequencing and Alignment Statistics for Ten Stroke Exomes.
| Sample ID | M006 | M0114 | M0379 | M0432 | M0823 | M1012 | M1096 | M1107 | SW393 | SW708 |
| Total Reads (75/76 bp) in millions(rounded) | 43,863,768 | 37,155,698 | 40,247,747 | 39,894,677 | 34,450,161 | 41,907,790 | 39,676,183 | 34,271,695 | 40,443,932 | 44,516,721 |
| Mapping % | 98.18% | 96.64% | 98.10% | 95.69% | 97.41% | 96.58% | 96.49% | 97.34% | 98.30% | 98.48% |
| Good pairs % | 97.50% | 96.01% | 97.12% | 94.90% | 96.90% | 95.89% | 95.79% | 96.92% | 97.75% | 98.08% |
| Singletons % | 0.55% | 0.53% | 0.66% | 0.74% | 0.45% | 0.65% | 0.65% | 0.39% | 0.49% | 0.32% |
| % reads in targeted regions | 49.74 | 53.92 | 54.10 | 54.94 | 61.36 | 59.61 | 55.54 | 53.34 | 49.97 | 54.65 |
| % reads in targeted regions +/−100 bp | 64.55 | 66.54 | 65.77 | 61.80 | 69.39 | 67.09 | 61.96 | 60.48 | 60.39 | 65.07 |
| % reads in targeted regions +/−200 bp | 71.14 | 72.78 | 71.21 | 66.80 | 74.93 | 72.46 | 66.88 | 65.18 | 65.18 | 69.84 |
| Overall coverage of target region | 42.6× | 38.58× | 42.56× | 41.45× | 41.02× | 39.62× | 42.37× | 43.01× | 43.65 | 43.2× |
| % of targeted baits with >20× coverage | 76.84 | 75.26 | 79.22 | 78.20 | 78.56 | 77.13 | 78.25 | 77.98 | 78.67 | 76.83 |
| % of targeted baits with >30× coverage | 64.37 | 57.81 | 65.88 | 64.07 | 63.60 | 61.15 | 65.06 | 64.98 | 66.37 | 65.29 |
| % of baits with no reads aligning | 0.76 | 0.56 | 0.44 | 0.62 | 0.70 | 0.71 | 0.67 | 0.72 | 0.49 | 0.95 |
SNP Summary for SNPs in target exome regions only.
| SAMPLE ID | M006 | M0114 | M0379 | M0432 | M0823 | M1012 | M1096 | M1107 | SW393 | SW708 |
|
| 31,990 | 31,097 | 31,756 | 31,255 | 32,368 | 31,385 | 30,942 | 31,595 | 26,472 | 26,942 |
|
| 27,364 | 26,887 | 27,506 | 26,260 | 27,139 | 26,198 | 26,045 | 26,797 | 24,116 | 24,509 |
|
| 28,103 | 26,844 | 28,205 | 26,668 | 28,022 | 26,787 | 26,449 | 27,574 | 23,262 | 23,697 |
|
| 25,097 | 24,486 | 25,370 | 23,899 | 24,935 | 23,830 | 23,674 | 24,651 | 22,182 | 22,567 |
|
| 3,006 | 2,358 | 2,835 | 2,769 | 3,087 | 2,957 | 2,775 | 2,923 | 1,080 | 1,130 |
|
| 8,313 | 6,033 | 6,583 | 4,750 | 4,909 | 5,193 | 4,737 | 4,823 | 5,214 | 5,599 |
Figure 2SNPs in the 10 cases exomes vs. 3 control populations.
Select Gene Analyses: 10 Exome Cases vs. 3 Control Populations.
| Exon bases | 10 exome snps | 10 exome rs | 10 exome rs (%) | 10 exome novel | ASW snps | YRI snps | CEU snps | Shared ASW-YRI | Shared ASW-10exome | Shared YRI-10exome | Shared ASW-YRI-10exome | Shared CEU-ASW | Shared CEU-YRI | Shared CEU-10exome | Shared All | Non-Common SNP density | |
|
| 162,201 | 122,052 | 75.2 | 40,149 | 60,197 | 80,180 | 72,800 | 56,102 | 50,586 | 62,066 | 49,853 | 63,931 | 58,131 | 47,762 | 41,936 | ||
|
| |||||||||||||||||
|
| 1312 |
| 7 | 70.0 | 3 | 4 | 5 | 5 | 4 | 3 | 4 | 3 | 4 | 5 | 4 |
|
|
|
| 8622 | 35 | 28 | 80.0 | 7 | 10 | 17 | 17 | 10 | 10 | 16 | 10 | 10 | 16 | 15 | 10 | 2.9 |
|
| 3512 | 20 | 20 | 100 | 0 | 19 | 18 | 12 | 18 | 16 | 15 | 15 | 12 | 12 | 11 | 11 | 2.7 |
|
| 2025 | 46 | 46 | 100 | 0 | 6 | 16 | 13 | 6 | 5 | 14 | 5 | 6 | 13 | 11 | 5 | 20.3 |
|
| 2300 | 11 | 10 | 90.9 | 1 | 3 | 7 | 5 | 3 | 3 | 7 | 3 | 2 | 5 | 5 | 2 | 3.9 |
|
| 9798 | 25 | 15 | 60.0 | 10 | 2 | 10 | 7 | 2 | 2 | 9 | 2 | 2 | 4 | 3 | 2 | 2.4 |
|
| 4582 | 8 | 4 | 50.0 | 4 | 3 | 3 | 2 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 1.3 |
|
| 4332 | 17 | 14 | 82.4 | 3 | 8 | 13 | 15 | 8 | 5 | 9 | 5 | 8 | 13 | 9 | 5 | 2.8 |
|
| 2821 | 7 | 3 | 42.9 | 4 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 2.1 |
|
| |||||||||||||||||
|
| 9915 | 31 | 28 | 90.3 | 3 | 10 | 14 | 13 | 9 | 9 | 12 | 9 | 8 | 11 | 11 | 8 | 2.3 |
|
| 1782 | 2 | 2 | 100 | 0 | 2 | 2 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1.1 |
|
| 4976 | 8 | 7 | 87.5 | 1 | 8 | 8 | 9 | 6 | 4 | 4 | 3 | 7 | 6 | 4 | 3 | 1.0 |
|
| 13243 | 48 | 42 | 87.5 | 6 | 34 | 36 | 44 | 26 | 22 | 28 | 21 | 31 | 28 | 27 | 20 | 2.1 |
|
| 11020 | 41 | 37 | 90.2 | 4 | 28 | 39 | 34 | 27 | 25 | 33 | 25 | 26 | 33 | 30 | 23 | 1.6 |
|
| 6226 | 21 | 17 | 81.0 | 4 | 10 | 12 | 12 | 9 | 8 | 9 | 8 | 10 | 11 | 9 | 8 | 2.1 |
|
| 1189 | 5 | 5 | 100 | 0 | 3 | 3 | 4 | 2 | 3 | 3 | 2 | 3 | 3 | 4 | 2 | 2.5 |
|
| 4666 | 19 | 18 | 94.7 | 1 | 10 | 12 | 13 | 10 | 10 | 12 | 10 | 7 | 9 | 10 | 7 | 2.6 |
|
| 1277 | 2 | 2 | 100 | 0 | 3 | 2 | 3 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 0.0 |
|
| 8896 | 22 | 22 | 100 | 0 | 9 | 12 | 11 | 7 | 6 | 7 | 5 | 7 | 7 | 6 | 3 | 2.1 |
|
| 3054 | 8 | 6 | 75.0 | 2 | 5 | 7 | 7 | 5 | 3 | 3 | 3 | 3 | 4 | 4 | 3 | 1.6 |
Genes in which at least two variations which are novel.
| Sample ID | ||||||||||
| Gene/Isoform | M006 | M0114 | M0379 | M0432 | M0823 | M1012 | M1096 | M1107 | SW393 | SW708 |
| OR4C3/NM_001004702 | 6 | 4 | 4 | 4 | 3 | 5 | 4 | 10 | 4 | 5 |
| CTBP2/NM_001083914 | 8 | 7 | 5 | 10 | 7 | 8 | 9 | 7 | 7 | 9 |
| FRG2C/NM_001124759 | 7 | 6 | 8 | 7 | 5 | 8 | 6 | 8 | 11 | 7 |
| AQP7/NM_001170 | 6 | 8 | 11 | 8 | 7 | 6 | 8 | 4 | 6 | 7 |
| CTBP2/NM_001329 | 8 | 7 | 5 | 10 | 7 | 8 | 9 | 7 | 7 | 9 |
| ASRD/NM_001669 | 10 | 9 | 9 | 9 | 9 | 10 | 7 | 11 | 11 | 11 |
| HLA-A/NM_002116 | 6 | 5 | 7 | 7 | 8 | 6 | 2 | 4 | 3 | 5 |
| ASRD/NM_009589 | 9 | 9 | 9 | 9 | 9 | 10 | 7 | 11 | 11 | 11 |
| CTBP2/NM_022802 | 6 | 6 | 3 | 7 | 5 | 7 | 7 | 6 | 5 | 8 |
Figure 3Compound heterozygotes in exome data for genes in which every case had a least two novel variants in the same gene isoform, only 6 genes (9 total isoforms) satisfied this criterion.
The figure illustrates the 9 isoforms (left y-axis) per sample (x-axis) by variant type (left y-axis). Notably, two of the CTPB2 isoforms (NM_001329 and NM_ 001083914) were seen to have a non-sense codon occurring in all 10 samples.
Figure 4This image shows the start region for CTPB2 gene (chromosome 10) in the Integrated Genomics Viewer.
The top three tracks show aligned sequence coverage for three of the pilot exomes and demonstrate the T to A substitution resulting in the nonsense codon as compared to the two bottom tracks showing the annotated RefSeq and codon positions.