| Literature DB >> 24575121 |
Zubin H Patel1, Leah C Kottyan2, Sara Lazaro2, Marc S Williams3, David H Ledbetter3, Hbgerard Tromp3, Andrew Rupert4, Mojtaba Kohram4, Michael Wagner4, Ammar Husami5, Yaping Qian5, C Alexander Valencia5, Kejian Zhang5, Margaret K Hostetter6, John B Harley2, Kenneth M Kaufman2.
Abstract
Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms.Entities:
Keywords: CASSI; Mendel errors; Mendelian errors; disease causative polymorphisms; next-generation sequencing; variant filtering; whole exome sequencing
Year: 2014 PMID: 24575121 PMCID: PMC3921572 DOI: 10.3389/fgene.2014.00016
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Sequencing quality parameters for all three individuals in blood, buccal, and saliva trio.
| Sample | Percentage of reads with GQS > 30 (%) | Mean GQS | Percentage of targeted sequence covered (%) | Mean read depth |
|---|---|---|---|---|
| Blood – proband | 84.19 | 33.46 | 97.07 | 167 |
| Blood – father | 83.58 | 33.25 | 96.46 | 146 |
| Blood – mother | 84.57 | 33.57 | 94.69 | 150 |
| Buccal– proband | 84.12 | 33.4 | 95.89 | 90 |
| Buccal – father | 84.82 | 33.63 | 96.73 | 155 |
| Buccal – mother | 85.04 | 33.71 | 97.35 | 136 |
| Saliva – proband | 83.38 | 33.17 | 95.95 | 106 |
| Saliva – father | 84.21 | 33.46 | 95.93 | 154 |
| Saliva – mother | 84.31 | 33.48 | 96.3 | 132 |
Concordance analysis of DNA from three individuals was collected from three biological sources and sequenced.
| DNA source | Sample | Concordance rate no filter applied (%) | Concordance rate all filters applied, not including variants that are unique to a single DNA source (%) | Concordance rate all filters applied, including variants that are unique to a single DNA source (%) |
|---|---|---|---|---|
| Blood vs. buccal | Individual 1 | 96.23 | 99.99 | 84.06 |
| Individual 2 | 96.61 | 99.99 | 84.71 | |
| Individual 3 | 96.62 | 99.98 | 84.48 | |
| Blood vs. saliva | Individual 1 | 96.30 | 99.99 | 84.22 |
| Individual 2 | 96.53 | 99.99 | 84.42 | |
| Individual 3 | 96.50 | 99.99 | 83.79 | |
| Buccal vs. saliva | Individual 1 | 96.27 | 99.99 | 84.14 |
| Individual 2 | 96.86 | 99.98 | 85.05 | |
| Individual 3 | 96.73 | 99.99 | 84.22 |
DNA from the proband (child) was collected from three biological sources and sequenced.
| DNA source | Unique compared to blood | Unique compared to buccal | Unique compared to saliva | Unique compared to the other two sources | |
|---|---|---|---|---|---|
| Blood | Unfiltered | 2636 | 1997 | 1095 | |
| Filtered | 10 | 4 | 2 | ||
| Buccal | Unfiltered | 1267 | 1437 | 535 | |
| Filtered | 0 | 2 | 0 | ||
| Saliva | Unfiltered | 1268 | 1438 | 669 | |
| Filtered | 1 | 0 | 0 |
Candidate causative sequence variants were identified in unfiltered and filtered data from the same trio that was sequenced three times using different DNA sources.
| DNA source | De novo variants | Recessive homozygous variants | Compound heterozygous | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Called | Unique to a single DNA source | Common to all DNA sources | Called | Unique to a single DNA source | Common to all DNA Sources | Called | Unique to a single DNA source | Common to all DNA sources | |
| Blood | 321 | 228 | 12 | 32 | 3 | 23 | 242 | 47 | 153 |
| Buccal | 306 | 219 | 12 | 36 | 12 | 23 | 285 | 79 | 153 |
| Saliva | 304 | 230 | 12 | 28 | 4 | 23 | 284 | 80 | 153 |
| Blood | 1 | 0 | 0 | 3 | 0 | 3 | 47 | 21 | 17 |
| Buccal | 0 | 0 | 0 | 3 | 0 | 3 | 39 | 11 | 17 |
| Saliva | 1 | 0 | 0 | 3 | 0 | 3 | 45 | 16 | 17 |