| Literature DB >> 25129038 |
Melanie Bahlo1, Rick Tankard, Vesna Lukic, Karen L Oliver, Katherine R Smith.
Abstract
High-throughput sequencing studies (HTS) have been highly successful in identifying the genetic causes of human disease, particularly those following Mendelian inheritance. Many HTS studies to date have been performed without utilizing available family relationships between samples. Here, we discuss the many merits and occasional pitfalls of using identity by descent information in conjunction with HTS studies. These methods are not only applicable to family studies but are also useful in cohorts of apparently unrelated, 'sporadic' cases and small families underpowered for linkage and allow inference of relationships between individuals. Incorporating familial/pedigree information not only provides powerful filtering options for the extensive variant lists that are usually produced by HTS but also allows valuable quality control checks, insights into the genetic model and the genotypic status of individuals of interest. In particular, these methods are valuable for challenging discovery scenarios in HTS analysis, such as in the study of populations poorly represented in variant databases typically used for filtering, and in the case of poor-quality HTS data.Entities:
Mesh:
Year: 2014 PMID: 25129038 PMCID: PMC4185103 DOI: 10.1007/s00439-014-1479-4
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
Fig. 1Identification of a causal variant in a family study, a prior to the advent of HTS, using linkage and Sanger sequencing, b advent of HTS, using only a proband, or one or two individuals from a family, ignoring relationships between the individuals, leading to lengthy lists of putative causal variants, and c mature HTS analysis, where family information is incorporated using genotypes derived from HTS data to perform a linkage or IBD analysis, leading to a much reduced list of variants
Fig. 2Dual workflow for HTS data analysis that takes advantage of HapMap SNPs for linkage mapping (a), and then uses this information in the filtering steps in (b)
HapMap SNPs available for linkage and IBD analysis based on ≥10× average coverage of the targeted regions in at least 50 % of samples
| Population | Available SNPs | Covered SNPs | % of total SNPs | Number SNPs with Het >0.3 | % of covered SNPs |
|---|---|---|---|---|---|
| CEU_2 | 3801563 | 153388 | 4.03 | 50231 | 32.75 |
| CHB_2 | 3827537 | 153484 | 4.01 | 47498 | 30.95 |
| JPT_2 | 3827726 | 153689 | 4.02 | 47351 | 30.81 |
| YRI_2 | 3750555 | 152239 | 4.06 | 51241 | 33.66 |
| CEU_3 | 1520715 | 88970 | 5.85 | 39174 | 44.03 |
| ASW_3 | 1463106 | 83303 | 5.69 | 41279 | 49.55 |
| CHB_3 | 1519591 | 88653 | 5.83 | 36806 | 41.52 |
| CHD_3 | 1246085 | 69281 | 5.56 | 35887 | 51.80 |
| GIH_3 | 1337706 | 74005 | 5.53 | 39370 | 53.20 |
| JPT_3 | 1518437 | 88455 | 5.83 | 36720 | 41.51 |
| LWK_3 | 1440446 | 83045 | 5.77 | 39271 | 47.29 |
| MEX_3 | 1380212 | 79198 | 5.74 | 38910 | 49.13 |
| MKK_3 | 1451099 | 81916 | 5.65 | 40321 | 49.22 |
| TSI_3 | 1347642 | 74645 | 5.54 | 38875 | 52.08 |
| YRI_3 | 1520811 | 89382 | 5.88 | 39684 | 44.40 |
The selected SNPs are over the four HapMap Phase II populations (designated by a _2) and the eleven HapMap Phase III populations (designated by _3). Data is based on the analysis of 20 Agilent V5 + UTR WES captured samples that were sequenced with Illumina HiSeq 2000 sequencing at the Australian Genome Research Facility, Melbourne, Australia