| Literature DB >> 25786579 |
Fuyuki Miya1, Mitsuhiro Kato2, Tadashi Shiohama3, Nobuhiko Okamoto4, Shinji Saitoh5, Mami Yamasaki6, Daichi Shigemizu1, Tetsuo Abe1, Takashi Morizono1, Keith A Boroevich1, Kenjiro Kosaki7, Yonehiro Kanemura8, Tatsuhiko Tsunoda1.
Abstract
Whole-exome sequencing (WES) is a useful method to identify disease-causing mutations, however, often no candidate mutations are identified using commonly available targeted probe sets. In a recent analysis, we also could not find candidate mutations for 20.9% (9/43) of our pedigrees with congenital neurological disorder using pre-designed capture probes (SureSelect V4 or V5). One possible cause for this lack of candidates is that standard WES cannot sequence all protein-coding sequences (CDS) due to capture probe design and regions of low coverage, which account for approximately 10% of all CDS regions. In this study, we combined a selective circularization-based target enrichment method (HaloPlex) with a hybrid capture method (SureSelect V5; WES), and achieved a more complete coverage of CDS regions (~97% of all CDS). We applied this approach to 7 (SureSelect V5) out of 9 pedigrees with no candidates through standard WES analysis and identified novel pathogenic mutations in one pedigree. The application of this effective combination of targeted enrichment methodologies can be expected to aid in the identification of novel pathogenic mutations previously missed by standard WES analysis.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25786579 PMCID: PMC4365396 DOI: 10.1038/srep09331
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
CDS coverage of each platform
| Platform | On-target region size | CDS coverage predicted by probe design | Read bases of raw NGS data | Mean depth of CDS region [median] | CDS coverage of depth ≥10 called data | CDS coverage of depth ≥15 called data |
|---|---|---|---|---|---|---|
| Agilent SureSelect V4 | 87.49 Mb | 93.88% | 6.84 Gb | 79.15 [61.78] | 90.57% | 87.56% |
| Agilent SureSelect V5 | 89.48 Mb | 96.53% | 6.38 Gb | 73.10 [65.02] | 93.85% | 91.90% |
| NimbleGen SeqCap ez Human Library v2 | 81.58 Mb | 96.08% | 18.68 Gb | 199.07 [172.00] | 93.54% | 92.98% |
| Illumina TruSeq | 100.33 Mb | 94.28% | 11.40 Gb | 61.72 [60.00] | 90.23% | 88.63% |
*Within 100 bp upstream and downstream of the capture targets;
†SureSelect V4 and V5 data indicate average of 38 and 104 samples of our experiments, respectively;
‡the original data were obtained and have been deposited by Clark et al.9, and re-analyzed them with our analysis pipeline; More details are shown in Supplementary Table S1.
Figure 1Analysis pipeline for combination of WES and CCCS.
WES was performed for all pedigrees. Complementary CDS sequencing (CCCS) was then performed in those where no candidate mutations were identified. The three steps, PCR duplication filter, adapter sequence removal and remapping using BLAT, differ between WES and CCCS analysis.
Figure 2Sequencing of CDS regions.
(a) Read depth distribution of on-target CDS regions in WES and CCCS. The blue and orange lines indicate the proportion and the filled areas indicate cumulative frequency. (b) CDS coverage by WES and CCCS. The red shaded area indicates the bases sequenced only with CCCS. (c) Combined sequencing coverage for WES and CCCS by chromosome. Bar lengths do not reflect chromosome length but the total number of CDS bases. Lime green and percentage indicate proportion of RD10 bases for each chromosome. All values are average across the 28 tested samples.
Figure 3Identified mutation in a family with microcephaly.
(a) Family tree of the pedigree with microcephaly. Shaded symbols denote affected individuals. Asterisks denote NGS was performed. (b) Sagittal T1-weighted brain magnetic resonance image (MRI) of the II-2 individual at 4 years of age shows frontal sloping and reduced volume of the brain, particularly the frontal lobe of the cerebrum. (c) Filtering the candidate mutations for the II-2 individual. The numbers in parenthesis represent the number of called variants with CCCS. Overlapping variants between WES and CCCS are not excluded. The other individuals are shown in Supplementary Fig. S3. The top row shows the variant counts called by ‘WES' and ‘WES and CCCS'. The second row shows counts after excluding known variants found in databases, except for known pathogenic mutations. The third row shows variant counts after excluding synonymous changes. Finally, the last raw of variant counts is consistent with the phenotype in the pedigree (i.e., total number of the autosomal recessive and compound heterozygous variants). (d) ASPM gene in human genome. Gray and black box indicate exonic untranslated regions (UTR) and CDS regions respectively. Red triangles indicate loci of identified mutation. Blue arrow (<<) indicates the coding direction. (e) Domains and mutations in the ASPM protein. Red pins indicate loci of known nonsense mutations in HGMD and ClinVar databases at 2014 June. Red triangles indicate loci of identified mutation. Orange pentagon, black hexagon, green hexagon and magenta oval denote calponin homology (IPR001715: InterPro ID), calmodulin-regulated spectrin-associated protein CH (IPR022613), P-loop containing nuclease triphosphate hydrolase (IPR027417) and armadillo-type hold domain (IPR016024), respectively. Blue oval denotes IQ motif, EF-hand binding site (IPR0000048). (f) Sanger sequencing data of the identified mutation.