| Literature DB >> 25679214 |
Tejasvi S Niranjan1, Cindy Skinner2, Melanie May2, Tychele Turner3, Rebecca Rose4, Roger Stevenson2, Charles E Schwartz2, Tao Wang5.
Abstract
X-linked Intellectual Disability (XLID) is a group of genetically heterogeneous disorders caused by mutations in genes on the X chromosome. Deleterious mutations in ~10% of X chromosome genes are implicated in causing XLID disorders in ~50% of known and suspected XLID families. The remaining XLID genes are expected to be rare and even private to individual families. To systematically identify these XLID genes, we sequenced the X chromosome exome (X-exome) in 56 well-established XLID families (a single affected male from 30 families and two affected males from 26 families) using an Agilent SureSelect X-exome kit and the Illumina HiSeq 2000 platform. To enrich for disease-causing mutations, we first utilized variant filters based on dbSNP, the male-restricted portions of the 1000 Genomes Project, or the Exome Variant Server datasets. However, these databases present limitations as automatic filters for enrichment of XLID genes. We therefore developed and optimized a strategy that uses a cohort of affected male kindred pairs and an additional small cohort of affected unrelated males to enrich for potentially pathological variants and to remove neutral variants. This strategy, which we refer to as Affected Kindred/Cross-Cohort Analysis, achieves a substantial enrichment for potentially pathological variants in known XLID genes compared to variant filters from public reference databases, and it has identified novel XLID candidate genes. We conclude that Affected Kindred/Cross-Cohort Analysis can effectively enrich for disease-causing genes in rare, Mendelian disorders, and that public reference databases can be used effectively, but cautiously, as automatic filters for X-linked disorders.Entities:
Mesh:
Year: 2015 PMID: 25679214 PMCID: PMC4332666 DOI: 10.1371/journal.pone.0116454
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
XLID Cohort for X Chromosome Exome Sequencing.
| Relationship of Samples | Number (Pairs) |
|---|---|
| Affected Sporadic Cases | 30 |
| Affected Pairs | 52 (26) |
| Brothers | 44 (22) |
| Maternal Half-Brothers | 4 (2) |
| Maternal Male First Cousins | 2 (1) |
| Uncle-Nephew | 2 (1) |
All samples are diagnosed with an X-linked Intellectual Disorder. Criteria for X-linkage are described in Materials and Methods.
Fig 1Relatedness between Study Samples.
The estimated relatedness of samples is calculated by comparing the percentage of shared variants between two samples. The vertical axis shows the percentage of shared variants between two samples. The horizontal axis shows individual samples (n = 82) in this study cohort. Sporadic cases are samples that are not related to any other sample (left) while kindred pairs are two related affected males (right) (Table 1). The color-coded alphanumeric labels designate individual samples along the X-axis. There are 30 sporadic cases [1–9, 0, a-t; black labels] and 52 kinships (26 pairs) [u-z, A-Z, symbols; colored labels]. Box and whisker plots indicate overall identity between a proband and all other samples in the cohort. Identity ≈ (2 x # of variants identical between both samples) / (sum of variants of both samples). Sporadic cases generally share low identity with other samples. Paired kindred generally show the highest identity with each other. Paired kindred are juxtaposed with each other with the same color on the X-axis to simplify visualization of relationships. Outlier labels located above the hollow plots indicate the identifier for the sample that shares the highest identity, which is consistent with the known family relationship.
Enrichment of Potential Pathological Variants in X-Exome of XLID Cohort with Different Variant Filters.
| Application of Variant Filters | Non-Synonymous or Splicing Variants | Other Variants | Total Variants | % Original |
|---|---|---|---|---|
| Strand and Proximity Pre-filters Only | 221.8 ± 30.8 | 724.5 ± 137.0 | 946.3 ± 167.8 | 100.0% |
| + Shared Segment Filter | 160.1 ± 65.0 | 511.3 ± 222.1 | 671.4 ± 287.1 | 71.0% |
| + [1000G] Male-162 Internal Exome Filter | 62.5 ± 19.9 | 645.6 ± 129.2 | 708.1 ± 149.1 | 74.8% |
| + Exome Variant Server (Male Only) Filter | 18.8 ± 4.6 | 545.9 ± 108.8 | 564.7 ± 113.4 | 59.7% |
| + “Non-clinical” dbSNP Filter | 11.9 ± 5.4 | 48.8 ± 18.5 | 60.7 ± 23.9 | 6.4% |
| + Affected Kindred/Cross-Cohort Filter | 7.5 ± 2.4 | 25.4 ± 2.2 | 32.9 ± 4.6 | 3.5% |
| All Filters | 2.1 ± 1.7 | 12.1 ± 10.0 | 14.2 ± 11.7 | 1.5% |
Average number of variants remaining per sample after sequential or aggregate filtering steps.
1 Strand and Proximity Pre-Filters are applied universally on top of all other filters. The percent of variants remaining after a particular filter is relative to the variant output after application of the Strand and Proximity Pre-Filters and is provided in column 5.
2 Shared Segment Filter: for demonstration purposes, results of this filter are provided separately from the rest of the Affected Kindred/Cross-Cohort Filter.
3 [1000G] Male-162 Internal Exome Filter: removes variants from the XLID cohort shared in common with 162 males from the 1000 Genomes.
4 Exome Variant Server (Male Only) Filter: removes variants from the XLID cohort shared in common with variants of the male fraction of EVS.
5 “Non-Clinical” dbSNP is redacted of known, probable, or potentially pathological variants in dbSNP Build 137.
6 Affected Kindred/Cross-Cohort Filter: results exclude the Shared Segment Filter component (see Row 2).
7 All filters, including re-introduction of known rare pathological variants (from dbSNP) that are inappropriately eliminated by the Affected Kindred/Cross-Cohort Filter.
Fig 2Shared Segment Filter and Error Reduction by Strand/Proximity Pre-Filter.
The Shared Segment Filter (component of the Affected Kindred/Cross-Cohort Filter) retains chromosomal segments shared as Identical by Descent between two related samples in the XLID cohort. In this example, Panels A and B each reflect the same kindred pair, two brothers. The X-axis is position along the X chromosome exome. The Y-axis indicates the allelic status of a given variant for both siblings. Each point in the graph is a variant site for at least one sample. R|R allelic status indicates that the given point (genomic site) matches the reference sequence (hg19) in both samples (both samples are wildtype). A|A allelic status indicates the given point (variant site) is alternate to hg19 in both samples (both samples are hemizygous mutant). A|R allelic status indicates the given point matches reference in one sample and is alternate in the kindred sample (the samples are genotypically discordant). The orange blocks delineate chromosomal segments devoid of A|R points. All sequence in that segment is Identical by Descent between the two samples. The Shared Segment Filter retains variants (A|A) within the orange block. Panel A shows variant allele status in the Shared Segment Filter prior to the application of the strand- and proximity-based pre-filters. With the exception of the rare de novo mutation, there should be no discordant (A|R) variants within the orange block. Such variants are likely erroneous. Panel B shows the Shared Segment Filter after application of the strand- and proximity-based pre-filters. The A|R variants previously present in the orange block are eliminated, reflecting a reduction in erroneous variant calls as a result of these pre-filters.
Ambiguous Variant Calls in the Public 1000 Genomes Variant Dataset.
| Sex | Chromosome | Genotype | Heterozygous | Homozygous | Ambiguous |
|---|---|---|---|---|---|
| Variant Call | 1 | 2 | Others | ||
| Males | X Chromosome | 39.49% | 51.55% | 8.96% | |
| Females | X Chromosome | 90.54% | 9.46% | 0% | |
| Males | Autosomes | 92.34% | 7.66% | 0% | |
| Females | Autosomes | 92.34% | 7.66% | 0% |
Variant call = 1: Percent of variant alleles present as one copy in a sample (heterozygous state). Variant call = 2: Percent of variant alleles present as two copies in a sample (homozygous state). Variant call = Others: Percent of variant alleles present in copies other than 1 or 2, including non-integer counts. All values are evaluated exclusively from coding sequence variants for the respective chromosomes and sexes. Only the male X chromosome dataset possesses ambiguous genotypes. All variants were obtained from the 1000 Genomes variant dataset, pre-separated by chromosome [Integrated Phase 1, version 3: 20101123].
Fig 3Schematics of Variant Calling and Affect Kindred/Cross-Cohort Analysis.
Panel A: Illumina FASTQ sequenced read files are aligned to the human reference genome (hg19) using bowtie2, followed by removal of PCR duplicates, read group adjustment, InDel realignment, and base recalibration. Variant calling is conducted using the Unified Genotyper (parameters provided in ). Variant calling is conducted in parallel on all alignments. Panel B: The Affected Kindred/Cross-Cohort Filter makes use of known relatedness. Unshared variants between related samples are removed. Shared variants between unrelated samples are removed. Shared variants between related samples are retained. The Affected Kindred/Cross-Cohort Filter accommodates for the possibility that the absence of a variant in a related sample may also be due to insufficient coverage or variant quality in the related sample. All retained variants are subsequently run through the Shared Segment Filter.
Fig 4Schematic of Variant Reduction Using a Combined Filter.
The Combined Filter sequentially applies all the filters described in this study. Vertical colored bars reflect relative changes in the content of the variant pool after each filter step. Horizontal colored bars reflect rejected variants upon each filter step. The Strand and Proximity Pre-Filters are applied universally. Then the Affected Kindred/Cross-Cohort Filter (with Shared Segment Filter) is applied. The rejected variant pool in this step primarily eliminates neutral variants. Nonetheless, this rejected pool of variants is assessed for co-occurrence with rare dbSNP variants with known pathological function. Rejected variants that positively co-occur in the Rare Clinical Variants dataset are re-introduced (thin red arrow). Database-dependent filters are sequentially applied. Red bars reflect potential XLID variants that may be of functional interest. Green bars reflect variants that are likely sequencing errors. Blue bars reflect variants that are likely neutral in XLID etiology.
Identification of Known and Potentially Novel Genes for XLID Using X Chromosome Exome Sequencing and Affected Kindred/Cross-Cohort Analysis.
| Name | Abbrev. | Map | Known or Predicted Function | XLID Gene | No. of Mutations | Mutation Nomenclature | SIFT Prediction | PolyPhen-2 Prediction | Variant Segregates with Disease in Family | Primary Isoform | Diseases or Phenotype | Reference |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C4H2 domain-containing zinc finger | ZC4H2 | Xq11.2 | Zinc finger transcription factor | 3 | IVS+5G>A p.L66H p.R190W | NA Damaging Damaging | NA Prob. Damaging Prob. Damaging | Yes Yes Yes | NM_001178033 | Wieacker-Wolfe Syndrome | Hirata et al, 2013 | |
| Alpha thalassemia / mental retardation syndrome X-linked | ATRX | Xq21.1 | ATP-dependent helicase; chromosome remodeling | Yes | 2 | p.R37X p.S1606N | Known Damaging Damaging | Known Damaging Prob. Damaging | Yes Yes | NM_000489 | XLID with alpha thalassemia | Gibbons et al, 1995 |
| Ubiquitin-conjugating enzyme E2A | UBE2A | Xq24 | Ubiquitin-conjugating enzyme | Yes | 2 | p.P68R p.Q110E | Damaging Tolerated | Prob. Damaging Prob. Damaging | Yes Yes | NM_003336 | XLID, Nascimento-type | Nascimento et al, 2006 |
| Filamin A | FLNA | Xq28 | Actin-binding protein; cytoskeletal reorganization | Yes | 2 | p.G1576R p.F2228L | Damaging low conf Tolerated | Prob. Damaging Prob. Damaging | Yes | NM_001110556 | Multiple congenital malformation syndromes | Fox et al, 1998 |
| Transcription Initiation TFIID Subunit 1 | TAF1 | Xq13.1 | Initiation of transcription by RNA Polymerase II and cell cycle control | 2 | p.M21L p.Q1428P | Tolerated Damaging | Benign Prob. Damaging | Yes | NM_138923 | |||
| Methly CpG-Binding Protein 2 | MECP2 | Xq28 | Chromatin-based transcriptional regulation | Yes | 1 | p.K268E | Damaging | Prob. Damaging | Yes | NM_001110792 | Rett Syndrome, XLID | Amir et al, 1999;; Schule et al, 2008; Orrico et al, 2000 |
| Host cell factor C1 | HCFC1 | Xq28 | Cell cycle control | Yes | 1 | p.G342S | Damaging | Prob. Damaging | Yes | NM_005334 | XLID with methylmalonic acidemia | Huang et al, 2012; Yu et al, 2013 |
| Zinc finger protein 711 | ZNF711 | Xq21.1 | Zinc finger transcription factor | Yes | 1 | p.N601S | Damaging | Prob. Damaging | Yes | NM_021998 | X-linked intellectual disability | Tarpey et al, 2009 |
| Rho Guanine Nucleotide Exchange Factor 9 | ARHGEF9 | Xq11.1-q11.2 | Brain-specific regulation of glycine and GABA receptors clusters | Yes | 1 | p.R236W | Damaging | Prob. Damaging | Yes | NM_001173480 | XLID, Epileptic encephalopathy | Shimojima et al, 2011; Marco et al, 2008 |
| E3 Ubiquitin Ligase | HUWE1 | Xp11.22 | Degradation of proteins involved in apoptosis and DNA maintenance | Yes | 1 | p.R4187H | Tolerated | Prob. Damaging | Yes | NM_031407 | XLID, Turner-type | Turner et al, 1994 |
| Ephrin B1 | EFNB1 | Xq13.1 | Ligand of Eph-related receptor tyrosine kinases | 1 | p.G290R | Tolerated | Prob. Damaging | Yes | NM_004429 | Craniofrontonasal Syndrome | Wieland et al, 2004; Twigg et al, 2004, 2013 | |
| Plexin A3 | PLXNA3 | Xq28 | Semaphorin receptor; cytoskeletal remodeling | 1 | p.V1304M | Tolerated | Benign | TBD | NM_017514 | |||
| Ring finger protein 128 | RNF128 | Xq22.3 | E3 Ubiquitin protein ligase | 1 | p.R12H | Damaging | Prob. Damaging | TBD | NM_194463 | |||
| Prickle homolog 3 | PRICKLE3 | Xp11.23 | LIM domain-containing protein | 1 | p.R175C | Damaging | Prob. Damaging | Yes | NM_006150 | |||
| Zinc finger, RNA-binding motif and serine/arginine rich 2 | ZRSR2 | Xp22.2 | Essential splicing factor | 1 | p.R440Q | Tolerated | Benign | Yes | NM_005089 | |||
| Glutamate receptor interacting protein associated protein 1 | GRIPAP1 | Xp11.23 | Interaction with AMPA receptor complex | 1 | p.R822Q | Tolerated | Prob. Damaging | Yes | NM_020137 | |||
| O-linked N-acetylglucosamine Transferase | OGT | Xq13.1 | Post-translational glycosylation | 1 | p.L244F | Tolerated | Prob. Damaging | Yes | NM_181673 | |||
| SRSF (Ser/Arg Splicing Factors) Protein Kinase 3 | SRPK3 | Xq28 | Homolog of SRPK1 with possible role in splicing regulation | 1 | p.H159D | Damaging | Poss. Damaging | Yes | NM_014370 |