Literature DB >> 26415661

CATCHing putative causative variants in consanguineous families.

Federico Andrea Santoni1,2, Periklis Makrythanasis3,4, Stylianos E Antonarakis5,6,7.   

Abstract

BACKGROUND: Consanguinity is an important risk factor for autosomal recessive (AR) disorders. Extended genomic regions identical by descent (IBD) in the offspring of consanguineous parents give rise to recessive disorders with identical (homozygous) pathogenic variants in both alleles. However, many clinical phenotypes presenting in the offspring of consanguineous couples are still of unknown etiology. Nowadays advances in High Throughput Sequencing provide an excellent opportunity to achieve a molecular diagnosis or to identify novel candidate genes.
RESULTS: To exploit all available information from the family structure we developed CATCH, an algorithm that combines genotyped SNPs of all family members for the optimal detection of Runs Of Homozygosity (ROH) and exome sequencing data from one affected individual to identify putative causative variants in consanguineous families.
CONCLUSIONS: CATCH proved to be effective in discovering known or putative new causative variants in 43 out of 50 consanguineous families. Among them, novel variants causative of familial thrombocytopenia, sclerosis bone dysplasia and the first homozygous loss-of-function mutation in FGFR3 in human causing severe skeletal deformities, tall stature and hearing impairment were identified.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26415661      PMCID: PMC4587650          DOI: 10.1186/s12859-015-0727-5

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

The investigation of the molecular basis of monogenic disorders has succeeded in identifying thousands of pathogenic variants in protein-coding genes that cause these disorders. There are, however, thousands of additional (near) Mendelian phenotypes for which the molecular genetics is still unknown. Indeed, the rarity of many such disorders, the lack of statistical power due to the non-availability of large families, locus heterogeneity, and the limitations of sequencing technologies hindered the search for “Mendelian” pathogenic variants. Nevertheless, extended genomic regions identical by descent (IBD) in the offspring of consanguineous mattings give rise to recessive disorders with identical (homozygous) pathogenic variants in both alleles. Consanguinity is practiced in a large proportion of human populations; rates reach 20-50 % in much of the Mediterranean basin [1]. Therefore, in a consanguineous family, the search for the unknown causative gene is magnified. The typical two-step approach is to first identify extended genomic homozygous regions (ROH, Runs of Homozygosity) by genotyping all available family members with SNP arrays. Putative candidate regions are then the ROHs that are shared among all affected individuals. Second, the causative variant is finally discovered by Sanger sequencing the genes inside the candidate regions. Nowadays this slow and laborious task may be conveniently relieved by Whole Exome Sequencing (WES) of one of the affected. Indeed, it has recently been shown that combining SNP arrays and WES data is a successful approach to the identification of causative variants in homozygosity [2]. Some attempts have been made on the extraction of ROH from WES data only, but the accuracy of these methods has proven to be suboptimal with respect of the usage of SNP arrays [3]. In the future, Whole Genome Sequencing (WGS) will provide at the same time the variants with a more accurate ROH estimation than WES based approaches but, at the moment, this procedure is far from being cost-effective. In order to integrate WES sensitivity with the optimal delineation of ROHs by SNP arrays in a comprehensive computational tool, we developed CATCH (Consanguinity Analysis Through Common Homozygosity). The algorithm recognizes affected specific ROHs from SNP array data and, inside these selected ROHs, identifies putative candidate genes from the integration of exome sequenced and annotated variants of one affected per consanguineous family.

Implementation

Input

CATCH takes as input: 1) the variants packaged in the standard Variant Calling Format (VCF) for one affected individual of the family; 2) a PED formatted file (http://pngu.mgh.harvard.edu/~purcell/plink/) describing the pedigree structure and the genotypes of all informative members of the family; and 3) ROH (Runs Of Homozygosity) regions as calculated by PLINK from the PED file and SNP arrays data. In this study, we used the HumanOmniExpress Bead Chip by IlluminaInc® (San Diego, CA) to genotype all family members. This SNP array tests 720 K SNPs with a mean distance of 4 kb between the SNPs. We defined as homozygous regions those regions with 50 consecutive homozygous SNPs. Exome was captured using SureSelect Human All Exons. Sequencing was performed with the Illumina HiSeq2000 and row reads were aligned with BWA [4]. Variant calling has been performed with SAMtools [5] and Pindel [6].

Data processing

CATCH makes use of Annovar [7] to annotate sequenced variants. After, it discards non-splicing or non-exonic, synonymous, heterozygous and frequent variants in the general population (variant with MAF < 2 % in 1000 Genomes are retained [www.1000genomes.org; the results presented here have been obtained with April 2012 release]). Furthermore, CATCH does not consider variants that are in duplicated regions or exceedingly strand biased (i.e., 0 reads in one strand of the alternative allele). For each selected variant found in the genome of the sequenced (affected) individual, CATCH fetches for the related ROH and calculates the overlap with the ROHs of the other affected family members (if available) and the intersection with the respective ROH of all remaining unaffected individuals of the family. If an overlap is found, in order to exclude that the regions are identical by state (IBS), CATCH additionally considers the SNPs in the ROH surrounding the variant and evaluates the eventual concordance with the haplotypes of all family members allowing for 1 % mismatch (Fig. 1). An important exception is when the ROH of the unaffected is smaller than the overlapping ROH of the affected. In this case affected and unaffected individuals may be identical by state (IBS) for that haplotype block but the origin of the haplotype is actually different. In general the haplotype size depends on age, smaller being older and younger being longer [8]. Therefore, long and younger haplotypes could include a recent, deleterious variant that can be transmitted to the affected individuals along with its entire haplotype block in homozygosity through the imbreeding loops [9]. Unaffected individuals may inherit one copy of this haplotype and one copy of the older one, thus being IBS for the smaller haplotype. We found an example of such a variant in the gene VLDLR [10].
Fig. 1

Schematic showing CATCH strategy for the identification of putative causative variants. Variants provided by a standard variant calling pipeline (i.e. BWA + Samtools or GATK ) are annotated by Annovar and filtered according to user preferences. ROH are calculated from SNPArray data by Plink for all available affected and unaffected family members. CATCH classifies every variant according to its presence/absence in ROHs as depicted in the figure. Green and red areas represent affected (A) and unaffected (U) ROH respectively

Schematic showing CATCH strategy for the identification of putative causative variants. Variants provided by a standard variant calling pipeline (i.e. BWA + Samtools or GATK ) are annotated by Annovar and filtered according to user preferences. ROH are calculated from SNPArray data by Plink for all available affected and unaffected family members. CATCH classifies every variant according to its presence/absence in ROHs as depicted in the figure. Green and red areas represent affected (A) and unaffected (U) ROH respectively In summary, each variant in homozygosity is assigned to one of the following classes: Class1 (Putative): neither overlap with ROH regions nor IBD has been detected with unaffected individuals. Class2 (Common): IBD with some unaffected individual has been detected. Class3 (Inside): ROH of the affected is longer than the overlapping ROH of the unaffected (IBS). The output is provided as a comma separated plain text containing the annotated variants and the class they have been assigned by CATCH.

Ethics approval

The study was approved by the Bioethics Committee of the University Hospitals of Geneva (Protocol number: CER 11–036).

Results

As its first application, CATCH has been employed on processed samples collected from 50 consanguineous families suggestive of AR of inheritance and a wide spectrum of AR phenotypes [10]. Briefly, all samples were genotyped with a dense SNP array (HumanOmniExpress Bead Chip by Illumina) to identify Runs of Homozygosity and exome sequencing on the Illumina HiSeq2000 was performed on one affected individual per family. Prior to CATCH, raw fastq files have been processed through a custom pipeline composed by BWA [4], samtools [5] rmdup and (i) samtools mpileup for the detection of Single Nucleotide Variants (SNV) (ii) Pindel [6] for the detection of insertions and deletions. All tools were run with default parameters. On average, 21,719 variants were identified per patient. ROHs were calculated by PLINK as stretches of 50 homozygous consecutive SNPs irrespective of the total length of the genomic region, allowing for one mismatch. We considered this as a reasonable trade-off between catching a significant amount of ROH (Additional file 1: Figure S1) and limiting the number of small IBS regions that are common in all individuals. Only relatively frequent SNPs (MAF >0.3) were included in the analysis. The ROH were further defined as genomic regions demarcated by the first encountered heterozygous SNPs flanking each established homozygous region. The variants that CATCH reported as belonging to Class 1 (Putative) or Class 3 (Inside) were ranked according to the following criteria: 1) pathogenic variants: known pathogenic variant or variant in known pathogenic gene according to the phenotype; 2) strong candidates variants: variant in a gene likely involved in the pathology according to supporting literature data; 3) Variant of Unknown Significance - VUS: variant predicted to be pathogenic but in a gene not known to be related to the phenotype (Additional file 1: Figure S1). For strong candidate variants, we combined information about any known function of the gene and the gene’s family, data coming from animal models or other in vitro experiments and tissue expression. Functional validation and further investigations of the clinical relevance of these variants are still ongoing. In 18 families, CATCH clearly identified the pathogenic variant in known disease-causing genes (Class 1 -DMP1, ARFGEF, FKTN, SEPSECS, GUCY2D, BBS4, SYNE1, POMGNT, MTFMT, TACO1, PYGM, PRX, TUSC3, STRA6, ALDH3A2, RNASET2, MMP2 and Class 3 - VLDLR). Detailed information about the variants are reported in (Additional file 2: Table S1). In 5 families, strong candidates were identified in genes functionally related to the phenotype and, in a further 22 families, variants of predicted pathogenicity according to by SIFT [11], PolyPhen [12] and Mutation Taster [13] were labeled as VUS. In 5 families, no reasonable candidates or VUS were identified. All discovered variants and the predicted segregations were further validated with conventional sequencing. Eventually, CATCH suggested at least one causative variant in 36 % of families which represents a substantial improvement in the ability to diagnose recessively inherited disorders in consanguineous families [14]. In three additional studies CATCH discovered the causative variants associated to three different genetic diseases. A highly consanguineous family from Northern Iraq presented in several members with familial thrombocytopenia with small size platelets. CATCH identified one homozygous pathogenic variant in FYB [15], a gene encoding for a cytosolic adaptor molecule expressed by T, natural killer (NK), myeloid cells and platelets, and involved in platelet activation and controls the expression of interleukin-2. Knock-out mice were reported to show isolated thrombocytopenia. Two sisters from a consanguineous Lebanese family were previously reported as presenting a new atypical form of sclerosing bone dysplasia [16]. CATCH identifies a potential causative variant in the gene DMP1, a transcriptional activator of osteoblast-specific genes such as alkaline phosphatase and osteocalcin [17], already associated to Autosomal Recessive Hypophosphatemic Rickets (ARHR) [18]. The variant causes the loss of a highly conserved signal sequence of 16 amino acids resulting in a complete absence of the excretion of the protein and its retention within the cells. The diagnosis was accordingly corrected, demonstrating the importance of this approach in the delineation of the molecular basis of rare diseases especially when the clinical presentation is unclear. Two affected brothers born to first cousin parents originating from Egypt presented with severe skeletal deformities, tall stature and hearing impairment. CATCH identified the first homozygous loss-of-function (predicted) mutation in FGFR3 in human [19]. This gene is one of many physiological regulators of linear bone growth and normally functions as an inhibitor, acting negatively on both proliferation and terminal differentiation of growth plate chondrocytes [20]. Before this finding, all pathogenic FGFR3 mutations in humans were associated with constitutive FGFR3 activation by impairing endochondral bone growth.

Conclusions

The use of whole exome sequencing in the detection of causative variants in homozygosity is really effective when associated to segregation data in a familiar context. Highly consanguineous relatives share several long Runs Of Homozygosity thus they bear a large number of potential causative variants. Of course, additional exome sequencing of non-affected relatives would dramatically reduce the number of false positives. However, the same result may be obtained at a considerably lower cost by genotyping these individuals and restricting exome sequencing to only one affected patient. CATCH is the first computational tool that process ROH, genotyping and exome sequencing data in an integrated way. It is handy and efficient, needing less than 5 min to analyze a nuclear family after annotation. It is written in Python and can run on a standard computer with a reasonable amount of RAM (>1GB). CATCH is released as Linux executable.

Availability of the software

Project name: CATCH Project home page:http://seaseq.unige.ch/~fsantoni/CATCH Operating system(s): Linux Programming language: Python Other requirements: Python 2.6 or higher License: GNU GPL. Any restrictions to use by non-academics: license needed

Consent to publish

All patients and/or parents provided their written informed consent for the analyses performed and for the publication of the results.

Availability of supporting data

All the variants mentioned in this study have been submitted to LOVD (http://databases.lovd.nl/whole_genome/genes).
  20 in total

1.  Long runs of homozygosity are enriched for deleterious variation.

Authors:  Zachary A Szpiech; Jishu Xu; Trevor J Pemberton; Weiping Peng; Sebastian Zöllner; Noah A Rosenberg; Jun Z Li
Journal:  Am J Hum Genet       Date:  2013-06-06       Impact factor: 11.025

2.  Autozygosity mapping with exome sequence data.

Authors:  Ian M Carr; Sanjeev Bhaskar; James O'Sullivan; Mohammed A Aldahmesh; Hanan E Shamseldin; Alexander F Markham; David T Bonthron; Graeme Black; Fowzan S Alkuraya
Journal:  Hum Mutat       Date:  2012-10-22       Impact factor: 4.878

3.  A novel homozygous mutation in FGFR3 causes tall stature, severe lateral tibial deviation, scoliosis, hearing impairment, camptodactyly, and arachnodactyly.

Authors:  Periklis Makrythanasis; Samia Temtamy; Mona S Aglan; Ghada A Otaify; Hanan Hamamy; Stylianos E Antonarakis
Journal:  Hum Mutat       Date:  2014-06-28       Impact factor: 4.878

4.  Diagnostic exome sequencing to elucidate the genetic basis of likely recessive disorders in consanguineous families.

Authors:  Periklis Makrythanasis; Mari Nelis; Federico A Santoni; Michel Guipponi; Anne Vannier; Frédérique Béna; Stefania Gimelli; Elisavet Stathaki; Samia Temtamy; André Mégarbané; Amira Masri; Mona S Aglan; Maha S Zaki; Armand Bottani; Siv Fokstuen; Lorraine Gwanmesia; Konstantinos Aliferis; Mariana Bustamante Eduardo; Georgios Stamoulis; Stavroula Psoni; Sofia Kitsiou-Tzeli; Helen Fryssira; Emmanouil Kanavakis; Nasir Al-Allawi; Abdelaziz Sefiani; Sana' Al Hait; Siham C Elalaoui; Nadine Jalkh; Lihadh Al-Gazali; Fatma Al-Jasmi; Habiba Chaabouni Bouhamed; Ebtesam Abdalla; David N Cooper; Hanan Hamamy; Stylianos E Antonarakis
Journal:  Hum Mutat       Date:  2014-08-18       Impact factor: 4.878

5.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.

Authors:  Kai Wang; Mingyao Li; Hakon Hakonarson
Journal:  Nucleic Acids Res       Date:  2010-07-03       Impact factor: 16.971

6.  Human haplotype block sizes are negatively correlated with recombination rates.

Authors:  Tiffany A Greenwood; Brinda K Rana; Nicholas J Schork
Journal:  Genome Res       Date:  2004-07       Impact factor: 9.043

7.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders.

Authors:  Yaping Yang; Donna M Muzny; Jeffrey G Reid; Matthew N Bainbridge; Alecia Willis; Patricia A Ward; Alicia Braxton; Joke Beuten; Fan Xia; Zhiyv Niu; Matthew Hardison; Richard Person; Mir Reza Bekheirnia; Magalie S Leduc; Amelia Kirby; Peter Pham; Jennifer Scull; Min Wang; Yan Ding; Sharon E Plon; James R Lupski; Arthur L Beaudet; Richard A Gibbs; Christine M Eng
Journal:  N Engl J Med       Date:  2013-10-02       Impact factor: 91.245

8.  Exome sequencing identifies CCDC8 mutations in 3-M syndrome, suggesting that CCDC8 contributes in a pathway with CUL7 and OBSL1 to control human growth.

Authors:  Dan Hanson; Philip G Murray; James O'Sullivan; Jill Urquhart; Sarah Daly; Sanjeev S Bhaskar; Leslie G Biesecker; Mars Skae; Claire Smith; Trevor Cole; Jeremy Kirk; Kate Chandler; Helen Kingston; Dian Donnai; Peter E Clayton; Graeme C M Black
Journal:  Am J Hum Genet       Date:  2011-07-07       Impact factor: 11.025

9.  Exome sequencing reveals a mutation in DMP1 in a family with familial sclerosing bone dysplasia.

Authors:  Marie-Hélène Gannagé-Yared; Periklis Makrythanasis; Eliane Chouery; Cristina Sobacchi; Cybel Mehawej; Federico A Santoni; Michel Guipponi; Stylianos E Antonarakis; Hanan Hamamy; André Mégarbané
Journal:  Bone       Date:  2014-08-30       Impact factor: 4.398

10.  Recessive thrombocytopenia likely due to a homozygous pathogenic variant in the FYB gene: case report.

Authors:  Hanan Hamamy; Periklis Makrythanasis; Nasir Al-Allawi; Abdulrahman A Muhsin; Stylianos E Antonarakis
Journal:  BMC Med Genet       Date:  2014-12-17       Impact factor: 2.103

View more
  10 in total

1.  Bi-allelic Variants in IQSEC1 Cause Intellectual Disability, Developmental Delay, and Short Stature.

Authors:  Muhammad Ansar; Hyung-Lok Chung; Ali Al-Otaibi; Mohammad Nael Elagabani; Thomas A Ravenscroft; Sohail A Paracha; Ralf Scholz; Tayseer Abdel Magid; Muhammad T Sarwar; Sayyed Fahim Shah; Azhar Ali Qaisar; Periklis Makrythanasis; Paul C Marcogliese; Erik-Jan Kamsteeg; Emilie Falconnet; Emmanuelle Ranza; Federico A Santoni; Hesham Aldhalaan; Ali Al-Asmari; Eissa Ali Faqeih; Jawad Ahmed; Hans-Christian Kornau; Hugo J Bellen; Stylianos E Antonarakis
Journal:  Am J Hum Genet       Date:  2019-10-10       Impact factor: 11.025

2.  Bi-allelic Variants in DYNC1I2 Cause Syndromic Microcephaly with Intellectual Disability, Cerebral Malformations, and Dysmorphic Facial Features.

Authors:  Muhammad Ansar; Farid Ullah; Sohail A Paracha; Darius J Adams; Abbe Lai; Lynn Pais; Justyna Iwaszkiewicz; Francisca Millan; Muhammad T Sarwar; Zehra Agha; Sayyed Fahim Shah; Azhar Ali Qaisar; Emilie Falconnet; Vincent Zoete; Emmanuelle Ranza; Periklis Makrythanasis; Federico A Santoni; Jawad Ahmed; Nicholas Katsanis; Christopher Walsh; Erica E Davis; Stylianos E Antonarakis
Journal:  Am J Hum Genet       Date:  2019-05-09       Impact factor: 11.025

3.  Biallelic variants in FBXL3 cause intellectual disability, delayed motor development and short stature.

Authors:  Muhammad Ansar; Sohail Aziz Paracha; Alessandro Serretti; Muhammad T Sarwar; Jamshed Khan; Emmanuelle Ranza; Emilie Falconnet; Justyna Iwaszkiewicz; Sayyed Fahim Shah; Azhar Ali Qaisar; Federico A Santoni; Vincent Zoete; Andre Megarbane; Jawad Ahmed; Roberto Colombo; Periklis Makrythanasis; Stylianos E Antonarakis
Journal:  Hum Mol Genet       Date:  2019-03-15       Impact factor: 6.150

4.  Pathogenic Variants in PIGG Cause Intellectual Disability with Seizures and Hypotonia.

Authors:  Periklis Makrythanasis; Mitsuhiro Kato; Maha S Zaki; Hirotomo Saitsu; Kazuyuki Nakamura; Federico A Santoni; Satoko Miyatake; Mitsuko Nakashima; Mahmoud Y Issa; Michel Guipponi; Audrey Letourneau; Clare V Logan; Nicola Roberts; David A Parry; Colin A Johnson; Naomichi Matsumoto; Hanan Hamamy; Eamonn Sheridan; Taroh Kinoshita; Stylianos E Antonarakis; Yoshiko Murakami
Journal:  Am J Hum Genet       Date:  2016-03-17       Impact factor: 11.025

5.  Visual impairment and progressive phthisis bulbi caused by recessive pathogenic variant in MARK3.

Authors:  Muhammad Ansar; Hyunglok Chung; Yar M Waryah; Periklis Makrythanasis; Emilie Falconnet; Ali Raza Rao; Michel Guipponi; Ashok K Narsani; Ralph Fingerhut; Federico A Santoni; Emmanuelle Ranza; Ali M Waryah; Hugo J Bellen; Stylianos E Antonarakis
Journal:  Hum Mol Genet       Date:  2018-08-01       Impact factor: 6.150

6.  Bi-allelic Loss-of-Function Variants in DNMBP Cause Infantile Cataracts.

Authors:  Muhammad Ansar; Hyung-Lok Chung; Rachel L Taylor; Aamir Nazir; Samina Imtiaz; Muhammad T Sarwar; Alkistis Manousopoulou; Periklis Makrythanasis; Sondas Saeed; Emilie Falconnet; Michel Guipponi; Constantin J Pournaras; Maqsood A Ansari; Emmanuelle Ranza; Federico A Santoni; Jawad Ahmed; Inayat Shah; Khitab Gul; Graeme Cm Black; Hugo J Bellen; Stylianos E Antonarakis
Journal:  Am J Hum Genet       Date:  2018-10-04       Impact factor: 11.025

7.  Taurine treatment of retinal degeneration and cardiomyopathy in a consanguineous family with SLC6A6 taurine transporter deficiency.

Authors:  Muhammad Ansar; Emmanuelle Ranza; Madhur Shetty; Sohail A Paracha; Maleeha Azam; Ilse Kern; Justyna Iwaszkiewicz; Omer Farooq; Constantin J Pournaras; Ariane Malcles; Mateusz Kecik; Carlo Rivolta; Waqar Muzaffar; Aziz Qurban; Liaqat Ali; Yacine Aggoun; Federico A Santoni; Periklis Makrythanasis; Jawad Ahmed; Raheel Qamar; Muhammad T Sarwar; L Keith Henry; Stylianos E Antonarakis
Journal:  Hum Mol Genet       Date:  2020-03-13       Impact factor: 6.150

8.  Whole-exome sequencing to analyze population structure, parental inbreeding, and familial linkage.

Authors:  Aziz Belkadi; Vincent Pedergnana; Aurélie Cobat; Yuval Itan; Quentin B Vincent; Avinash Abhyankar; Lei Shang; Jamila El Baghdadi; Aziz Bousfiha; Alexandre Alcais; Bertrand Boisson; Jean-Laurent Casanova; Laurent Abel
Journal:  Proc Natl Acad Sci U S A       Date:  2016-05-31       Impact factor: 11.205

9.  Exome sequencing discloses KALRN homozygous variant as likely cause of intellectual disability and short stature in a consanguineous pedigree.

Authors:  Periklis Makrythanasis; Michel Guipponi; Federico A Santoni; Maha Zaki; Mahmoud Y Issa; Muhammad Ansar; Hanan Hamamy; Stylianos E Antonarakis
Journal:  Hum Genomics       Date:  2016-07-16       Impact factor: 4.639

10.  Biallelic variants in KIF14 cause intellectual disability with microcephaly.

Authors:  Periklis Makrythanasis; Reza Maroofian; Asbjørg Stray-Pedersen; Damir Musaev; Maha S Zaki; Iman G Mahmoud; Laila Selim; Amera Elbadawy; Shalini N Jhangiani; Zeynep H Coban Akdemir; Tomasz Gambin; Hanne S Sorte; Arvid Heiberg; Jennifer McEvoy-Venneri; Kiely N James; Valentina Stanley; Denice Belandres; Michel Guipponi; Federico A Santoni; Najmeh Ahangari; Fatemeh Tara; Mohammad Doosti; Justyna Iwaszkiewicz; Vincent Zoete; Paul Hoff Backe; Hanan Hamamy; Joseph G Gleeson; James R Lupski; Ehsan Ghayoor Karimiani; Stylianos E Antonarakis
Journal:  Eur J Hum Genet       Date:  2018-01-17       Impact factor: 4.246

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.