Literature DB >> 23052039

Olorin: combining gene flow with exome sequencing in large family studies of complex disease.

James A Morris1, Jeffrey C Barrett.   

Abstract

MOTIVATION: The existence of families with many individuals affected by the same complex disease has long suggested the possibility of rare alleles of high penetrance. In contrast to Mendelian diseases, however, linkage studies have identified very few reproducibly linked loci in diseases such as diabetes and autism. Genome-wide association studies have had greater success with such diseases, but these results explain neither the extreme disease load nor the within-family linkage peaks, of some large pedigrees. Combining linkage information with exome or genome sequencing from large complex disease pedigrees might finally identify family-specific, high-penetrance mutations.
RESULTS: Olorin is a tool, which integrates gene flow within families with next generation sequencing data to enable the analysis of complex disease pedigrees. Users can interactively filter and prioritize variants based on haplotype sharing across selected individuals and other measures of importance, including predicted functional consequence and population frequency.

Entities:  

Mesh:

Year:  2012        PMID: 23052039      PMCID: PMC3519455          DOI: 10.1093/bioinformatics/bts609

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Next generation sequencing has rapidly become the standard approach for identifying mutations responsible for Mendelian diseases (Bamshad ). Although software and file formats for the processing of raw sequence data are relatively robust (Danecek ; Li ), there is currently a lack of easy-to-use software for downstream analysis of these data. For some study designs, such as focused analysis of fully penetrant de novo mutations or autosomal recessive inheritance, exome sequence data can be analysed and filtered relatively simply. Increasingly, however, sequence-based approaches are being applied to complex diseases, which are unlikely to follow a simple genetic model, such as autism (Neale ), and to more complicated scenarios, such as large pedigrees with incomplete penetrance. These studies require new tools to enable the diverse community of researchers working on such families to interactively and comprehensively analyze next generation sequence data. Figure 1 shows how our new program, Olorin, integrates within-family linkage analysis with exome sequencing in a user-friendly package.
Fig. 1.

Olorin uses patterns of gene flow estimated by MERLIN to identify genomic regions shared by affected individuals in large pedigrees. This information is combined with next generation sequence data, and only those variants that lie within shared regions are analysed. Users can further refine the list of variants using Olorin’s realtime filtering tools

Olorin uses patterns of gene flow estimated by MERLIN to identify genomic regions shared by affected individuals in large pedigrees. This information is combined with next generation sequence data, and only those variants that lie within shared regions are analysed. Users can further refine the list of variants using Olorin’s realtime filtering tools

2 FEATURES

2.1 File formats

Olorin uses four types of data file: two that provide information about the gene flow calculated by MERLIN (Abecasis ), one defining the pedigree structure, and a list of variants identified by sequencing. MERLIN’s haplotyping functionality is used to compute haplotype inheritance within the pedigree. Details of the genomic markers used in the estimation of haplotypes, and pedigree information about the relationships between individuals and their disease status are read from standard.map and.ped MERLIN format files. All variants identified from sequencing across samples need to be provided as a single variant call format (VCF) file (version 4.0 or greater) (Danecek ).

2.2 Workflow

2.2.1 Selecting individuals

On loading data, Olorin automatically generates an interactive pedigree using standard conventions for information such as sex and disease status. Users can obtain additional information, such as whether a particular individual has been sequenced, via a mouseover popup box. To begin filtering variants, the user first needs to select individuals to be used in searching for shared genomic segments by clicking on them in the pedigree (Fig. 2).
Fig. 2.

Screenshot of Olorin running on OS X. (A) the interactive pedigree panel, (B) the general options tab of the filtering dialog, (C) the dynamic filtering panel, (D) genome-wide segments display, highlighting shared segments in green and (E) the variants table

Screenshot of Olorin running on OS X. (A) the interactive pedigree panel, (B) the general options tab of the filtering dialog, (C) the dynamic filtering panel, (D) genome-wide segments display, highlighting shared segments in green and (E) the variants table

2.2.2 Initial variant filtering

After selecting individuals, the user can customize the analysis via a filtering dialog (Fig. 2). First, they set the minimum number of individuals required to share a segment. This enables searches for variants of incomplete penetrance if the threshold is set below the total number of affected individuals in the pedigree. Next, the user can select which information fields from the VCF will be included for subsequent filtering and display. A population frequency cut-off can also be specified at this point if (as is often the case) the study design is focused on variants expected to be rare in healthy individuals.

2.2.3 Dynamic variant filtering

Olorin populates an analysis table (Fig. 2) with variants found in the shared segments. This table can be sorted on any column, and variants in the table can be filtered out in real time using a number of filtering tools (Fig. 2), which are dynamically generated based on the user-selected data fields. Olorin can show variants discovered in any or all of these individuals, depending on the genetic model under consideration.

2.2.4 Predicted variant effects

Because the ‘consequence’ strings in the VCF information field contain a wealth of parseable information, Olorin supports further processing of two variant consequence string formats: the UK10K analysis pipeline format and the Ensembl Variant Effect Predictor format (McLaren ). Because each variant can have multiple consequences, Olorin automatically selects and displays only the most damaging effect for each variant, showing the remainder via a popup box.

3 IMPLEMENTATION

Olorin is written in Java and will work on any platform with Java 1.6 or later installed. The interactive pedigree is drawn using the PedVizAPI (Fuchsberger ). The genome-wide sharing plots are generated using source code from the visualization tool, IdeogramBrowser (Müller ).
  8 in total

1.  Merlin--rapid analysis of dense genetic maps using sparse gene flow trees.

Authors:  Gonçalo R Abecasis; Stacey S Cherny; William O Cookson; Lon R Cardon
Journal:  Nat Genet       Date:  2001-12-03       Impact factor: 38.330

2.  Visualization of genomic aberrations using Affymetrix SNP arrays.

Authors:  André Müller; Karlheinz Holzmann; Hans A Kestler
Journal:  Bioinformatics       Date:  2006-11-30       Impact factor: 6.937

3.  PedVizApi: a Java API for the interactive, visual analysis of extended pedigrees.

Authors:  Christian Fuchsberger; Mario Falchi; Lukas Forer; Peter P Pramstaller
Journal:  Bioinformatics       Date:  2007-11-22       Impact factor: 6.937

Review 4.  Exome sequencing as a tool for Mendelian disease gene discovery.

Authors:  Michael J Bamshad; Sarah B Ng; Abigail W Bigham; Holly K Tabor; Mary J Emond; Deborah A Nickerson; Jay Shendure
Journal:  Nat Rev Genet       Date:  2011-09-27       Impact factor: 53.242

5.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

6.  Patterns and rates of exonic de novo mutations in autism spectrum disorders.

Authors:  Benjamin M Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Paz Polak; Seungtai Yoon; Jared Maguire; Emily L Crawford; Nicholas G Campbell; Evan T Geller; Otto Valladares; Chad Schafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna Muzny; Jeffrey G Reid; Irene Newsham; Yuanqing Wu; Lora Lewis; Yi Han; Benjamin F Voight; Elaine Lim; Elizabeth Rossin; Andrew Kirby; Jason Flannick; Menachem Fromer; Khalid Shakir; Tim Fennell; Kiran Garimella; Eric Banks; Ryan Poplin; Stacey Gabriel; Mark DePristo; Jack R Wimbish; Braden E Boone; Shawn E Levy; Catalina Betancur; Shamil Sunyaev; Eric Boerwinkle; Joseph D Buxbaum; Edwin H Cook; Bernie Devlin; Richard A Gibbs; Kathryn Roeder; Gerard D Schellenberg; James S Sutcliffe; Mark J Daly
Journal:  Nature       Date:  2012-04-04       Impact factor: 49.962

7.  Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor.

Authors:  William McLaren; Bethan Pritchard; Daniel Rios; Yuan Chen; Paul Flicek; Fiona Cunningham
Journal:  Bioinformatics       Date:  2010-06-18       Impact factor: 6.937

8.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

  8 in total
  5 in total

1.  Genome-wide linkage analyses of non-Hispanic white families identify novel loci for familial late-onset Alzheimer's disease.

Authors:  Brian W Kunkle; James Jaworski; Sandra Barral; Badri Vardarajan; Gary W Beecham; Eden R Martin; Laura S Cantwell; Amanda Partch; Thomas D Bird; Wendy H Raskind; Anita L DeStefano; Regina M Carney; Michael Cuccaro; Jeffrey M Vance; Lindsay A Farrer; Alison M Goate; Tatiana Foroud; Richard P Mayeux; Gerard D Schellenberg; Jonathan L Haines; Margaret A Pericak-Vance
Journal:  Alzheimers Dement       Date:  2015-09-11       Impact factor: 21.566

2.  Using genetic prediction from known complex disease Loci to guide the design of next-generation sequencing experiments.

Authors:  Luke Jostins; Adam P Levine; Jeffrey C Barrett
Journal:  PLoS One       Date:  2013-10-18       Impact factor: 3.240

3.  FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies.

Authors:  Ren-Hua Chung; Wei-Yun Tsai; Chen-Yu Kang; Po-Ju Yao; Hui-Ju Tsai; Chia-Hsiang Chen
Journal:  PLoS Comput Biol       Date:  2016-06-06       Impact factor: 4.475

4.  Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis.

Authors:  Peter Sona; Jong Hui Hong; Sunho Lee; Byong Joon Kim; Woon-Young Hong; Jongcheol Jung; Han-Na Kim; Hyung-Lae Kim; David Christopher; Laurent Herviou; Young Hwan Im; Kwee-Yum Lee; Tae Soon Kim; Jongsun Jung
Journal:  BMC Bioinformatics       Date:  2018-12-03       Impact factor: 3.169

5.  Validation and assessment of variant calling pipelines for next-generation sequencing.

Authors:  Mehdi Pirooznia; Melissa Kramer; Jennifer Parla; Fernando S Goes; James B Potash; W Richard McCombie; Peter P Zandi
Journal:  Hum Genomics       Date:  2014-07-30       Impact factor: 4.639

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.