Literature DB >> 26803158

FamAgg: an R package to evaluate familial aggregation of traits in large pedigrees.

Johannes Rainer¹, Daniel Taliun², Yuri D'Elia¹, Cristian Pattaro¹, Francisco S Domingues¹, Christian X Weichenberger¹.

Abstract

UNLABELLED: Familial aggregation analysis is the first fundamental step to perform when assessing the extent of genetic background of a disease. However, there is a lack of software to analyze the familial clustering of complex phenotypes in very large pedigrees. Such pedigrees can be utilized to calculate measures that express trait aggregation on both the family and individual level, providing valuable directions in choosing families for detailed follow-up studies. We developed FamAgg, an open source R package that contains both established and novel methods to investigate familial aggregation of traits in large pedigrees. We demonstrate its use and interpretation by analyzing a publicly available cancer dataset with more than 20 000 participants distributed across approximately 400 families.
AVAILABILITY AND IMPLEMENTATION: The FamAgg package is freely available at the Bioconductor repository, http://www.bioconductor.org/packages/FamAgg CONTACT: Christian.Weichenberger@eurac.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Mesh：
Pedigree
Software

Year: 2016 PMID： 26803158 PMCID： PMC4866523 DOI： 10.1093/bioinformatics/btw019

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

The investigation of whether a disease or symptom trait recurs more often among close relatives than in the general population is a deeply rooted subject in genetic epidemiology, often termed as familial aggregation analysis (Khoury ). While segregation analysis was the tool of choice to identify patterns of Mendelian diseases, there is no unique method to highlight familial clusters for complex diseases, especially in situations involving very large pedigrees lacking a regular family structure. In this setting, more than three decades ago, a computational method was first developed to highlight familial aggregation of various cancer types (Hill, 1980). The method was based on the kinship coefficient Φ, which is the probability that two subjects i and j share the same allele identical-by-descent at one locus, and represents a suitable measure to quantify the relationship between two individuals in the pedigree (Malécot, 1948). In this early approach, the average kinship between all affected pairs was compared to the mean kinship of multiple sets of randomly selected matched controls (Hill, 1980). This and other kinship-based methods have been successfully applied to very large pedigrees to assess whether diseases such as autism (Jorde ) or Parkinson’s disease (Sveinbjörnsdottir ) showed evidence of familial aggregation. The kinship-based approach was also extended to be used with time-to-event data: the presence of familial clustering is assessed based on disease incidence rates, thus accounting for the time to disease onset (Kerber, 1995). Driven by the lack of open access tools for familial aggregation analyses in large pedigrees, we have developed an R package providing this functionality. Besides basic pedigree analysis, sub-setting and plotting methods, it implements the previously published methods based on average kinship and disease incidence rates as well as two novel approaches to detect familial aggregation employing statistics based on kinship coefficients combined with Monte Carlo simulation techniques.

2 Implementation

The FamAgg package implements five family aggregation detection methods that can be run on a single family or sets of families and allow stratification according to different conditions such as gender, age and generation. The kinship sum (KS) test assesses whether an affected subject is more closely related to other affected rather than unaffected cases in the pedigree. Let A be the set of affected subjects and N the number of simulation steps. The kinship sum of subject i to all other affected cases is , whose null distribution S is obtained by N-time random sampling of #(A) affected cases from the complete pedigree without replacement. An empirical p-value for S is obtained as p = P(S ≥ S). In the kinship group (KG) test, for each affected individual i, its most distant affected relative k is identified. We then define a group G that includes all individuals j such that Φ ≥ Φ. For each group G, we calculate two null distributions, based on repeatedly random sampling of #(A) affected individuals from the complete pedigree. First, for each group G we compute the distribution of the number of affected cases from the random sampling, which allows computing an empirical p-value p for finding by chance at least the number of observed cases in group G. Second, we provide a means to detect clusters of closely related affected family members: for each group G we derive the distribution of kinship coefficients Φ from the random sampling for all affected individuals a. From this distribution we calculate the empirical p-value to find a closer affected relative than in the observed case. The genealogical index of familiality (GIF) test (Hill, 1980) is a pure family-based test. It computes the mean kinship K (Malécot, 1948) for a selected family F, defined as the average kinship coefficient between all possible pairs of affected individuals i and j, and creates a null distribution K of mean kinships of N sets of randomly selected (optionally matched) controls. An empirical p-value is derived as p = P(K ≥ K). The familial incidence rate (FIR) approach introduced by Kerber (1995) concentrates on familial aggregation for individuals in longitudinal studies. It is based on the incidence rate I = C/T, where C is the number of incident cases and T is the total number of years an individual was exposed to the risk of disease (person-years). This measure has been refined by weighting the individual’s contribution and time spent in the study by the kinship coefficient Φ to arrive at a measure of familial incidence rate FR for any individual i. Finally, we provide a convenience interface to compute the exact probability of familial clustering (PFC) of phenotypes as provided in the gap R package (Yu ). It contrasts the number of affected cases against family sizes in a contingency table but the estimation of an exact p-value is possible only for families of limited size, due to the high computational demand. The method is based on the exact test for multinomial distributions, and therefore its application to large pedigrees is possible only with the aid of pedigree splitting software such as for example Jenti (Falchi ). With the exception of the GIF method, which identifies aggregation of a trait in the full pedigree all kinship-based methods are applied at the level of individuals and thus allow to identify either individuals in families with significant aggregation (KS test), or groups of highly clustered affected individuals within families (KG test), or assess the risk for individuals given their relation to affected individuals in the pedigree (FIR). In addition to these familial aggregation methods, FamAgg provides functions to sub-set pedigrees, to identify common ancestors for any given list of individuals, to identify matched controls within pedigrees and to convert pedigrees into graphs, which opens the whole world of graph-theory methods to pedigree analyses. It uses the kinship2 R package (Sinnwell ) for kinship coefficient calculation and plotting, and provides a transparent interface to Haplopainter software (Thiele ). The open, object-oriented software architecture of the FamAgg package invites contribution of additional tests from the research community. Extensive documentation and examples are distributed with the FamAgg package, which is available as supplementary material.

3 Applications

We applied the KS, GIF, KG and FIR tests from the FamAgg package to the publicly available Minnesota Breast Cancer dataset (Sellers ), which contains genealogical information from 426 unrelated affected founders whose families entered a longitudinal study on cancer in the state of Minnesota (USA) in 1944. There are 1376 cases spread over these 426 families with a median family size of 53 members, the largest family comprises of 382 individuals in six generations. The performed tests did not utilize sampling stratification and the null distributions were calculated with N = 50 000 sampling steps. Runtimes on a single 2.4 GHz processor of a MacBook Pro with 16 GB of memory are as follows: FIR test, 2 s; GIF test, 7 min; KS test, 23 min; and KG test, 3 h. At a significance level of 0.05, the KS test and the GIF test identified 42 and 34 families with a significant enrichment of cases, respectively. Figure 1A highlights the 14 families with filled symbols where both the KS and the GIF tests identified significant familial aggregation. Figure 1B provides an example of a smaller family with breast cancer aggregation. The p-values are 2.4 × 10−3 for the GIF test and 1.3 × 10−2 for individual 410 according to the KS test.

Fig. 1.

Familial aggregation in the Minnesota Breast Cancer dataset. (A) Scatter plot of –log10(p-values) from the KS test (x-axis) and the GIF test (y-axis) computed for all 426 families. Given the KS test provides a p-value for each affected subject, the lowest p-value in each family is displayed. At a significance level of 0.05 (dashed lines), the GIF test identifies 34 families whereas the KS test identifies 42 families. Filled circles and family identifiers are provided for the 14 families when tests are jointly significant. For example, family 432 is top-ranked by both tests: p-value = 1.3 × 10−3 and 9.6 × 10−5 with the GIF and KS test, respectively. Non-significant family clusters are gray shaded. (B) Pedigree of family 13, which is ranked second by the GIF test (p-value = 2.4 × 10−3). The family comprises 29 phenotyped members and includes five affected females. If known, age of cancer onset (cases) or age of demise is indicated below individuals’ identifiers. For subject 410, S = 1.0 (0.25 × 3 affected sisters + 0.25 × 1 affected daughter), with p-value = 1.3 × 10−2. Sisters 406, 408 and 409 have equal S = 3 × 0.25 + 0.125 = 0.875 (p-value = 2.4 × 10−2), as they are aunts of subject 419. The familial incidence rate of individual 410 is FR = 8.7 × 10−3, which is in the top percentile of all computed values in the Minnesota Breast Cancer dataset

8 in total

1. Statistical inference for familial disease clusters.

Authors: Chang Yu; Daniel Zelterman
Journal: Biometrics Date: 2002-09 Impact factor: 2.571

2. HaploPainter: a tool for drawing pedigrees with complex haplotypes.

Authors: Holger Thiele; Peter Nürnberg
Journal: Bioinformatics Date: 2004-09-17 Impact factor: 6.937

3. Jenti: an efficient tool for mining complex inbred genealogies.

Authors: Mario Falchi; Christian Fuchsberger
Journal: Bioinformatics Date: 2008-01-24 Impact factor: 6.937

4. The kinship2 R package for pedigree data.

Authors: Jason P Sinnwell; Terry M Therneau; Daniel J Schaid
Journal: Hum Hered Date: 2014-07-29 Impact factor: 0.444

5. Fifty-year follow-up of cancer incidence in a historical cohort of Minnesota breast cancer families.

Authors: T A Sellers; R A King; J R Cerhan; P L Chen; D M Grabrick; L H Kushi; W S Oetting; R A Vierkant; C M Vachon; F J Couch; T M Therneau; J E Olson; V S Pankratz; L C Hartmann; V E Anderson
Journal: Cancer Epidemiol Biomarkers Prev Date: 1999-12 Impact factor: 4.254

6. Method for calculating risk associated with family history of a disease.

Authors: R A Kerber
Journal: Genet Epidemiol Date: 1995 Impact factor: 2.135

7. Familial aggregation of Parkinson's disease in Iceland.

Authors: S Sveinbjörnsdottir; A A Hicks; T Jonsson; H Pétursson; G Guğmundsson; M L Frigge; A Kong; J R Gulcher; K Stefansson
Journal: N Engl J Med Date: 2000-12-14 Impact factor: 91.245

8. The UCLA-University of Utah epidemiologic survey of autism: genealogical analysis of familial aggregation.

Authors: L B Jorde; A Mason-Brothers; R Waldmann; E R Ritvo; B J Freeman; C Pingree; W M McMahon; B Petersen; W R Jenson; A Mo
Journal: Am J Med Genet Date: 1990-05

8 in total

6 in total

1. Identification of rare sequence variation underlying heritable pulmonary arterial hypertension.

Authors: Stefan Gräf; Matthias Haimel; Marta Bleda; Charaka Hadinnapola; Laura Southgate; Wei Li; Joshua Hodgson; Bin Liu; Richard M Salmon; Mark Southwood; Rajiv D Machado; Jennifer M Martin; Carmen M Treacy; Katherine Yates; Louise C Daugherty; Olga Shamardina; Deborah Whitehorn; Simon Holden; Micheala Aldred; Harm J Bogaard; Colin Church; Gerry Coghlan; Robin Condliffe; Paul A Corris; Cesare Danesino; Mélanie Eyries; Henning Gall; Stefano Ghio; Hossein-Ardeschir Ghofrani; J Simon R Gibbs; Barbara Girerd; Arjan C Houweling; Luke Howard; Marc Humbert; David G Kiely; Gabor Kovacs; Robert V MacKenzie Ross; Shahin Moledina; David Montani; Michael Newnham; Andrea Olschewski; Horst Olschewski; Andrew J Peacock; Joanna Pepke-Zaba; Inga Prokopenko; Christopher J Rhodes; Laura Scelsi; Werner Seeger; Florent Soubrier; Dan F Stein; Jay Suntharalingam; Emilia M Swietlik; Mark R Toshner; David A van Heel; Anton Vonk Noordegraaf; Quinten Waisfisz; John Wharton; Stephen J Wort; Willem H Ouwehand; Nicole Soranzo; Allan Lawrie; Paul D Upton; Martin R Wilkins; Richard C Trembath; Nicholas W Morrell
Journal: Nat Commun Date: 2018-04-12 Impact factor: 14.919

2. Evaluation of genetic diversity and management of disease in Border Collie dogs.

Authors: Pamela Xing Yi Soh; Wei Tse Hsu; Mehar Singh Khatkar; Peter Williamson
Journal: Sci Rep Date: 2021-03-18 Impact factor: 4.379

3. Variation and transmission of the human gut microbiota across multiple familial generations.

Authors: Mireia Valles-Colomer; Rodrigo Bacigalupe; Sara Vieira-Silva; Jeroen Raes; Gwen Falony; Shinya Suzuki; Youssef Darzi; Raul Y Tito; Takuji Yamada; Nicola Segata
Journal: Nat Microbiol Date: 2021-12-30 Impact factor: 17.745

4. Genetic and Metabolic Determinants of Atrial Fibrillation in a General Population Sample: The CHRIS Study.

Authors: David B Emmert; Vladimir Vukovic; Nikola Dordevic; Christian X Weichenberger; Chiara Losi; Yuri D'Elia; Claudia Volpato; Vinicius V Hernandes; Martin Gögele; Luisa Foco; Giulia Pontali; Deborah Mascalzoni; Francisco S Domingues; Rupert Paulmichl; Peter P Pramstaller; Cristian Pattaro; Alessandra Rossini; Johannes Rainer; Christian Fuchsberger; Marzia De Bortoli
Journal: Biomolecules Date: 2021-11-09

5. Identification of Genetic Predispositions Related to Ionizing Radiation in Primary Human Skin Fibroblasts From Survivors of Childhood and Second Primary Cancer as Well as Cancer-Free Controls: Protocol for the Nested Case-Control Study KiKme.

Authors: Manuela Marron; Lara Kim Brackmann; Heike Schwarz; Willempje Hummel-Bartenschlager; Sebastian Zahnreich; Danuta Galetzka; Iris Schmitt; Christian Grad; Philipp Drees; Johannes Hopf; Johanna Mirsch; Peter Scholz-Kreisel; Peter Kaatsch; Alicia Poplawski; Moritz Hess; Harald Binder; Thomas Hankeln; Maria Blettner; Heinz Schmidberger
Journal: JMIR Res Protoc Date: 2021-11-11

6. Comparative assessment of different familial aggregation methods in the context of large and unstructured pedigrees.

Authors: Christian X Weichenberger; Johannes Rainer; Cristian Pattaro; Peter P Pramstaller; Francisco S Domingues
Journal: Bioinformatics Date: 2019-01-01 Impact factor: 6.937

6 in total