| Literature DB >> 25519375 |
Lei Sun1, Apostolos Dimitromanolakis2.
Abstract
Pedigree errors and cryptic relatedness often appear in families or population samples collected for genetic studies. If not identified, these issues can lead to either increased false negatives or false positives in both linkage and association analyses. To identify pedigree errors and cryptic relatedness among individuals from the 20 San Antonio Family Studies (SAFS) families and cryptic relatedness among the 157 putatively unrelated individuals, we apply PREST-plus to the genome-wide single-nucleotide polymorphism (SNP) data and analyze estimated identity-by-descent (IBD) distributions for all pairs of genotyped individuals. Based on the given pedigrees alone, PREST-plus identifies the following putative pairs: 1091 full-sib, 162 half-sib, 360 grandparent-grandchild, 2269 avuncular, 2717 first cousin, 402 half-avuncular, 559 half-first cousin, 2 half-sib+first cousin, 957 parent-offspring and 440,546 unrelated. Using the genotype data, PREST-plus detects 7 mis-specified relative pairs, with their IBD estimates clearly deviating from the null expectations, and it identifies 4 cryptic related pairs involving 7 individuals from 6 families.Entities:
Year: 2014 PMID: 25519375 PMCID: PMC4143714 DOI: 10.1186/1753-6561-8-S1-S23
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
IBD distribution and kinship coefficient for the relationship types (reltype) considered by PREST-plus
| reltype coding in PREST-plus | Relationship type | Distribution of IBD sharing | Kinship coefficient, φ | ||
|---|---|---|---|---|---|
|
|
|
| |||
| 11 | MZ-twin (MZ) | 0.000 | 0.000 | 1.000 | 0.50000 |
| 10 | parent-offspring (PO) | 0.000 | 1.000 | 0.000 | 0.25000 |
| 1 | full-sib (FS) | 0.250 | 0.500 | 0.250 | 0.25000 |
| 9 | half-sib+first cousin (HSFC) | 0.375 | 0.500 | 0.125 | 0.18750 |
| 2 | half-sib (HS) | 0.500 | 0.500 | 0.000 | 0.12500 |
| 3 | grandparent-grandchild (GPC) | 0.500 | 0.500 | 0.000 | 0.12500 |
| 4 | avuncular (AV) | 0.500 | 0.500 | 0.000 | 0.12500 |
| 5 | first cousin (FC) | 0.750 | 0.250 | 0.000 | 0.06250 |
| 7 | half-avuncular (HAV) | 0.750 | 0.250 | 0.000 | 0.06250 |
| 8 | half-first cousin (HFC) | 0.875 | 0.125 | 0.000 | 0.03125 |
| 6 | unrelated (UN) | 1.000 | 0.000 | 0.000 | 0.00000 |
| 99 | other types (Others) | NA | NA | NA | NA |
Figure 1Results of analysis 1: relationship IBD estimation within and between the 20 SAFS families using PREST-plus. The figures are stratified by the null putative relationship, R, as defined by the given pedigrees. The red cross marks the expected IBD distribution for Ras provided in Table 1. Each black dot shows the estimated pvs. pbased on the obseved genotype data for each of the 455,535 genotyped pairs analyzed, inlucidng 1091 full-sib, 162 half-sib, 360 grandparent-grandchild, 2269 avuncular, 2717 first-cousin, 440,546 unrelated (from both within and across families), 402 half-avuncular, 559 half-first cousin, 2 half-sib+first cousin, 957 parent-offspring, and 6470 other types of pairs. Blue circles mark the obvious outliers as detailed in Table 2.
Relationship estimation results for clear outliers in Figure 1 identified by analysis 1.
| Estimated | ||||||||
|---|---|---|---|---|---|---|---|---|
| FID1a | IID1b | FID2a | IID2b | reltypec | commarkd |
|
|
|
| 3 | T2DG0300174 | 3 | T2DG0300175 | 1 | 49009 | 0.0000 | 0.0000 | 1.0000 |
| 4 | T2DG0400281 | 4 | T2DG0400282 | 1 | 48996 | 0.0000 | 0.0000 | 1.0000 |
| 4 | T2DG0400265 | 4 | T2DG0400266 | 2 | 48994 | 0.358 | 0.4511 | 0.1909 |
| 21 | T2DG2100946 | 21 | T2DG2100947 | 2 | 48957 | 0.3112 | 0.5566 | 0.1322 |
| 21 | T2DG2100952 | 21 | T2DG2100966 | 4 | 48949 | 0.9876 | 0.0109 | 0.0015 |
| 4 | T2DG0400207 | 4 | T2DG0400260 | 6 | 48955 | 0.4759 | 0.5157 | 0.0084 |
| 4 | T2DG0400207 | 4 | T2DG0400247 | 6 | 47503 | 0.6094 | 0.3723 | 0.0182 |
These 7 pairs of individuals have their estimated IBD distributions clearly deviating from the null expected values as specified in Table 1.
a Family ID.
b Individual ID.
c Relationship type as in Table 1.
d The number of common markers genotyped for both individuals.
Figure 2Results of analysis 2: relationship IBD estimation among the 141 genotyped putatively unrelated individuals in the "UNREL.txt" file. The red cross marks the the IBD distribution expected for unrelated. Each black dot shows the estimated pvs. pbased on the obseved genotype data for each of the 9870 putatively unrelated pairs analyzed by PREST-plus (left) and PLINK(right). Blue circles mark the obvious outliers as detailed in Table 3.
Relationship estimation results for clear outliers in Figure 2 identified by analysis 2
| PREST-plus estimated | PLINK estimated | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| FID1 | IID1 | FID2 | IID2 | reltype | commark |
|
|
|
|
|
|
| 9 | T2DG0901244 | 10 | T2DG1000566 | 6 | 48912 | 0.8159 | 0.1735 | 0.0105 | 1.0000 | 0.0000 | 0.0000 |
| 8 | T2DG0800497 | 9 | T2DG0901244 | 6 | 48957 | 0.8304 | 0.1696 | 0.0000 | 1.0000 | 0.0000 | 0.0000 |
| 21 | T2DG2100951 | 25 | T2DG2501033 | 6 | 48940 | 0.8174 | 0.1826 | 0.0000 | 0.7713 | 0.2287 | 0.0000 |
| 4 | T2DG0400207 | 4 | T2DG0400247 | 6 | 47503 | 0.6142 | 0.3673 | 0.0185 | 0.7460 | 0.1972 | 0.0568 |
These four pairs of individuals have their estimated IBD distributions clearly deviating from the values expected for unrelated pairs.
Relationship testing results for clear outliers in Figure 1 identified by analysis 1, and in Figure 2 identified by analysis 2
| FID1 | IID1 | FID2 | IID2 | null reltype | plausible reltype | ||||
|---|---|---|---|---|---|---|---|---|---|
| The 7 outliers identified by analysis 1 | |||||||||
| 3 | T2DG0300174 | 3 | T2DG0300175 | 1 | full-sib | 0 | 11 | MZ-twins | N/A |
| 4 | T2DG0400281 | 4 | T2DG0400282 | 1 | full-sib | 0 | 11 | MZ-twins | N/A |
| 4 | T2DG0400265 | 4 | T2DG0400266 | 2 | half-sib | 0 | 9 | half-sib+first cousin | 0.254 |
| 21 | T2DG2100946 | 21 | T2DG2100947 | 2 | half-sib | 0 | 9 | half-sib+first cousin | 0.432 |
| 21 | T2DG2100952 | 21 | T2DG2100966 | 4 | avunuclar | 0 | 6 | unrelated | 0.891 |
| 4 | T2DG0400207 | 4 | T2DG0400260 | 6 | unrelated | 0 | 2 | half-sib | 0.328 |
| 4 | T2DG0400207 | 4 | T2DG0400247 | 6 | unrelated | 0 | 5 | first cousin | 0.752 |
| The 4 outliers identified by analysis 2 | |||||||||
| 9 | T2DG0901244 | 10 | T2DG1000566 | 6 | unrelated | 0 | 8 | half-first cousin | 0.112 |
| 8 | T2DG0800497 | 9 | T2DG0901244 | 6 | unrelated | 0.007 | 8 | half-first cousin | 0.673 |
| 21 | T2DG2100951 | 25 | T2DG2501033 | 6 | unrelated | 0 | 5 | first cousin | 0.633 |
| 4 | T2DG0400207 | 4 | T2DG0400247 | 6 | unrelated | 0 | 5 | first cousin | 0.712 |
Empirical p-values are based on 25,000 simulated replicates, with genotype data simulated under a specified relationship type. The simulating relationship type can be the null relationship defined by the given pedigrees (i.e. the null reltype) or another relationship type (i.e. the plausible reltype) as listed in Table 1. The possible plausible relationship types are not unique and the table provides the one with the highest p-values. Small p-value for testing the null reltype suggests that the observed genotype data are not compatible with the null relationship defined by the given pedigree, whereas large p-value for testing the plausible reltype suggests that the observed genotype data are compatible with the proposed alternative.