| Literature DB >> 31501877 |
Natalia Blay1,2,3, Eduard Casas1,2,4, Iván Galván-Femenía1,5, Jan Graffelman6,7, Rafael de Cid1,5, Tanya Vavouri1,2.
Abstract
Analysis of RNA sequencing (RNA-seq) data from related individuals is widely used in clinical and molecular genetics studies. Prediction of kinship from RNA-seq data would be useful for confirming the expected relationships in family based studies and for highlighting samples from related individuals in case-control or population based studies. Currently, reconstruction of pedigrees is largely based on SNPs or microsatellites, obtained from genotyping arrays, whole genome sequencing and whole exome sequencing. Potential problems with using RNA-seq data for kinship detection are the low proportion of the genome that it covers, the highly skewed coverage of exons of different genes depending on expression level and allele-specific expression. In this study we assess the use of RNA-seq data to detect kinship between individuals, through pairwise identity by descent (IBD) estimates. First, we obtained high quality SNPs after successive filters to minimize the effects due to allelic imbalance as well as errors in sequencing, mapping and genotyping. Then, we used these SNPs to calculate pairwise IBD estimates. By analysing both real and simulated RNA-seq data we show that it is possible to identify up to second degree relationships using RNA-seq data of even low to moderate sequencing depth.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31501877 PMCID: PMC6868348 DOI: 10.1093/nar/gkz776
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of data workflow for kinship detection and pedigree reconstruction using RNA-seq data. GQ: genotype quality, DP:depth, IBD: identity by descent, MAF: minor allele frequency.
Figure 2.Structure of datasets used for the assessment of kinship detection using RNA-seq data. (A) Structure of the extended CEPH/UTAH family 1463 with empirical RNA-seq data (26). Additional empirical RNA-seq datasets are included in Supplementary Figure S2. (B) List of real related pairs of individuals with simulated RNA-seq data. (C) Simulated families with simulated RNA-seq data (pedigree types 1–4). Real individuals with simulated RNA-seq data are highlighted in grey. Simulated individuals with simulated RNA-seq data are highlighted in black.
Figure 3.Ternary diagrams of IBD estimates for (A) CEPH/UTAH family 1463, (B) simulated RNA-seq data from real related pairs of individuals and (C) simulated RNA-seq data from simulated related individuals from different pedigree types (type 1–type 4). Note that in ternary diagrams data points that have very similar Z0, Z1 and Z2 values overlap and may not be visible. Ternary diagrams for five additional empirical RNA-seq datasets are shown in Supplementary Figure S2.
Figure 4.Ternary diagrams of IBD estimates for each sequencing depth (top) and number of SNPs used for pairwise comparison per sequencing depth (bottom). Green color indicates that the pedigree was correctly reconstructed, yellow colour indicates that there was no overlapping between groups but the pedigree reconstruction was not possible, and red color indicates that there was an overlapping between groups. The box with a sequencing depth of 52 is the one for the original data, being the mean for the different sequencing depths (from 37M to 67M reads).
Figure 5.Pedigree reconstruction using RNA-seq data. (A) The matrices show the correctly reconstructed simulated pedigrees with our default workflow (left panel), with five additional unrelated individuals (middle panel) and with a higher relatedness cutoff (0.375 instead of 0.2) (right panel). (B) The boxplots summarize the distributions of the number of SNPs used for pairwise comparisons and IBD estimation for each pair of individuals in each simulated pedigree.