| Literature DB >> 30084929 |
Timothy J Peters1, Hugh J French1,2, Stephen T Bradford1,3, Ruth Pidsley1, Clare Stirzaker1,4, Hilal Varinli1,3,5,6, Shalima Nair1, Wenjia Qu1, Jenny Song1, Katherine A Giles1, Aaron L Statham1, Helen Speirs7, Terence P Speed8,9, Susan J Clark1,4.
Abstract
MOTIVATION: A synoptic view of the human genome benefits chiefly from the application of nucleic acid sequencing and microarray technologies. These platforms allow interrogation of patterns such as gene expression and DNA methylation at the vast majority of canonical loci, allowing granular insights and opportunities for validation of original findings. However, problems arise when validating against a "gold standard" measurement, since this immediately biases all subsequent measurements towards that particular technology or protocol. Since all genomic measurements are estimates, in the absence of a "gold standard" we instead empirically assess the measurement precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. Both cross-platform and cross-locus comparisons can be made across all common loci, allowing identification of technology- and locus-specific tendencies.Entities:
Mesh:
Year: 2019 PMID: 30084929 PMCID: PMC6378945 DOI: 10.1093/bioinformatics/bty675
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Description of biological samples used in this study. Datasets are named T1 and T2(A, B) for transcription, M1 for methylation and IL1 for interlaboratory testing
Fig. 2.Graphical depictions of row-linear fits for genes (a) NUBP1 and (b) HCN4 from Dataset T1. (c) Marginal and (d) joint distributions for parameter a, (e) marginal and (f) joint distributions for parameter b, (g) marginal and (h) joint distributions for parameter d, for the entirety of Dataset T1
Fig. 3.Marginal distributions of (a)b and (b)d for dataset T2. Boxplots separating loci into coding and noncoding targets over all platforms for (c) dataset T2 sensitivity, (d) dataset T2 precision, (e) dataset T2B sensitivity and (f) dataset T2B precision
Fig. 4.Marginal distributions for (a) parameter a, (b) parameter b and (c)d, for the entirety of Dataset M1. Graphical depictions of row-linear fits for array-discordant CpG sites for which WGBS data favours (d) the EPIC array and (e) the 450K array. (f) DeFinetti diagram showing the proportions of b described by the three platforms in Dataset M1. We show backtransformed axes to the more interpretable methylation domain (0, 1) in (a), (d) and (e)
Fig. 5.Effect of array normalisation on Dataset M1. Precision of raw 450K and EPIC data (a) split by Type I and Type II probes and (b) total intensity (methylated + unmethylated channel) of Type II probes, and (c and d) the same values post-normalisation
Fig. 6.Effect of (a) repeat regions and (b) cross-hybridisation on array sensitivity from dataset M1. (c) Sensitivity of cross-hybridising probes against the LASSO coefficient of target WGBS values from sparse linear modelling. (d) Effect of cross-hybridisation on the predictive capacity of WGBS measurements for their matched microarray measurements, via LASSO. (e) Precision of WGBS against mean coverage of the samples, for individual CpG loci
Fig. 7.Scatterplots depicting (a)b and (b)d of twenty-one laboratories for both KRAS genotype abundances, from Dataset IL1. Number plotted denotes laboratory ID