| Literature DB >> 31672996 |
Olga Chervova1, Lucia Conde2, José Afonso Guerra-Assunção2, Ismail Moghul2, Amy P Webster2, Alison Berner2,3, Elizabeth Larose Cadieux2,4, Yuan Tian2, Vitaly Voloshin5, Tiago F Jesus6, Rifat Hamoudi7,8, Javier Herrero2, Stephan Beck9.
Abstract
Integrative analysis of multi-omics data is a powerful approach for gaining functional insights into biological and medical processes. Conducting these multifaceted analyses on human samples is often complicated by the fact that the raw sequencing output is rarely available under open access. The Personal Genome Project UK (PGP-UK) is one of few resources that recruits its participants under open consent and makes the resulting multi-omics data freely and openly available. As part of this resource, we describe the PGP-UK multi-omics reference panel consisting of ten genomic, methylomic and transcriptomic data. Specifically, we outline the data processing, quality control and validation procedures which were implemented to ensure data integrity and exclude sample mix-ups. In addition, we provide a REST API to facilitate the download of the entire PGP-UK dataset. The data are also available from two cloud-based environments, providing platforms for free integrated analysis. In conclusion, the genotype-validated PGP-UK multi-omics human reference panel described here provides a valuable new open access resource for integrated analyses in support of personal and medical genomics.Entities:
Mesh:
Year: 2019 PMID: 31672996 PMCID: PMC6823446 DOI: 10.1038/s41597-019-0205-4
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1PGP-UK workflow. Horizontal panels depict the general sample/data categories and options (e.g blood and/or saliva) and vertical panels depict specific data types and their flow from start to end.
PGP-UK data identifiers for the reference panel comprised of 10 PGP-UK participants.
| Sample ID | EBI ID | Tissue | WGS | WGBS | 450 k | RNA-seq |
|---|---|---|---|---|---|---|
| ENA | ENA | Array Express | ENA | |||
| uk35C650 | SAMEA4545245 | blood | ERX1796409 | ERX2408504 | 101130760050_R04C02 | ERX2373318 |
| saliva | 101130760049_R03C01 | |||||
| uk2E2AAE | SAMEA4545246 | blood | ERX1796410 | ERX2408505 | 101130760050_R05C02 | ERX2373321 |
| saliva | 101130760050_R03C01 | |||||
| uk2DF242 | SAMEA4545247 | blood | ERX1796411 | ERX2408506 | 101130760049_R06C02 | ERX2373317 |
| saliva | 101130760049_R03C02 | |||||
| uk740176 | SAMEA4545248 | blood | ERX1796412 | ERX2408507 | 101130760050_R06C02 | ERX2373324 |
| saliva | 101130760050_R06C01 | |||||
| uk33D02F | SAMEA4545249 | blood | ERX1796413 | ERX2408508 | 101130760049_R05C02 | ERX2373316 |
| saliva | 101130760049_R04C02 | |||||
| uk0C72FF | SAMEA4545250 | blood | ERX1796414 | ERX2408509 | 101130760049_R06C01 | ERX2373322 |
| saliva | 101130760050_R01C01 | |||||
| uk1097F9 | SAMEA4545251 | blood | ERX1796415 | ERX2408510 | 101130760050_R02C01 | ERX2373320 |
| saliva | 101130760050_R01C02 | |||||
| uk174659 | SAMEA4545252 | blood | ERX1796416 | ERX2408511 | 101130760050_R05C01 | ERX2373325 |
| saliva | 101130760049_R05C01 | |||||
| uk85AA3B | SAMEA4545253 | blood | ERX1796417 | ERX2408512 | 101130760049_R02C02 | ERX2373323 |
| saliva | 101130760049_R01C01 | |||||
| uk481F67 | SAMEA4545254 | blood | ERX1796418 | ERX2408513 | 101130760049_R02C01 | ERX2373319 |
| saliva | 101130760050_R02C02 |
For each of them WGS, WGBS and RNA-seq data were obtained from blood samples, methylation profiles were obtained using 450 k from both blood and saliva samples. The table contains ENA accession numbers for WGS, WGBS and RNA-seq, for 450 k data it shows Sentrix IDs and positions, separated by the underscores.
Fig. 2PGP-UK QC images for WGS, WGBS, RNA-seq and 450 k methylation data. (a) WGS coverage depth plot. (b) WGBS coverage depth plot. (c) RNA-seq reads distribution over the different genome features. (d) Density plot for Illumina 450 k methylation profiles.
Quality control metrics summary of the WGS data derived from blood samples of 10 PGP-UK participants.
| Sample ID | Median | Bases | Duplicated Reads, % | GC Content, % | ||
|---|---|---|---|---|---|---|
| Read 1 | Read 2 | Read 1 | Read 2 | |||
| uk35C650 | 32.0X | 64% | 8.0% | 6.3% | 40% | 41% |
| uk2E2AAE | 47.0X | 95% | 18.3% | 18.4% | 41% | 41% |
| uk2DF242 | 35.0X | 75% | 10.2% | 13.6% | 41% | 41% |
| uk740176 | 35.0X | 80% | 8.3% | 9.6% | 40% | 41% |
| uk33D02F | 31.0X | 58% | 11.2% | 12.1% | 41% | 41% |
| uk0C72FF | 31.0X | 57% | 3.7% | 8.1% | 41% | 41% |
| uk1097F9 | 39.0X | 85% | 4.5% | 12.7% | 40% | 41% |
| uk174659 | 35.0X | 78% | 8.5% | 15.1% | 41% | 41% |
| uk85AA3B | 37.0X | 85% | 6.1% | 3.2% | 41% | 41% |
| uk481F67 | 30.0X | 54% | 8.6% | 7.0% | 41% | 41% |
The table contains median coverage depth, percentage of bases covered with at least 30X depth, as well as duplicated reads and GC contents percentages for both forward and reverse reads for each sample.
Quality control metrics summary of the WGBS data derived from blood samples of 10 PGP-UK participants.
| Sample ID | Median | Bases | Duplicated Reads, % | GC Content, % | ||
|---|---|---|---|---|---|---|
| Read 1 | Read 2 | Read 1 | Read 2 | |||
| uk35C650 | 10.0X | 15% | 27.3% | 13.3% | 26% | 29% |
| uk2E2AAE | 15.0X | 20% | 39.4% | 20.3% | 24% | 27% |
| uk2DF242 | 16.0X | 23% | 28.0% | 12.4% | 24% | 27% |
| uk740176 | 15.0X | 20% | 25.8% | 12.6% | 25% | 27% |
| uk33D02F | 16.0X | 20% | 26.3% | 13.1% | 24% | 27% |
| uk0C72FF | 14.0X | 18% | 26.8% | 11.4% | 25% | 28% |
| uk1097F9 | 14.0X | 15% | 26.0% | 10.8% | 24% | 27% |
| uk174659 | 14.0X | 17% | 27.1% | 15.5% | 24% | 27% |
| uk85AA3B | 16.0X | 19% | 28.3% | 14.9% | 24% | 27% |
| uk481F67 | 15.0X | 25% | 31.6% | 17.4% | 26% | 29% |
The table contains median coverage depth, percentage of bases covered with at least 30X depth, as well as duplicated reads and GC contents percentages for both forward and reverse reads for each sample.
Quality control metrics summary of the RNA-seq data derived from blood samples of 10 PGP-UK participants.
| Sample ID | RIN | Uniquely | Duplicated Reads, % | GC Content, % | ||
|---|---|---|---|---|---|---|
| Read 1 | Read 2 | Read 1 | Read 2 | |||
| uk35C650 | 8.8 | 88.8% | 83.2% | 80.6% | 53% | 56% |
| uk2E2AAE | 9.1 | 89.3% | 85.9% | 82.3% | 53% | 56% |
| uk2DF242 | 9.2 | 90.0% | 86.3% | 81.9% | 53% | 56% |
| uk740176 | 8.5 | 90.0% | 84.8% | 80.6% | 53% | 56% |
| uk33D02F | 8.3 | 87.0% | 85.5% | 82.6% | 53% | 56% |
| uk0C72FF | 7.9 | 86.7% | 85.0% | 82.5% | 53% | 56% |
| uk1097F9 | 8.7 | 86.1% | 86.5% | 82.6% | 54% | 57% |
| uk174659 | 9.3 | 90.4% | 84.4% | 81.3% | 53% | 56% |
| uk85AA3B | 8.6 | 89.0% | 84.9% | 81.2% | 53% | 56% |
| uk481F67 | 7.1 | 90.4% | 87.3% | 83.7% | 52% | 55% |
The table contains RIN value, percentages of uniquely aligned bases, as well as duplicated reads and GC contents percentages for both forward and reverse reads for each sample.
Quality control metrics summary of the Illumina 450 k data derived from blood and saliva samples of 10 PGP-UK participants.
| Sample ID | Tissue | Detection | Bead Count | ||
|---|---|---|---|---|---|
| Available, % | Available, % | ||||
| uk35C650 | blood | 100% | 99.98476% | 99.91370% | 100% |
| saliva | 100% | 99.97899% | 99.93710% | 100% | |
| uk2E2AAE | blood | 100% | 99.92297% | 99.92503% | 100% |
| saliva | 100% | 99.93491% | 99.93670% | 100% | |
| uk2DF242 | blood | 100% | 99.96478% | 99.90896% | 100% |
| saliva | 100% | 99.97178% | 99.92110% | 100% | |
| uk740176 | blood | 100% | 99.91638% | 99.90361% | 100% |
| saliva | 100% | 99.91638% | 99.91800% | 100% | |
| uk33D02F | blood | 100% | 99.92791% | 99.91761% | 100% |
| saliva | 100% | 99.92771% | 99.93120% | 100% | |
| uk0C72FF | blood | 100% | 99.97467% | 99.90484% | 100% |
| saliva | 100% | 99.98558% | 99.88910% | 100% | |
| uk1097F9 | blood | 100% | 99.98929% | 99.92460% | 100% |
| saliva | 100% | 99.98744% | 99.92150% | 100% | |
| uk174659 | blood | 100% | 99.97714% | 99.94151% | 100% |
| saliva | 100% | 99.97899% | 99.92190% | 100% | |
| uk85AA3B | blood | 100% | 99.93327% | 99.89351% | 100% |
| saliva | 100% | 99.94089% | 99.91670% | 100% | |
| uk481F67 | blood | 100% | 99.98126% | 99.91514% | 100% |
| saliva | 100% | 99.98105% | 99.92130% | 100% | |
The table contains percentages of available detection p-values and bead count numbers together with percentages of p < 0.01 and bead count numbers 3 and above for each sample.
Fig. 3Multi-Omics Data Matching. (a) Multi-Omics Data Matching Schema. 65 loci were used in matching WGS with methylation and WGBS data, 279 loci were used in matching WGS with RNA-seq data. (b) Correlation plot displaying matching results for WGS vs. 450 k datasets. (c) Correlation plot displaying matching results for WGS vs. WGBS datasets. (d) Correlation plot displaying matching results for WGS vs. RNA-seq datasets. On correlation plots (b–d) scale is represented by the combination of ball size and colour (from white to dark blue) and goes from 0 (0% match) to 1 (perfect 100% match).
Summary of data cross-validation between 450 k, WGBS and RNA-seq against WGS.
| Sample ID | WGS vs. 450 k | WGS vs. WGBS | WGS vs. RNA-seq | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Loci, n | Loci, % | matched, % | Loci, n | Loci, % | matched, % | Loci, n | Loci, % | matched, % | |
| uk35C650 | 65 | 100 | 100 | 52 | 80 | 100 | 161 | 57.71 | 81.99 |
| uk2E2AAE | 65 | 100 | 100 | 51 | 78.46 | 100 | 172 | 61.65 | 75.58 |
| uk2DF242 | 65 | 100 | 100 | 58 | 89.23 | 100 | 183 | 65.59 | 70.49 |
| uk740176 | 65 | 100 | 100 | 61 | 93.85 | 98.36 | 152 | 54.48 | 80.26 |
| uk33D02F | 65 | 100 | 100 | 53 | 81.54 | 100 | 188 | 67.38 | 69.68 |
| uk0C72FF | 65 | 100 | 100 | 57 | 87.69 | 98.25 | 159 | 56.99 | 81.13 |
| uk1097F9 | 65 | 100 | 100 | 54 | 83.08 | 98.15 | 190 | 68.10 | 73.68 |
| uk174659 | 65 | 100 | 100 | 52 | 80 | 100 | 197 | 70.61 | 74.62 |
| uk85AA3B | 65 | 100 | 100 | 60 | 92.30 | 100 | 167 | 59.86 | 83.23 |
| uk481F67 | 65 | 100 | 100 | 53 | 81.54 | 100 | 169 | 60.57 | 71.01 |
Columns Loci, n and Loci, % contain respective numbers and percentages of loci used for matching (out of 65 loci for WGS and WGBS vs. 450 k and out of 279 loci for WGS vs. RNA-seq).
| Measurement(s) | DNA methylation profiling data • whole genome sequencing assay • bisulfite sequencing assay • transcription profiling assay |
| Technology Type(s) | DNA methylation profiling assay • DNA sequencing • RNA sequencing |
| Factor Type(s) | age • sex • smoking status |
| Sample Characteristic - Organism | Homo sapiens |