| Literature DB >> 32282793 |
Yi Jiang1,2,3, Gina Giase4, Kay Grennan5, Annie W Shieh5, Yan Xia1,5, Lide Han3, Quan Wang2,3, Qiang Wei2,3, Rui Chen2,3, Sihan Liu1, Kevin P White6,7, Chao Chen1,8, Bingshan Li2,3, Chunyu Liu1,5,9.
Abstract
Studies of complex disorders benefit from integrative analyses of multiple omics data. Yet, sample mix-ups frequently occur in multi-omics studies, weakening statistical power and risking false findings. Accurately aligning sample information, genotype, and corresponding omics data is critical for integrative analyses. We developed DRAMS (https://github.com/Yi-Jiang/DRAMS) to Detect and Re-Align Mixed-up Samples to address the sample mix-up problem. It uses a logistic regression model followed by a modified topological sorting algorithm to identify the potential true IDs based on data relationships of multi-omics. According to tests using simulated data, the more types of omics data used or the smaller the proportion of mix-ups, the better that DRAMS performs. Applying DRAMS to real data from the PsychENCODE BrainGVEX project, we detected and corrected 201 (12.5% of total data generated) mix-ups. Of the 21 mix-ups involving errors of racial identity, DRAMS re-assigned all data to the correct racial group in the 1000 Genomes project. In doing so, quantitative trait loci (QTL) (FDR<0.01) increased by an average of 1.62-fold. The use of DRAMS in multi-omics studies will strengthen statistical power of the study and improve quality of the results. Even though very limited studies have multi-omics data in place, we expect such data will increase quickly with the needs of DRAMS.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32282793 PMCID: PMC7179940 DOI: 10.1371/journal.pcbi.1007522
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 3Summary of highly related data pairs.
Summary of highly related data pairs among the six omics types. A highly related data pair was defined as a data pair with genetic relatedness score > 0.65. M: Matched pairs (highly related data pairs that have the same individual IDs), Mis: Mismatched pairs (highly related data pairs with different individual IDs).
Summary of samples and sample corrections in BrainGVEX.
| Data type | Number of samples | Number of contaminated samples | Number of | Number of | Number of samples missing sex information | Proportion of sex-matched samples | Number of samples switched IDs | Number of samples unrelated to any other | Number of |
|---|---|---|---|---|---|---|---|---|---|
| WGS | 285 | 1 | 256 | 24 | 5 | 0.914 | 54 (19.0%) | 0 | 0 |
| PsychChip | 263 | 0 | 244 | 19 | 0 | 0.928 | 43 (16.3%) | 0 | 0 |
| Affymetrix | 137 | 0 | - | - | - | - | 0 | 1 | - |
| ATAC-Seq | 295 | 0 | 260 | 28 | 7 | 0.903 | 55 (18.6%) | 3 | 15 |
| RNA-Seq | 426 | 0 | 414 | 3 | 9 | 0.993 | 3 (0.7%) | 0 | 3 |
| Ribo-Seq | 197 | 0 | - | - | - | - | 50 (25.4%) | 4 | - |
| Total | 1603 | 1 | 1174 | 74 | 21 | - | 201 (12.5%) | 8 | 18 |
Note: “Number of sex-matched samples” indicates the number of samples with the same reported sex and SNP-inferred sex.
Increased number of cis-QTLs after correcting data IDs.
| QTL type | Category | Before | After | Fold change | Novel QTLs after correcting IDs (π1 in GTEx | Discarded QTLs after correcting IDs (π1 in GTEx) |
|---|---|---|---|---|---|---|
| WGS vs. RNA-Seq | Sample size | 278 | 273 | - | - | - |
| #QTLs (FDR<0.01) | 57,209 | 96,242 | 1.68 | 43,266 (0.608) | 4,233 (0.246) | |
| #QTLs (FDR<0.05) | 90,231 | 147,942 | 1.64 | 66,993 (0.475) | 9,282 (0.213) | |
| WGS vs. Ribo-Seq | Sample size | 191 | 187 | - | - | - |
| #QTLs (FDR<0.01) | 18,178 | 31,345 | 1.72 | - | - | |
| #QTLs (FDR<0.05) | 30,641 | 48,306 | 1.58 | - | - | |
| PsychChip vs. RNA-Seq | Sample size | 259 | 253 | - | - | - |
| #QTLs (FDR<0.01) | 48,742 | 76,995 | 1.58 | 31,801 (0.638) | 3,548 (0.581) | |
| #QTLs (FDR<0.05) | 77,925 | 117,711 | 1.51 | 49,682 (0.519) | 9,896 (0.246) | |
| PsychChip vs. Ribo-Seq | Sample size | 177 | 172 | - | - | - |
| #QTLs (FDR<0.01) | 15,028 | 20,447 | 1.36 | - | - | |
| #QTLs (FDR<0.05) | 26,350 | 32,209 | 1.22 | - | - | |
| Total | #QTLs (FDR<0.01) | 139,157 | 225,029 | 1.62 | 59,399 | 17,020 |
| #QTLs (FDR<0.05) | 225,147 | 346,168 | 1.54 | 95,495 | 34,684 |
Note: Only chromosome 1 was used to save computing time.
* π1 was estimated using the “qvalue” package in R.