| Literature DB >> 30463556 |
Jin Lu1, Jiangwen Sun1, Xinyu Wang1, Henry Kranzler2, Joel Gelernter3, Jinbo Bi4.
Abstract
BACKGROUND: Although substance use disorders (SUDs) are heritable, few genetic risk factors for them have been identified, in part due to the small sample sizes of study populations. To address this limitation, researchers have aggregated subjects from multiple existing genetic studies, but these subjects can have missing phenotypic information, including diagnostic criteria for certain substances that were not originally a focus of study. Recent advances in addiction neurobiology have shown that comorbid SUDs (e.g., the abuse of multiple substances) have similar genetic determinants, which makes it possible to infer missing SUD diagnostic criteria using criteria from another SUD and patient genotypes through statistical modeling.Entities:
Keywords: Addiction; Matrix completion; Parallel computing; Phenotype imputation; Substance use disorder
Mesh:
Year: 2018 PMID: 30463556 PMCID: PMC6249733 DOI: 10.1186/s12918-018-0623-5
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Inferring phenotypes for diagnoses of substance use disorders (e.g., opioid and cocaine illustrated here) via matrix completion. Phenotypes for a set of patients are organized into a matrix F with rows for m patients and columns for n diagnostic symptoms. Related features that describe patients and symptoms are available, such as, genotypes of individuals in the matrix X and pair-wise similarities between diagnostic symptoms in the matrix Y. A bi-linear model: F=XGY+N is used where G and N are model parameters to be learned, and quantify the impact of genotype-symptom interactions and the residual from any other effect (mainly random environmental effect), respectively
Sample size by study and race: African-Americans (AAs) and European-Americans (EAs)
| AAs | EAs | |
|---|---|---|
| CUD association, microarray | 2718 | 2037 |
| CUD association, exome sequencing | 940 | 1395 |
| OUD association, microarray | 1398 | 1756 |
| OUD association, exome sequencing | 540 | 1190 |
| Phenome inference | 1149 | 2292 |
Fig. 2The Comparison of the average RMSE values and standard deviations as bars in Synthetic Experiments I, II, and III
Fig. 3The heatmap of the true G and recovered G matrices in Synthetic Experiment I
The Comparison of RMSE values and computation time of different methods in Synthetic Experiment IV
|
| StoLADMM | LADMM | DirtyIMC | IMC | MAXIDE | |
|---|---|---|---|---|---|---|
| 10% | RMSE | 0.061 | 0.062 | 0.419 | 0.402 | - |
| time(s) | 1.773 | 20.727 | 0.827 | 4.750 | - | |
| 20% | RMSE | 0.095 | 0.098 | 0.453 | 0.468 | - |
| time(s) | 1.475 | 26.781 | 0.757 | 4.297 | - | |
| 30% | RMSE | 0.085 | 0.076 | 0.499 | 0.402 | - |
| time(s) | 1.447 | 18.807 | 0.406 | 4.750 | - | |
| 40% | RMSE | 0.089 | 0.069 | 0.593 | 0.620 | - |
| time(s) | 1.452 | 18.976 | 0.420 | 4.796 | - | |
| 50% | RMSE | 0.081 | 0.076 | 0.716 | 0.700 | - |
| time(s) | 1.382 | 22.057 | 0.248 | 3.156 | - |
Computation time is measured by seconds, and ‘-’ represents running failure, i.e., the method fails due to the out-of-memory issue
The comparison of imputation results by different methods on the Opioid-Cocaine SUD dataset
|
| StoLADMM | LADMM | DirtyIMC | IMC | MAXIDE | NM | |
|---|---|---|---|---|---|---|---|
| 20% | RMSE | 0.236 | 0.231 | 0.297 | 0.230 | 0.235 | 0.567 |
| time(s) | 30.938 | 664.515 | 45.366 | 21.053 | 4732.718 | - | |
| 40% | RMSE | 0.226 | 0.234 | 0.298 | 0.235 | 0.236 | 0.582 |
| time(s) | 29.953 | 982.212 | 21.063 | 20.803 | 3772.202 | - | |
| 60% | RMSE | 0.228 | 0.236 | 0.301 | 0.237 | 0.235 | 0.581 |
| time(s) | 28.719 | 815.841 | 20.269 | 36.737 | 4718.916 | - | |
| 80% | RMSE | 0.236 | 0.237 | 0.303 | 0.239 | 0.241 | 0.585 |
| time(s) | 30.547 | 877.886 | 23.906 | 32.872 | 4011.692 | - | |
| 100% | RMSE | 0.223 | 0.239 | 0.303 | 0.246 | 0.242 | 0.574 |
| time(s) | 30.172 | 489.770 | 22.922 | 24.653 | 3695.292 | - |
Fig. 4The recovered G by our method for the Cocaine-Opioid SUD dataset. Columns C1-C11 represent 11 CUD diagnostic criteria, columns O1-O11 represent 11 OUD diagnostic criteria. C1/O1: Larger or longer Cocaine/Opioid use than intended; C2/O2: Failed efforts to stop on Cocaine/Opioid; C3/O3: Much time spent in Cocaine/Opioid related activities; C4/O4: Strong desire to use Cocaine/Opioid; C5/O5: Cocaine/Opioid effect interfered with life; C6/O6: Cocaine/Opioid use despite of its interference; C7/O7: Major activities reduced by Cocaine/Opioid use; C8/O8: Physical hazard caused by Cocaine/Opioid use; C9/O9: Cocaine/Opioid use knowing it threatening health; C10/O10: Cocaine/Opioid tolerance; C11/O11: Cocaine/Opioid withdrawal syndrome
Fig. 5The top 30 rows of the recovered G by our method for the Cocaine-Opioid SUD dataset. Columns correspond to the diagnostic criteria for CUD and OUD whereas rows correspond to the candidate genetic variants. The right-hand side gives the locations of these genetic variants and their p-values obtained in the GWAS
Fig. 6Gene expression distribution (RPKM, Reads per Kilobase Million) of C8orf48 across human tissues