| Literature DB >> 31375670 |
Andrew D McEachran1,2, Ilya Balabin3, Tommy Cathey4, Thomas R Transue4, Hussein Al-Ghoul5, Chris Grulke6, Jon R Sobus7, Antony J Williams8.
Abstract
Confident identification of unknown chemicals in high resolution mass spectrometry (HRMS) screening studies requires cohesive workflows and complementary data, tools, and software. Chemistry databases, screening libraries, and chemical metadata have become fixtures in identification workflows. To increase confidence in compound identifications, the use of structural fragmentation data collected via tandem mass spectrometry (MS/MS or MS2) is vital. However, the availability of empirically collected MS/MS data for identification of unknowns is limited. Researchers have therefore turned to in silico generation of MS/MS data for use in HRMS-based screening studies. This paper describes the generation en masse of predicted MS/MS spectra for the entirety of the US EPA's DSSTox database using competitive fragmentation modelling and a freely available open source tool, CFM-ID. The generated dataset comprises predicted MS/MS spectra for ~700,000 structures, and mappings between predicted spectra, structures, associated substances, and chemical metadata. Together, these resources facilitate improved compound identifications in HRMS screening studies. These data are accessible via an SQL database, a comma-separated export file (.csv), and EPA's CompTox Chemicals Dashboard.Entities:
Year: 2019 PMID: 31375670 PMCID: PMC6677792 DOI: 10.1038/s41597-019-0145-z
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Search results from an MS-Ready Formula search of C15H16O2 Candidate chemical structures are denoted by a DTXSID and preferred name and contain linked metadata such as the Number of Sources, CPDat Count, and PubMed Ref. Count. Rank ordering by metadata brings the most likely chemicals to the top of the search results list.
Fig. 2Chemical structure metadata information followed by predicted MS/MS data included in the .dat ASCII prediction files using the example of DTXCID80539702 in ESI-positive mode. Only the first ~50 lines of predictions are shown and structural annotations with SMILES succeeding predictions are not included in the image.
Fig. 3Enhanced Entity Relationship (EER) Diagram of the MySQL database created to host predicted MS/MS data generated using CFM-ID.
Chemical metadata for a subset of chemicals defined by DTXCID.
| DTXCID | DTXSID | PREFERRED_NAME | CASRN | MS_READY_MOLECULAR_FORMULA | MS_READY_MONOISOTOPIC_MASS | MS_READY_SMILES | DATA_SOURCES | NUMBER_OF_PUBMED_ARTICLES | PUBCHEM_DATA_SOURCES | CPDAT_COUNT | SUSDAT | STOFFIDENT | TOXCAST |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DTXCID8068549 | DTXSID00146058 | Tetrazepam | 10379-14-3 | C16H17ClN2O | 288.10294 | CN1C2=C(C=C(Cl)C=C2)C(=NCC1=O)C1=CCCCC1 | 26 | 85 | 30 | 5 | Y | Y | - |
| DTXCID0077853 | DTXSID00155362 | N(4)-Acetylsulfadiazine | 127-74-2 | C12 H12N4O3S | 292.06301 | CC(=O)NC1=CC=C(C=C1)S(=O)(=O)NC1=NC=CC=N1 | 19 | 7 | 51 | — | Y | Y | — |
| DTXCID20208682 | DTXSID00173127 | N-L-Alanyl-L-alanine | 1948-31-8 | C6H12N2O3 | 160.08479 | CC(N)C(=O)NC(C)C(O)=O | 13 | 172 | 53 | — | Y | — | — |
| DTXCID10104684 | DTXSID00182193 | (8,8′-Bi-2H-1-benzopyran)-2,2′-dione, 4,4′,7,7′-tetramethoxy-5,5′-dimethyl-, (+)- (9CI) | 27909-08-6 | C24H22O8 | 438.13147 | COC1=CC(=O)OC2=C1C(C)=CC(OC)=C2C1=C(OC)C=C(C)C2=C1OC(=O)C=C2OC | 7 | — | 15 | — | Y | — | — |
| DTXCID40105487 | DTXSID00182996 | Methyl naphthoate | 28804-90-2 | C12H10O2 | 186.06808 | COC(=O)C1=CC=CC2=CC=CC=C12 | 12 | — | 49 | — | Y | — | — |
| DTXCID60122353 | DTXSID00199862 | Dioxypyramidon | 519-65-3 | C13H17N3O3 | 263.12699 | CN(C)C(=O)C(=O)N(N(C)C(C)=O)C1=CC=CC=C1 | 12 | — | 18 | — | Y | Y | — |
| DTXCID6022 | DTXSID0020022 | Acifluorfen | 50594-66-6 | C14H7ClF3NO5 | 360.99648 | OC(=O)C1=C(C=CC(OC2=CC=C(C=C2Cl)C(F)(F)F)=C1)[N+]([O-])=O | 65 | 50 | 74 | 36 | Y | Y | Y |
| DTXCID5074 | DTXSID0020074 | Gabapentin | 60142-96-3 | C9H17NO2 | 171.12593 | NCC1(CC(O)=O)CCCCC1 | 53 | 3053 | 177 | 29 | Y | Y | - |
| DTXCID9076 | DTXSID0020076 | Amitrole | 61-82-5 | C2H4N4 | 84.043596 | NC1=NNC=N1 | 88 | 7089 | 200 | 28 | Y | — | Y |
| DTXCID00209011 | DTXSID0020107 | Aspartame | 22839-47-0 | C14H18N2O5 | 294.12157 | COC(=O)C(CC1=CC=CC=C1)NC(=O)C(N)CC(O)=O | 59 | 862 | 111 | 84 | Y | Y | Y |
| DTXCID40232 | DTXSID0020232 | Caffeine | 58-08-2 | C8H10N4O2 | 194.08038 | CN1C=NC2=C1C(=O)N(C)C(=O)N2C | 116 | 21207 | 287 | 2384 | Y | Y | Y |
| DTXCID80311 | DTXSID0020311 | Monuron | 150-68-5 | C9H11ClN2O | 198.05599 | CN(C)C(=O)NC1=CC=C(Cl)C=C1 | 72 | 24 | 77 | 47 | Y | Y | Y |
| DTXCID20440 | DTXSID0020440 | Dichlorprop | 120-36-5 | C9H8Cl2O3 | 233.98505 | CC(OC1=C(Cl)C=C(Cl)C=C1)C(O)=O | 77 | 89 | 105 | 73 | Y | Y | Y |
| DTXCID80442 | DTXSID0020442 | 2,4-Dichlorophenoxyacetic acid | 94-75-7 | C8H6Cl2O3 | 219.9694 | OC(=O)COC1=C(Cl)C=C(Cl)C=C1 | 115 | 2614 | 175 | 173 | Y | Y | Y |
| DTXCID00446 | DTXSID0020446 | Diuron | 330-54-1 | C9H10Cl2N2O | 232.01702 | CN(C)C(=O)NC1=CC(Cl)=C(Cl)C=C1 | 110 | 1257 | 132 | 252 | Y | Y | Y |
| Design Type(s) | chemical structure classification objective • modeling and simulation objective • mass spectrometry data analysis objective |
| Measurement Type(s) | tandem mass spectrometry |
| Technology Type(s) | digital curation |
| Factor Type(s) | chemical entity |
| Sample Characteristic(s) |