| Literature DB >> 31645559 |
R Alizadehsani1, M Roshanzamir2, M Abdar3, A Beykikhoshk4, A Khosravi1, M Panahiazar5, A Koohestani1, F Khozeimeh6, S Nahavandi1, N Sarrafzadegan7,8.
Abstract
We present the coronary artery disease (CAD) database, a comprehensive resource, comprising 126 papers and 68 datasets relevant to CAD diagnosis, extracted from the scientific literature from 1992 and 2018. These data were collected to help advance research on CAD-related machine learning and data mining algorithms, and hopefully to ultimately advance clinical diagnosis and early treatment. To aid users, we have also built a web application that presents the database through various reports.Entities:
Mesh:
Year: 2019 PMID: 31645559 PMCID: PMC6811630 DOI: 10.1038/s41597-019-0206-3
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
The fields of “Journals/Conferences” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| Journal/ConferenceID | ✓ | The ID defined for each journal or conference | |
| Journal/ConferenceType | This field indicates if this record is a journal or conference | ||
| Journal/ConferenceName | The name of the journal or conference | ||
| Publisher | The publisher of journal or conference |
The fields of “Papers_ImportantFeatures” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| PaperID | ✓ | Papers (PaperID) | The ID defined for each paper |
| DiseaseID | ✓ | Diseases (DiseaseID) | The ID defined for each disease |
| DatasetID | ✓ | Datasets (DatasetID) | The ID defined for each dataset |
| FeatureID | ✓ | Features (FeatureID) | The ID defined for each feature |
| FeatureRank | The reported ranked of the feature |
Based on these tables, we prepared various reports to extract important information from them. The description of the most important reports is shown in Table 16. As there are many reports extracted from this database, the description of other reports is available from our web application help section.
The fields of “Authors” table, their properties and descriptions.
| Field name | P. K. | F. K. from Table | Description |
|---|---|---|---|
| AuthorID | ✓ | The ID defined for each author | |
| AuthorName | The name of the author |
The fields of “Dataset” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| DatasetID | ✓ | The ID defined for each dataset | |
| DatasetName | The name of the dataset | ||
| DatasetSampleSize | Number of records in each dataset | ||
| Country | The name of the country where the dataset was collected |
The fields of “Disease” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| DiseaseID | ✓ | The ID defined for each disease | |
| DiseaseName | The name of the disease (for now they are only CAD or stenosis of LAD, LCX or RCA) |
The fields of “Features” table, their properties and descriptions.
| Field name | P. K. | F. K. from Table | Description |
|---|---|---|---|
| FeatureID | ✓ | The ID defined for each feature | |
| FeatureName | The name of each feature | ||
| Abbreviation | Abbreviation of each feature (if it exists and is used commonly) | ||
| FeatureCategory | The category that the feature belongs to |
The fields of “Methods” table, their properties and descriptions.
| Field name | P. K. | F. K. from Table | Description |
|---|---|---|---|
| MethodID | ✓ | The ID defined for each method | |
| MethodName | The name of the machine learning method that was used in each study | ||
| MethodCategory | The category that this method belongs to |
The fields of “Feature Selection Algorithms” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| FeatureSelectionID | ✓ | The ID defined for the feature selection algorithm | |
| FeatureSelectionName | The name of the feature selection algorithm |
The fields of “Papers” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Definition |
|---|---|---|---|
| PaperID | ✓ | The ID defined for each paper | |
| PaperName | Title of paper | ||
| FirstAuthorID | Authors (AuthorID) | The ID of the first author | |
| Year | The year this research has been published | ||
| Journal/ConferenceID | JournalsConferences (Journal/ConferenceID) | The ID of the journal or conference that this research has been published in | |
| Train-Test Separation Method | Which method is used for the train and test separation method | ||
| ShortDescriptionAboutMainMethod | A short description of the main method | ||
| ConclusionsReportedByAuthors | A short description of the conclusion | ||
| NumberOfCitation | Number of citations of this research |
The fields of “Review Articles” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| PaperID | ✓ | The ID defined for each paper | |
| PaperName | Title of paper | ||
| FirstAuthorID | Authors (AuthorID) | The ID of the first author | |
| Year | The year this research has been published | ||
| Journal/ConferenceID | JournalsConferences (Journal/ConferenceID) | The ID of the journal or conference that this research has been published in | |
| InvestigatedResearchFrom | The year the investigation begins | ||
| InvestigatedResearchTo | The year the investigation ends | ||
| Number of investigated papers | Number of papers investigated in each review paper | ||
| Number of citations | Number of citations of each review paper | ||
| NotableConclusion | Notable conclusion of each research |
The fields of “Papers_ Datasets_Diseases_Methods” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| PaperID | ✓ | Papers (PaperID) | The ID defined for each paper |
| DatasetID | ✓ | Datasets (DatasetID) | The ID defined for each dataset |
| DiseaseID | ✓ | Disease (DiseaseID) | The ID defined for each disease |
| MethodID | ✓ | Methods (MethodID) | The ID defined for each method |
| IsMainMethod | If this method is the main method (the method with the highest performance) of the research or not? | ||
| Accuracy% | The reported accuracy | ||
| Sensitivity(Recall)% | The reported sensitivity | ||
| Specificity% | The reported specificity | ||
| F-Measure% | The reported F-Measure | ||
| AUC | The reported AUC | ||
| Precision% | The reported precision |
The fields of “Datasets_Features” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| DatasetID | ✓ | Datasets (DatasetID) | The ID defined for each dataset |
| FeatureID | ✓ | Features (FeatureID) | The ID defined for each feature |
The fields of “Papers_Authors” table, their properties and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| PaperID | ✓ | Papers (PaperID) | The ID defined for each paper |
| AuthorID | ✓ | Authors (AuthorID) | The ID defined for each author |
The fields of “Papers_Authors (review articles)” table, their properties, and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| PaperID | ✓ | The ID defined for each paper | |
| AuthorID | ✓ | The ID defined for each author |
The fields of “Papers_FeatureSelectionAlgorithms” table, their properties, and descriptions.
| Field name | P. K. | F. K from Table (Field) | Description |
|---|---|---|---|
| paperID | ✓ | Papers (PaperID) | The ID defined for each paper |
| FeatureSelectionID | ✓ | FeatureSelectionAlgorithms (FeatureSelectionID) | The ID defined for each feature selection algorithm |
Some of the most important reports and their descriptions.
| Report name | Report description |
|---|---|
| Frequency of winning machine learning methods | This report shows how popular/successful a machine learning technique is. For each machine learning technique, it reports on the total number of papers that have used it in their analysis, and how many times it outperformed the other techniques. |
| Important features reported for a specific disease in a specific country | This table gives a detailed list of what features are collected in each country for each disease. Moreover, one can see the number of papers that have emphasized the importance of each feature, as well as the mean of ranks given to that feature as a proxy of feature importance in that country. |
| Comparison of machine learning methods in each dataset | Usually, each paper reports the results of applying multiple machine learning methods on a dataset. This table allows us to compare how these algorithms perform on a dataset. It reports the paper title, the dataset that it has used, and the difference between the accuracy of two selected methods. |
| Important features in each disease | This report lists the set of features that are reported to be important for each heart disease. First, a disease needs to be selected from the drop-down menu. The resulting table will show the feature name, the number of papers that reported the feature to be important for the selected disease, and the mean of the reported ranks for that feature. The smaller the rank, the more important the feature is. |
| Papers vs. Specific feature category | Feature categories represent the set of features that are obtained from the same resources. For example, ECG category represents the set of features that are obtained from electrocardiography. For each feature category, this table reports the paper titles and the accuracy they have achieved. |
| Number of papers using a specific algorithm by year | For a given machine learning method, this table reports the number of papers using that method per year, the title of publication with the best performance and the highest accuracy achieved. |
Fig. 1Source of our dataset and its distribution worldwide. (a) The number of sources included in the database by year of publication. (b) The datasets’ distribution in different countries. It is clear that most datasets were collected in the USA. Then, India, China, Turkey, and Iran have more datasets.
Fig. 2Structure of the database (The relationships between tables). The key icons in the tables show the primary keys of those tables, and the key icons in the relationships between tables show the source tables of foreign keys in the tables.
Fig. 3The front page of our web application in different modes. (a) The front page of our web application before login. There are four options. The first is a link to the home page. The second shows the list of reports that all users can see. The third is contact information, and finally, the fourth is used to log in/off to the system. (b) The front page after login; two more options appear. The first shows the list of tables, and the second shows the email address of the logged in user. (c) The login page. Currently, only the administrator can login to add, edit and remove data and reports. Other users do not need to login.
Fig. 4List of tables and reports. (a) A screenshot from the list of tables. Guests cannot see the list of tables (b) A screenshot from the list of reports (c) A screenshot from the list of reports for the administrator. Please note there is another option in the list that the administrator can use to manage the reports.
Fig. 5The facility prepared for an administrator to add, edit, and remove the data. The first option is used to add a new record to the table. The second option is used to determine the number of rows shown on a page. The third option can be used to export the table to the CSV file format. The fourth option can be used to edit or delete the specific record, and finally, the fifth option can be used to search the table.
Fig. 6Reports. (a) Shows the output of a report as a table. (b) Shows output as a chart. The first and the second options in this form determine the horizontal and vertical axes, respectively. (c) In some reports, filters must be applied to data. For example, in this report, we need to specify the disease and country in which we are interested to see the most important features reported.
| Measurement(s) | coronary artery disease |
| Technology Type(s) | digital curation |
| Factor Type(s) | year • disease |
| Sample Characteristic - Organism | Homo sapiens |