| Literature DB >> 34322474 |
Yin Jin1,2, Wang Junren2,3, Jiang Jingwen2,3, Sun Yajing2,3, Chen Xi4, Qin Ke1.
Abstract
Relying on the Biomedical Big Data Center of West China Hospital, this paper makes an in-depth research on the construction method and application of breast cancer-specific database system based on full data lifecycle, including the establishment of data standards, data fusion and governance, multi-modal knowledge graph, data security sharing and value application of breast cancer-specific database. The research was developed by establishing the breast cancer master data and metadata standards, then collecting, mapping and governing the structured and unstructured clinical data, and parsing and processing the electronic medical records with NLP natural language processing method or other applicable methods, as well as constructing the breast cancer-specific database system to support the application of data in clinical practices, scientific research, and teaching in hospitals, giving full play to the value of medical big data of the Biomedical Big Data Center of West China Hospital.Entities:
Keywords: breast cancer; data governance; data security sharing; disease-specific database; knowledge graph; metadata
Year: 2021 PMID: 34322474 PMCID: PMC8311352 DOI: 10.3389/fpubh.2021.712827
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Characteristics description among breast cancer patients at recruitment.
| Age, median (IQR) | 41.0 (47.0, 55.0) | |
| Sex | Female | 7697 (99.6%) |
| BMI, median (IQR) | 20.83 (22.86, 24.97) | |
| Menopause status, No (%) | Yes | 2999 (38.8%) |
| No | 4693 (60.7%) | |
| Unknown | 38 (0.5%) | |
| Stage, No (%) | 0 | 0 |
| I | 1619 (25%) | |
| II | 3427 (33.5%) | |
| III | 1927 (29.8%) | |
| Unknown | 757 (11.8%) | |
| pT status, No (%) | 0 | 308 (4.0%) |
| 1 | 2532 (32.8%) | |
| 2 | 3633 (47.0%) | |
| 3 | 336 (4.3%) | |
| 4 | 440 (5.7%) | |
| Unknown | 481 (6.2%) | |
| pN status, No (%) | 0 | 3748 (48.5%) |
| 1 | 2227 (28.9%) | |
| 2 | 856 (11.1%) | |
| 3 | 783 (10.1%) | |
| Unknown | 116 (1.4%) | |
| ER | Negative– | 2323 (30.1%) |
| Positive+ | 5094 (65.9%) | |
| Unknown | 313 (4.0%) | |
| PR | Negative– | 2680 (34.7%) |
| Positive+ | 4737 (61.3%) | |
| Unknown | 313 (4.0%) | |
| HER2 | Negative– | 4555 (58.9%) |
| Positive+ | 1856 (24.0%) | |
| Unknown | 1319 (17.1%) | |
| Ki67 | <14% | 1370 (17.8%) |
| ≥14% | 5752 (74.4%) | |
| Unknown | 608 (7.8%) |
ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor-2; Ki-67, marker of proliferation Ki-67.
Figure 1Overall design of breast cancer-specific database system. HIS, Hospital Information System; EMPI, Enterprise Master Patient Index.
Some referenced dataset standards.
| National health industry standards | Basic Dataset of Basic Information–Personal Information (WS 371-2012) |
| International oncology specialized standards | SEER Program Coding and Staging Manual of National Cancer Institute (NCI) |
| Other standards | Supplement to current hospital datasets |
WS and WS/T refers to standards issued by Chinese Professional Committee on Health Information Standards; T/CHIA refers to standards issued by Chinese Health Information Association.
Some collected fields of the breast cancer-specific dataset.
| Basic information | Patient ID, patient name, home address, contact number, first contact name, relationship with patient, ID number, gender, date of birth, height, weight, gender, age, marital status, occupation, ethnicity, ancestral home, nationality, etc. | Patient standard database |
| Inpatient information | Patient ID, time of medical records, reliability, medical unit admitted, nursing unit admitted, time of admission by the current medical unit, current medical unit, current nursing unit, date of diagnosis, time of admission, length of stay, time of discharge, transfer of departments, date of transfer from current medical unit, medical unit transferred, nursing unit transferred, attending physician, way of discharge, number of operations, etc. | Patient standard database |
| Progress note | Patient ID, chief complaint, clinical pathway ID, medical advice, observations, results of ward round, dosing regimen, summary opinions, etc. | Disease-specific database |
| Nursing assessment | Details of occupational exposure, smoking status, duration of smoking, average number of cigarettes, smoking cessation, duration of smoking cessation, drinking, duration of drinking, average number of drinks, allergy history, details of allergy history, diet, general health condition, vaccination history, past history of serious illness, details of serious illness, history of blood transfusion, trauma history, history of infectious diseases, details of infectious diseases, history of surgery, details of surgery, etc. | Patient standard database |
| Diagnostic information | Diagnosis category, diagnosis code, diagnosis name, pre- and post-operation diagnostic accordance, outpatient diagnostic accordance, clinical case diagnostic accordance, radiopathological diagnostic accordance, discrepancy between admission diagnosis and primary discharge diagnosis, cataloged diagnosis name splicing, cataloged diagnosis code splicing, first page diagnosis name splicing, tumor morphological code name, tumor morphological code, etc. | Disease-specific database |
| Physical examination | Body temperature, pulse rate, respiratory rate, blood pressure, general condition, skin mucosa, lymph nodes, head, hair distribution, eyes, ears, nose, mouth, face, neck, chest, lungs, heart, blood vessels, abdomen, genitalia, anorectum, spine and extremities, nervous system, routine examinations, specialist examinations, etc. | Patient standard database |
| Testing | LIS reported DR, test time, item number in test results, item name in test results, sample code, sample name, reference value range, quantitative result, item unit, label, result, etc. | Patient standard database |
| Examination | Mass size, distribution of lesions (single or multicenter), tumor location, presence of distant metastasis, etc. | Image database |
| Surgical anesthesia | Date, operation level, anesthesia level, incision type, anesthesiologist, operation code & name, operation time, surgeon, preoperative and postoperative diagnosis, preoperative chemotherapy, radiotherapy, anesthesia method, intraoperative bleeding, blood transfusion, etc. | Disease-specific database |
| Treatment information | Inpatient diagnosis and treatment plan, type of medical advice, item name of medical advice, frequency, usage, implementation date of medical advice, invalidation date of medical advice (for long-term treatment), source of medical advice, treatment means, etc. | Disease-specific database |
| Postoperative radiotherapy for tumor | Measurement of radiotherapy (single and cumulative), start and end time, adverse reactions, etc. | Disease-specific database |
| Postoperative chemotherapy for tumor | Chemotherapy regimen (i.e., drug type and dosage, route of administration), cycle, start and end time, adverse reactions, etc. | Disease-specific database |
| Disease progression and outcome | Conditions at admission, chief complaint, summary of medical record, course of disease (not available), discharge summary, main discharge diagnosis and treatment, etc. | Disease-specific database |
| Follow-up visit | Time, survival status, recurrence, metastasis, adverse reactions, etc. | Follow-up visit database |
| Charges | Charges for outpatient service, hospitalization, operation, examination, testing, drugs, etc. | Disease-specific database |
LIS, Laboratory Information Management System; DR, Digital radiography.
Partial attributes of some standard data elements.
| Gender code | varchar | 10 | Y | GB/T 2261.1-2003 |
| Marital status code | varchar | 10 | Y | GB/T 2261.2-2003 |
| Health insurance category code | varchar | 10 | N | CVO2.01.204 Table for Health Insurance Category Code |
| Medical history | varchar | 200 | N | 0-No, 1-Yes |
| Registration category | varchar | 20 | Y | 01-General clinic, 02-Emergency, 03-Specialty clinic, 04-Specialist clinic, 05-VIP clinic, 06-Disease-specific clinic, 09-Others |
| Diagnosis basis | varchar | 20 | Y | CT05.01.001 |
| Prescription type/name | varchar | 20 | Y | 01-General prescription, 02-Pediatric prescription, 03-Emergency prescription, 04-Narcotic drug prescription (Class I psychotropic drug prescription), 05-Narcotic drug prescription (Class II psychotropic drug prescription), 99-Others |
| Dosing frequency code | varchar | 20 | N | CV06.00.228 |
| Examination site code | varchar | 60 | N | CV06.00.227 |
| Surgical procedure code | varchar | 20 | Y | ICD-9-CM-3 |
| Anesthesia mode code | varchar | 20 | Y | CV06.00.103 |
| Surgical position code | varchar | 10 | Y | CV06.00.223 |
| ASA physical status classification code | varchar | 10 | Y | CV05.10.021 |
GB/T refers to standard issued by China National Institute of Standardization; CV and CT refer to classification code table in standard issued by China National Institute of Standardization.
Figure 2Data governance framework. ETL, Extract-Transform-Load.
Figure 3Database cleaning results.
Figure 4Parsing results of some electronic medical records. ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor-2; Ki-67, marker of proliferation Ki-67.
Figure 5Image data governance process. PACS, Picture Archiving and Communication Systems.
Figure 6Partial breast cancer knowledge graph.