| Literature DB >> 35322032 |
Abstract
Advances in data science and digitalization are transforming the world, and the pharmaceutical industry is no exception. Multiple sensor-equipped manufacturing processes and laboratory analysis are the main sources of primary data, which have been utilized for the presented dataset of 1005 actual production batches of selected medicine. This dataset includes incoming raw material quality results, compression process time series and final product quality results for the selected product. The data is highly valuable for it provides an insight into every 10 seconds of the process trajectory for 1005 actual production batches along with product quality collected over several years. It therefore offers an opportunity to develop advanced analysis models and procedures which would lead to the omission of current conventional and time consuming laboratory testing. Benefits for both the industry and patient are obvious: reducing product lead times and costs of manufacture.Entities:
Year: 2022 PMID: 35322032 PMCID: PMC8943063 DOI: 10.1038/s41597-022-01203-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Overview of data sources for a pharmaceutical product.
| Excipients’ characteristics |
| Active pharmaceutical ingredient characteristics |
| Process time series |
| Intermediate product characteristics |
| Laboratory analysis |
| Product quality |
Sources of data in the process and analysis flow explained.
| Active pharmaceutical ingredient, excipients | |
| 1.1. Quality analysis | Water and impurities content, pH, particle size and density |
| Process time series derived from tablet press sensors | |
| 2.1. Intermediate product control | Hardness, thickness, diameter, weight |
| Film-coated tablets | |
| 3.1. Quality analysis | Drug release, impurities content, related substances content |
Overview of laboratory data.
| Number of rows | Number of parameters | Number of categorical parameters | Number of numeric parameters | Number of independent variables | Number of dependent variables |
|---|---|---|---|---|---|
Detailed description of Laboratory.csv file content.
| List of parameters in laboratory data file | Unit of Measure | Short description | Source of data |
|---|---|---|---|
| batch | N/A | Index column, identifies every product batch number. | |
| code | N/A | Groups batches into so-called product sub-families defined by product code. | |
| strength | mg/unit | Strength of the product (i.e., mg of Active Pharmaceutical Ingredient (API) per tablet). | |
| size | tablets | Target number of tablets produced per batch. | |
| start | N/A | Starting time of production in date-time format. | |
| api_code, api_batch | N/A | Active pharmaceutical ingredient (API) material code and batch number. | |
| smcc_batch, lactose_batch, starch_batch | N/A | Silicified microcrystalline cellulose (SMCC), lactose and starch batch numbers. | |
| api_water | % | Content of water in API measured with loss on drying method. | |
| api_total_impurities, api_l_impurity | % | API total impurities and L impurity content, measured with High performance liquid chromatography (HPLC) method. | |
| api_content | % | Active ingredient content in raw material (excluding impurities, water, etc.) in %. | |
| api_ps01, api_ps05, api_ps09 | µm | Particle diameter in microns at 10% cumulative volume (ps01), 50% (ps05), and 90% (ps09). | |
| lactose_water | % | Lactose water content measured with loss on drying method in %. | |
| lactose_sieve0045, lactose_sieve015, lactose_sieve025 | % | Lactose particle size; % of weighted residual on one of the three sieves: 0.045 mm, 0.15 mm, 0.25 mm. | |
| smcc_water | % | Silicified microcrystalline cellulose water content in %. | |
| smcc_td, smcc_bd | g/ml | Silicified microcrystalline cellulose tap (td) and bulk density (bd). | |
| smcc_ps01, smcc_ps05, smcc_ps09 | µm | Particle diameter in microns at 10% cumulative volume (ps01), 50%/ps05), and 90% (ps09). | |
| starch_ph | N/A | Starch pH value. | |
| starch_water | % | Starch water content measured with loss on drying method. | |
| tbl_min_thickness, tbl_max_thickness | mm | Tablet core min and max thickness measured during compression in millimeters. | |
| fct_min_thickness, fct_max_thickness | mm | Film coated tablets min and max thickness measured after coating in millimeters. | |
| tbl_min_weight, tbl_max_weight | mg | Tablet core weight minimum and maximum measured during compression. | |
| tbl_rsd_weight | % | Tablet core weight relative standard deviation (RSD) measured during compression. | |
| fct_rsd_weight | % | Film-coated tablet weight RSD measured after coating process. | |
| tbl_min_hardness, tbl_max_hardness, tbl_av_hardness | N | Tablet core hardness min, max and average measured during compression; in Newtons. | |
| fct_min_hardness, fct_max_hardness, fct_av_hardness | N | Film-coated tablets hardness min, max and average measured after coating; in Newtons. | |
| tbl_tensile, fct_tensile | N/A | Normalized hardness parameter calculated for tablet core and FCT: tensile strength. | |
| tbl_yield, batch_yield | % | Yield based on target quantity for compression process (tbl) and whole batch (batch) expressed in %. | |
| dissolution_av | % | Drug release from final tablet in defined time: average (calculated) % of API released in 30 minutes. | |
| dissolution_min | % | Drug release from final tablet in defined time: minimum % of API released in 30 minutes. | |
| residual_solvent | % | Residual solvent content in final product measured with gas chromatography (GC) method. | |
| impurities_total | % | Total impurities content in final product measured with HPLC method. | |
| impurity_o | % | Content of impurity O in final product measured with HPLC method. | |
| impurity_l | % | Content of impurity L in final product measured with HPLC method. |
Detailed description of process time series.
| Parameters in time series files | Unit of measure | Short description |
|---|---|---|
| timestamp | N/A | Index column; identifier of every 10 s entry. |
| campaign | N/A | Campaign number groups several batches (e.g., 5–15) into one manufacturing cycle; the batches belonging to the same campaign were manufactured one after the other. |
| batch | N/A | Batch number identifies the batch of the final product. |
| code | N/A | Product code number defines the product sub-family to which the batch belongs. Every time series dataset file has the same product code and contains all batches within the same product code. |
| tbl_speed | tablets/hour | Tablet press speed: it indicates when the process is running and when it has stopped, if there were many changes to this parameter or many stoppages, the material handling is challenging, which may indicate suboptimal product quality. |
| fom | rpm | Filling device speed in rotations per minute: similar to tablet press speed. If the process is running, so is the filling device. This parameter generally does not change and is only set during the start-up. If many changes (during the start-up) are observed, this again indicates potential difficulties with incoming material handling. |
| main_comp | kN | Main compression force – mean value: the more constant this parameter is, the more homogeneous is the incoming material blend in terms of physical properties. |
| tbl_fill | mm | Tablet fill depth: defines the volume of filled blended material to be compressed. If flow properties of material are poor, this parameter will vary throughout the batch and will consequently impact tablet hardness and weight. |
| SREL | % | Main compression force – standard relative deviation: this parameter is calculated by the tablet press itself by using main compression force mean values. It gives an indication of how uniform the tablets compacted are. |
| pre_comp | kN | Pre-compression force – mean value: if pre-compression force is used for tablet compaction, this parameter will be greater than 1 and will give a similar indication as main compression force. It is not readily used for the product in the scope. |
| produced | tablets | Good production: all acceptable tablets that have been produced at that particular timestamp. |
| waste | tablets | Bad production: tablets that do not pass the set tablet press parameters (i.e., max % deviation from the set main compression force – mean value). This is also a cumulative parameter and gives information about all rejected tablets at that particular time. |
| cyl_main | mm | Cylindrical height – main compression: cylindrical height of the tablet (main compression station) in mm. The height and hardness of the tablet are changed by changing the cylindrical height. |
| cyl_pre | mm | Cylindrical height – pre-compression: cylindrical height of the tablet (pre-compression station) in mm. |
| stiffness | N | Bottom punch stiffness in Newton: when the limit is reached, the press is stopped with suitable diagnosis. An equipment parameter. |
| ejection | N | Maximum tablet ejection force: if this parameter rises, the tablet ejection friction is higher, which could mean that some minor sticking of the tablet has occurred on the tablet tooling. |
Description of attributes derived from time series per batch.
| New time series-derived attributes | Unit of Measure | Short description |
|---|---|---|
| tbl_speed_mean | tablets/hour | Mean tablet speed excluding tablet speed values of 0. |
| tbl_speed_change | N/A | Number of changes of tablet speed, normalized with batch size. |
| tbl_speed_0_duration | N/A | Duration of tablet speed at 0, normalized with batch size; weekends excluded. |
| total_waste | tablets | Total number of rejected tablets per batch, normalized with batch size. |
| startup_waste | tablets | Total number of rejected tablets during the start-up of the tablet press. |
| weekend | N/A | Weekend batch run: categorical variable (yes/no). |
| fom_mean | rpm | Mean value of filling device speed, excluding time when tablet press speed was 0. |
| fom_change | N/A | Number of filling device speed changes (during the start-up). |
| SREL_startup_mean | % | Mean standard relative deviation of main compression force (SREL) value during the start-up phase of the compression process. |
| SREL_production_mean | % | Mean SREL value during the production phase of the compression process. |
| SREL_production_max | % | Max SREL value during the production phase of the compression process. |
| main_CompForce mean | kN | Main compression force mean value during the production phase of the process. |
| main_CompForce_sd | kN | Main compression force standard deviation during the production phase of the process. |
| main_CompForce_median | kN | Main compression force median during the production phase of the process. |
| pre_CompForce_mean | kN | Pre-compression force mean value during the production phase of the process. |
| tbl_fill_mean | mm | Tablet fill depth volume mean value during the production phase of the process. |
| tbl_fill_sd | mm | Tablet fill depth volume standard deviation during the production phase of the process. |
| cyl_height_mean | mm | Cylindrical height mean value during the production phase of the process. |
| stiffness_mean | N | Mean bottom punch stiffness during the production phase of the process. |
| stiffness_max | N | Max bottom punch stiffness during the production phase of the process. |
| stiffness_min | N | Min bottom punch stiffness during the production phase of the process. |
| ejection_mean | N | Ejection force mean value (production phase of the process). |
| ejection_max | N | Ejection force max value (production phase of the process). |
| ejection_min | N | Ejection force min value (production phase of the process). |
| Startup_tbl_fill_maxDifference | mm | Maximum difference between min and max tablet fill depth value (during the start-up phase of the process). |
| Startup_main_CompForce_mean | kN | Main compression force mean value during the start-up phase. |
| Startup_tbl_fill_mean | mm | Tablet fill depth mean value during the start-up phase. |
Fig. 1Process capability graph and Ppk calculation for drug release average, where the X-axis shows the % of drug released and the Y-axis the % of all results.
Fig. 2Process capability graph and Ppk calculation for drug release minimum, where the X-axis shows the % of drug released and the Y-axis the % of all results.
Statistical analysis and performance qualification.
| List of parameters in laboratory data file (units of measure) | Average | Standard deviation | Relative standard deviation (%) | Min | Max |
|---|---|---|---|---|---|
| API (Active pharmaceutical ingredient) water (%) | 1.5 | 0.4 | 29.7 | 0.0 | 2.7 |
| API total impurities (%) | 0.2 | 0.1 | 50.1 | 0.1 | 0.5 |
| API L impurity (%) | 0.1 | 0.0 | 35.3 | 0.0 | 0.1 |
| API content (%) | 94.4 | 0.4 | 0.4 | 93.3 | 95.6 |
| API particle size 0.1 (µm) | 2.7 | 1.2 | 45.6 | 0.0 | 6.0 |
| API particle size 0.5 (µm) | 39.7 | 11.7 | 29.6 | 8.3 | 67.0 |
| API particle size 0.9 (µm) | 159.7 | 25.1 | 15.7 | 79.1 | 232.0 |
| Lactose water (%) | 0.1 | 0.0 | 14.6 | 0.0 | 0.1 |
| Lactose sieve 0.045 mm (%) | 17.5 | 1.1 | 6.2 | 15.0 | 19.0 |
| Lactose sieve 0.15 mm (%) | 50.4 | 1.6 | 3.1 | 44.0 | 53.0 |
| Lactose sieve 0.25 mm (%) | 82.3 | 1.2 | 1.5 | 80.0 | 86.0 |
| SMCC (silicified microcrystalline cellulose) water (%) | 4.5 | 0.1 | 3.1 | 4.3 | 4.7 |
| SMCC tap density (g/ml) | 0.4 | 0.0 | 2.1 | 0.4 | 0.5 |
| SMCC bulk density (g/ml) | 0.3 | 0.0 | 3.3 | 0.3 | 0.4 |
| SMCC particle size 0.1 (µm) | 32.5 | 1.8 | 5.5 | 30.4 | 37.6 |
| SMCC particle size 0.5 (µm) | 120.1 | 4.5 | 3.7 | 111.4 | 126.8 |
| SMCC particle size 0.9 (µm) | 257.8 | 7.8 | 3.0 | 236.8 | 270.2 |
| Starch pH | 4.5 | 0.1 | 3.0 | 4.3 | 4.8 |
| Starch water (%) | 2.6 | 0.6 | 24.3 | 1.8 | 3.9 |
| Tablet core min and max thickness (mm) | A statistical analysis of these parameters is not applicable due to different target values for diameter, thickness, hardness and weight across four product sub-families. These data are included in a normalized parameter: tensile strength as explained before. The data are nonetheless provided in the dataset in case other researchers attempt to use them differently. | ||||
| FCT (film coated tablet) min and max thickness (mm) | |||||
| Tablet core weight min, max (mg) | |||||
| Tablet core RSD (%) | |||||
| FCT weight RSD (%) | |||||
| Tablet core hardness min, max, average (N) | |||||
| FCT hardness min, max, average (N) | |||||
| Tablet core tensile strength | 1.3 | 0.3 | 24.1 | 0.8 | 2.4 |
| FCT tensile strength | 1.7 | 0.4 | 22.3 | 1.0 | 3.0 |
| Tablet press yield (%) | 98.3 | 1.1 | 1.1 | 88.0 | 100.8 |
| Batch yield (%) | 98.3 | 1.1 | 1.1 | 88.0 | 100.9 |
| Drug release average (%) | 90.6 | 3.4 | 3.7 | 82.5 | 102.7 |
| Drug release min (%) | 85.6 | 4.2 | 4.9 | 74.0 | 100.0 |
| Residual solvent (%) | 0.0 | 0.0 | 91.1 | 0.0 | 0.2 |
| Total impurities (%) | 0.1 | 0.1 | 71.3 | 0.1 | 0.6 |
| Impurity O (%) | 0.1 | 0.0 | 17.7 | 0.0 | 0.2 |
| Impurity L (%) | 0.1 | 0.0 | 40.9 | 0.1 | 0.2 |
Fig. 3Visualization example of process time series before preprocessing.
Fig. 4Visualization example after preprocessing (time format correction).
| Measurement(s) | Incoming raw material quality (particle size distribution, water content, impurities level, residual solvents, pH) • In process control measurements of tablet core and film coated tablets (weight, thickness, diameter, hardness, yield of a process) • Final medicine quality on a representative sample of film coated tablets (drug release in defined time, active ingredient content, impurities level, residual solven content) • Process time series of selected tablet compression parameters (every 10 s of the compression process) |
| Technology Type(s) | Laboratory based analysis (particle sizer using laser diffraction, loss on drying method, HPLC method, GC method) • Automatic IPC check machine (combining balance, hardness, thickness and diameter measurements) • HPLC (High performance liquid chromatography), GC (gas chromatography) • Tablet compression machine calibrated sensors for the following main parameters: main and pre-compression force, fill depth, cylindrycal height, ejection force, number of wasted tablets. |
| Factor Type(s) | batch • code • strength • size • start • api_code • api_batch • smcc_batch • lactose_batch • starch_batch • api_water • api_total_impurities • api_l_impurity • api_content • api_ps01 • api_ps05 • api_ps09 • lactose_water • lactose_sieve0045 • lactose_sieve015 • lactose_sieve025 • smcc_water • smcc_td • smcc_bd • smcc_ps01 • smcc_ps05 • smcc_ps09 • starch_ph • starch_water • tbl_min_hardness • tbl_max_hardness • tbl_av_hardness • tbl_min_thickness • tbl_max_thickness • fct_min_thickness • fct_max_thickness • tbl_min_weight • tbl_max_weight • tbl_rsd_weight • fct_rsd_weight • fct_min_hardness • fct_max_hardness • fct_av_hardness • tbl_tensile • fct_tensile • tbl_yield • batch_yield • time series: tbl_speed • time series: fom • time series: main_comp • time series: tbl_fill • time series: SREL • time series: pre_comp • time series: produced • time series: waste • time series: cyl_main • time series: cyl_pre • time series: stiffness • time series: ejection |
| Sample Characteristic - Organism | Selected medicine |
| Sample Characteristic - Environment | manufacturing process |
| Sample Characteristic - Location | Pharmaceutical industry |