| Literature DB >> 35715446 |
Edward J Beard1,2, Jacqueline M Cole3,4,5,6.
Abstract
The number of scientific publications reporting cutting-edge third-generation photovoltaic devices is increasing rapidly, owing to the pressing need to develop renewable-energy technologies that address the climate-change crisis. Consequently, the field could benefit from a central repository where photovoltaic-performance metrics, such as the power-conversion efficiency (η) are recorded. We present two automatically generated databases that contain photovoltaic properties and device material data for dye-sensitized solar cells (DSCs) and perovskite solar cells (PSCs), totalling 660,881 data entries representing 57,678 photovoltaic devices. The databases were generated by applying the text-mining toolkit ChemDataExtractor on a corpus of 25,720 articles. A multi-faceted evaluation, incorporating manual and automatic methods, was applied to ensure that the data contained therein were of the highest quality, with precision metrics ranging from 73.1% to 95.8%. The DSC database contains 475,045 entries representing 41,680 devices, and the PSC database contains 185,836 entries representing 15,818 devices. The databases are available in MongoDB and JSON formats, which can be queried in Python, R, Java and MATLAB for data-driven photovoltaic materials discovery.Entities:
Year: 2022 PMID: 35715446 PMCID: PMC9205998 DOI: 10.1038/s41597-022-01355-w
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Flow diagram detailing the classification algorithm for filtering the corpus. The various output options are outlined with a bold border, and are coloured in red, blue and indigo according to the classes of dye-sensitized, perovskite and quantum dot solar cells, respectively.
Fig. 2Extraction pipeline used to create database records from a research article.
List of supported properties for each database.
| Dye Sensitized Solar Cell Database | Perovskite Solar Cell Database |
|---|---|
| Open-circuit voltage ( | Open-circuit voltage ( |
| Short-circuit current density ( | Short-circuit current density ( |
| Fill factor (FF) | Fill factor (FF) |
| Power-conversion efficiency (PCE, | Power-conversion efficiency (PCE, |
| Short-circuit current ( | Short-circuit current ( |
| Power in ( | Power in ( |
| Maximum power ( | Maximum power ( |
| Active area | Active area |
| Solar Simulator and Irradiance | Solar Simulator and Irradiance |
| Series resistance ( | Series resistance ( |
| Specific series resistance ( | Specific series resistance ( |
| Charge-transfer resistance ( | Charge-transfer resistance ( |
| Specific charge-transfer resistance ( | Specific charge-transfer resistance ( |
| Reference | Reference |
| Substrate | Substrate |
| Counter electrode | Counter electrode |
| Dye | Perovskite |
| Semiconductor | Electron-transfer material (ETM) |
| Redox couple | Hole-transport material (HTM) |
| Electrolyte | |
| Dye loading | |
| Semiconductor thickness | |
| Exposure-time thickness |
Properties applicable for contextual merging from the document.
| Dye Sensitized Solar Cell Database | Perovskite Solar Cell Database |
|---|---|
| Solar simulator (irradiance) | Solar simulator (irradiance) |
| Substrate | Substrate |
| Active area | Active area |
| Semiconductor | Counter electrode |
| Semiconductor thickness | |
| Dye loading | |
| Redox couple |
Description of data records.
| Key | Description | Data Type |
|---|---|---|
| Device Characteristics | General properties of the device structure, with no dependence on physical geometry. | Dict |
| DSC/PSC Material Components | Material components of the solar cell. | Dict |
| Device Metrology | Numerical data with a dependence on macroscopic attributes of the solar cell. | Dict |
| DSC/PSC Material Metrology | Numerical data with a dependence on microscopic attributes of the solar cell. | Dict |
| Table Data | Contextual information about the table from which the record was extracted. | Dict |
| Device Reference | Citation data extracted from within the table. | Dict |
| Article Info | The metadata extracted by CDE-PV. | Dict |
Description of properties that can present in quantitative data sub-records.
| Key | Description | Data Type |
|---|---|---|
| specifier | Extracted text used to identify the property. | String |
| raw_value | Extracted text containing the value information from this property. | String |
| raw_units | Extracted text indicating the units describing this property. | String |
| value | Numerical values of the property. | List[Float] |
| std_value | Numerical values of the property converted to the standard unit. | List[Float] |
| units | Unit data reported using CDE-PVs unit formatting. | String |
| std_units | Standardized unit data reported using CDE-PVs unit formatting. | String |
| error | Numerical extracted error of the property. | Float |
| std_error | Numerical extracted error of the property converted to the standard unit. | Float |
| derived_value | Numerical values derived using other extracted properties. | List[Float] |
| derived_units | Unit data reported using CDE-PVs unit formatting, calculated from other extracted properties. | String |
| derived_error | Numerical estimation of error derived from other extracted properties. | Float |
| normalized | Dictionary of normalized PCE data with respect to a reference component (only found in some ‘pce’ sub-records). | Dict |
Precision and recall metrics of the sub-records of each sample set.
| Description | TP | FP | FN | Precision (%) | Recall (%) | |
|---|---|---|---|---|---|---|
| DSC sample set | 1,518 | 67 | 75 | 95.8 | 95.3 | 95.5 |
| PSC sample set | 1,891 | 110 | 68 | 94.5 | 96.5 | 95.5 |
Precision and recall metrics of complete records for DSC database sample set.
| Description | TP | FP | Precision (%) |
|---|---|---|---|
| Entire PV record | 141 | 52 | 73.1 |
| Correct dye | 162 | 31 | 83.9 |
| Correct dye, | 162 | 31 | 83.9 |
Precision and recall metrics of complete records for PSC database sample set.
| Description | TP | FP | Precision (%) |
|---|---|---|---|
| Entire PV record | 179 | 62 | 74.3 |
| Correct perovskite | 207 | 34 | 85.9 |
| Correct perovskite, | 202 | 39 | 83.8 |
DSC database - automated comparison between derived and extracted values of solar irradiance.
| Description | Correct | Incorrect | % Correct |
|---|---|---|---|
| Within ± 50 Am−2 | 14,329 | 3,209 | 81.7 |
| Within derived error | 14,498 | 3,040 | 77.5 |
| Within derived error or ± 50 Am−2 | 14,541 | 2,997 | 82.9 |
PSC database - automated comparison between derived and extracted values of solar irradiance.
| Description | Correct | Incorrect | % Correct |
|---|---|---|---|
| Within ± 50 Am−2 | 4,870 | 690 | 87.6 |
| Within derived error | 4,595 | 965 | 82.6 |
| Within derived error or ± 50 Am−2 | 4,887 | 673 | 88.8 |
Fig. 3Histograms showing the , , FF and PCE distribution for the DSC and PSC databases. First column: DSC database histograms. Second column: PSC database histograms. Third column: Violin plots comparing the distributions for DSC (blue) and PSC (orange).
Fig. 4Histograms showing the , , FF and PCE distributions for the lower ranges of values in the DSC database. This includes two histograms for and PCE with differing resolution.
Fig. 5Histograms showing the , , FF and PCE distribution for the DSC and PSC databases, after enhancement using data from the first bin where appropriate. First column: DSC database histograms. Second column: PSC database histograms. Third column: Violin plots that compare the distributions for DSC (blue) and PSC (orange).
Fig. 6Bar charts for the most common materials in the DSC and PSC databases. Top left: the most common light-absorbing dye compounds. Top right: the most common light-absorbing perovskite materials. Bottom left: The most common hole-transporting materials in the PSC database. Bottom right: The most common electron-transporting materials in the PSC database. The compact form of TiO2 mentioned in the bottom right histogram describes a dense, uniform arrangement of TiO2 that is typically used to create a thin film in a perovskite solar cell.
Fig. 7Graphs comparing the new automatically created DSC database (left) to the manually created database ‘DSSCDB’[22] (right) for all instances of the most common dye, N719. First row: Histograms that show the PCE of all records of the dye N719. Second row: 2-D histograms of the PCE and active area for all records of the dye N719. Third row: 2-D histograms of the PCE and semiconductor thickness for all devices that contain a TiO2 layer and the dye N719. The three plots on the right-hand side of this figure are reproduced from Fig. 6 of the paper by Venkatraman et al.[22], with permission under the terms of the Creative Commons Attribution 4.0 International Licence (http://creativecommons.org/licenses/by/4.0/).
| Measurement(s) | photovoltaic device parameters |
| Technology Type(s) | natural language processing |