| Literature DB >> 28393846 |
Muhammad Shaheen1, Muhammad Shahbaz2.
Abstract
The presence of hydrocarbons beneath earth's surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are then correlated with the probability of hydrocarbon's existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28393846 PMCID: PMC5385557 DOI: 10.1038/srep46108
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Microbial Indicators2152526.
| S. No | Name of Indicator | Abbreviation |
|---|---|---|
| 1. | Number of methane oxidizing bacteria | Nmob |
| 2. | Number of ethane oxidizing bacteria | Neob |
| 3. | Number of propane oxidizing bacteria | Npob |
| 4. | Number of butane oxidizing bacteria | Nbob |
| 5. | Number of sulphate reducing bacteria | Nsrb |
| 6. | Number of denitrifying bacteria | Ndb |
| 7. | Number of interisitial | Ni |
| 8. | Number of hydrocarbons dissolved in fluids | Nhdf |
| 9. | Number of occluded hydrocarbons | Noh |
| 10. | Number of adsorbed hydrocarbons | Nah |
| 11. | Rate of carbon dioxide production | Rcdp |
| 12. | Number of identified microscopic objects | Nimb |
| 13. | Amount of hydrogen sulphide | Ahs |
| 14. | Amount of nitrogen | An |
| 15. | Amount of nitrogen oxide | Ano |
| 16. | Amount of paraffin dirt | Apd |
| 17. | Oxidation reduction potential (Eh) | Orp |
| 18. | PH of the system | Phs |
| 19. | PH/Eh ratio | Pheh |
| 20. | Rate of precipitation of silica | Rps |
| 21. | Amount of iron oxides | Aio |
| 22. | Amount of phosphates | Ap |
| 23. | Amount of carbonates | Ac |
| 24. | Quantity of clays | Qc |
| 25. | Precipitation of pyrite | Ppr |
| 26. | Precipitation of greigite | Pg |
| 27. | Precipitation of pyrrhotite | Ppy |
| 28. | Precipitation of maghemite | Pmg |
| 29. | Quantity of magnetic particles | Qmp |
Importance of context variable in association rules; Prec = Precipitation, Range = 100–300, S = support of rules, ms = minimum support.
| Rules | Parameters | Context |
|---|---|---|
| Rule1/T1:is_a (Apd, >=X1)ˆis_a (Pheh, >=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1) => found (MP, true) | S = 8, ms = 3 | Prec = 250 |
| Rule2/T1:is_a (Apd, >=X1)ˆis_a (Pheh, >=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1)ˆis_a (Ni, >=X3) => found (MP, true) | S = 6, ms = 3 | Prec = 250 |
| Rule3/T1:is_a (Apd, >=X1)ˆis_a (Pheh, <=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1) => found (MP, true) | S = 6, ms = 3 | Prec = 450 |
| Rule4/T1:is_a (Apd, >=X1)ˆis_a (Pheh, <=X2)ˆis_a (Aio, >X4)ˆcolor_of (soil, Y1) => found (MP, true) | S = 8, ms = 3 | Prec = 370 |
| Rule1/T2:is_a (Apd, >=X1)ˆis_a (Pheh, >=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1) => found (MP, true) | S = 7, ms = 3 | Prec = 250 |
| Rule2/T2:is_a (Apd, >=X1)ˆis_a (Pheh, >=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1)ˆis_a (Ni, >=X3) => found (MP, true) | S = 4, ms = 3 | Prec = 250 |
| Rule3/T2:is_a (Apd, >=X1)ˆis_a (Pheh, >=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1) => found (MP, true) | S = 4, ms = 3 | Prec = 250 |
| Rule1/T3:is_a (Apd, >=X1)ˆis_a (An, >=X5)ˆclose_to (Prospect, rock) => found (MP, true) | S = 6, ms = 3 | Prec = 250 |
| Rule2/T3:is_a (Apd, >=X1)ˆis_a (Pheh, <=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1)ˆis_a (Ni, >=X3) => found (MP, true) | S = 5, ms = 3 | Prec = 450 |
| Rule1/T4: is_a (Apd, >=X1)ˆis_a (Pheh, >=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1) => found (MP, true) | S = 5, ms = 3 | Prec = 250 |
| Final Rule Set Rule1:is_a (Apd, >=X1)ˆis_a (Pheh, >=X2)ˆdiff_of (Aio.Rps, >X3)ˆcolor_of (soil, Y1) => found (MP, true) |
Four Cases of Context Variable14.
| Cases (Context) | Proposed Change in Support | |
|---|---|---|
| For Positive Rules | Context < CIV | Difference = (CIV − context) * 100/context; New_support_value = actual_support + (actual_support * Difference)/100 |
| For Positive Rules | Context > CFV | Difference = (context − CFV) * 100/context; New_support_value = actual_support − (actual support * Difference)/100 |
| For Negative Rules | Context < CIV | Difference = (CIV − context) * 100/context; New_support_value = actual_support − (actual support * Difference)/100 |
| For Negative Rules | Context > CFV | Difference = (context − CFV) * 100/context; New_support_value = actual_support + (actual support * Difference)/100 |
Figure 1Process of context based association rule mining for microbial energy prospection.
Figure 2Data file for Microbial Prospection of Energy (before processing).
Microbial Prospection Database Structure.
| No | Name of attribute | Data type | Length |
|---|---|---|---|
| 1. | ProspectID | Numeric | 10 |
| 2. | IndicatorID | Numeric | 20 |
| 3. | Indicator Range | Numeric | 10 |
| 4. | Prospect coordinates | DMS | 8 |
| 5. | Date of evaluation | Date | 8 |
| 6. | Context temp | Boolean | 2 |
| 7. | Context salinity | Boolean | 2 |
| 8. | Context humidity | Boolean | 2 |
| 9. | Context rainfall | Boolean | 2 |
| 10. | Context fossil | Boolean | 2 |
Microbial Indicator Table Structure.
| S. No | Name of attribute | Data type | Length | Constraint |
|---|---|---|---|---|
| 1. | IndicatorID | Numeric | 20 | Unique |
| 2. | Indicator Name | String | 30 | NIL |
Summary of Benchmark.
| 1. | Total number of attributes in database | 29 |
| 2. | Total number of records | 4984 |
| 3. | Total number of sites | 28 |
| 4. | Sites with positive prospection results | 20 |
| 5. | Sites with negative prospection results | 8 |
| 6. | Available spatial records (converted to non spatial) | 5,81,504 |
| 7. | Available non spatial records | 28 |
Figure 3Data view of microbial indicators for prospect/non-prospect sites showing replacement of positive values with “1”.
Figure 4Positive and Negative Association Rule Mining in Microbial Datasets.
Figure 5Plot; Number of rules with minimum support of Apriori, PNARM and CBPNARM.
Figure 6Min Support and avg confidence of Apriori, PNARM and CBPNARM.
Figure 7Plot; Min Support and Execution Time of Apriori, PNARM and CBPNARM.