| Literature DB >> 26101543 |
Lorentz Jäntschi1, Lavinia L Pruteanu2, Alina C Cozma3, Sorana D Bolboacă4.
Abstract
Simple and multiple linear regression analyses are statistical methods used to investigate the link between activity/property of active compounds and the structural chemical features. One assumption of the linear regression is that the errors follow a normal distribution. This paper introduced a new approach to solving the simple linear regression in which no assumptions about the distribution of the errors are made. The proposed approach maximizes the probability of observing the event according to the random error. The use of the proposed approach is illustrated in ten classes of compounds with different activities or properties. The proposed method proved reliable and was showed to fit properly the observed data compared to the convenient approach of normal distribution of the errors.Entities:
Mesh:
Year: 2015 PMID: 26101543 PMCID: PMC4458545 DOI: 10.1155/2015/360752
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Flowchart of the implemented method. The starting values of the “a” (coefficient of the independent variable), “μ” (population mean), and “σ” (population standard deviation) coefficients are those obtained by least squares estimation method while the imposed value of power of the errors is equal to 2. The algorithm that maximizes likelihood finds optimal solution for “q,” “a,” “μ,” and “σ” that satisfy (6).
Characteristics of the investigated classes of compounds.
| Set |
| Class | Activity/property, expressed as | Reference |
|---|---|---|---|---|
| 1a | 35 | Phenols | Toxicity on | [ |
| 1b | 126 | |||
| 1c | 250 | |||
|
| ||||
| 2 | 24 | Organic compounds | Solubility, log | [ |
|
| ||||
| 3 | 73 | Alkanes | Boiling point, BP | [ |
|
| ||||
| 4a | 40 | Flavonoids | Solubility, log |
[ |
| 4b | 30 | Lethal Dose 50%, ln(LD50) | ||
|
| ||||
| 5 | 132 | Estrogen receptor (ER) | Binding affinities, log(RBA) | [ |
|
| ||||
| 6 | 80 | Pyrrolo-pyrimidine derivatives | c-Src tyrosine kinase inhibitory activity, pIC50 = −log10(IC50) | [ |
|
| ||||
| 7 | 47 | Substituted aromatic sulfonamides | Inhibition activity on carbonic anhydrase II, log | [ |
|
| ||||
| 8 | 37 | Carboquinone derivatives | Molar concentration, log(1/MC) | [ |
|
| ||||
| 9 | 47 | Dipeptides | ACE (angiotensin converting enzyme) inhibitory activity, ACE | [ |
|
| ||||
| 10 | 60 | Mycotoxins compounds | Retention time, ln(RT) | [ |
Characteristics of the SLR-LS models used in the optimization study.
| Set | SLR model |
|
|
|
|
|---|---|---|---|---|---|
| 1a | log(1/IGC50) = +0.677 · log | 0.90 | 0.22 | 287 | 35 |
| 1b | log(1/IGC50) = +0.647 · log | 0.84 | 0.30 | 666 | 126 |
| 1c | log(1/IGC50) = −0.443 · log | 0.53 | 0.57 | 276 | 250 |
| 2 | log | 0.53 | 0.43 | 25 | 24 |
| 3 | BP = +188.40 · lbMdsHg | 0.99 | 3.81 | 8050 | 73 |
| 4a | log | 0.71 | 0.32 | 92 | 40 |
| 4b | ln(LD50) = +0.0018 · SD − 61.168 | 0.41 | 0.98 | 19 | 30 |
| 5 | logRBA = +0.026 · TIC1 − 4.145 | 0.36 | 1.44 | 72 | 132 |
| 6 | pIC50 = +0.255 · DCW − 1.216 | 0.71 | 0.57 | 191 | 80 |
| 7 | log | 0.49 | 0.37 | 43 | 47 |
| 8 | log(1/MC) = −4.129 · TEuIFFDL | 0.65 | 0.38 | 64 | 37 |
| 9 | ACE = 47.5480 · IHMdpMg | 0.74 | 0.33 | 128 | 47 |
| 10 | ln(RT) = 0.348 · log | 0.56 | 0.50 | 75 | 60 |
SLR = simple linear regression.
log(1/IGC50) = concentrations (expressed as mM) producing a 50% growth inhibition on T. pyriformis.
MDF descriptors [33, 39, 40, 42].
SD = global correlation descriptor [35]; TIC1 = total information content index (neighborhood symmetry of 1-order).
DCW = flexible (activity dependent) descriptor.
std_dim3 = the square root of the third largest eigenvalue of the covariance matrix of the atomic coordinates [43].
R 2 = determination coefficient; s = standard error of the estimate.
F = Fisher's statistic of the regression model; n = sample size.
Optimization results: q = 2 versus q determined to satisfy (6).
| set |
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |||
| 1a | 35 | 0.678 | −1.386 | 0.218 | 9.52 | 0.638 | −1.181 | 0.222 | 4.20 · 10−54 |
| 1b | 126 | 0.647 | −1.050 | 0.298 | 4.36 | 0.647 | −1.029 | 0.298 | 3.07 · 10−115 |
| 1c | 250 | 0.509 | −0.443 | 0.596 | 1.29 | 0.563 | −0.623 | 0.569 | 2.42 · 10−53 |
| 2 | 24 | −0.004 | 2.095 | 0.414 | 0.61 | −0.005 | 2.270 | 0.516 | 1.76 · 10−12 |
| 3 | 73 | 188.408 | −507.959 | 3.762 | 1.34 | 188.408 | −507.959 | 3.762 | 6.93 · 10−2 |
| 4a | 40 | 1.000 | 5.232 | 0.308 | 2.81 | 1.041 | 5.338 | 0.308 | 1.30 · 10−19 |
| 4b | 30 | 0.002 | −61.168 | 0.945 | 0.67 | 0.002 | −64.950 | 0.964 | 1.16 · 10−8 |
| 5 | 132 | 0.024 | −3.812 | 1.374 | 1.70 | 0.026 | −3.967 | 1.374 | 7.33 · 10−3 |
| 6 | 80 | 0.255 | −1.216 | 0.558 | 2.87 | 0.255 | −1.216 | 0.558 | 3.39 · 10−23 |
| 7 | 47 | −0.578 | 2.646 | 0.360 | 3.43 | −0.555 | 2.594 | 0.353 | 1.06 · 10−30 |
| 8 | 37 | −4.129 | 5.789 | 0.372 | 1.29 | −4.297 | 5.789 | 0.372 | 4.75 · 10−14 |
| 9 | 47 | 47.561 | −0.169 | 0.319 | 3.17 | 49.502 | −0.279 | 0.319 | 9.01 · 10−29 |
| 10 | 60 | 0.348 | 1.711 | 0.492 | 1.74 | 0.355 | 1.711 | 0.492 | 6.09 · 10−5 |
q = power of the errors; a, b = coefficients in the simple linear model.
μ = population mean; σ = population standard deviation.
Figure 2Distribution of power of the errors according to iteration: investigation of phenols set (35 compounds (1a) and 126 compounds (1b), resp.). Distribution of power of the errors according to iteration: phenols (1c), organic compounds (2), alkanes (3), flavonoids (4a and 4b), estrogen receptor (5), pyrrolo-pyrimidine derivatives (6), and substituted aromatic sulfonamides (7). Distribution of power of the errors according to iteration: behavior on carboquinone derivatives (8), dipeptides (9), and mycotoxins compounds (10).
Figure 3The line of SLR-LS (q = 2) and SLR-MLE (q determined to satisfy (6)): investigation of phenols set (35 compounds (1a) and 126 compounds (1b), resp.). Phenols (1c), organic compounds (2), alkanes (3), flavonoids (4a and 4b), estrogen receptor (5), pyrrolo-pyrimidine derivatives (6), and substituted aromatic sulfonamides (7). Carboquinone derivatives (8), dipeptides (9), and mycotoxins compounds (10).