| Literature DB >> 34356082 |
Erinija Pranckeviciene1, Valentina Gineviciene1, Audrone Jakaitiene1, Laimonas Januska1, Algirdas Utkus1.
Abstract
Total genotype score (TGS) reflects additive effect of genotypes on predicting a complex trait such as athletic performance. Scores assigned to genotypes in the TGS should represent an extent of the genotype's predisposition to the trait. Then, combination of genotypes highly ranks those individuals, who have a trait expressed. Usually, the genotypes are scored by the evidence of a genotype-phenotype relationship published in scientific studies. The scores can be revised computationally using genotype data of athletes, if available. From the available genotype data of 180 Lithuanian elite athletes we created an endurance-mixed-power performance TGS profile based on known ACE rs1799752, ACTN3 rs1815739, and AMPD1 rs17602729, and an emerging MB rs7293 gene markers. We analysed an ability of this TGS profile to stratify athletes according to the sport category that they practice. Logistic regression classifiers were trained to compute the genotype scores that represented the endurance versus power traits in the group of analysed athletes more accurately. We observed differences in TGS distributions in female and male group of athletes. The genotypes with possibly different effects on the athletic performance traits in females and males were described. Our data-driven analysis and TGS modelling tools are freely available to practitioners.Entities:
Keywords: Lithuanian athletes; TGS; human athletic performance; logistic regression; polygenic profile; total genotype score
Mesh:
Substances:
Year: 2021 PMID: 34356082 PMCID: PMC8306147 DOI: 10.3390/genes12071067
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
An example of possible polygenic profile of tendon injury with scored genotypes and population frequency in Ensembl database.
| Gene Marker | Genotype | Genotype Score | Trait Outcome | Population Frequency |
|---|---|---|---|---|
|
| TT | 1 | High risk | 0.157 |
| rs12722 | CT | 0.5 | Moderate risk | 0.384 |
| Confidence 1 | CC | −1 | Low risk | 0.458 |
|
| CC | 0.5 | Moderate risk | 0.833 |
| rs1800012 | AC | 0.5 | Moderate risk | 0.151 |
| Confidence 1 | AA | −1 | Low risk | 0.016 |
|
| AA | 1 | Increased risk | 0.126 |
| rs679620 | AG | 0 | Moderate risk | 0.444 |
| Confidence 0.75 | GG | −1 | Low risk | 0.430 |
Coefficients of the logistic regression model fitted to the athletes genotype data in the three two class classification tasks: endurance versus power (E vs. P), endurance versus mixed (E vs. M), power versus mixed (P vs. M). The column b represents a coefficient of a corresponding genotype in a fitted logistic regression model. The class probability represents which is a measure of how strongly a coefficient b alone predicts either of the binary classes.
| Classification | E(1) vs. P(0) | Class | E(1) vs. M(0) | Class | P(1) vs. M(0) | Class |
|---|---|---|---|---|---|---|
| Genotype |
| Prob. |
| Prob. |
| Prob. |
| −0.080 | 0.479 | −0.199 | 0.450 | 0.039 | 0.509 | |
| 0.289 | 0.571 | 0.241 | 0.560 | −0.071 | 0.482 | |
| −0.209 | 0.447 | −0.041 | 0.489 | 0.032 | 0.508 | |
| 0.288 | 0.571 | 0.222 | 0.555 | −0.047 | 0.488 | |
| 0.062 | 0.515 | −0.453 | 0.388 | −0.403 | 0.400 | |
| −0.350 | 0.413 | 0.230 | 0.557 | 0.451 | 0.610 | |
| 0.088 | 0.522 | 0.117 | 0.529 | −0.068 | 0.482 | |
| 0.223 | 0.555 | 0.095 | 0.523 | −0.125 | 0.468 | |
| −0.311 | 0.422 | −0.212 | 0.447 | 0.194 | 0.548 | |
| 0.415 | 0.602 | 0.418 | 0.603 | −0.071 | 0.482 | |
| 0.194 | 0.548 | −0.044 | 0.488 | −0.250 | 0.437 | |
| −0.610 | 0.351 | −0.374 | 0.407 | 0.321 | 0.579 |
Genotype and allele counts of gene markers ACE rs1799752, ACTN3 rs181573, AMPD1 rs17602729, and MB rs7293 in Lithuanian elite athletes and controls. The p-value of test result of genotype and allele frequencies between elite athletes and controls significant at a level and significant at a level .
| Gene/Group | Genotype Frequency | Allele Frequency | |||||
|---|---|---|---|---|---|---|---|
|
| DD | ID | II | 0.049 ** | D | I | 0.093 * |
| Athletes | 46 | 84 | 50 | 176 | 184 | ||
| Controls | 63 | 94 | 98 | 220 | 290 | ||
|
| RR | RX | XX | 0.118 | R | X | 0.0786 * |
| Athletes | 56 | 102 | 22 | 214 | 146 | ||
| Controls | 104 | 125 | 26 | 333 | 177 | ||
|
| CC | CT | TT | 0.625 | C | T | 0.539 |
| Athletes | 133 | 45 | 2 | 311 | 49 | ||
| Controls | 184 | 65 | 6 | 433 | 77 | ||
|
| AA | AG | GG | 0.0004 ** | A | G | 0.588 |
| Athletes | 35 | 116 | 29 | 186 | 174 | ||
| Controls | 69 | 116 | 70 | 254 | 256 | ||
Figure 1Representation of a number of athletes in endurance, mixed, and power sports carrying a specific genotype combination.
Binomial probability to observe individuals in a random group of carrying a genotype combination of probability .
| Genotype Combination |
|
|
|
|---|---|---|---|
| ID-RX-AG-CT | 12 | 0.021 | 0.000345 |
| ID-RX-AG-CC | 20 | 0.0594 | 0.002910 |
| II-RX-GG-CC | 1 | 0.0374 | 0.007327 |
| DD-RX-AG-TT | 2 | 0.0013 | 0.021598 |
| ID-RR-GG-CC | 1 | 0.0299 | 0.023502 |
| ID-RR-AG-CC | 13 | 0.0495 | 0.047800 |
| ID-XX-AG-CC | 5 | 0.0124 | 0.049164 |
| ID-RX-GG-CC | 10 | 0.0359 | 0.054216 |
| II-RX-AG-CC | 15 | 0.0618 | 0.055684 |
| DD-RX-GG-CC | 1 | 0.0240 | 0.055847 |
| II-RR-GG-CC | 2 | 0.0311 | 0.056269 |
| DD-RX-AG-CC | 10 | 0.0398 | 0.076339 |
| DD-RR-GG-CC | 1 | 0.0200 | 0.096777 |
| DD-RR-AG-CT | 4 | 0.0117 | 0.099880 |
| II-RR-AG-CC | 11 | 0.0515 | 0.104875 |
| DD-XX-AG-CC | 3 | 0.0083 | 0.125010 |
Gene markers, their genotypes, genotype scores, and associated traits in LR TGS and WF TGS models, along with the population genotype frequencies in the Ensembl database (ACE rs1799752 frequencies obtained from the rs4341 in 100% LD with I/D polymorphism II = CC, ID = GC, DD = GG) and in Lithuanian controls (LR—logistic regression; TGS—total genotype score; WF—Williams and Folland genotype score; Freq.—genotype frequencies; End—endurance-oriented athletes; Mix—mixed athletes group; Pow—power-oriented athletes).
| Gene Marker | Genotype | LR TGS | Trait | WF TGS | Trait | Freq. Ensembl All | Freq. Control |
|---|---|---|---|---|---|---|---|
|
| DD | 0.4165 | End | 2 | End | 0.237 | 0.247 |
| rs1799752 | ID | −0.1470 | Mix | 1 | Mix | 0.466 | 0.369 |
| II | −0.4655 | Pow | 0 | Pow | 0.297 | 0.384 | |
|
| RR | −0.0595 | Pow | 2 | End | 0.382 | 0.408 |
| rs1815739 | RX | 0.265 | End | 1 | Mix | 0.435 | 0.49 |
| XX | −0.1205 | Pow | 0 | Pow | 0.183 | 0.102 | |
|
| CC | 0.255 | End | 2 | End | 0.927 | 0.722 |
| rs17602729 | CT | 0 | Mix | 1 | Mix | 0.07 | 0.255 |
| TT | −0.4005 | Pow | 0 | Pow | 0.003 | 0.024 | |
|
| AA | 0.1325 | End | 2 | End | 0.313 | 0.271 |
| rs7293 | AG | 0.159 | End | 1 | Mix | 0.437 | 0.455 |
| GG | −0.2525 | Pow | 0 | Pow | 0.251 | 0.275 |
Classification accuracy of logistic regression classifiers in three two-class classification tasks. Accuracy is presented as mean ± standard deviation. Confusion tables show how many athletes were classified correctly and incorrectly by final LR classifiers trained on all data.
| Classifier | Accuracy | Confusion Table | |||
|---|---|---|---|---|---|
| Endurance vs. Power |
| Predicted | |||
| Pow | End | ||||
| Actual | Pow | 14 | 30 | ||
| End | 6 | 75 | |||
| Endurance vs. Mixed |
| Predicted | |||
| Mix | End | ||||
| Actual | Mix | 10 | 45 | ||
| End | 12 | 69 | |||
| Power vs. Mixed |
| Predicted | |||
| Mix | Pow | ||||
| Actual | Mix | 43 | 12 | ||
| Pow | 24 | 20 | |||
Figure 2WF TGS value distributions shown by violin and box-whisker plots in females and males in different sport categories.
Figure 3LR TGS value distributions shown by violin and box-whisker plots in females and males in different sport categories. Violin plots show areas of high data point density along the axis of value distribution.
Statistically significant Wilcoxon rank sum test outcomes (p-values at 95% and 90% levels of significance) of LR TGS and WF TGS value comparisons between males and females and different sport groups.
| Groups Compared (n) | LR TGS ( | WF TGS ( |
|---|---|---|
| Endurance (81) vs. power (44) | 0.0396 | 0.0528 |
| Endurance (81) vs. mixed (55) | 0.0861 | 0.105 |
| Power females (6) vs. males (38) | 0.042 | 0.0229 |
| Females endurance (27) vs. power (6) | 0.0152 | 0.0033 |
| Females power (6) vs. mixed (17) | 0.08 | 0.0022 |
Figure 4Athletes data order by the LR TGS values. The graph shows a number of athletes from the different sports characterised by the same LR TGS value. The order is from the negative TGS values representing the power class to positive, representing the endurance class.
Figure 5Athletes data order by the WG TGS values. The graph shows a number of athletes from the different sports characterised by the same WF TGS value. Values are ordered from negative representing the power class towards positive, representing the endurance class.
Distribution of athletes by genotype, by gender, and by sport category.
| Gene Marker | Genotype | Females | Males | ||||
|---|---|---|---|---|---|---|---|
| Endurance | Power | Mixed | Endurance | Power | Mixed | ||
|
| DD | 10 | 2 | 4 | 15 | 7 | 8 |
| rs1799752 | ID | 15 | 1 | 7 | 24 | 16 | 21 |
| II | 2 | 3 | 6 | 15 | 15 | 9 | |
|
| RR | 9 | 4 | 3 | 13 | 12 | 15 |
| rs1815793 | RX | 15 | 2 | 12 | 35 | 20 | 18 |
| XX | 3 | 0 | 2 | 6 | 6 | 5 | |
|
| CC | 20 | 3 | 11 | 43 | 30 | 26 |
| rs17602729 | CT | 7 | 2 | 6 | 10 | 18 | 12 |
| TT | 0 | 1 | 0 | 1 | 0 | 0 | |
|
| AA | 8 | 0 | 5 | 8 | 9 | 5 |
| rs7293 | AG | 16 | 1 | 10 | 38 | 25 | 26 |
| GG | 3 | 5 | 2 | 8 | 4 | 7 | |
Actual effect size and empirical power of the statistical Wilcoxon test with the available sample sizes (n1 and n2) for the LR TGS model of the groups shown in Table 7.
| Groups | Sample Size | Actual | Actual | Empirical |
|---|---|---|---|---|
|
| Effect Size |
| Power | |
| All athletes endurance vs. power | 81, 44 | 0.55 | 0.05 | 0.81 |
| All athletes endurance vs. mixed | 81, 55 | 0.379 | 0.1 | 0.68 |
| Power males vs. females | 38, 6 | 1.057 | 0.1 | 0.75 |
| Females endurance vs. power | 27, 6 | 1.54 | 0.05 | 0.88 |
| Females mixed vs. power | 17, 6 | 1.057 | 0.1 | 0.75 |