| Literature DB >> 30940865 |
Shenmin Guan1,2, Yingying Zhao1, Xiao Zhuo2, Wenhui Song2, Xiaorui Geng3, Huanming Yang4,5, Jian Wang4,5, Xinhua Wu6, Jinlong Yang7,8, Xin Song9, Le Cheng10,11,12,13.
Abstract
Regional gender differences in autosomal chromosome disorders have been observed repeatedly. However, the corresponding diversity changes remain unconfirmed. By analyzing previously published thalassemia data from the Dai people in Dehong and Xishuangbanna (two regions in Yunnan Province, China), we found that several sequence types, including HBA CNV and HBB mutations, significantly depend on gender in Xishuangbanna but not in Dehong. With the supportive evidence from previous researches, we accept that some certain mutations depend on gender regionally. This association seems peculiar. It is among one common people on a small geographical scale, while other recorded thalassemia gender difference varies by ethnics and continent.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30940865 PMCID: PMC6445288 DOI: 10.1038/s41598-019-41905-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Diversity comparison. In panels A-E, the frequency of each sequence type is presented. “*” indicates that the difference is significant. Panel F shows that all the genotypes differed significantly between the two regions. The paired code name and type annotation listed in the legend for each genotype was used in the rest of this article, including in the description of genotypes and the identification of genotypes for association rules. (A) HBA1: hemoglobin, alpha 1. A100 = normal, A101 = c.95 + 1G > A, A102 = c.98T > A. (B) HBA2: hemoglobin, alpha 2. A200 = normal, A201 = c.427T > C (het), A202 = c.369C > G (het), A203 = c.369C > G (hom), A204 = c.1delA (hom), A205 = c.427T > C (hom), A206 = c.377T > C (hom). (C) HBA_CNV: hemoglobin, alpha, copy number variation. AC00 = normal, AC01 = αα/−α3.7, AC02 = αα/ −α3.7(α1 and α2 fusion after deletion in another chain), AC03 = αα/–SEA, AC04 = αα/−α4.2, AC05 = −α3.7/= SEA, AC06 = −α3.7/−α3.7, AC07 = αα/ααα. (D) HBB: hemoglobin, beta. B00 = normal, B01 = c.79G > A (het), B02 = c.217dupA (het), B03 = c.126_129delCTTT (het), B04 = c.52A > T (het), B05 = c.316-238C > T (het), B06 = c.79G > A (hom), B07 = c.380T > G (het), B08 = c.410G > A (het), B09 = c.126delC (het), B10 = c.92 + 1G > T (het). (E) HBB_CNV: hemoglobin, beta, copy number variation. BC00 = normal, BC01 = 1-HBB. Here, “het” and “hom” denote heterozygote and homozygote. The details of the code names can be seen in S2_Table.xls. (F) Each genotype is a combination of sequence types, and every combination is described with the involved sequence types. Genotype 1 = A100, A203, AC03, B00, BC00; Genotype 2 = A100, A202, AC00, B00, BC00; Genotype 3 = A100, A200, AC04, B00, BC00; Genotype 4 = A100, A200, AC03, B00, BC00; Genotype 5 = A100, A200, AC01, B01, BC00; Genotype 6 = A100, A200, AC01, B00, BC00; Genotype 7 = A100, A200, AC00, B04, BC00; Genotype 8 = A100, A200, AC00, B03, BC00; Genotype 9 = A100, A200, AC00, B01, BC00.
Figure 2Scatter diagram and classification diagram for association rules. The support, evidence and lift in the scatter diagrams were given by the apriori algorithm and are the most important properties of an association rule. The high lift indicates that the identified rule cannot be caused by randomness. In the classification diagrams, the association rules were classified by the left-hand side (LHS) of the implication. Rules that share more common property-value pairs in the LHS are thought to be more similar.
Figure 3Probability comparison for significant association rules. Each pair of bars represents a significant rule. This figure shows the difference between the joint probability and the conditional probability corresponding to each significant association rule. The joint probability is described by the percentage of the samples corresponding to the right-hand side (RHS) of the rule, which indicates the frequency in all samples. The conditional probability is described by the percentage of the samples corresponding to both the LHS and the RHS (left- and right-hand sides of the rule), which indicates the frequency according to the rule. Rule 1 {Sex = Female, HBA_CNV = AC03, HBB = B00} => {HBA2 = A203}, Rule 2 {HBA2 = A203} => {HBA_CNV = AC03}, Rule 3 {Sex = Male, Age = Y} => {HBB = B01}, Rule 4 {Sex = Male, HBA2 = A200, HBA_CNV = AC00} => {HBB = B04}, Rule 5 {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03}, Rule 6 {Sex = Male, Age = O, HBB = B00} => {HBA_CNV = AC03}, Rule 7 {Sex = Male, Age = O, HBA2 = A200} => {HBA_CNV = AC03}, Rule 8 {Sex = Male, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03}, Rule 9 {Sex = Male, Age = O} => {HBA_CNV = AC03}, Rule 10 {Sex = Male, HBB = B00} => {HBA_CNV = AC03}, Rule 11 {HBA2 = A200, HBA_CNV = AC03} => {Sex = Male}, Rule 12 {Sex = Male} => {HBA_CNV = AC03}. The significant association rules found in Xishuangbanna were divided into three groups here.
Figure 4Data analysis process. The process includes multiple steps: data cleaning, association mining, association rule significance testing, association regionality discussion, and association rule replacement discussion. The data were cleaned mainly by removing the items without needed information. The associations were mined with the apriori algorithm. The association rules identified by apriori were tested with the Fisher’s exact test. Then, the regionality and replacement of the significant rules were discussed. When discussing regionality, each significant rule found in a region was tested in the other region. If a rule was significant in one region but not in the other, it was thought to be regional. When discussing the rationality of replacing an apriori rule with a simpler rule, a comparison was made between two conditional probabilities, one for the apriori rule and the other for a given simpler rule that could replace the former rule. If the probabilities differed from each other significantly, the replacement was thought not to be rational. At last, a permutation test was designed for the Xishuangbanna rules including “Sex” in discussion.
Rule comparison.
| No | Rule | Simpler Rule | P Value |
|---|---|---|---|
| 1 | {Sex = Female, HBA_CNV = AC03, HBB = B00} => {HBA2 = A203} | {Sex = Female} => {HBA2 = A203} | 0.0252 |
| 2 | {Sex = Female, HBA_CNV = AC03, HBB = B00} => {HBA2 = A203} | {HBB = B00} => {HBA2 = A203} | 0.026 |
| 3 | {Sex = Female, HBA_CNV = AC03, HBB = B00} => {HBA2 = A203} | {Sex = Female, HBB = B00} => {HBA2 = A203} | 0.0346 |
| 4 | {Sex = Male, Age = Y} => {HBB = B01} | {Age = Y} => {HBB = B01} | 0.0332 |
| 5 | {Sex = Male, HBA2 = A200, HBA_CNV = AC00} => {HBB = B04} | {HBA2 = A200} => {HBB = B04} | 0.0484 |
| 6 | {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {Age = O} => {HBA_CNV = AC03} | 0.0164 |
| 7 | {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {HBA2 = A200} => {HBA_CNV = AC03} | 0.0028 |
| 8 | {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {HBB = B00} => {HBA_CNV = AC03} | 0.0045 |
| 9 | {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {Age = O, HBA2 = A200} => {HBA_CNV = AC03} | 0.0177 |
| 10 | {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {Age = O, HBB = B00} => {HBA_CNV = AC03} | 0.0251 |
| 11 | {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | 0.0044 |
| 12 | {Sex = Male, Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {Age = O, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | 0.0442 |
| 13 | {Sex = Male, Age = O, HBB = B00} => {HBA_CNV = AC03} | {Age = O} => {HBA_CNV = AC03} | 0.0236 |
| 14 | {Sex = Male, Age = O, HBB = B00} => {HBA_CNV = AC03} | {HBB = B00} => {HBA_CNV = AC03} | 0.0089 |
| 15 | {Sex = Male, Age = O, HBA2 = A200} =>{HBA_CNV = AC03} | {Age = O} => {HBA_CNV = AC03} | 0.0291 |
| 16 | {Sex = Male, Age = O, HBA2 = A200} => {HBA_CNV = AC03} | {HBA2 = A200} => {HBA_CNV = AC03} | 0.0077 |
| 17 | {Sex = Male, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {HBA2 = A200} => {HBA_CNV = AC03} | 0.0041 |
| 18 | {Sex = Male, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {HBB = B00} => {HBA_CNV = AC03} | 0.0101 |
| 19 | {Sex = Male, HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | {HBA2 = A200, HBB = B00} => {HBA_CNV = AC03} | 0.0098 |
| 20 | {Sex = Male, HBB = B00} => {HBA_CNV = AC03} | {HBB = B00} => {HBA_CNV = AC03} | 0.0126 |
| 21 | {HBA2 = A200, HBA_CNV = AC03} => {Sex = Male} | {HBA2 = A200} => {Sex = Male} | 0.0183 |
Because “Sex” was thought to be essential in Xishuangbanna, a permutation test was applied to the significant rules with “Sex” (see Method). The most concerned three rules are {Sex = Male} => {HBA_CNV = AC03}, {Sex = Male, Age = O} => {HBA_CNV = AC03}, and {Sex = Male, Age = Y} => {HBB = B01}. After shuffling “Sex”, the proportions of their Fisher test p values less than 0.01 were 9e-04, 4e-04, 1e-03, respectively; the proportion of p values less than 0.03 were 0.0011 0.0019 0.0066, respectively; the proportion of p values less than 0.05 were 0.0019 0.0067 0.0129, respectively (see PermutationTestBN in S1_Table.xls). These proportions suggest that, “Sex” certainly relates to “AC03” and “B01”.