| Literature DB >> 31470908 |
Florence Lichou1, Sébastien Orazio2, Stéphanie Dulucq1, Gabriel Etienne1, Michel Longy1, Christophe Hubert3, Alexis Groppi4, Alain Monnereau2, François-Xavier Mahon1, Béatrice Turcq5.
Abstract
BACKGROUND: Targeted therapies have greatly improved cancer patient prognosis. For instance, chronic myeloid leukemia is now well treated with imatinib, a tyrosine kinase inhibitor. Around 80% of the patients reach complete remission. However, despite its great efficiency, some patients are resistant to the drug. This heterogeneity in the response might be associated with pharmacokinetic parameters, varying between individuals because of genetic variants. To assess this issue, next-generation sequencing of large panels of genes can be performed from patient samples. However, the common problem in pharmacogenetic studies is the availability of samples, often limited. In the end, large sequencing data are obtained from small sample sizes; therefore, classical statistical analyses cannot be applied to identify interesting targets. To overcome this concern, here, we described original and underused statistical methods to analyze large sequencing data from a restricted number of samples.Entities:
Keywords: Chronic myeloid leukemia; Factorial correspondence analysis; Hierarchical clustering on principal components; Next-generation sequencing; Pharmacogenetics; Rank products; Small sample size; Statistics
Year: 2019 PMID: 31470908 PMCID: PMC6717342 DOI: 10.1186/s40246-019-0235-1
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
List and characteristics of the 48 sequenced genes (obtained from GeneCards® database)
| Gene symbol | Chromosomal location | Gene name |
|---|---|---|
| Plasma proteins, membrane transporters, and regulators ( | ||
| ABCB1 | 7q21.12 | ATP-binding cassette subfamily B member 1 |
| ABCC2 | 10q24 | ATP-binding cassette subfamily C member 2 |
| ABCG2 | 4q22.1 | ATP-binding cassette subfamily G member 2 (Junior blood group) |
| HFE | 6p21.3 | Hemochromatosis |
| HIF1A | 14q23.2 | Hypoxia inducible factor 1 alpha subunit |
| ORM1 | 9q32 | Orosomucoid 1 |
| SLC22A1 | 6q25.3 | Solute carrier family 22 member 1 |
| SLC22A4 | 5q23.3 | Solute carrier family 22 member 4 |
| SLCO1A2 | 12p12 | Solute carrier organic anion transporter family member 1A2 |
| SLCO1B1 | 12p12 | Solute carrier organic anion transporter family member 1B1 |
| Metabolism enzymes and regulators ( | ||
| CYP1A1 | 15q24.1 | Cytochrome P450 family 1 subfamily A member 1 |
| CYP1A2 | 15q24.1 | Cytochrome P450 family 1 subfamily A member 2 |
| CYP2C19 | 10q24 | Cytochrome P450 family 2 subfamily C member 19 |
| CYP2C8 | 10q24.1 | Cytochrome P450 family 2 subfamily C member 8 |
| CYP2C9 | 10q24.1 | Cytochrome P450 family 2 subfamily C member 9 |
| CYP2D6 | 22q13.1 | Cytochrome P450 family 2 subfamily D member 6 |
| CYP3A4 | 7q21.1 | Cytochrome P450 family 3 subfamily A member 4 |
| CYP3A5 | 7q21.1 | Cytochrome P450 family 3 subfamily A member 5 |
| NR1I2 | 3q12-q13.3 | Nuclear receptor subfamily 1 group I member 2 |
| NR1I3 | 1q23.3 | Nuclear receptor subfamily 1 group I member 3 |
| UGT1A1 | 2q37.1 | UDP glucuronosyltransferase family 1 member A1 |
| UGT1A9 | 2q37 | UDP glucuronosyltransferase family 1 member A9 |
| Cell cycle and proliferation ( | ||
| CCND1 | 11q13 | Cyclin D1 |
| PPP2R2A | 8p21.2 | Protein phosphatase 2 regulatory subunit |
| RPA1 | 17p13.3 | Replication protein A1 |
| RPA2 | 1p35 | Replication protein A2 |
| RPA3 | 7p21.3 | Replication protein A3 |
| DNA repair ( | ||
| ERCC2 | 19q13.3 | Excision repair cross-complementation group 2 |
| ERCC3 | 2q21 | Excision repair cross-complementation group 3 |
| ERCC4 | 16p13.3 | Excision repair cross-complementation group 4 |
| ERCC5 | 13q22-q34 | Excision repair cross-complementation group 5 |
| ERCC6 | 10q11 | Excision repair cross-complementation group 6 |
| ERCC8 | 5q12.1 | Excision repair cross-complementation group 8 |
| LIG1 | 19q13.33 | DNA ligase 1 |
| RAD23B | 9p31.2 | RAD23 homolog B. nucleotide excision repair protein |
| XPA | 9p22.3 | Xeroderma pigmentosum complementation group A |
| XPC | 3p25.1 | Xeroderma pigmentosum complementation group C |
| Cytokine pathways ( | ||
| CXCL8 | 4q13-q21 | C-X-C motif chemokine ligand 8 |
| IFNG | 12q14 | Interferon gamma |
| IFNGR1 | 6q23-q24 | Interferon gamma receptor 1 |
| IFNGR2 | 21q22.1 | Interferon gamma receptor 2 (interferon gamma transducer 1) |
| SOCS1 | 16p13.13 | Suppressor of cytokine signaling 1 |
| SOCS2 | 12q | Suppressor of cytokine signaling 2 |
| Kinases and phosphatases ( | ||
| AKT1 | 14q32.33 | V-akt murine thymoma viral oncogene homolog 1 |
| ULK3 | 15q24.1 | Unc-51 like kinase 3 |
| PTPN1 | 20q12.1-q13.2 | Protein tyrosine phosphatase non-receptor type 1 |
| PTPN2 | 18p11.3-p11.2 | Protein tyrosine phosphatase non-receptor type 2 |
| PTPN22 | 1p13.2 | Protein tyrosine phosphatase non-receptor type 22 |
ANNOVAR annotations of all sequenced polymorphisms
| ANNOVAR annotation | Deletion | Insertion | SNP | Total | Percentage of the total |
|---|---|---|---|---|---|
| Upstream to the promoter | 0 | 0 | 6 | 6 | 0.8 |
| Downstream to the promoter | 0 | 0 | 3 | 3 | 0.4 |
| UTR5 | 1 | 2 | 53 | 56 | 7.9 |
| UTR3 | 1 | 2 | 29 | 32 | 4.5 |
| Exonic | 5 | 1 | 164 | 170 | 24.0 |
| Exonic splicing | 0 | 0 | 5 | 5 | 0.7 |
| Splicing | 0 | 0 | 1 | 1 | 0.2 |
| Intergenic | 0 | 0 | 3 | 3 | 0.4 |
| Intronic | 34 | 22 | 375 | 431 | 60.9 |
| ncRNA_exonic | 0 | 0 | 1 | 1 | 0.2 |
| Total | 41 | 27 | 640 | 708 | 100.0 |
| Percentage of the total | 5.8 | 3.8 | 90.4 | 100.0 |
ANNOVAR annotations of exonic sequenced polymorphisms
| Polymorphisms | Exonic and exonic splicing | Percentage of all exonic polymorphisms | Percentage of all polymorphisms |
|---|---|---|---|
| Deletion | 5 | 2.9 | 0.7 |
| Frameshift | 2 | 1.1 | 0.3 |
| Non-frameshift | 2 | 1.1 | 0.3 |
| Stop-gain | 1 | 0.6 | 0.1 |
| Insertion | 1 | 0.6 | 0.1 |
| Frameshift | 1 | 0.6 | 0.1 |
| SNP | 169 | 96.6 | 23.9 |
| Non-synonymous | 104 | 59.4 | 14.7 |
| Synonymous | 64 | 36.6 | 9.0 |
| Stop-loss | 1 | 0.6 | 0.1 |
| Total | 175 | 100.0 | 24.7 |
Repartition of the sequenced polymorphisms in 1000G database
| Total polymorphisms | Polymorphisms with AltAF in 1000G | Percentage with AltAF | |
|---|---|---|---|
| Upstream to the promoter | 6 | 3 | 50.0 |
| Downstream to the promoter | 3 | 2 | 66.7 |
| UTR5 | 56 | 52 | 92.9 |
| UTR3 | 32 | 28 | 87.5 |
| Exonic | 170 | 138 | 81.2 |
| Exonic splicing | 5 | 3 | 60.0 |
| Splicing | 1 | 1 | 100.0 |
| Intergenic | 3 | 3 | 100.0 |
| Intronic | 431 | 347 | 80.5 |
| ncRNA_exonic | 1 | 1 | 100.0 |
| Total | 708 | 578 | 81.6 |
Fig. 1Workflow to adjust the genotype matrix according to the AltAF. First, for polymorphisms with no AltAF, a theoretical MAF is added. Second, genotypes for polymorphisms with an AltAF ≥ 0.5 are inverted. Third, polymorphisms with no variant allele identified by NGS are excluded. Finally, VAFs are determined and reported in a contingency table. GT (genotype) = {0_0, 0_1, 1_1}, 0_0: reference homozygous, 0_1: heterozygous, 1_1: variant homozygous
Fig. 2Overview of the two statistical strategies to highlight gene and polymorphisms most likely to be associated with IM resistance. First, either all polymorphisms or only polymorphisms with variant causing protein alterations can be selected. Second, two analyses can be performed. The FCA and HCPC method will display the polymorphisms individually on a two-dimensional graph according to VAFs. The simulation and RP will permit to rank genes according to VAFs (mean of VAFs per gene or sum divided by the size of the gene)
Fig. 3Cluster plot of the distribution of the variant allele for each identified polymorphism (684) among the three groups: CML sensitive patients, CML resistant patients, and the general population. Three clusters were obtained after FCA and HCPC. 1, highest variant frequency in CML sensitive patients; 2, no difference between populations; 3, highest variant frequency in CML resistant patients
Top genes identified by the rank product method
| Gene | RP/Rsuma | FC (class 1/class 2)b | pfpc | |
|---|---|---|---|---|
|
| 1.000 | 0.2495 | 4.56E−05 | 1.63E−06 |
|
| 2.000 | 0.4859 | 4.55E−03 | 3.25E−04 |
|
| 3.000 | 0.6521 | 3.45E−02 | 3.70E−03 |
aRP/Rsum (rank product statistics): the probability that the gene would be classified first in all samples (from both conditions). The lower it is, the more the difference between control and treated conditions is important
bFC (class 1/class 2): computed fold change of the average “expression levels” under two conditions
cpfp percentage of false prediction
Patients’ characteristics
| Characteristics | All patients | Optimal response | Failure response |
|---|---|---|---|
| No. of patients (%) | 24 (100) | 12 (50) | 12 (50) |
| Gender | |||
| Male | 15 | 8 | 7 |
| Female | 9 | 4 | 5 |
| Median age at diagnosis (range) | 59 (19–86) | 61 (19–86) | 57 (20–77) |
| Sokal score (low/intermediate/high) | 8/7/9 | 6/2/4 | 2/5/5 |