| Literature DB >> 30508009 |
Leila Maria Ferreira1, Thelma Sáfadi2, Juliano Lino Ferreira3.
Abstract
We propose to evaluate genome similarity by combining discrete non-decimated wavelet transform (NDWT) and elastic net. The wavelets represent a signal with levels of detail, that is, hidden components are detected by means of the decomposition of this signal, where each level provides a different characteristic. The main feature of the elastic net is the grouping of correlated variables where the number of predictors is greater than the number of observations. The combination of these two methodologies applied in the clustering analysis of the Mycobacterium tuberculosis genome strains proved very effective, being able to identify clusters at each level of decomposition.Entities:
Year: 2018 PMID: 30508009 PMCID: PMC6415607 DOI: 10.1590/1678-4685-GMB-2018-0035
Source DB: PubMed Journal: Genet Mol Biol ISSN: 1415-4757 Impact factor: 1.771
Descriptions of the Mycobacterium tuberculosis strains.
| Sequences | Descriptions of the strains |
|---|---|
| Seq1_DS | Strain was isolated in Russia belonging to the AI family (according to RFLP genotyping) and it is sensitive to all common drugs used in the treatment of tuberculosis. |
| Seq2_DS | Susceptible strain representing the largest portion of tuberculosis isolates recovered during an epidemic in the Western Cape of South Africa. |
| Seq3_DS | Susceptible strain belonging to the Beijing family, sequenced for comparative genomic studies. |
| Seq4_DR | Resistant strain isolated in 2004, referring to a patient with secondary pulmonary tuberculosis, sequenced for comparative genomic studies. |
| Seq5_DR | Drug-resistant strain, having an accelerated rate of transmission between humans under agglomeration conditions. |
| Seq6_MDR | Strain from a single patient in KwaZulu-Natal, South Africa. |
| Seq7_XDR | Strain from a single patient in KwaZulu-Natal, South Africa. |
| Seq8_DS | Susceptible strain used for comparative genomic studies. |
| Seq9_DS | Susceptible strain derived from the original human lung H37, isolated in 1934. It has been widely used all over the world in biomedical research. Unlike some clinical isolates, it retains total virulence in animals with tuberculosis and is susceptible to drugs and receptive to genetic manipulation. |
| Seq10_DS | A virulent susceptible strain derived from its
virulent parent strain H37 (isolated from a 19-year-old male patient
with chronic pulmonary tuberculosis, named Edward R. Baldwin in
1905). This strain was obtained through an aging and dissociation
process of an |
Figure 1Geometry of the penalties. Source: Zou and Hastie (2005).
Description of the Mycobacterium tuberculosis genome.
| Sequence number | NCBI Access number | Resistance type | Total Rate of GC-content | Infraspecific name |
|---|---|---|---|---|
| Seq1 | CP002992.1 | DS | 0.6560 | CTRI-2 |
| Seq2 | CP000717.1 | DS | 0.6562 | F11 |
| Seq3 | CP001641.1 | DS | 0.6561 | CCDC5079 |
| Seq4 | CP001642.1 | DR | 0.6559 | CCDC5180 |
| Seq5 | CP001664.1 | DR | 0.6563 | str. Haarlem |
| Seq6 | CP001658.1 | MDR | 0.6561 | KZN 1435 |
| Seq7 | CP001976.1 | XDR | 0.6561 | KZN 605 |
| Seq8 | CP002884.1 | DS | 0.6561 | CCDC5079 |
| Seq9 | AL123456.3 | DS | 0.6561 | H37Rv |
| Seq10 | CP000611.1 | DS | 0.6561 | H37Ra |
Figure 2GC-content sequence sign (10,000 bp window) of MTB strains.
Figure 3Elastic net for: (a) signals of the GC-content sequences, (b) s5 coefficients.
Figure 4Elastic net for: (a) d1, (b) d2, (c) d3, (d) d4, and (e) d5 coefficients.
Formation of the groups at each level of decomposition.
| Levels | Groups | ||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| 1 | DS{1} | DS{2, 3} DR{4, 5} | DS{8, 9, 10}{6-MDR, 7-XDR} | ||
| 2 | DS{2} | DS{3, 10} | DR{4} | {6-MDR, 7-XDR} | DS{1, 8, 9} DR{5} |
| 3 | DS{2} | DS{3, 10} {6-MDR, 7-XDR} | DS{1, 8, 9} DR{4, 5} | ||
| 4 | {6-MDR, 7-XDR} | DS{1, 2, 3, 8, 9, 10} DR{4, 5} | |||
| 5 | {6-MDR, 7-XDR} | DS{1, 2, 3, 8, 9, 10} DR{4, 5} | |||