| Literature DB >> 27295298 |
Ioannis Kavakiotis1, Aliki Xochelli2,3, Andreas Agathangelidis4, Grigorios Tsoumakas5, Nicos Maglaveras2,6, Kostas Stamatopoulos2,3, Anastasia Hadzidimitriou2,3, Ioannis Vlahavas5, Ioanna Chouvarda2,6.
Abstract
BACKGROUND: Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM.Entities:
Keywords: CLL; Chronic lymphocytic leukaemia; Data integration; Feature extraction; List aggregation; Mutation patterns, somatic hypermutation; SHM
Mesh:
Substances:
Year: 2016 PMID: 27295298 PMCID: PMC4905615 DOI: 10.1186/s12859-016-1044-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Reference dataset. Reference dataset is organized in a hierarchical manner of alleles-genes-subgroups-clans. Figure presents the sub-tree specific for IGHV4-34*01 and IGHV4-34*02 alleles
Fig. 2Integrated data sources
Fig. 3Data preprocessing step. This step leads from raw data to selected data for feature extraction
Fig. 4Towards analysis steps. This process leads from the dataset generated based on feature extraction process to the final aggregated results
Expected movement per clan (Whole gene area)
| CLL#4 | CLL#11 | CLL#16 | CLL#29 | CLL#201 | |
|---|---|---|---|---|---|
| Clan1 | 40.64 | 45.42 | 46.52 | 44.82 | 29.47 |
| Clan3 | 20.08 | 20.35 | 11.55 | 20.94 | 21.18 |
| Clan2 | 39.27 | 34.23 | 41.93 | 34.24 | 49.36 |
Expected movement per clan (Conserved gene area)
| CLL#4 | CLL#11 | CLL#16 | CLL#29 | CLL#201 | |
|---|---|---|---|---|---|
| Clan1 | 45.86 | 46.70 | 55.15 | 50.51 | 36.71 |
| Clan3 | 20.43 | 25.51 | 10.92 | 20.25 | 23.69 |
| Clan2 | 33.71 | 27.78 | 33.93 | 29.23 | 39.60 |
Fig. 5Expected movement per clan (whole gene area)
Fig. 6Expected movement per clan (conserved gene area)
Fig. 7Clustering of subsets based on the distances of movement per clan (whole gene area)
Fig. 8Clustering of subsets based on the distances of movement per clan (conserved gene area)
Fig. 9First ten toward germlines (TowGs) for every subset (whole gene area). The set containing the first ten TowGs for every subset. It is important to mention that this graph does not express a ranked list, but rather a union of the highly ranked genes across subsets (with potentially different ranking per subset). The red color indicates that the gene in this row can be found in the top ten of the corresponding subset in the column. For that reason, every column has exactly ten cells (whole gene area)
Fig. 10First ten toward germlines (TowGs) for every subset (conserved gene area). The set containing the first ten TowGs for every subset. It is important to mention that this graph does not express a ranked list, but rather a union of the highly ranked genes across subsets (with potentially different ranking per subset). The red color indicates that the gene in this row can be found in the top ten of the corresponding subset in the column. For that reason, every column has exactly ten cells. Blue cells denote difference from the whole gene analysis, i.e., genes that are not in top10 in conserved analysis (conserved area)
Expected movement per functionality (Whole gene area)
| CLL#4 | CLL#11 | CLL#16 | CLL#29 | CLL#201 | |
|---|---|---|---|---|---|
| ORF | 36.33 | 31.66 | 32.91 | 35.37 | 35.18 |
| P | 37.28 | 37.76 | 39.75 | 34.39 | 38.29 |
| F | 26.39 | 30.57 | 27.33 | 30.24 | 26.53 |
Fig. 11Expected movement per functionality (whole gene area)