| Literature DB >> 23209689 |
Andrew R Dalby1, Ibrahim Emam, Raimo Franke.
Abstract
Microarray data from cell lines of Non-Small Cell Lung Carcinoma (NSCLC) can be used to look for differences in gene expression between the cell lines derived from different tumour samples, and to investigate if these differences can be used to cluster the cell lines into distinct groups. Dividing the cell lines into classes can help to improve diagnosis and the development of screens for new drug candidates. The micro-array data is first subjected to quality control analysis and then subsequently normalised using three alternate methods to reduce the chances of differences being artefacts resulting from the normalisation process. The final clustering into sub-classes was carried out in a conservative manner such that sub-classes were consistent across all three normalisation methods. If there is structure in the cell line population it was expected that this would agree with histological classifications, but this was not found to be the case. To check the biological consistency of the sub-classes the set of most strongly differentially expressed genes was be identified for each pair of clusters to check if the genes that most strongly define sub-classes have biological functions consistent with NSCLC.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23209689 PMCID: PMC3507731 DOI: 10.1371/journal.pone.0050253
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Values for the Filtering of the Microarray Probeset Level Data.
| Target number or probes | Normalisation Method | Ratio (r) | Difference (d) | Actual Number of Probes | ||||
| 300 | rma | 6.2 | 64 | 290 | ||||
| gcrma | 27 | 64 | 282 | |||||
| farms | 2.5 | 64 | 282 | |||||
| 1000 | rma | 4.2 | 64 | 932 | ||||
| gcrma | 20 | 64 | 921 | |||||
| farms | 2.0 | 32 | 957 | |||||
| 1A | ||||||||
|
|
|
|
|
|
| |||
| 300 | rma | 25% above 9 | 1.08 | >9.3 | 308 | |||
| gcrma | 25% above 9 | 1.4 | >9.3 | 326 | ||||
| farms | 25% above 9 | 0.8 | >9.3 | 328 | ||||
| 1000 | rma | 25% above 9 | 0.65 | >9.3 | 1025 | |||
| gcrma | 25% above 9 | 0.85 | >9.3 | 1064 | ||||
| farms | 25% above 9 | 0.5 | >9.3 | 914 | ||||
| 1B | ||||||||
Table 1A are the parameters for the Golub filtering and Table 1B are the parameters for the median based fitting. Where r is the ratio between the highest and lowest level of expression for a particular probe across all the arrays and d is the difference between the maximum and minimum expression levels. IQR is the interquartile range and the lower threshold must be passed by at least 25% of the arrays.
Figure 1Boxplots of the log2 transformed data normalised using A) rma, B) gcrma, and C) farms.
Figure 2Cluster dendrogram from hierarchical agglomerative clustering of the gcrma normalised data, filtered with interquartile range filtering to give 282 probesets.
Figure 3Heatmap for the differentially expressed genes between clusters 1 and 2 for the rma normalised data filtered using the IQR method to give 1025 probes.
Figure 4A flowchart summarising the quality control and normalisation steps of the data analysis.
The pink boxes indicate when decisions are made to exclude arrays from the analysis because of quality control issues.
Cluster Assignments and the Consensus From Analysis of the Dendrograms Produced by Agglomerative Hierarchical Clustersing of the Normalised and Filtered Datasets.
| Filtering Method | Golub | Interquartile Range | |||||||||||
| Normalisation | Farms | RMA | GCRMA | Farms | RMA | GCRMA | |||||||
| No. of Probesets | L | S | L | S | L | S | L | S | L | S | L | S | |
| Array | Consensus | ||||||||||||
| GSM372745 | 0 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | |||||
| GSM372746 | 0 | 2 | 2 | 1 | 2 | 1 | 1 | ||||||
| GSM372747 | 0 | 2 | 1 | 3 | 3 | 3 | 3 | ||||||
| GSM372748 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 |
| GSM372749 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 |
| GSM372750 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ||||
| GSM372751 | 0 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | |||||
| GSM372752 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | |
| GSM372753 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 3 | ||||
| GSM372754 | 1 | 1 | 3 | 1 | 3 | 1 | 3 | 3 | 3 | ||||
| GSM372755 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||
| GSM372756 | 3 | 1 | 3 | 3 | 3 | 3 | 1 | 3 | 3 | ||||
| GSM372757 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | |
| GSM372758 | 0 | 2 | |||||||||||
| GSM372759 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |||
| GSM372760 | 0 | 1 | 3 | 1 | 1 | 1 | 3 | 3 | 3 | 3 | 3 | ||
| GSM372761 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 |
| GSM372762 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 |
| GSM372763 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| GSM372764 | 2 | 2 | 2 | 2 | 2 | 2 | |||||||
| GSM372765 | 3 | 1 | 1 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | |||
| GSM372766 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | ||
| GSM372767 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 1 | 1 | 1 | 1 | 1 | 1 |
| GSM372768 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | ||||
| GSM372769 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 3 | 3 | 3 |
| GSM372771 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | ||
| GSM372772 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ||
| GSM372773 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 |
| GSM372774 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||||
| GSM372775 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| GSM372777 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 |
| GSM372778 | 0 | 2 | 2 | 1 | 3 | 1 | 1 | 1 | 1 | ||||
| GSM372779 | 0 | 2 | 1 | 3 | 3 | 3 | 3 | ||||||
| GSM372780 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | |||||
| GSM372781 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | ||||
| GSM372782 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
| GSM372783 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | |
| GSM372784 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ||||||
| GSM372785 | 0 | 1 | 2 | ||||||||||
| GSM372786 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||
| GSM372787 | 0 | 2 | 1 | ||||||||||
| GSM372788 | 0 | 2 | 2 | 2 | 2 | 2 | |||||||
| GSM372789 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | |
| GSM372790 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | ||
| GSM372791 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 |
| GSM372792 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||||
| GSM372793 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | ||
| GSM372795 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | ||||||
| GSM372796 | 1 | 1 | 1 | 1 | 2 | 2 | |||||||
| GSM372798 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | ||||||
The two filtering methods are that according to Golub or using the Interquartile Range [21]. The normalisation methods are farms, rma or gcrma. The probeset sizes are approximately 1000 (L) or approximately 300 (S). The arrays are assigned to clusters 1,2,3 or 0 means there is no consensus and gaps indicate not cluster was assigned from that dendrogram.
The three identified clusters and their annotations.
| Array | Cell Line | Type | |
|
| GSM372754 | H1648 | Adenocarcinoma IIIA |
| GSM372755 | H1650 | Adenocarcinoma IIIB | |
| GSM372761 | H1975 | Adenocarcinoma | |
| GSM372763 | H2009 | Adenocarcinoma IV | |
| GSM372769 | H2347 | Adenocarcinoma I | |
| GSM372773 | H3122 | Adenocarcinoma IV | |
| GSM372775 | H3255 | Adenocarcinoma IIIB | |
| GSM372777 | H441 | Papillary Adenocarcinoma | |
| GSM372780 | H820 | Papillary Adenocarcinoma | |
| GSM372781 | HCC1171 | Adenocarcinoma I | |
| GSM372782 | HCC1195 | Adenosquamous carcinoma I | |
| GSM372786 | HCC193 | Adenocarcinoma | |
| GSM372789 | HCC2450 | Adenosquamous carcinoma | |
| GSM372790 | HCC2935 | Adenocarcinoma | |
| GSM372792 | HCC4006 | Adenocarcinoma | |
| GSM372796 | HCC78 | Adenocarcinoma | |
|
| GSM372748 | Calu6 | NSCLC |
| GSM372749 | H1299 | Large cell carcinoma | |
| GSM372753 | H157 | Squamous cell carcinoma | |
| GSM372759 | H1792 | Adenocarcinoma IV | |
| GSM372764 | H2052 | Mesothelioma IV | |
| GSM372768 | H23 | Adenocarcinoma | |
| GSM372771 | H2882 | Squamous cell carcinoma IV | |
| GSM372783 | HCC1359 | Spindle-giant cell carcinoma | |
| GSM372784 | HCC15 | Squamous cell carcinoma | |
| GSM372791 | HCC366 | Adenosquamous carcinoma | |
| GSM372793 | HCC44 | Adenocarcinoma | |
|
| GSM372752 | H1437 | Adenocarcinoma I |
| GSM372756 | H1666 | Adenocarcinoma III | |
| GSM372762 | H1993 | Adenocarcinoma IIIA | |
| GSM372765 | H2087 | Adenocarcinoma I | |
| GSM372766 | H2122 | Adenocarcinoma IV | |
| GSM372767 | H2126 | Adenocarcinoma | |
| GSM372795 | HCC515 | Adenocarcinoma |
Differentially Expressed Genes Between the Different Clusters.
| Differentially Expressed Between Cluster 1 and 2 | Function |
| SCNN1A | Sodium channel and ion regulation – signal transduction |
| SCEL | Sciellin – metal binding protein, epidermis development. |
| KRT19 | Keratin 19– cytoskeletal protein. |
| RAB25 | Member of the RAS oncogene family. |
| MAGE Family | Melanoma Antigen Family. |
|
|
|
| TFF1 | Trefoil Factor One |
| CPE | Carboxypeptidase E |
| FGG | Fibrinogen Gamma Chain - Coagulation |
| CPS1 | Carbamoyl-phosphate Synthase – amino acid metabolism |
|
|
|
| TFF1 | Trefoil Factor One |
| FGG | Fibrinogen Gamma Chain - Coagulation |
| AQP3 | Aquaporin 3– water reabsorption |
| CPE | Carboxypeptidase E |
| FGB | Fibrinogen Beta Chain - Coagulation |
| CPS1 | Carbamoyl-phosphate Synthase I – amino acid metabolism |
These genes are found to be differentially expressed in most of the normalisation methods and irrespective of the level of filtering.