| Literature DB >> 24222912 |
Alfredo Benso1, Paolo Cornale, Stefano Di Carlo, Gianfranco Politano, Alessandro Savino.
Abstract
Undirected gene coexpression networks obtained from experimental expression data coupled with efficient computational procedures are increasingly used to identify potentially relevant biological information (e.g., biomarkers) for a particular disease. However, coexpression networks built from experimental expression data are in general large highly connected networks with an elevated number of false-positive interactions (nodes and edges). In order to infer relevant information, the network must be properly filtered and its complexity reduced. Given the complexity and the multivariate nature of the information contained in the network, this requires the development and application of efficient feature selection algorithms to be able to exploit the topological characteristics of the network to identify relevant nodes and edges. This paper proposes an efficient multivariate filtering designed to analyze the topological properties of a coexpression network in order to identify potential relevant genes for a given disease. The algorithm has been tested on three datasets for three well known and studied diseases: acute myeloid leukemia, breast cancer, and diffuse large B-cell lymphoma. Results have been validated resorting to bibliographic data automatically mined using the ProteinQuest literature mining tool.Entities:
Mesh:
Year: 2013 PMID: 24222912 PMCID: PMC3814072 DOI: 10.1155/2013/676328
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Example of construction of a MWNET. Expression data in this example are not from real experiments. They are simply used to show the process required to construct a MWNET starting from raw expression data.
List of samples for the AML dataset. Samples are cDNA 45 K array technology.
| Sample number | GEO accession number | Experiment name |
|---|---|---|
| 1 | GSM6259 | AML 13 |
| 2 | GSM6266 | AML 28 |
| 3 | GSM6281 | AML 21 |
| 4 | GSM6284 | AML 112 |
| 5 | GSM6309 | AML 32 |
| 6 | GSM6317 | AML 20 |
| 7 | GSM6318 | AML 111 |
| 8 | GSM6319 | AML 18 |
| 9 | GSM6275 | AML 1 |
| 10 | GSM6285 | AML 25 |
| 11 | GSM6292 | AML 105 |
| 12 | GSM6311 | AML 24 |
| 13 | GSM6335 | AML 16 |
| 14 | GSM6337 | AML 114 |
List of samples for the BC data-set. Samples are cDNA 9 K array technology.
| Sample number | GEO accession number | Experiment name |
|---|---|---|
| 1 | GSM73756 | BC-16 versus NF (svi114) |
| 2 | GSM73784 | 808A versus NF (svi060) |
| 3 | GSM73706 | 107B versus NF (svi032) |
| 4 | GSM73726 | 110B versus NF (svi033) |
| 5 | GSM73727 | 111A versus NF (svi034) |
| 6 | GSM73732 | 111B versus NF (svi035) |
| 7 | GSM73734 | 114A versus NF (svi037) |
| 8 | GSM73736 | 115B versus NF (svi038) |
| 9 | GSM73783 | 710A versus NF (svi056) |
| 10 | GSM73764 | 118B versus NF (svi041) |
| 11 | GSM73786 | 123B versus NF (svi043) |
| 12 | GSM73704 | 206A versus NF (svi045) |
| 13 | GSM73708 | 214B versus NF (svi048) |
| 14 | GSM73709 | 305A versus NF (svi049) |
| 15 | GSM73738 | 308B versus NF (svi050) |
| 16 | GSM73776 | 402B versus NF (svi052) |
| 17 | GSM73777 | 406A versus NF (svi053) |
| 18 | GSM73779 | 708B versus NF (svi054) |
| 19 | GSM73697 | 805A versus NF (svi058) |
| 20 | GSM73699 | 807A versus NF (svi059) |
List of samples for the DLCL data-set. Samples are cDNA 9 K array technology.
| Sample number | GEO accession number | Experiment name |
|---|---|---|
| 1 | GSM2035 | DLCL-0047 |
| 2 | GSM2036 | DLCL-0042 |
| 3 | GSM1958 | DLCL-0040 |
| 4 | GSM1959 | DLCL-0036; OCT |
| 5 | GSM2037 | DLCL-0035 |
| 6 | GSM1994 | DLCL-0034 |
| 7 | GSM2038 | DLCL-0033 |
| 8 | GSM1995 | DLCL-0032 |
| 9 | GSM1996 | DLCL-0031 |
| 10 | GSM1997 | DLCL-0030 |
| 11 | GSM1998 | DLCL-0029 |
| 12 | GSM1960 | DLCL-0028 |
| 13 | GSM1999 | DLCL-0027 |
| 14 | GSM2039 | DLCL-0026 |
| 15 | GSM2040 | DLCL-0025 |
| 16 | GSM2000 | DLCL-0024 |
| 17 | GSM2001 | DLCL-0023 |
| 18 | GSM2041 | DLCL-0021 |
| 19 | GSM2043 | DLCL-0019 |
| 20 | GSM2044 | DLCL-0018 |
| 21 | GSM2045 | DLCL-0016 |
| 22 | GSM2047 | DLCL-0014 |
| 23 | GSM2048 | DLCL-0013 |
| 24 | GSM2049 | DLCL-0012 |
| 25 | GSM2050 | DLCL-0011 |
| 26 | GSM2051 | DLCL-0010 |
| 27 | GSM2052 | DLCL-0009 |
| 28 | GSM2053 | DLCL-0008 |
| 29 | GSM2055 | DLCL-0006 |
| 30 | GSM2056 | DLCL-0005 |
| 31 | GSM2058 | DLCL-0003 |
| 32 | GSM2059 | DLCL-0002 |
| 33 | GSM2060 | DLCL-0001 |
| 34 | GSM1965 | DLCL-0052 |
| 35 | GSM1967 | DLCL-0041 |
| 36 | GSM1968 | DLCL-0039 |
| 37 | GSM1969 | DLCL-0037 |
| 38 | GSM2072 | DLCL-0034 |
| 39 | GSM1972 | DLCL-0033 |
| 40 | GSM2073 | DLCL-0032 |
| 41 | GSM2074 | DLCL-0031 |
| 42 | GSM2016 | DLCL-0028 |
| 43 | GSM2077 | DLCL-0027 |
| 44 | GSM1974 | DLCL-0025 |
| 45 | GSM2078 | DLCL-0024 |
| 46 | GSM2079 | DLCL-0023 |
| 47 | GSM1976 | DLCL-0015 |
| 48 | GSM1977 | DLCL-0011 |
| 49 | GSM1978 | DLCL-0010 |
| 50 | GSM1979 | DLCL-0009 |
| 51 | GSM1982 | DLCL-0002 |
Aggregated filtering results for the three considered data-sets.
| Original number of genes | Filtered number of genes | Reduction ratio | |
|---|---|---|---|
| AML | 39,028 | 505 | 98.70% |
| BC | 7,531 | 662 | 91.20% |
| DLCL | 6,826 | 115 | 98.31% |
Figure 2AML preliminary bibliometric validation.
Figure 3BC preliminary bibliometric validation.
Figure 4DLCL preliminary bibliometric validation.
Algorithm 1ProteinQuest query example to obtain citation data for the AML dataset.
Citations of groups of filtered genes according to the disease.
| Group of genes | Disease | Total | ||
|---|---|---|---|---|
| AML | DLCL | BC | ||
| Number 1 AML filtered genes | 23,248 | 5,741 | 17,769 | 46,758 |
| (49.72), [63.68] | (12.28), [28.56] | (38.00), [17.56] | ||
|
| ||||
| Number 2 DLCL filtered genes | 2,470 | 10,347 | 6,180 | 18,997 |
| (13.00), [6.77] | (54.47), [51.47] | (32.53), [6.11] | ||
|
| ||||
| Number 3 BC filtered genes | 10,787 | 4,015 | 77,223 | 92,025 |
| (11.72), [29.55] | (4.36), [19.97] | (83.92), [76.33] | ||
|
| ||||
| Total | 36,505 | 20,103 | 101,172 | |
( ): percentage in rows; [ ]: percentage in columns.