| Literature DB >> 17087822 |
Tim F Rayner1, Philippe Rocca-Serra, Paul T Spellman, Helen C Causton, Anna Farne, Ele Holloway, Rafael A Irizarry, Junmin Liu, Donald S Maier, Michael Miller, Kjell Petersen, John Quackenbush, Gavin Sherlock, Christian J Stoeckert, Joseph White, Patricia L Whetzel, Farrell Wymore, Helen Parkinson, Ugis Sarkans, Catherine A Ball, Alvis Brazma.
Abstract
BACKGROUND: Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support.Entities:
Mesh:
Year: 2006 PMID: 17087822 PMCID: PMC1687205 DOI: 10.1186/1471-2105-7-489
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1An example of an investigation design graph for a simple one-channel array experiment. Three samples are used: liver, kidney, and brain. Labeled RNA extracts from each sample are hybridized on an array (type HG_U95A). The RNA extraction and labeling are described in the Material processing protocol, P-XMPL-1. Raw data files (Data1.cel, Data2.cel and Data3.cel) are obtained, and then normalized and summarized as described in the Normalization protocol, P-XMPL-2, generating the file FGDM.txt.
A spreadsheet representation of the investigation design graph shown in Figure 1.
| liver 1 | Homo sapiens | liver | P-XMPL-1 | hyb 1 | HG_U95A | Data1.cel | P-XMPL-2 | FGDM.txt |
| kidney 1 | Homo sapiens | kidney | P-XMPL-1 | hyb 2 | HG_U95A | Data2.cel | P-XMPL-2 | FGDM.txt |
| brain 1 | Homo sapiens | brain | P-XMPL-1 | hyb 3 | HG_U95A | Data3.cel | P-XMPL-2 | FGDM.txt |
Each initial sample has a Sample ID (the first column in the spreadsheet) and Characteristics – Organism (genus and species) and OrganismPart (the second and third columns). The terms used to annotate the characteristics can be obtained from the MGED Ontology [26], another suitable source of controlled vocabulary terms, or provided as user defined terms. The fourth column gives a reference to a relevant protocol, while the fifth gives the IDs of the three hybridizations performed. The reference to the array design type (HG_U95A) is given as a hybridization property, which is followed by the data file names, a reference to the data normalization protocol and the normalized data file.
Figure 2An example investigation design graph. This graph depicts two samples hybridized on an array (design name SMD-10K) labeled by Cy3 and Cy5, generating the data file Data.txt.
Figure 3An investigation design graph representing a two-channel experiment with extract pooling and reference RNA. This investigation is similar to the example in the Introduction (Figure 1), except that it uses a two-channel array and an RNA reference. The extract pooling protocol has been omitted for clarity.
SDRF representation of the investigation design graph in Figure 3.
| liver 1 | Homo sapiens | liver | P-XMPL-1 | Extract 1 | P-XMPL-3 | Cy3 | Hyb 1 | SMD-10K | 1.txt | FGDM.txt |
| liver 2 | Homo sapiens | liver | P-XMPL-1 | Extract | P-XMPL-3 | Cy3 | Hyb 1 | SMD-10K | 1.txt | FGDM.txt |
| kidney 1 | Homo sapiens | kidney | P-XMPL-1 | Extract 3 | P-XMPL-3 | Cy3 | Hyb 2 | SMD-10K | 2.txt | FGDM.txt |
| kidney 2 | Homo sapiens | kidney | P-XMPL-1 | Extract 4 | P-XMPL-3 | Cy3 | Hyb 2 | SMD-10K | 2.txt | FGDM.txt |
| brain 1 | Homo sapiens | brain | P-XMPL-1 | Extract 5 | P-XMPL-3 | Cy3 | Hyb 3 | SMD-10K | 3.txt | FGDM.txt |
| brain 2 | Homo sapiens | brain | P-XMPL-1 | Extract 6 | P-XMPL-3 | Cy3 | Hyb 3 | SMD-10K | 3.txt | FGDM.txt |
| Extract reference | P-XMPL-3 | Cy5 | Hyb 1 | SMD-10K | 1.txt | FGDM.txt | ||||
| Extract reference | P-XMPL-3 | Cy5 | Hyb 2 | SMD-10K | 2.txt | FGDM.txt | ||||
| Extract reference | P-XMPL-3 | Cy5 | Hyb 3 | SMD-10K | 3.txt | FGDM.txt |
Each 'layer' in the graph is represented by an ID column in the spreadsheet, followed by columns for each of the labels. Each path in the graph is represented by one row in the spreadsheet.
Figure 5An example of a more complex experimental design (data objects not shown). This is a real-world example, corresponding to the experiment with accession number E-MIMR-12 in Array Express.
An example of an IDF.
| University of Heidelberg H sapiens TK6 | |||
| genetic_modification_design | time_series_design | ||
| GeneticModification | Time | ||
| Maier | Fleckenstein | Li | |
| Patrick | Katharina | Li | |
| +496213833773 | |||
| Theodor-Kutzer-Ufer 1–3 | |||
| Department of Radiation Oncology, University of Heidelberg | |||
| submitter; investigator | investigator | investigator | |
| biological_replicate | |||
| biological_replicate | |||
| 2005-02-28 | |||
| 2006-01-03 | |||
| 12345678 | |||
| Patrick Maier; Katharina Fleckenstein; Li Li; Stephanie Laufs; Jens Zeller; Stefan Fruehauf; Carsten Herskind; Frederik Wenz | |||
| submitted | |||
| Gene expression of TK6 cells transduced with an oncoretrovirus expressing MDR1 (TK6MDR1) was compared to untransduced TK6 cells and to TK6 cell transduced with an oncoretrovirus expressing the Neomycin resistance gene (TK6neo). Two biological replicates of each were generated and the expression profiles were determined using Affymetrix Human Genome U133 Plus2.0 GeneChip microarrays. Comparisons between the sample groups allow the identification of genes with expression dependent on the MDR1 overexpression. | |||
| GROWTHPRTCL10653 | EXTPRTCL10654 | TRANPRTCL10656 | |
| grow | nucleic_acid_extraction | bioassay_data_transformation | |
| TK6 cells were grown in suspension cultures in RPMI 1640 medium supplemented with 10% horse serum (Invitrogen, Karlsruhe, Germany). The cells were routinely maintained at 37 C and 5% CO2. | Approximately 10 cells were lysed in RLT buffer (Qiagen).Total RNA was extracted from the cell lysate using an RNeasy kit (Qiagen). | Mixed Model Normalization with SAS Micro Array Solutions (version 1.3). | |
| media | Extracted Product; Amplification | ||
| e-mexp-428_tab.txt | |||
| CTO | MO | nci_meta | |
| 1.3.0.1 |
The first column represents the qualifier name, while their values are given starting from column 2 (if the qualifier has two or more values, each is given in a separate column).
An example of an ADF document.
| 1 | 1 | 1 | 1 | R1 | ATGGTTGGTTACGTGT | experimental | PTEN | |
| 1 | 1 | 1 | 2 | R2 | CCGCGTTGCCCCGCC | experimental | PAX2 | |
| 1 | 1 | 1 | 3 | R3 | CGTAGCTGATCGATGA | experimental | WWOX | |
| 1 | 1 | 1 | 4 | R4 | GGTTGGCTGAGATCGT | experimental | MAPK8 | |
| 1 | 1 | 2 | 1 | R1 | ATGGTTGGTTACGTGT | experimental | PTEN | |
| 1 | 1 | 2 | R2 | CCGCGTTGCCCCGCC | experimental | PAX2 | ||
| 1 | 1 | 2 | 3 | R3 | CGTAGCTGATCGATGA | experimental | WWOX | |
| 1 | 1 | 2 | 4 | R4 | GGTTGGCTGAGATCGT | experimental | MAPK8 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4 | 6 | 20 | 20 | 462020 | TCCCTTCCGTTGTCCT | control | control_spike_calibration |
Note how the information about Reporter and CompositeElement is duplicated to indicate the fact that every synthetic sequence is spotted more than once on the array.
Figure 4Replicated design, dual channel with dye swap. Data objects are not shown as there is a simple one-to-one mapping between hybridizations and raw data files.
Replicated design, dual channel with dye swap. Data objects have been omitted for brevity.
| Source 1a | Sample 1a | Extract 1a | LabeledExtract 1a Cy3 | Cy3 | Hybridization 1 |
| Source 1b | Sample 1b | Extract 1b | LabeledExtract 1b Cy5 | Cy5 | Hybridization 1 |
| Source 1a | Sample 1a | Extract 1a | LabeledExtract 1a Cy5 | Cy5 | Hybridization 2 |
| Source 1b | Sample 1b | Extract 1b | LabeledExtract 1b Cy3 | Cy3 | Hybridization 2 |
| Source 2a | Sample 2a | Extract 2a | LabeledExtract 2a Cy3 | Cy3 | Hybridization 3 |
| Source 2b | Sample 2b | Extract 2b | LabeledExtract 2b Cy5 | Cy5 | Hybridization 3 |
| Source 2a | Sample 2a | Extract 2a | LabeledExtract 2a Cy5 | Cy5 | Hybridization 4 |
| Source 2b | Sample 2b | Extract 2b | LabeledExtract 2b Cy3 | Cy3 | Hybridization 4 |
Representation of the investigation design in Figure 5 as an SDRF.
| BS_TKAC_13 | BSM_TKAC_02m | BSM_TKAC_23p | BSM_TKAC_23p | biotin | HFB2002012102A |
| BS_TKAC_13 | BSM_TKAC_03m | BSM_TKAC_24p | BSM_TKAC_24p | biotin | HFB2002012103A |
| BS_TKAC_13 | BSM_TKAC_04m | BSM_TKAC_25p | BSM_TKAC_25p | biotin | HFB2002012104A |
| BS_TKAC_13 | BSM_TKAC_05m | BSM_TKAC_26p | BSM_TKAC_26p | biotin | HFB2002012105A |
| BS_TKAC_13 | BSM_TKAC_06m | BSM_TKAC_27p | BSM_TKAC_27p | biotin | HFB2002012106A |
| BS_TKAC_13 | BSM_TKAC_07m | BSM_TKAC_28p | BSM_TKAC_28p | biotin | HFB2002012107A |
| BS_TKAC_13 | BSM_TKAC_08m | BSM_TKAC_29p | BSM_TKAC_29p | biotin | HFB2002012108A |
| BS_TKAC_13 | BSM_TKAC_09m | BSM_TKAC_30p | BSM_TKAC_30p | biotin | HFB2002012109A |
| BS_TKAC_13 | BSM_TKAC-10m | BSM_TKAC_31p | BSM_TKAC_31p | biotin | HFB2002012110A |
| BS_TKAC_14 | BSM_TKAC_02n | BSM_TKAC_23p | BSM_TKAC_23p | biotin | HFB2002012102A |
| BS_TKAC_14 | BSM_TKAC_03n | BSM_TKAC_24p | BSM_TKAC_24p | biotin | HFB2002012103A |
| BS_TKAC_14 | BSM_TKAC_04n | BSM_TKAC_25p | BSM_TKAC_25p | biotin | HFB2002012104A |
| BS_TKAC_14 | BSM_TKAC_05n | BSM_TKAC_26p | BSM_TKAC_26p | biotin | HFB2002012105A |
| BS_TKAC_14 | BSM_TKAC_06n | BSM_TKAC_27p | BSM_TKAC_27p | biotin | HFB2002012106A |
| BS_TKAC_14 | BSM_TKAC_07n | BSM_TKAC_28p | BSM_TKAC_28p | biotin | HFB2002012107A |
| BS_TKAC_14 | BSM_TKAC_08n | BSM_TKAC_29p | BSM_TKAC_29p | biotin | HFB2002012108A |
| BS_TKAC_14 | BSM_TKAC_09n | BSM_TKAC_30p | BSM_TKAC_30p | biotin | HFB2002012109A |
| BS_TKAC_14 | BSM_TKAC-10n | BSM_TKAC_31p | BSM_TKAC_31p | biotin | HFB2002012110A |
| BS_TKAC_15 | BSM_TKAC_02o | BSM_TKAC_23p | BSM_TKAC_23p | biotin | HFB2002012102A |
| BS_TKAC_15 | BSM_TKAC_03o | BSM_TKAC_24p | BSM_TKAC_24p | biotin | HFB2002012103A |
| BS_TKAC_15 | BSM_TKAC_04o | BSM_TKAC_25p | BSM_TKAC_25p | biotin | HFB2002012104A |
| BS_TKAC_15 | BSM_TKAC_05o | BSM_TKAC_26p | BSM_TKAC_26p | biotin | HFB2002012105A |
| BS_TKAC_15 | BSM_TKAC_06o | BSM_TKAC_27p | BSM_TKAC_27p | biotin | HFB2002012106A |
| BS_TKAC_15 | BSM_TKAC_07o | BSM_TKAC_28p | BSM_TKAC_28p | biotin | HFB2002012107A |
| BS_TKAC_15 | BSM_TKAC_08o | BSM_TKAC_29p | BSM_TKAC_29p | biotin | HFB2002012108A |
| BS_TKAC_15 | BSM_TKAC_09o | BSM_TKAC_30p | BSM_TKAC_30p | biotin | HFB2002012109A |
| BS_TKAC_15 | BSM_TKAC_10o | BSM_TKAC_31p | BSM_TKAC_31p | biotin | HFB2002012110A |
| BS_TKAC_16 | BSM_TKAC_02q | BSM_TKAC_23p | BSM_TKAC_23p | biotin | HFB2002012102A |
| BS_TKAC_16 | BSM_TKAC_03q | BSM_TKAC_24p | BSM_TKAC_24p | biotin | HFB2002012103A |
| BS_TKAC_16 | BSM_TKAC_04q | BSM_TKAC_25p | BSM_TKAC_25p | biotin | HFB2002012104A |
| BS_TKAC_16 | BSM_TKAC_05q | BSM_TKAC_26p | BSM_TKAC_26p | biotin | HFB2002012105A |
| BS_TKAC_16 | BSM_TKAC_06q | BSM_TKAC_27p | BSM_TKAC_27p | biotin | HFB2002012106A |
| BS_TKAC_16 | BSM_TKAC_07q | BSM_TKAC_28p | BSM_TKAC_28p | biotin | HFB2002012107A |
| BS_TKAC_16 | BSM_TKAC_08q | BSM_TKAC_29p | BSM_TKAC_29p | biotin | HFB2002012108A |
| BS_TKAC_16 | BSM_TKAC_09q | BSM_TKAC_30p | BSM_TKAC_30p | biotin | HFB2002012109A |
| BS_TKAC_16 | BSM_TKAC_10q | BSM_TKAC_31p | BSM_TKAC_31p | biotin | HFB2002012110A |
The bold highlighting indicates the materials linked to a single hybridization for ease of viewing this example.
Figure 6Graph with four possible paths between nodes. While four paths are possible between the nodes in this graph [(a → c → d), (a → c → e), (b → c → d), and (b → c → e)], only two full paths, e.g., (a → c → d) and (b → c → e), are required to capture all of the existing relationships between the nodes.
SDRF representation of the DAG in Figure 6.
| a | c | d |
| b | c | e |
To describe the IDG in Figure 6, only two of the possible four paths need be shown; redundant edges in the graph may be omitted.
Representing SDRF from Table 2 by a set of two SDRF files: first spreadsheet.
| liver 1 | Homo sapiens | liver | P-XMPL-1 | Extract 1 | P-XMPL-3 | Cy3 | Hyb 1 |
| liver 2 | Homo sapiens | liver | P-XMPL-1 | Extract 2 | P-XMPL-3 | Cy3 | Hyb 1 |
| kidney 1 | Homo sapiens | kidney | P-XMPL-1 | Extract 3 | P-XMPL-3 | Cy3 | Hyb 2 |
| kidney 2 | Homo sapiens | kidney | P-XMPL-1 | Extract 4 | P-XMPL-3 | Cy3 | Hyb 2 |
| brain 1 | Homo sapiens | brain | P-XMPL-1 | Extract 5 | P-XMPL-3 | Cy3 | Hyb 3 |
| brain 2 | Homo sapiens | brain | P-XMPL-1 | Extract 6 | P-XMPL-3 | Cy3 | Hyb 3 |
| Extract reference | P-XMPL-3 | Cy5 | Hyb 1 | ||||
| Extract reference | P-XMPL-3 | Cy5 | Hyb 2 | ||||
| Extract reference | P-XMPL-3 | Cy5 | Hyb 3 |
Such splitting of an SDRF spreadsheet can be done on any ID column, which becomes the last column in the first spreadsheet (this table) and is repeated as the first column in the second spreadsheet (Table 9).
Representing SDRF from Table 2 by a set of two SDRF files: second spreadsheet.
| Hyb 1 | SMD-10K | 1.txt | FGDM.txt |
| Hyb 2 | SMD-10K | 2.txt | FGDM.txt |
| Hyb 3 | SMD-10K | 3.txt | FGDM.txt |
See the legend to Table 8 for discussion. Because each Hybridization ID only needs to be represented once, the second partial spreadsheet has three rows instead of nine (discounting the header row).
An example of a data matrix.
| Data1.cel | Data1.cel | Data2.cel | Data2.cel | Data3.cel | Data3.cel | |
| signal | p-value | signal | p-value | signal | p-value | |
| Gene 1 | x11 | p11 | x21 | p21 | x31 | p31 |
| Gene 2 | x12 | p12 | x22 | p22 | x32 | p32 |
| Gene 3 | x13 | p13 | x23 | p23 | x33 | p33 |
| ... | ... | ... | ... | ... | ... | ... |
| Gene n | x1n | p1n | x2n | p2n | x3n | p3n |
The first row gives references to objects in an SDRF file, for instance to ArrayData URIs in the SDRF in Table 1. The second row specifies the names of the quantitation types that are represented in each column. The first column gives the names of the biological objects these 'expression' measurements relate to, for instance the IDs of the reporters or composite elements in the ADF file or files describing the design of array(s) on which these measurements have been performed. Alternatively, this column may contain identifiers from public sequence databases, or chromosome coordinates from a specified genome build.
Experimental factor values example. The Characteristics categories used in column headings (i.e., the terms in square brackets) are taken from the MGED Ontology "BioMaterialCharacteristics" class [26]. The values contained in the body of these columns may be either free text, or terms from an ontology as indicated by an "OI" tag in the column heading (relating to the MAGEv2 concept "OntologyIndividual"). For example, the "OI:nci_meta" tag indicates that terms are taken from the NCI Metathesaurus [27]. The sources for these database tags ("nci_meta", "CTO") are defined in the IDF, as shown in Table 3. Biological replicates are indicated by shared experimental factor values ("Time" in this example; the columns containing experimental factors would be specified in the accompanying IDF). Most of the protocols have been omitted for brevity. Please see the detailed MAGE-TAB specification document [22] for more information.
| ARP1-0h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 0 | hours | H_ARP1-0h | A-AFFY-33 |
| ARP2-0h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 0 | hours | H_ARP2-0h | A-AFFY-33 |
| ARP3-0h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 0 | hours | H_ARP3-0h | A-AFFY-33 |
| ARP1-2h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 2 | hours | H_ARP1-2h | A-AFFY-33 |
| ARP2-2h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 2 | hours | H_ARP2-2h | A-AFFY-33 |
| ARP3-2h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 2 | hours | H_ARP3-2h | A-AFFY-33 |
| ARP1-4h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 4 | hours | H_ARP1-4h | A-AFFY-33 |
| ARP2-4h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 4 | hours | H_ARP2-4h | A-AFFY-33 |
| ARP3-4h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 4 | hours | H_ARP3-4h | A-AFFY-33 |
| ARP1-6h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 6 | hours | H_ARP1-6h | A-AFFY-33 |
| ARP2-6h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 6 | hours | H_ARP2-6h | A-AFFY-33 |
| ARP3-6h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 6 | hours | H_ARP3-6h | A-AFFY-33 |
| ARP1-8h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 8 | hours | H_ARP1-8h | A-AFFY-33 |
| ARP2-8h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 8 | hours | H_ARP2-8h | A-AFFY-33 |
| ARP3-8h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 8 | hours | H_ARP3-8h | A-AFFY-33 |
| ARP1-10h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 10 | hours | H_ARP1-10h | A-AFFY-33 |
| ARP2-10h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 10 | hours | H_ARP2-10h | A-AFFY-33 |
| ARP3-10h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 10 | hours | H_ARP3-10h | A-AFFY-33 |
| ARP1-12h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 12 | hours | H_ARP1-12h | A-AFFY-33 |
| ARP2-12h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 12 | hours | H_ARP2-12h | A-AFFY-33 |
| ARP3-12h | MOLT4 | T cell | acute lymphoblastic leukemia | Homo sapiens | P-XMPL-3 | 12 | hours | H_ARP3-12h | A-AFFY-33 |