Literature DB >> 16033654

Inconsistencies over time in 5% of NetAffx probe-to-gene annotations.

Carolina Perez-Iratxeta, Miguel A Andrade.   

Abstract

BACKGROUND: DNA microarray probes are designed to match particular mRNA transcripts, often based on expressed sequences like ESTs, or cDNAs, many times incomplete. As a result, the relations between probes and genes can change as the sequence data are updated. However, it is frequent that the reported results of microarray analyses are given just as lists of genes without any reference to the underlying probes.
RESULTS: We show for a particular commercial microarray design that the number of probes associated to some genes change with time. These changes concern approximately 5% of the probe sets across the history of annotation releases over a two year span.
CONCLUSION: We recommend to report probe set identifiers when publishing microarray results, and to submit those analyses to microarray public databases to ensure that the interpretation of the data is updated with the latest set of annotations.

Entities:  

Mesh:

Year:  2005        PMID: 16033654      PMCID: PMC1188054          DOI: 10.1186/1471-2105-6-183

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

During a large scale analysis of data derived using the Affymetrix MOE430 murine DNA microarray [1], we detected striking differences in the resulting set of expressed genes depending on whether we were using one or another release of the microarray probe annotations as distributed by Affymetrix (NetAffx [2]). This is due to probes that point to different genes in different versions of the NetAffx data (see for example, the assignment for probe set 1433436_s_at in Table 1). Considering that gene names are broadly used by researchers when reporting microarray analysis results and in order to assess the magnitude of these changes, we measured their kind and extent through the history of all annotation files using as example the Affymetrix MOE430A/B chips.
Table 1

Example of a pair of probe sets inconsistently annotated.

12345678
probe set id17-Mar-039-Apr-0325-Jun-039-Oct-0311-Dec-039-Apr-0419-May-0423-Jun-04
1433436_s_atThtpaThtpaThtpaAp1g2Ap1g2ThtpaThtpaThtpa
1419113_atAp1g2Ap1g2Ap1g2Ap1g2Ap1g2Ap1g2Ap1g2Ap1g2

An example of a pair of probe sets inconsistently annotated across the history of the eight NetAffx annotation files. Probe sets 1433436_s_at and 1419113_at were both assigned to gene Ap1g2, the gamma subunit of adaptor protein complex AP-1 in versions 4 and 5. This is a Golgi apparatus gene involved in protein transport. Thtpa is a hydrolase enzyme, the thiamine triphosphatase. In our experimental data, 1433436_s_at was detected as present and 1419113_at as absent. NetAffx releases of October 9th and December 11th 2003 would suggest that Ap1g2 was expressed, while any other release would give the opposite result.

Example of a pair of probe sets inconsistently annotated. An example of a pair of probe sets inconsistently annotated across the history of the eight NetAffx annotation files. Probe sets 1433436_s_at and 1419113_at were both assigned to gene Ap1g2, the gamma subunit of adaptor protein complex AP-1 in versions 4 and 5. This is a Golgi apparatus gene involved in protein transport. Thtpa is a hydrolase enzyme, the thiamine triphosphatase. In our experimental data, 1433436_s_at was detected as present and 1419113_at as absent. NetAffx releases of October 9th and December 11th 2003 would suggest that Ap1g2 was expressed, while any other release would give the opposite result. Affymetrix DNA microarrays include probes for the detection of target sequences that are mainly based on UniGene clusters [3]. UniGene is a database of gene-oriented clusters of GenBank sequences, where in addition to sequences of well-characterized genes, hundreds of thousands of novel expressed sequence tag sequences (ESTs) have been included. Affymetrix probe sets are annotated according to their related current records in UniGene and LocusLink, including genomic location, gene symbol, and function description, when available (NetAffx database, [2]). We obtained all 8 NetAffx releases for the MOE430A/B microarray, dated from 2003 March 17th until 2004 June 6th (kindly provided by Marco Raposo, Affymetrix). First, we observed that there was at least one gene name change for 13,699 of the approximately 45,000 probe sets included in the MOE430A/B chips. Many of these changes were simply probe sets initially without a gene name that were eventually associated to one. This reflects a general improvement in the functional annotation of the human genome. Other changes could be explained by the use of synonymous gene symbols. However, according to a table of synonymous gene symbols that we extracted from the LocusLink gene database [4], there was still a total of 2277 probe sets with gene name changes that could not be explained by the use of a synonym. This represents about the 5% of the total of probe sets in the chip. The underlying problem is exemplified in Table 1, where it can be seen that at least one probe must have been temporarily assigned to the wrong transcript. These inconsistencies can be detected when two probe sets attached to the same gene in one version of the annotations are attached to different gene names in another version. Table 2 indicates the number of inconsistencies by pairs of probe sets observed from one version of NetAffx annotations to the next, which amounts to thousands. This explains the variation in the biological interpretation of an Affymetrix microarray experiment depending on the version of the NetAffx annotations used.
Table 2

Number of split and joined probe set pairs between consecutive versions of NetAffx.

NetAffx versionsSplitsJoins
1 → 200
2 → 358624140
3 → 425473575
4 → 513801742
5 → 654798787
6 → 700
7 → 849044553

Splits represent the number of pairs of probe sets that point to the same gene name in one NetAffx release but to a different gene name from each other in the following release. Joins represent the number of pairs of probe sets that point to different gene names from each other in a release but to the same gene name in the following release. For this computation, all probes with no gene name were considered as associated to a different gene name. Dates of the NetAffx versions are given in Table 1.

Number of split and joined probe set pairs between consecutive versions of NetAffx. Splits represent the number of pairs of probe sets that point to the same gene name in one NetAffx release but to a different gene name from each other in the following release. Joins represent the number of pairs of probe sets that point to different gene names from each other in a release but to the same gene name in the following release. For this computation, all probes with no gene name were considered as associated to a different gene name. Dates of the NetAffx versions are given in Table 1. The design of DNA microarray probe sets is often based on assembled groups of expressed sequences observed as ESTs or cDNAs, and might represent partial transcripts. Additional evidence in the form of new sequences, or even new gene predictions, can modify the preliminary assignment (for example, by discovering that two ESTs that were considered to be representing different mRNA transcripts are actually part of the same one). Therefore, information assigned to a probe based on gene predictions (such as a gene name) can be considered non-static and might change over time. Although, one can expect annotations will improve over time due to more accurate genomic assemblies, the changes will still occur for a while since a large fraction of genes are still predicted. Probe sequences constitute the only static information attached to the microarray: this information is inherent to the design of the microarray and will not change over time. This was pointed out in the manuscript that describes the NetAffx annotation files [2] but currently there is no visible warning or reminder in the Affymetrix website. It happens that, although these are implicitly well known facts in the bioinformatics community, experimental users of microarrays are not so aware of the problem, probably because the surprisingly large extent of these changes has not been pointed out before. For example, the recent letter from the Microarray Gene Expression Data Society [5] explains that deposition of microarray data in public databases assures data persistence, integration, accessibility, and data standardization, but misses the problem of variable gene structure. There are recent publications that deal with the analysis of relations between Affymetrix probe sets and gene sequences [6-8], but they do no report the extent of the variation of these relations along time as we have done here. This latter fact, which could convince many microarray users to send their data to public databases, has not been well publicized. Deposition of microarray data in public databases is much more than just making the data public, but to making them really of use to the scientific community. Those databases include the descriptions of probe sequences and update constantly the non-static information associated to them, thus allowing the re-interpretation of the data and solving the problem we presented here.

Authors' contributions

CP and MA participated in the design of the study, the computations, and preparation of the manuscript. Both authors read and approved the final manuscript.
  8 in total

1.  NetAffx: Affymetrix probesets and annotations.

Authors:  Guoying Liu; Ann E Loraine; Ron Shigeta; Melissa Cline; Jill Cheng; Venu Valmeekam; Shaw Sun; David Kulp; Michael A Siani-Rose
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

2.  ProbeLynx: a tool for updating the association of microarray probes to genes.

Authors:  Fiona M Roche; Karsten Hokamp; Michael Acab; Lorne A Babiuk; Robert E W Hancock; Fiona S L Brinkman
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

3.  An open letter on microarray data from the MGED Society.

Authors:  Catherine Ball; Alvis Brazma; Helen Causton; Steve Chervitz; Ron Edgar; Pascal Hingamp; John C Matese; Carl Icahn; Helen Parkinson; John Quackenbush; Martin Ringwald; Susanna-Assunta Sansone; Gavin Sherlock; Paul Spellman; Christian Stoeckert; Yoshio Tateno; Ronald Taylor; Joseph White; Neil Winegarden
Journal:  Microbiology       Date:  2004-11       Impact factor: 2.777

4.  ADAPT: a database of affymetrix probesets and transcripts.

Authors:  Hui Sun Leong; Tim Yates; Claire Wilson; Crispin J Miller
Journal:  Bioinformatics       Date:  2005-03-03       Impact factor: 6.937

Review 5.  Study of stem cell function using microarray experiments.

Authors:  Carolina Perez-Iratxeta; Gareth Palidwor; Christopher J Porter; Neal A Sanche; Matthew R Huska; Brian P Suomela; Enrique M Muro; Paul M Krzyzanowski; Evan Hughes; Pearl A Campbell; Michael A Rudnicki; Miguel A Andrade
Journal:  FEBS Lett       Date:  2005-03-21       Impact factor: 4.124

6.  Database resources of the National Center for Biotechnology Information: update.

Authors:  David L Wheeler; Deanna M Church; Ron Edgar; Scott Federhen; Wolfgang Helmberg; Thomas L Madden; Joan U Pontius; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Tugba O Suzek; Tatiana A Tatusova; Lukas Wagner
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

7.  Database resources of the National Center for Biotechnology Information.

Authors:  David L Wheeler; Tanya Barrett; Dennis A Benson; Stephen H Bryant; Kathi Canese; Deanna M Church; Michael DiCuccio; Ron Edgar; Scott Federhen; Wolfgang Helmberg; David L Kenton; Oleg Khovayko; David J Lipman; Thomas L Madden; Donna R Maglott; James Ostell; Joan U Pontius; Kim D Pruitt; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Steven T Sherry; Karl Sirotkin; Grigory Starchenko; Tugba O Suzek; Roman Tatusov; Tatiana A Tatusova; Lukas Wagner; Eugene Yaschenko
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

8.  A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array.

Authors:  Jeremy Harbig; Robert Sprinkle; Steven A Enkemann
Journal:  Nucleic Acids Res       Date:  2005-02-18       Impact factor: 16.971

  8 in total
  19 in total

Review 1.  Molecular diagnostics in transplantation.

Authors:  Maarten Naesens; Minnie M Sarwal
Journal:  Nat Rev Nephrol       Date:  2010-08-24       Impact factor: 28.314

2.  AILUN: reannotating gene expression data automatically.

Authors:  Rong Chen; Li Li; Atul J Butte
Journal:  Nat Methods       Date:  2007-11       Impact factor: 28.547

3.  Cross-platform expression microarray performance in a mouse model of mitochondrial disease therapy.

Authors:  Zhe Zhang; David L Gasser; Eric F Rappaport; Marni J Falk
Journal:  Mol Genet Metab       Date:  2009-10-30       Impact factor: 4.797

4.  Compartmental localization and clinical relevance of MICA antibodies after renal transplantation.

Authors:  Li Li; Amery Chen; Abanti Chaudhuri; Neeraja Kambham; Tara Sigdel; Rong Chen; Minnie M Sarwal
Journal:  Transplantation       Date:  2010-02-15       Impact factor: 4.939

5.  Identification of novel stem cell markers using gap analysis of gene expression data.

Authors:  Paul M Krzyzanowski; Miguel A Andrade-Navarro
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

Review 6.  Ten years of pathway analysis: current approaches and outstanding challenges.

Authors:  Purvesh Khatri; Marina Sirota; Atul J Butte
Journal:  PLoS Comput Biol       Date:  2012-02-23       Impact factor: 4.475

Review 7.  In praise of arrays.

Authors:  Lihua Ying; Minnie Sarwal
Journal:  Pediatr Nephrol       Date:  2008-06-21       Impact factor: 3.714

8.  The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services.

Authors:  Martijn P van Iersel; Alexander R Pico; Thomas Kelder; Jianjiong Gao; Isaac Ho; Kristina Hanspers; Bruce R Conklin; Chris T Evelo
Journal:  BMC Bioinformatics       Date:  2010-01-04       Impact factor: 3.169

9.  Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis.

Authors:  Pieter Bt Neerincx; Pierrot Casel; Dennis Prickett; Haisheng Nie; Michael Watson; Christophe Klopp; Jack Am Leunissen; Martien Am Groenen
Journal:  BMC Proc       Date:  2009-07-16

10.  The Sterolgene v0 cDNA microarray: a systemic approach to studies of cholesterol homeostasis and drug metabolism.

Authors:  Tadeja Rezen; Peter Juvan; Klementina Fon Tacer; Drago Kuzman; Adrian Roth; Denis Pompon; Lawrence P Aggerbeck; Urs A Meyer; Damjana Rozman
Journal:  BMC Genomics       Date:  2008-02-11       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.