A discussion of recent advances and limitations in functional annotation and network reconstruction based on gene expression microarray dataSystems approaches for understanding biological complexity and studying diseases rely on iterative and extensive characterization of genes, transcripts, proteins and their interactions, generation of hypotheses about how they functionally inter-relate within subsystems, conversion of these hypotheses into formal mathematical models and their experimental testing (Auffray ). In this context, modeling of gene regulatory networks from functional annotations is currently performed top-down, studying global network architecture and performance (Bray, 2003), and bottom-up, identifying modular subsystems from functional genomics data (Alon, 2003).Because transcription is the first step of gene expression subjected to extensive regulations by internal and external factors, systems approaches rely heavily on gene expression data. Microarray technology has developed steadily for three decades to allow measurements of expression levels for thousands of genes in different biological contexts, and a wealth of such data is now available in public repositories (Ball ). The expectation is that microarray analysis will help elucidating what the genes do, when, where and how they are expressed as elements of an orchestrated system under the effects of perturbations, and thus reveal the underlying transcriptional regulatory networks.Two recent papers report attempts to develop an optimized framework for functional annotation and reconstruction of regulatory networks using large-scale expression data sets combined with protein interaction and phenotypic data in yeast (Tanay ; Zhou ). These approaches are designed to identify genes with similar functions, but not necessarily coexpressed, and to extract essential features of regulatory networks through analysis of independent data sets. Zhou et al first identified a collection of coexpressed gene pairs (doublets) representing functional modules in individual data sets, based on expression correlation and functional annotation, most of which were functionally homogeneous. Then, using this first-order meta-information, they conducted a second-order expression analysis, assembling pairs of doublets (quadruplets) found tightly coregulated across multiple data sets into context-dependent regulatory modules. Similarly, Tanay et al used biclustering within a large microarray data compendium to identify relevant functional modules. These approaches generated functional predictions consistent with experimental studies, identified novel cross-doublet gene pairs missed in the standard analyses and allowed assignment of novel functions to a number of previously uncharacterized genes. These achievements represent significant improvements compared to previous studies, since only half of the globally coexpressed gene pairs identified by the standard methods are functionally homogeneous, and analysis of a compendium of yeast expression profiles yielded only a handful of functional assignments (Hughes ; Auffray ).In addition, Zhou et al assembled gene regulatory networks by using first-order expression correlation of target gene modules as an activity profile for the transcriptional factor regulating them, and second-order expression correlations between the activity profiles of transcriptional modules to measure cooperativity between transcription factors. Through integration with protein–protein and protein–DNA interaction data, functionally consistent transcription modules controlled by distinct transcription factors and displaying high second-order correlations were shown to participate in transcription cascades. Thus high-order clustering of transcriptional modules identifies potential interconnectivity between groups of genes participating in the same biological processes, and provides indirect assignment of transcription factors to these processes, overcoming their low expression levels. This represents another improvement over conventional approaches, which are limited by their inability to reconstruct the hidden organization of the regulatory signals (Wei ): high-order analyses go one step further to capture combinatorial coregulations for genes that do not exhibit identical expression patterns.Despite the significant progress that such data-driven network assembly methods represent, due to the underlying network complexity, it remains extremely difficult to reconstruct complete regulatory networks exclusively based on the information available from microarrays, even when combined with other types of data in higher order analyses (Wei ; Papin ). This is currently limiting our ability to understand the biological significance of the topological properties of the reconstructed networks, which have typically scale-free and small-world architectures (Grigorov, 2005). Combinatorial expansion in the number of potential network structures and comprehensive evaluation of their consistency are key challenges that the approach developed by Zhou et al does not entirely overcome, particularly since the number of possible alternate genetic regulatory networks highly depends on the size and type of the data sets and the maximum number of regulatory inputs per gene (Orrell ). Using a Bayesian modeling approach imposing severe constraints on network architecture, several groups have successfully overcome some of these limitations, inferring transcriptional regulatory modules through a high-order analysis of microarray data combined with genotyping and phenotypic data in recombinant inbred mice (Bystrykh ; Chesler ; Hubner ; Li ).However, a great deal of biological information is most likely contained in the absolute expression levels, including the large number of those of low magnitude that are subject to chaotic fluctuations and trigger the emergence of self-organization in complex biological systems (Auffray ). Such fluctuations are unlikely to be captured by high-order expression analysis when it only considers functional links that are simultaneously turned on or off over various conditions, and is limited by the current inability of high-throughput technologies to provide the accurate and consistent data required (Jarvinen ). Due to insufficient standardization in experiment description, including array element description and annotation, and irregularities in data integrity (Brazma ; Grunenfelder and Winzeler, 2002), microarrays represent an incompletely mature technology using a variety of platforms and analysis tools, which are often difficult to compare. Thus, poorly documented variations exist within any given microarray data set, especially when different generations of microarrays are considered together (Hwang ; Shi ). They are likely to influence significantly both first-order and high-order analyses, as shown by the influence of RNA integrity on expression level measurements (Imbeaud ). Such variations should therefore be documented using vigilant experimental and data processing pipelines rather than masked, as is currently done in most microarray studies.
Authors: T R Hughes; M J Marton; A R Jones; C J Roberts; R Stoughton; C D Armour; H A Bennett; E Coffey; H Dai; Y D He; M J Kidd; A M King; M R Meyer; D Slade; P Y Lum; S B Stepaniants; D D Shoemaker; D Gachotte; K Chakraburtty; J Simon; M Bard; S H Friend Journal: Cell Date: 2000-07-07 Impact factor: 41.582
Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron Journal: Nat Genet Date: 2001-12 Impact factor: 38.330
Authors: Charles Auffray; Sandrine Imbeaud; Magali Roux-Rouquié; Leroy Hood Journal: Philos Trans A Math Phys Eng Sci Date: 2003-06-15 Impact factor: 4.226
Authors: Catherine Ball; Alvis Brazma; Helen Causton; Steve Chervitz; Ron Edgar; Pascal Hingamp; John C Matese; Helen Parkinson; John Quackenbush; Martin Ringwald; Susanna-Assunta Sansone; Gavin Sherlock; Paul Spellman; Christian Stoeckert; Yoshio Tateno; Ronald Taylor; Joseph White; Neil Winegarden Journal: Environ Health Perspect Date: 2004-08 Impact factor: 9.031
Authors: Doblin Sandai; Zhikang Yin; Laura Selway; David Stead; Janet Walker; Michelle D Leach; Iryna Bohovych; Iuliana V Ene; Stavroula Kastora; Susan Budge; Carol A Munro; Frank C Odds; Neil A R Gow; Alistair J P Brown Journal: mBio Date: 2012-12-11 Impact factor: 7.867