| Literature DB >> 27092161 |
Elise A R Serin1, Harm Nijveen2, Henk W M Hilhorst1, Wilco Ligterink1.
Abstract
Plants are fascinating and complex organisms. A comprehensive understanding of the organization, function and evolution of plant genes is essential to disentangle important biological processes and to advance crop engineering and breeding strategies. The ultimate aim in deciphering complex biological processes is the discovery of causal genes and regulatory mechanisms controlling these processes. The recent surge of omics data has opened the door to a system-wide understanding of the flow of biological information underlying complex traits. However, dealing with the corresponding large data sets represents a challenging endeavor that calls for the development of powerful bioinformatics methods. A popular approach is the construction and analysis of gene networks. Such networks are often used for genome-wide representation of the complex functional organization of biological systems. Network based on similarity in gene expression are called (gene) co-expression networks. One of the major application of gene co-expression networks is the functional annotation of unknown genes. Constructing co-expression networks is generally straightforward. In contrast, the resulting network of connected genes can become very complex, which limits its biological interpretation. Several strategies can be employed to enhance the interpretation of the networks. A strategy in coherence with the biological question addressed needs to be established to infer reliable networks. Additional benefits can be gained from network-based strategies using prior knowledge and data integration to further enhance the elucidation of gene regulatory relationships. As a result, biological networks provide many more applications beyond the simple visualization of co-expressed genes. In this study we review the different approaches for co-expression network inference in plants. We analyse integrative genomics strategies used in recent studies that successfully identified candidate genes taking advantage of gene co-expression networks. Additionally, we discuss promising bioinformatics approaches that predict networks for specific purposes.Entities:
Keywords: co-expression; gene expression; gene networks; gene prioritization; transcriptomics
Year: 2016 PMID: 27092161 PMCID: PMC4825623 DOI: 10.3389/fpls.2016.00444
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Co-expression network inference pipeline. The biological question addressed drives the strategy for the co-expression network analysis: prior knowledge can be used to identify guide-genes and co-expression databases can be queried to investigate gene co-expression patterns across multiple conditions. Similarity in gene expression patterns is calculated using correlation coefficients (Pearson, Spearman…). A user defined threshold (in this example set at 0.8) enables the selection of genes with high co-expression scores. Significantly co-expressed genes are reported in the binary adjacency matrix as 1. A clustering algorithm is applied on the adjacency matrix to infer networks of significantly co-expressed genes. In the resulting network, significantly co-expressed genes are depicted as numbered nodes (vertices) linked by edges (links). The length of the edges is relative to the expression similarity of the connected genes, with a short edge corresponding to a high co-expression value. A “path” corresponds to the number of edges connecting two nodes (the shortest path from node 9 to 4 is 4 edges). Hubs are identified as highly connected nodes (node 1) and group of connected genes form modules (nodes 1–7). Network properties can be described by different parameters such as: •The connectivity of a network corresponds to the total number of links in the network. •The node degree corresponds to the number of connections of a node with other nodes in the network (node 4 has a node degree of 3). •The betweenness of a node corresponds to the sum of the shortest paths connecting all pair of nodes in the network, passing through that specific node. The betweenness of node 8 corresponds to the sum of the shortest path the connecting node 10–9, 3–9, 4–9 etc…).
Overview of available resources for co-expression network analysis.
| Data availability and data selection for co-expression network analysis | |||||
| BAR—eFP browser | Interactive visualization of gene expression | Arabidopsis | Winter et al., | ||
| GEO | Public functional genomics data repository | Several species | Edgar et al., | ||
| Genevestigator | Database for curated gene expression data | Several species | Hruz et al., | ||
| Phytozome | Comparative platform for plant genomics | Several species | Goodstein et al., | ||
| ArrayExpress | Database for large functional genomics | Several species | Brazma, | ||
| ATTED-II | Gene co-expression database | Several species | Obayashi et al., | ||
| Cressexpress | Co-expression analysis for Arabidopsis | Arabidopsis | Srinivasasainagendra et al., | ||
| GeneMANIA | Interactive network displaying various functional associations | Arabidopsis | Warde-Farley et al., | ||
| AraNet | Probabilistic functional gene network of Arabidopsis | Arabidopsis | Lee et al., | ||
| CORNET | Co-expression analysis on predefined or user defined experiments | Arabidopsis | De Bodt et al., | ||
| PLANEX | Plant gene co-expression database | Several species | Yim et al., | ||
| Oryza Express | Gene expression database for Rice | Rice | Hamada et al., | ||
| RiceFriend | Gene expression database for Rice | Rice | Sato et al., | ||
| Cytoscape | Visualization and analysis of co-expression networks | Shannon et al., | |||
| GraphViz | Visualization and analysis of co-expression networks | Gansner and North, | |||
| Gene prioritization | |||||
| Blast2GO | Identify and visualize enriched GO terms in ranked lists of genes | Conesa et al., | |||
| biNGO | Maere et al., | ||||
| KEGG (pathways) | Collection of manually drawn pathways | Several species | Kanehisa and Goto, | ||
| BioCyc | Pathway and genome database | Several species | Caspi et al., | ||
| Mapman | Display large data sets on diagram of metabolic maps | Several species | Thimm et al., | ||
| plantTFDB | Plant transcription factor database | Several species | Jin et al., | ||
| PLACE | Database of motifs found in cis-acting regulatory elements | Arabidopsis | Higo et al., | ||
| AGRIS and AtregNet | Information resource of Arabidopsis promoter sequences, Transcription factor and targets | Arabidopsis | Palaniswamy et al., | ||
| PubTator | Web-based tool for accelerating manual literature curation | Wei et al., | |||
| EVEX | Large scale text mining resource | Hakala et al., | |||
| TAIR | The Arabidopsis Information Resource for mutant phenotype information | Arabidopsis | Lamesch et al., | ||
| Comparative co-expression network analysis | ComplEX | Explore and compare sub-networks of three species | Arabidopsis, poplar and rice | Netotea et al., | |
| CoExpNetViz | Comparative co-expression analysis for bait genes | Several species | Tzfadia et al., | ||
| PLAZA | Database to explore gene families and genomic homology | Several species | Proost et al., | ||
Examples of strategies used for co-expression network analysis in regard to the respective biological question addressed.
| Data availability for co-expression network analysis | Identify functional modules associated to germination and dormancy | Arabidopsis | Use of a condition dependant approach | Bassel et al., |
| Build a comprehensive and functional co-expression network | Arabidopsis, rice | Integration of multiple sources of data in the network construction to support functional gene linkage | Lee et al., | |
| Gene functional annotation | Rice | Comparison of condition dependant and condition independent network based approach. | Childs et al., | |
| Maximize the capture of gene co-expression relationship | Arabidopsis | Pre-clustering of input expression samples to approximate condition dependant approach | Feltus et al., | |
| Gene prioritization | Explore the modular biological organization | Arabidopsis | Arabidopsis gene co-expression network based on 1000 microarrays. Modules were extracted using the Markov Clustering Algorithm (MCL) | Mao et al., |
| Infer gene regulatory relationships in gene co-expression modules | Arabidopsis | Identify gene expression modules driven by known | Ma et al., | |
| Gene functional annotation | Arabidopsis | Module enrichment for known | Vandepoele et al., | |
| Identify co-expression modules | Arabidopsis | Development of an Heuristic clustering algorithm | Mutwil et al., | |
| eQTL based co-expression networks | Identify causal genes responsible for glucosinolate variation | Arabidopsis | Use co-expression network as non-genetic (independent) filter to prioritize GWA mapping candidates | Chan et al., |
| Identify candidates for shade avoidance | Arabidopsis | Prioritize genes underlying phenotypic QTL using co-expression network analysis, eQTL information and functional classification | Jimenez-Gomez et al., | |
| Examine natural variation in circadian clock function | Arabidopsis | eQTL mapping using | Kerwin et al., | |
| Examine transcriptional network response to biotic interactions | Arabidopsis | Perform a network eQTL analysis from | Kliebenstein et al., | |
| Identify novel abiotic stress genes | Arabidopsis | Network guided genetic screen: gene ranking combined to co-expression network analysis | Ransbotyn et al., | |
| Temporal resolution for co-expression network | Resolve the chronological regulatory mechanisms involved in the response to pathogen infection | Arabidopsis | Temporal clustering by combining extensive time series data and co-expression network analysis | Windram et al., |
| Identify key genes regulating the acquisition of longevity during seed maturation | Medicago Arabidopsis | Developmental time course data and cross species comparison for co-expression network analysis | Righetti et al., | |
| Spatial resolution for dynamic co-expression network | Identify cell-specific molecular mechanisms | Maize | Combine Laser-capture microscopy with RNA-seq | Zhan et al., |
| Comparative co-expression network analysis | Knowledge transfer between species | Maize rice | Global co-expression network alignment using both gene homology and network topology | Ficklin and Feltus, |
| Identify conserved modules across species | Several species | Co-expressed node vicinity networks (NVNS) compared across species. | Mutwil et al., |
Figure 2Schematic representation of gene prioritization strategies. Gene sets of different expression values (shades of green) are used for co-expression network inference. Genes with co-expression values above a user defined threshold (dark green nodes) form nodes and edges in the network. Various additional data can then be used to enrich and extract biological relevant information from the network. Enrichment analysis tools such as gene ontology terms (pink contour nodes) can be used to functionally annotate unknown genes (question marked node) clustered in the vicinity. Prior knowledge can also help to highlight known gene-gene interactions (dotted line) and cis-regulatory motif (purple flags) can suggest local regulatory interactions (arrows) between transcription factors (TF node) and their target genes (flagged nodes). Gene regulatory relationships can also be extracted from time series data. Algorithms can extract causal regulatory relationships from shifted gene expression patterns in time series data. Co-localization of trans- and cis-eQTLs (hotspots) can also infer regulatory relationships between genes with a cis-eQTL (orange contour node) and genes with trans-eQTLs (blue contour node). Additional information can be gained from comparisons with networks of other species (yellow nodes) by orthology and network alignment (dotted lines).