| Literature DB >> 20619006 |
Damien Chaussabel1, Virginia Pascual, Jacques Banchereau.
Abstract
Blood is the pipeline of the immune system. Assessing changes in transcript abundance in blood on a genome-wide scale affords a comprehensive view of the status of the immune system in health and disease. This review summarizes the work that has used this approach to identify therapeutic targets and biomarker signatures in the field of autoimmunity and infectious disease. Recent technological and methodological advances that will carry the blood transcriptome research field forward are also discussed.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20619006 PMCID: PMC2895587 DOI: 10.1186/1741-7007-8-84
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Figure 1Blood is the pipeline of the immune system. Transcriptional profiling in the blood consists of measuring RNA abundance in circulating nucleated cells. Changes in transcript abundance can result from exposure to host or pathogen-derived immunogenic factors (for example, pathogen-derived molecular patterns activating specialized pattern recognition receptors expressed at the surface of leukocytes) and/or changes in relative cellular composition (for example, influx of immature neutrophils occurring in response to bacterial infection). The main blood leukocyte populations circulating in the blood are represented in this figure. Each cell type has a specialized function. Eosinophils, basophils and neutrophils are innate immune effectors playing a key role in defense against pathogens. T lymphocytes are the mediators of the adaptive cellular immune response. Antibody producing B lymphocytes (plasma cells) are key effectors of the humoral immune response. Monocytes, dendritic cells and B lymphocytes present antigens to T lymphocytes and play a central role in the development of the adaptive immune response. Blood leukocytes can be exposed in the circulation to factors released systemically from tissues where pathogenic processes take place. In addition, leukocytes will cross the endothelial barrier to reach local sites of inflammation. Dendritic cells exposed to inflammatory factors in tissues will be transported via the lymphatic system and reach lymph nodes via the afferent lymphatic vessels. These dendritic cells will encounter naïve T cells that are transported to the lymph node via high endothelial venules. 'Educated' T cells will then exit the lymph node via efferent lymph vessels that collect in the thoracic lymph duct, which in turn connects to the subclavian vein, at which point these T cells rejoin the blood circulation.
Figure 2The immune profiling armamentarium. The number of high-throughput molecular and cellular profiling tools that can be used to profile the human immune system is increasing rapidly. Proteomic assays are used to determine antibody specificity or measure changes in serum levels of cytokines or chemokines using multiplex assays. Cellular profiling assays are used to phenotype immune cells based on intracellular or extracellular markers using polychromatic flow cytometry. In vitro cellular assays can measure innate or antigen-specific responsiveness in cells exposed to immunogenic factors. Genomic approaches consist of measuring abundance of cellular RNA and also microRNAs that are present in cells or in the serum. Other genomic approaches consist of determining gene sequence and function (for example, genome-wide association studies, RNA interference screens, exome sequencing).
Figure 3RNA profiling technologies. Several technology platforms are available for measuring RNA abundance on large scales. Microarray technologies rely on dense arrays of oligonucleotide probes used to capture complementary sequences present in biological samples at various concentrations. Following extraction, RNA is used as a template and amplified in a labeling reaction. The labeled material captured by the microarray is imaged and relative abundance determined based on the strength of the signal produced by the fluorochromes that serve as reporters in this assay. The Nanostring technology measures RNA abundance at the single molecule level. RNA serves as starting material for this assay, which does not involve the use of enzymes for amplification or labeling. Capture and reporter probes form complexes in solution with RNA molecules. These complexes are captured on a solid surface and imaged. Molecule counts are generated based on the number of reporter probes detected on the image. The reporter consists of a string of seven fluorochromes, with four different colors available to fill each position. Up to 500 different transcripts can be detected in a single reaction on this platform. For RNA sequencing (RNA-seq) the starting RNA population must first be converted into a library of cDNA fragments. High throughput sequencing of such fragments yields short sequences or reads that are typically 30 to 400 bp in length. For a given sample tens of millions of such sequences will then be uniquely mapped against a reference genome. The density of coverage for a given gene determines its relative level of expression. Similarities and differences between these technology platforms should be noted. For instance, microarrays and Nanostring technologies rely on oligonucleotide probes to capture complementary target sequences. Nanostring and RNA-seq technologies measure abundance at the single molecule level, with results expressed as molecule counts and sequence coverage, respectively. Microarray and RNA-seq technologies require extensive sample processing, which include amplification steps. dsDNA, double-stranded DNA.
Figure 4Data management is key to progress. Extensive cellular and molecular profiling of human subjects generates vast amounts of disparate data. Effective data management and integration solutions are essential to the preservation of this information in an interpretable form. Thus, data management efforts occurring 'behind the scenes' have an essential role to play in realizing the full potential of high throughput profiling approaches in human subjects.
A data mining primer: basic steps used for analysing microarray data
| Here we provide basic analysis steps and important considerations for microarray data analysis: |
| - Per-chip normalization: This step controls for array-wide variations in intensity across multiple samples that form a given dataset. Arrays, as with all fluorescence based assays, are subject to signal variation for a variety of reasons, including the efficiency of the labeling and hybridization reactions and possibly other, less well defined variables, such as reagent quality and sample handling. To control for this, samples are normalized by first subtracting background and then employing a normalization algorithm to rescale the difference in overall intensity to a fixed intensity level for all samples across multiple arrays. |
| - Data filtering: Typically more than half of the oligonucleotide probes present on a microarray do not detect a signal for any of the samples in a given analysis. Thus, a detection filter is applied to exclude these transcripts from the original dataset. This step avoids the introduction of unnecessary noise in downstream analyses. |
| - Unsupervised analysis: The aim of this analysis is to group samples on the basis of their molecular profiles without |
| - Clustering: Clustering is commonly used for the discovery of expression patterns in large datasets. Hierarchical clustering is an iterative agglomerative clustering method that can be used to produce gene trees and condition trees. Condition tree clustering groups samples based on the similarity of their expression profiles across a specified gene list. Other commonly employed clustering algorithms include k-means clustering and self-organizing maps. |
| - Class comparison: Such analyses identify genes that are differentially expressed among study groups ('classes') and/or time points. The methods for analysis are chosen based on the study design. For studies with independent observations and two or more groups, |
| - Multiple testing correction: Multiple testing correction (MTC) methods provide a means to mitigate the level of noise in sets of transcripts identified by class comparison (in order to lower permissiveness of false positives). While it reduces noise, MTC promotes a higher false negative rate as a result of dampening the signal. The methods available are characterized by varying degrees of stringency, and therefore they produce gene lists with different levels of robustness. |
| • Bonferroni correction is the most stringent method used to control the familywise error rate (probability of making one or more type I errors) and can drastically reduce false positive rates. Conversely, it increases the probability of having false negatives. |
| • Benjamini and Hochberg false discovery rate [ |
| - Class prediction: Class prediction analyses assess the ability of gene expression data to correctly classify a study subject or sample. K-nearest neighbors is a commonly used technique for this task. Other available class prediction procedures include, but are not limited to, discriminant analysis, general linear model selection, logistic regression, distance scoring, partial least squares, partition trees, and radial basis machine. |
| - Sample size: The number of samples necessary for the identification of a robust signature is variable. Indeed, sample size requirements will depend on the amplitude of the difference between, and the variability within, study groups. |
| A number of approaches have been devised for the calculation of sample size for microarray experiments, but to date little consensus exists [ |
Figure 5Blood transcriptional fingerprints of patients with . Relative changes in transcript abundance in the blood of patients with S. aureus infection compared to that of healthy controls are recorded for a set of 28 transcriptional modules. Colored spots represent relative increase (red) or decrease (blue) in transcript abundance (P < 0.05, Mann Whitney) within a module. The legend shows functional interpretation for this set of modules. Fingerprints have been generated for two independent cohorts of subjects (divided into a training set used in the discovery phase, n = 30, and an independent test set used in the validation phase, n = 32).