| Literature DB >> 35283833 |
Amparo Martínez-Pérez1,2, Olivia Estévez1,2, África González-Fernández1,2.
Abstract
While Tuberculosis (TB) infection remains a serious challenge worldwide, big data and "omic" approaches have greatly contributed to the understanding of the disease. Transcriptomics have been used to tackle a wide variety of queries including diagnosis, treatment evolution, latency and reactivation, novel target discovery, vaccine response or biomarkers of protection. Although a powerful tool, the elevated cost and difficulties in data interpretation may hinder transcriptomics complete potential. Technology evolution and collaborative efforts among multidisciplinary groups might be key in its exploitation. Here, we discuss the main fields explored in TB using transcriptomics, and identify the challenges that need to be addressed for a real implementation in TB diagnosis, prevention and therapy.Entities:
Keywords: RNA-sequencing; drug resistance; immune response; microarray; mycobacteria; transcriptomics; tuberculosis
Year: 2022 PMID: 35283833 PMCID: PMC8908424 DOI: 10.3389/fmicb.2022.835620
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1Timeline of main milestones in technologies for gene expression. Polymerase Chain Reaction (PCR) was invented in 1984 by Mullis (1990). A decade later, Real-Time qPCR (RT-qPCR) technology enabled the detection of PCR products in real-time (Higuchi et al., 1993). This allowed, for the first time, quantify mRNA expression of selected genes. Simultaneous analysis of multiple genes was possible due to microarray, Expressed Sequence Tag (EST) (Okubo et al., 1992), Serial Analysis of Gene Expression (SAGE) (Velculescu et al., 1995), Massively Parallel Signature Sequencing (MPSS) (Brenner et al., 2000), Cap Analysis of Gene Expression (CAGE) (Shiraki et al., 2003) and Reverse-transcriptase multiplex ligation-dependent probe amplification (RT-MLPA) (Eldering et al., 2003) technologies. Although the DNA hybridization method was described earlier, the microarray technology is considered to be firstly commercialized by Affymetrix in 1994 (Lenoir and Giannella, 2006). It was promptly applied to measure gene expression (Schena et al., 1995). It continues to be one of the most popular methods for gene expression, allowing the analysis of hundreds or thousands of genes. Development of Next Generation Sequencing (NGS) in the onset of 2000s supposed a revolution for both genomics and transcriptomics (Mardis, 2011). The first publication using NGS RNA-Seq technology was in 2006 (Bainbridge et al., 2006), but it was not until 2008 when the term RNA-seq started to be used (Lister et al., 2008; Mortazavi et al., 2008). RNA-Seq was applied to single-cell technology in 2009 for the first time (Tang et al., 2009). Emerging applications, as spatial transcriptomics (Ståhl et al., 2016), hold the promise of new advances in transcriptomics research.
Advantages and disadvantages of microarrays and RNA-sequencing for gene expression analysis.
| Microarray | RNA-seq | |
| Advantages | • Standardized analysis: Easier to analyze. | • Broader dynamic range (< 105) and sensitivity ( |
| Disadvantages | • Lower dynamic range (103–104). | • Large size of files: Demands considerable amount of computer resources for storage and analysis. |
Public repositories for transcriptomic data in health sciences.
| Repository | Host institution | File storage | Data type |
| Gene Expression Omnibus (GEO) | National Center for Biotechnology Information (NCBI) | NCBI Sequence Read Archive (SRA) | Functional genomics data generated from microarray or NGS platforms |
| BioStudies (former ArrayExpress) | European Bioinformatics Institute (EMBL-EBI) | European Nucleotide Archive (ENA) | Functional genomics data generated from microarray or NGS platforms |
| DDBJ Sequence Read Archive (DRA) | DNA Data Bank of Japan (DDBJ) | DDBJ Sequence Read Archive (DRA) | Functional genomics data generated from NGS platforms. |
| Genomic Data Commons (GDC) | National Cancer Institute (NCI) | Genomic Data Commons (GDC) | Functional genomics data generated from NGS platforms in cancer. |
| Genome Sequence Archive (GSA) | National Genomics Data Center (NGDC), China National Center for Bioinformation (CNCB) | Genome Sequence Archive (GSA) | Raw sequence reads from diverse sequencing platforms |
NGS, Next-Generation Sequencing.
FIGURE 2Workflow in RNA-Seq analysis. Transcriptome can be sequenced either from messenger RNA (mRNA) fraction, or total RNA, which includes also ribosomal RNA and transfer RNA. (1) RNA-sequencing generate a large amount of data from the millions of sequenced fragments (reads), and converts the information into a FASTQ file. (2) Pre-processing steps are commonly performed including quality check, trimming, filtering or error correction. (3) If an annotated genome is available, the sequenced reads are mapped onto the reference genome to identify each transcript and the correspondent gene. In this case, it is recommended to use splice-aware aligners, that align reads across splice junctions. However, if a reference genome is not available, then the reads will be assembled de novo by their overlapping regions to form contigs. (4) Next, quantification determines the number of raw reads that map to each transcript or gene and commonly normalized them to be compared between samples. The most commonly used normalizations are the “Reads Per Kilobase Million” (RPKM) or its alternative “Fragments Per Kilobase Million” (FPKM) and the “Transcripts per Kilobase” (TPM). (5) Then, differential expression (DE) analysis allows the identification of those genes whose expression change under particular circumstances indicates the gene expression profile associated to a certain condition through different statistical methods. (6) The result of a differential expression analysis is a list of DE genes that can sometimes contain hundreds or even thousands of genes. A downstream analysis is usually needed to interpret the results, as Gene Set Analysis (GSA) or Gene Set Enrichment Analysis (GSEA). Besides, there are many other options for the analysis of RNA-seq data, as the identification of Single Nucleotide Polymorphisms, or nucleotide insertions and deletions.
Summary of high-throughput transcriptomic applications in TB research.
| Field in TB research | Applications | References |
| Mechanisms of bacterial infection | • Characterize Mtb growing | |
| • Characterize mutant isogenic strains and clinical isolates. | ||
| • Characterize Mtb infecting cells | ||
| • Gene expression changes during | ||
| • Biofilm production. |
| |
| Mechanisms of latency | • Model of latency | |
| Mechanisms of host response | • Characterize the host immune response to infection, analyzing blood or lung tissue. | |
| • Understand the bases of early clearance. |
| |
| • Function of host non-coding regulatory RNA. | ||
| • Dual transcriptomic analysis to comprehend host-bacteria interaction. | ||
| Diagnosis | • Identify blood biomarkers that differentiate active, latent TB patients or healthy individuals. | |
| • Biomarkers that differentiate TB from other infectious diseases. | ||
| • Biomarkers for extrapulmonary TB. |
| |
| • Biomarkers for HIV-TB coinfection. | ||
| Treatment evolution | • Identify biomarkers of success/failure to anti-TB treatment. | |
| Progression to TBI | • Finding biomarkers that predict progression to active TBI. | |
| • Characterize patients at risk of recurrent TB. |
| |
| Drug resistance and search for novel drugs | • Understand mechanisms underlying Mtb single and multi-drug resistance | |
| • Identify bacterial candidates for drug targeting. | ||
| • Unravel the mechanism of action of novel compounds. | ||
| Vaccines and correlates of protection | • BCG vaccination effect | |
| • Characterization of diverse BCG strains and effect in vaccination. | ||
| • Profile immune response generated by novel TB vaccine candidates. | ||
| • Search for correlates of protection for new vaccines design or therapies. |
BCG, Bacillus Calmette-Guérin; HIV, Human Immunodeficiency Virus; Mtb, Mycobacterium tuberculosis; MTBC, Mycobacterium Tuberculosis Complex; TB, Tuberculosis.
FIGURE 3Variables when designing high-throughput transcriptomic studies. Diverse parameters must be taken into account when designing new TB studies involving HTTr. Some of them include: (1) the infection model: in vitro (cell infection) or in vivo (animals or TB patients) (2) type of bacteria (i.e., strain, mutated or wild-type) and culture conditions; (3) samples to be collected (saliva, sputum, blood, etc.), including analysis of immune cells or subpopulations, at basal level or after stimulation; (4) the stage of the disease (i.e., active disease, latency, early clearance); (5) target element: bacteria, host or both, and (6) the HTTr platform most adequate for the study.