| Literature DB >> 30592451 |
Craig P Hersh, Ian M Adcock, Juan C Celedón, Michael H Cho, David C Christiani, Blanca E Himes, Naftali Kaminski, Rasika A Mathias, Deborah A Meyers, John Quackenbush, Susan Redline, Katrina A Steiling, Holly K Tabor, Martin D Tobin, Mark M Wurfel, Ivana V Yang, Gerard H Koppelman.
Abstract
High-throughput, "next-generation" sequencing methods are now being broadly applied across all fields of biomedical research, including respiratory disease, critical care, and sleep medicine. Although there are numerous review articles and best practice guidelines related to sequencing methods and data analysis, there are fewer resources summarizing issues related to study design and interpretation, especially as applied to common, complex, nonmalignant diseases. To address these gaps, a single-day workshop was held at the American Thoracic Society meeting in May 2017, led by the American Thoracic Society Section on Genetics and Genomics. The aim of this workshop was to review the design, analysis, interpretation, and functional follow-up of high-throughput sequencing studies in respiratory, critical care, and sleep medicine research. This workshop brought together experts in multiple fields, including genetic epidemiology, biobanking, bioinformatics, and research ethics, along with physician-scientists with expertise in a range of relevant diseases. The workshop focused on application of DNA and RNA sequencing research in common chronic diseases and did not cover sequencing studies in lung cancer, monogenic diseases (e.g., cystic fibrosis), or microbiome sequencing. Participants reviewed and discussed study design, data analysis and presentation, interpretation, functional follow-up, and reporting of results. This report summarizes the main conclusions of the workshop, specifically addressing the application of these methods in respiratory, critical care, and sleep medicine research. This workshop report may serve as a resource for our research community as well as for journal editors and reviewers of sequencing-based manuscript submissions in our research field.Entities:
Keywords: RNA sequencing; bioinformatics; functional genomics; genetic epidemiology; whole-genome sequencing
Mesh:
Year: 2019 PMID: 30592451 PMCID: PMC6812157 DOI: 10.1513/AnnalsATS.201810-716WS
Source DB: PubMed Journal: Ann Am Thorac Soc ISSN: 2325-6621
Examples of human next-generation sequencing studies in respiratory, critical care, and sleep medicine
| Technology | Disease/Trait | Study Design/Subjects | Main Findings | Validation | Reference |
|---|---|---|---|---|---|
| Whole-exome sequencing | Narcolepsy | 18 Families | 8 Missense variants in | Resequencing in 250 cases, 150 control subjects; | |
| Whole-exome sequencing | Bronchopulmonary dysplasia | 50 Twin pairs, including 51 BPD cases | 258 Genes with rare nonsynonymous mutations | Lung gene expression in published human data and rat BPD model, mouse phenotype database | |
| Whole-exome sequencing | Airflow obstruction | 100 Heavy smokers with normal lung function | Nonsynonymous SNP in | Association testing in two additional studies. Immunohistochemistry in bronchial epithelial cells. | |
| Whole-exome sequencing | Idiopathic pulmonary fibrosis | 79 Probands with familial pulmonary fibrosis, 2,816 control subjects | Mutations in | Mutations segregated in families. Shorter leukocyte telomeres in mutation carriers. | |
| Whole-genome sequencing | Pulmonary vascular disease | 864 PAH, 16 PVOD/ PCH, 7,134 control subjects | Phenotype association with younger age, reduced KCO, shorter survival | ||
| Whole-genome sequencing | Asthma | WGS in 8,453 Icelanders, imputation in >150 K | Rare variant in | Genotyping in 6,465 cases, >300 K control subjects; interleukin-33 gene expression; | |
| RNA sequencing | Smoking | Blood samples from 229 current, 286 former smokers | 171 DE genes, including 7 lncRNAs, 8 genes with differential exon use | Published microarray study | |
| RNA sequencing | COPD | Lung tissue from 98 cases, 91 control subjects | 2,312 DE genes | qPCR for seven genes | |
| Single-cell RNA sequencing | IPF | FACS-sorted lung epithelial cells from 6 IPF, 3 control subjects | 4 Cell clusters: AT2, basal, goblet, and indeterminate | Immunofluorescence confocal microscopy for epithelial cell markers | |
| miRNA sequencing | Sepsis | Plasma from 29 sepsis, 44 noninfective SIRS, 16 control subjects | 6 miRNAs distinguish sepsis from SIRS | qPCR, correlation with inflammatory cytokines | |
| miRNA sequencing | Exercise physiology | Plasma before/after treadmill exercise test, | miR-181b increased with exercise | qPCR in separate cohort ( |
Definition of abbreviations: BPD = bronchopulmonary dysplasia; COPD = chronic obstructive pulmonary disease; DE = differentially expressed; FACS = fluorescence-activated cell sorter; IPF = idiopathic pulmonary fibrosis; KCO = carbon monoxide transfer coefficient; lncRNA = long noncoding RNA; PAH = pulmonary arterial hypertension; PCH = pulmonary capillary hemangiomatosis; PMVEC = pulmonary microvascular endothelial cells, PVOD = pulmonary veno-occlusive disease; qPCR = quantitative polymerase chain reaction; SIRS = Systemic Inflammatory Response Syndrome; SNP = single-nucleotide polymorphism; WGS = whole-genome sequencing.
Figure 1.Next-generation sequencing methodology (Illumina). Genomic DNA is fragmented and sequencing adaptors are attached. The genomic library is then hybridized to complementary oligonucleotide probes in the flow cell chamber. Because there are adaptors on both ends, hybridization results in a bridge. Amplification leads to clusters of fragments with the same sequence. Clusters are denatured; then, sequencing-by-synthesis involves the addition of fluorescently labeled nucleotides, with serial imaging after the incorporation of each nucleotide. Reprinted by permission from Reference 116.
Figure 2.Workflow for a next-generation sequencing study in human disease.
Biobanks and commonly used databases for next-generation sequencing research
| URL | |
|---|---|
| Biobanks and other large sequencing studies | |
| Centers for Common Disease Genomics | |
| China Kadoorie Biobank | |
| Genomics England (“100,000 Genomes Project”) | |
| Trans-Omics in Precision Medicine (TOPMed) | |
| U.K. Biobank | |
| Databases | |
| Database of Genotypes and Phenotypes (dbGaP) | |
| Ensembl genome browser | |
| Gene Expression Omnibus (GEO) | |
| Genome Aggregation Database (gnomAD) | |
| Genotype-Tissue Expression project (GTEx) | |
| Human Cell Atlas | |
| Lung Map | |
| Reference Sequence Database (RefSeq) | |
| Sequence Read Archive (SRA) | |
| University of California Santa Cruz (UCSC) Genome Browser |
Minimal elements required in the reporting of high-throughput sequencing studies.
| Analytic Step | Required Elements |
|---|---|
| Whole-exome and genome sequencing | |
| Preprocessing and preanalysis quality control | Randomization of samples |
| Target design, when applicable (e.g., whole-exome sequencing) | |
| Methods for quality assessment of: | |
| Raw reads | |
| Aligned reads and coverage | |
| Global data quality | |
| Ancestry of samples (comparison with study and to reference genomes) | |
| Core analytics | Method of read alignment |
| Method of variant calling | |
| Method of association analyses | |
| Advanced analytics | Methods for integration with other data types |
| RNA sequencing | |
| Preprocessing and preanalysis quality control | Spike-in use |
| Randomization of samples | |
| Number of raw reads | |
| Methods for quality assessment of: | |
| Raw reads | |
| Aligned reads | |
| Quantification of reads | |
| Reproducibility of replicates | |
| Global data quality | |
| Core analytics | Method of transcript/gene identification |
| Method of transcript/gene quantification | |
| Method of normalization | |
| Method of batch correction | |
| Method of detection of differential expression | |
| Advanced analytics | Method of transcript/isoform discovery |
| Method of indel detection | |
| Method of gene fusion detection | |
| Method of variant detection | |
| Method for single-cell analyses | |
| Methods for integration with other data types |
Required elements should also include the package or software name, version number, and settings used for the analysis.
Software for DNA sequencing studies
| Task | Tools | URL |
|---|---|---|
| Alignment | BWA-MEM ( | |
| Bowtie2 ( | ||
| Quality control | Raw reads | |
| FastQC | ||
| FASTX-Toolkit | ||
| Mapping | ||
| BAMtools ( | ||
| Picard Tools | ||
| Variants | ||
| GATK ( | ||
| Variant calling | SAMtools ( | |
| GATK unified genotyper, haplotype caller, variant quality score recalibration ( | ||
| Visualization | Integrative Genomics Viewer (IGV) ( | |
| UCSC Genome Browser ( | ||
| Association analysis | PLINK 2 (common variants) ( | |
| SKAT-O (rare variants) ( | ||
| GENESIS (rare variants) ( | ||
| BOLT-LMM ( |
This table provides an overview of commonly used software tools for performing analysis of next-generation sequencing data. Because the field continues to evolve rapidly, additional tools not listed in this table may also be useful to researchers.
Software for RNA sequencing studies
| Task | Tools | URL |
|---|---|---|
| Alignment | Bowtie ( | |
| STAR ( | ||
| TopHat ( | ||
| Transcript quantification | Cufflinks ( | |
| eXpress ( | ||
| HTSeq-count ( | ||
| Kallisto ( | ||
| RSEM ( | ||
| Quality control | Raw reads | |
| FastQC | ||
| FASTX-Toolkit | ||
| Mapping | ||
| BAMtools ( | ||
| Picard Tools | ||
| RSeQC ( | ||
| Quantification | ||
| NOISeq ( | ||
| Differential expression | DEGseq ( | |
| DESeq2 ( | ||
| edgeR ( | ||
| limma/voom ( | ||
| PoissonSeq ( | ||
| NOISeq ( | ||
| Sleuth ( | ||
| Alternative splicing | CuffDiff2 ( | |
| DEX-Seq ( | ||
| DSG-Seq ( | ||
| MISO ( | ||
| rSeqDiff ( | ||
| Leafcutter ( | ||
| Visualization | CummeRbund ( | |
| Integrative Genomics Viewer (IGV) ( | ||
| RNASeqViewer ( | ||
| SplicePlot ( | ||
| SpliceSeq ( | ||
| SplicingViewer ( | ||
| UCSC Genome Browser ( |
This table provides an overview of commonly used software tools for performing analysis of RNA sequencing data. Because the field continues to evolve rapidly, additional tools not listed in this table may also be useful to researchers.
Selected examples of omics integration and using omics for functional validation studies
| Technique | Example |
|---|---|
| Context-dependent eQTLs | Li and colleagues showed that cytokine production by peripheral blood mononuclear cells on stimulation depends on six specific SNPs ( |
| Imputed gene expression (PrediXcan) | Ferreira and colleagues tested for associations between asthma and 17,190 genes found to have cis- and/or trans-eQTLs across 12 cell types relevant to asthma ( |
| Gene knockdown | Dixit and colleagues investigated the effect of gene knockdown by CRISPR/Cas9 on RNA-seq expression in human LPS stimulated bone marrow dendritic cells, a method they called Perturb-seq ( |
Definition of abbreviations: eQTL = expression quantitative trait locus; LPS = lipopolysaccharide; QTL = quantitative trait locus; SNP = single-nucleotide polymorphism.