Literature DB >> 21221095

Integrative genomics viewer.

James T Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S Lander, Gad Getz, Jill P Mesirov.

Abstract

Entities: Disease Gene Species

Mesh：

Year: 2011 PMID： 21221095 PMCID： PMC3346182 DOI： 10.1038/nbt.1754

Source DB: PubMed Journal: Nat Biotechnol ISSN： 1087-0156 Impact factor: 54.908

× No keyword cloud information.

To the Editor

Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole genome sequencing, epigenetic surveys, expression profiling of coding and non-coding RNAs, SNP and copy number profiling, and functional assays. Analysis of these large, diverse datasets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large datasets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data poses a significant challenge to the development of such tools. To address this challenge we developed the Integrative Genomics Viewer (IGV), a lightweight visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. It supports flexible integration of a wide range of genomic data types including aligned sequence reads, mutations, copy number, RNAi screens, gene expression, methylation, and genomic annotations (Figure S1). The IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large datasets over all resolution scales, while consuming minimal resources on the client computer (see Supplementary Text). Navigation through a dataset is similar to Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole-genome to base pair (Figure S2). Datasets can be loaded from local or remote sources, including cloud-based resources, enabling investigators to view their own genomic datasets alongside publicly available data from, for example, The Cancer Genome Atlas (TCGA)[1], 1000 Genomes (www.1000genomes.org/), and ENCODE[2] (www.genome.gov/10005107) projects. In addition, IGV allows collaborators to load and share data locally or remotely over the Web. IGV supports concurrent visualization of diverse data types across hundreds, and up to thousands of samples, and correlation of these integrated datasets with clinical and phenotypic variables. A researcher can define arbitrary sample annotations and associate them with data tracks using a simple tab-delimited file format (see Supplementary Text). These might include, for example, sample identifier (used to link different types of data for the same patient or tissue sample), phenotype, outcome, cluster membership, or any other clinical or experimental label. Annotations are displayed as a heatmap but more importantly are used for grouping, sorting, filtering, and overlaying diverse data types to yield a comprehensive picture of the integrated dataset. This is illustrated in Figure 1, a view of copy number, expression, mutation, and clinical data from 202 glioblastoma samples from the TCGA project in a 3 kb region around the EGFR locus[1, 3]. The investigator first grouped samples by tumor subtype, then by data type (copy number and expression), and finally sorted them by median copy number over the EGFR locus. A shared sample identifier links the copy number and expression tracks, maintaining their relative sort order within the subtypes. Mutation data is overlaid on corresponding copy number and expression tracks, based on shared participant identifier annotations. Several trends in the data stand out, such as a strong correlation between copy number and expression and an overrepresentation of EGFR amplified samples in the Classical subtype.

Figure 1

Copy number, expression, and mutation data grouped by tumor subtype

This figure illustrates an integrated, multi-modal view of 202 glioblastoma multiforme TCGA samples. Copy number data is segmented values from Affymetrix SNP 6.0 arrays. Expression data is limited to genes represented on all TCGA employed platforms and displayed across the entire gene locus. Red shading indicates relative up-regulation of a gene and the degree of copy gain of a region; blue shading indicates relative down-regulation and copy loss. Small black squares indicate the position of point missense mutations. Samples are grouped by tumor subtype (2nd annotation column) and data type (1st sample annotation column), and sorted by copy number of the EGFR locus. Linking via sample attributes insures that the order of sample tracks is consistent across data types within their respective tumor subtypes.

IGV’s scalable architecture makes it well suited for genome-wide exploration of next-generation sequencing (NGS) datasets, including both basic aligned read data as well as derived results, such as read coverage. NGS datasets can approach terabytes in size, so careful management of data is necessary to conserve compute resources and to prevent information overload. IGV varies the displayed level of detail according to resolution scale. At very wide views, such as the whole genome, IGV represents NGS data by a simple coverage plot. Coverage data is often useful for assessing overall quality and diagnosing technical issues in sequencing runs (Figure S3), as well as analysis of ChIP-Seq[4] and RNA-Seq[5] experiments (Figures S4 and S5). As the user zooms below the ~50 kb range, individual aligned reads become visible (Figure 2) and putative SNPs are highlighted as allele counts in the coverage plot. Alignment details for each read are available in popup windows (Figures S6 and S7). Zooming further, individual base mismatches become visible, highlighted by color and intensity according to base call and quality. At this level, the investigator may sort reads by base, quality, strand, sample and other attributes to assess the evidence of a variant. This type of visual inspection can be an efficient and powerful tool for variant call validation, eliminating many false positives and aiding in confirmation of true findings (Figures S6 and S7).

Figure 2

View of aligned reads at 20kb resolution

Coverage plot and alignments from paired-end reads for a matched tumor/normal pair. Sequencing was performed on an Illumina GA2 platform and aligned with Maq. Alignments are represented as gray polygons with reads mismatching the reference indicated by color. Loci with a large percentage of mismatches relative to the reference are flagged in the coverage plot as color-coded bars. Alignments with unexpected inferred insert sizes are indicated by color. There is evidence for an approximately 10kb deletion (removing 2 exons of AIDA) in the tumor sample not present in the normal.

Many sequencing protocols produce reads from both ends (“paired ends”) of genomic fragments of known size distribution. IGV uses this information to color-code paired ends if their insert sizes are larger than expected, fall on different chromosomes, or have unexpected pair orientations. Such pairs, when consistent across multiple reads, can be indicative of a genomic rearrangement. When coloring aberrant paired ends, each chromosome is assigned a unique color, so that intra- (same color) and inter- (different color) chromosomal events are readily distinguished (Figures 2 and S8). We note that misalignments, particularly in repeat regions, can also yield unexpected insert sizes, and can be diagnosed with the IGV (Figure S9). There are a number of stand-alone, desktop genome browsers available today[6] including Artemis[7], EagleView[8], MapView[9], Tablet[10], Savant[11], Apollo[12], and the Integrated Genome Browser[13]. Many of them have features that overlap with IGV, particularly for NGS sequence alignment and genome annotation viewing. The Integrated Genome Browser also supports viewing array-based data. See Supplementary Table 1 and Supplementary Text for more detail. IGV focuses on the emerging integrative nature of genomic studies, placing equal emphasis on array-based platforms, such as expression and copy-number arrays, next-generation sequencing, as well as clinical and other sample metadata. Indeed, an important and unique feature of IGV is the ability to view all these different data types together and to use the sample metadata to dynamically group, sort, and filter datasets (Figure 1 above). Another important characteristic of IGV is fast data loading and real-time pan and zoom – at all scales of genome resolution and all dataset sizes, including datasets comprising hundreds of samples. Finally, we have placed great emphasis on the ease of installation and use of IGV, with the goal of making both the viewing and sharing of their data accessible to non-informatics end users. IGV is open source software and freely available at http://www.broadinstitute.org/igv/, including full documentation on use of the software.

11 in total

1. Artemis: sequence visualization and annotation.

Authors: K Rutherford; J Parkhill; J Crook; T Horsnell; P Rice; M A Rajandream; B Barrell
Journal: Bioinformatics Date: 2000-10 Impact factor: 6.937

2. The ENCODE (ENCyclopedia Of DNA Elements) Project.

Authors:
Journal: Science Date: 2004-10-22 Impact factor: 47.728

3. MapView: visualization of short reads alignment on a desktop computer.

Authors: Hua Bao; Hui Guo; Jinwei Wang; Renchao Zhou; Xuemei Lu; Suhua Shi
Journal: Bioinformatics Date: 2009-04-15 Impact factor: 6.937

4. EagleView: a genome assembly viewer for next-generation sequencing technologies.

Authors: Weichun Huang; Gabor Marth
Journal: Genome Res Date: 2008-06-11 Impact factor: 9.043

Review 5. Visualizing genomes: techniques and challenges.

Authors: Cydney B Nielsen; Michael Cantor; Inna Dubchak; David Gordon; Ting Wang
Journal: Nat Methods Date: 2010-02-25 Impact factor: 28.547

6. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1.

Authors: Roel G W Verhaak; Katherine A Hoadley; Elizabeth Purdom; Victoria Wang; Yuan Qi; Matthew D Wilkerson; C Ryan Miller; Li Ding; Todd Golub; Jill P Mesirov; Gabriele Alexe; Michael Lawrence; Michael O'Kelly; Pablo Tamayo; Barbara A Weir; Stacey Gabriel; Wendy Winckler; Supriya Gupta; Lakshmi Jakkula; Heidi S Feiler; J Graeme Hodgson; C David James; Jann N Sarkaria; Cameron Brennan; Ari Kahn; Paul T Spellman; Richard K Wilson; Terence P Speed; Joe W Gray; Matthew Meyerson; Gad Getz; Charles M Perou; D Neil Hayes
Journal: Cancer Cell Date: 2010-01-19 Impact factor: 31.743

7. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals.

Authors: Mitchell Guttman; Ido Amit; Manuel Garber; Courtney French; Michael F Lin; David Feldser; Maite Huarte; Or Zuk; Bryce W Carey; John P Cassady; Moran N Cabili; Rudolf Jaenisch; Tarjei S Mikkelsen; Tyler Jacks; Nir Hacohen; Bradley E Bernstein; Manolis Kellis; Aviv Regev; John L Rinn; Eric S Lander
Journal: Nature Date: 2009-02-01 Impact factor: 49.962

8. Tablet--next generation sequence assembly visualization.

Authors: Iain Milne; Micha Bayer; Linda Cardle; Paul Shaw; Gordon Stephen; Frank Wright; David Marshall
Journal: Bioinformatics Date: 2009-12-04 Impact factor: 6.937

9. The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets.

Authors: John W Nicol; Gregg A Helt; Steven G Blanchard; Archana Raja; Ann E Loraine
Journal: Bioinformatics Date: 2009-08-04 Impact factor: 6.937

10. Comprehensive genomic characterization defines human glioblastoma genes and core pathways.

Authors:
Journal: Nature Date: 2008-09-04 Impact factor: 49.962

2000 in total

1. Genomic characterisation of Cuiaba and Charleville viruses: arboviruses (family Rhabdoviridae, genus Sripuvirus) infecting reptiles and amphibians.

Authors: Nikos Vasilakis; Robert B Tesh; Steven G Widen; Divya Mirchandani; Peter J Walker
Journal: Virus Genes Date: 2018-12-03 Impact factor: 2.332

2. Directed Nucleosome Sliding during the Formation of the Simian Virus 40 Particle Exposes DNA Sequences Required for Early Transcription.

Authors: Meera Ajeet Kumar; Karine Kasti; Lata Balakrishnan; Barry Milavetz
Journal: J Virol Date: 2019-02-05 Impact factor: 5.103

3. UBE2O remodels the proteome during terminal erythroid differentiation.

Authors: Anthony T Nguyen; Miguel A Prado; Paul J Schmidt; Anoop K Sendamarai; Joshua T Wilson-Grady; Mingwei Min; Dean R Campagna; Geng Tian; Yuan Shi; Verena Dederer; Mona Kawan; Nathalie Kuehnle; Joao A Paulo; Yu Yao; Mitchell J Weiss; Monica J Justice; Steven P Gygi; Mark D Fleming; Daniel Finley
Journal: Science Date: 2017-08-04 Impact factor: 47.728

4. Bypassing Drug Resistance Mechanisms of Prostate Cancer with Small Molecules that Target Androgen Receptor-Chromatin Interactions.

Authors: Kush Dalal; Meixia Che; Nanette S Que; Aishwariya Sharma; Rendong Yang; Nada Lallous; Hendrik Borgmann; Deniz Ozistanbullu; Ronnie Tse; Fuqiang Ban; Huifang Li; Kevin J Tam; Mani Roshan-Moniri; Eric LeBlanc; Martin E Gleave; Daniel T Gewirth; Scott M Dehm; Artem Cherkasov; Paul S Rennie
Journal: Mol Cancer Ther Date: 2017-08-03 Impact factor: 6.261

5. Pancreatic intraductal tubulopapillary neoplasm is genetically distinct from intraductal papillary mucinous neoplasm and ductal adenocarcinoma.

Authors: Olca Basturk; Michael F Berger; Hiroshi Yamaguchi; Volkan Adsay; Gokce Askan; Umesh K Bhanot; Ahmet Zehir; Fatima Carneiro; Seung-Mo Hong; Giuseppe Zamboni; Esra Dikoglu; Vaidehi Jobanputra; Kazimierz O Wrzeszczynski; Serdar Balci; Peter Allen; Naoki Ikari; Shoko Takeuchi; Hiroyuki Akagawa; Atsushi Kanno; Tooru Shimosegawa; Takanori Morikawa; Fuyuhiko Motoi; Michiaki Unno; Ryota Higuchi; Masakazu Yamamoto; Kyoko Shimizu; Toru Furukawa; David S Klimstra
Journal: Mod Pathol Date: 2017-08-04 Impact factor: 7.842

6. IL-15 Preconditioning Augments CAR T Cell Responses to Checkpoint Blockade for Improved Treatment of Solid Tumors.

Authors: Lauren Giuffrida; Kevin Sek; Melissa A Henderson; Imran G House; Junyun Lai; Amanda X Y Chen; Kirsten L Todd; Emma V Petley; Sherly Mardiana; Izabela Todorovski; Emily Gruber; Madison J Kelly; Benjamin J Solomon; Stephin J Vervoort; Ricky W Johnstone; Ian A Parish; Paul J Neeson; Lev M Kats; Phillip K Darcy; Paul A Beavis
Journal: Mol Ther Date: 2020-07-21 Impact factor: 11.454

7. ETV4 and AP1 Transcription Factors Form Multivalent Interactions with three Sites on the MED25 Activator-Interacting Domain.

Authors: Simon L Currie; Jedediah J Doane; Kathryn S Evans; Niraja Bhachech; Bethany J Madison; Desmond K W Lau; Lawrence P McIntosh; Jack J Skalicky; Kathleen A Clark; Barbara J Graves
Journal: J Mol Biol Date: 2017-07-17 Impact factor: 5.469

8. SLC25A10 biallelic mutations in intractable epileptic encephalopathy with complex I deficiency.

Authors: Giuseppe Punzi; Vito Porcelli; Matteo Ruggiu; Md F Hossain; Alessio Menga; Pasquale Scarcia; Alessandra Castegna; Ruggiero Gorgoglione; Ciro L Pierri; Luna Laera; Francesco M Lasorsa; Eleonora Paradies; Isabella Pisano; Carlo M T Marobbio; Eleonora Lamantea; Daniele Ghezzi; Valeria Tiranti; Sergio Giannattasio; Maria A Donati; Renzo Guerrini; Luigi Palmieri; Ferdinando Palmieri; Anna De Grassi
Journal: Hum Mol Genet Date: 2018-02-01 Impact factor: 6.150

9. Single sample sequencing (S3EQ) of epigenome and transcriptome in nucleus accumbens.

Authors: S J Xu; E A Heller
Journal: J Neurosci Methods Date: 2018-07-18 Impact factor: 2.390

10. Epigenetic Regulation of ZBTB18 Promotes Glioblastoma Progression.

Authors: Vita Fedele; Fangping Dai; Anie P Masilamani; Dieter H Heiland; Eva Kling; Ana M Gätjens-Sanchez; Roberto Ferrarese; Leonardo Platania; Doostkam Soroush; Hyunsoo Kim; Sven Nelander; Astrid Weyerbrock; Marco Prinz; Andrea Califano; Antonio Iavarone; Markus Bredel; Maria S Carro
Journal: Mol Cancer Res Date: 2017-05-16 Impact factor: 5.852