| Literature DB >> 35552372 |
Andrea Guarracino1, Simon Heumos2,3, Sven Nahnsen2,3, Pjotr Prins4, Erik Garrison4.
Abstract
MOTIVATION: Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way.Entities:
Keywords: comparative genomics; liftover; pangenome visualization; pangenomics; variation graphs
Year: 2022 PMID: 35552372 PMCID: PMC9237687 DOI: 10.1093/bioinformatics/btac308
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Overview of the methods provided by ODGI (in black) and their supported input (in blue) and output (in red) data formats (A color version of this figure appears in the online version of this article.)
Fig. 2.Visualizing the major histocompatibility complex (MHC) and complement component 4 (C4) pangenome graphs. (a) odgi draw layout of the MHC pangenome graph extracted from a whole human pangenome graph of 90 haplotypes. The red rectangle highlights the C4 region. (b–e) odgi viz visualizations of the C4 pangenome graph, where eight paths are displayed: two reference genomes (CHM13 and GRCh38 on the top) and six haplotypes of three diploid individuals. (b) odgi viz default modality: the image shows a quite linear graph. The links at the bottom indicate the presence of a structural variant (long link) with another structural variant nested inside it (short link on the left). (c) Color by path position. The top two reference genomes and one haplotypes (HG01952#2) go from left to right, while five haplotypes go in the opposite direction, as indicated by the black color on their left. (d) odgi viz color by strandness: the red paths indicate the haplotypes that were assembled in reverse with respect to the two reference genomes. (e) odgi viz color by node depth: using the Spectra color palette with four levels of node depths, white indicates no depth, while gray, red and yellow indicate depth 1, 2 and greater than or equal to 3, respectively. Coloring by node depth, we can see that the two references present two different allele copies of the C4 genes, both of them including the HERV sequence. The entirely gray paths have one copy of these genes. HG01071#2 presents three copies of the locus (orange), of which one contains the HERV sequence (gray in the middle of the orange). In HG01952#1, the HERV sequence is absent. (f) Layout of the C4 pangenome graph made with the Bandage tool (Wick ) and annotated by using odgi position. Green nodes indicate the C4 genes (in red). The red rectangle highlights the regions where C4A and C4B genes differ. (g) Annotated Bandage layout of the C4 region where C4A and C4B genes differ due to single nucleotide variants leading to changes in the encoded protein sequences. Node labels were annoted by using odgi position. (h) Visualization of odgi untangle output in the C4 pangenome graph: the plots show the copy number status of the sequences in the C4 region with respect to the GRCh38 reference sequence, making clear, for example, that in HG00438#2, the C4A gene is missing (no black lines in the region annotated in red) (A color version of this figure appears in the online version of this article.)
Fig. 3.Features of a 90-haplotype human pangenome graph of the exon 1 huntingtin gene (HTTexon1): (a) excerpt of vital statistics of the HTTexon1 graph displayed by MultiQC’s ODGI module. (b) Per nucleotide node degree distribution of CHM13 in the HTTexon1 graph. Around position 200 there is a huge variation in node degree. (c) Per nucleotide node depth distribution of CHM13 in the HTTexon1 graph. The alternating depth around position 200 indicates polymorphic variation complementing the above node degree analysis. (d) odgi viz visualization of the 23 largest gene alleles, CHM13 and GRCh38 of the HTTexon1 graph. (e) vg viz nucleotide-level visualization of 10 gene alleles, CHM13, GRCH38 of the HTTexon1 graph focusing on the CAG variable repeat region
Fig. 4.Performance on a graph of human chromosome 6 from the HPRC. ODGI compares favorably to VG across all routine pangenomic tasks. Evaluations across threads were done using a 64 human haplotype graph. Evaluations across haplotypes were done using 16 threads. (a) Performance evaluation when translating a graph into the tools’ respective native formats. (b) Performance evaluation when extracting the centromeric region from the HPRC graph. (c) Performance evaluation when visualizing a graph. Both tools were run with only one thread. vg viz: *A 816 MB SVG was produced which cannot be opened by any program. **All produced SVGs only contain an XML header, nothing else