| Literature DB >> 34456520 |
Joseph Outten1, Andrew Warren1.
Abstract
Pangenomes are organized collections of the genomic information from related individuals or groups. Graphical pangenomics is the study of these pangenomes using graphical methods to identify and analyze genes, regions, and mutations of interest to an array of biological questions. This field has seen significant progress in recent years including the development of graph based models that better resolve biological phenomena, and an explosion of new tools for mapping reads, creating graphical genomes, and performing pangenome analysis. In this review, we discuss recent developments in models, algorithms associated with graphical genomes, and comparisons between similar tools. In addition we briefly discuss what these developments may mean for the future of genomics. © Indian Institute of Science 2021.Entities:
Keywords: Genome assembly; Graph genomes; Graphical pangenomics; Multiple sequence alignment; Pangenomics
Year: 2021 PMID: 34456520 PMCID: PMC8384392 DOI: 10.1007/s41745-021-00255-z
Source DB: PubMed Journal: J Indian Inst Sci ISSN: 0019-4964
Terminology, informed in part by19,52.
| Term | Description | Parent types* |
|---|---|---|
| de Bruijn graph | Nodes are sequence k–mers, and directed edges connect k–mers whose k-1 suffix overlaps with other k–mers k-1 prefix | NA |
| Sequence graph | Edges or nodes are labelled with sequences. Used to compress sequence representation and express contiguity between segments with directed or bidirected edges | NA |
| Genome graph | Relates a genome’s sequence information to itself or other genomes | Sequence graph |
| Pangenome | A representation of the genetic information across a population | NA |
| Pangenome graph | Genome graphs explicitly involving more than one genome | Genome graph, sequence graph |
| Synteny graph | Relates blocks of conserved sequence | Sequence graph |
| Reference genome | Used as the standard for comparison in a species, e.g. GRCh38 | NA |
| Reference bias | The use of a linear-reference causing incomplete analysis or a lack of sensitivity | NA |
| Variation graph | Bidirected graphs which embed linear sequences as paths | Pangenome graph, bidirected graph |
| Bidirected graph | Each edge has a discrete endpoint on either the left or right of a node | Sequence graph |
*Where applicable, parent types lists those other terms for which the term in question is a specific type
Figure 1:(i) An example of a genome graph at the resolution of single nucleotide polymorphisms. The mapping criteria of exact match , is used to define a frame of reference and the resulting nodes (similarity groups). (ii) An example of a larger structural variant. The colored bars represent larger graph structures which themselves represent divergent sequences that do not meet the mapping criteria relative to one another.
Figure 2:Examples of deconvolution. Regions of similarity have matching symbols and connecting edges, and by logical extension diverging paths represent regions of divergence. Boxes represent a similarity group, which forms the basis of including a region in a node for the multi-genome model. The extent of deconvolution is dictated by the incoming labels, the mapping criteria, and the frame of reference given by the algorithm. (i) Two input genomes A and B each with two replicons. (ii) A de Bruijn graph model of similarity is dictated by k-mer parameter size and the amount of repeat similarity. (iii) An example of a metagenomic level of resolution capable in a genome graph given unknown provenance of contigs. (iv) Fully disambiguated groupings for assembled genomes given as input.