| Literature DB >> 28287462 |
Bhavna Hurgobin1,2, David Edwards3.
Abstract
Increasing evidence suggests that a single individual is insufficient to capture the genetic diversity within a species due to gene presence absence variation. In order to understand the extent to which genomic variation occurs in a species, the construction of its pangenome is necessary. The pangenome represents the complete set of genes of a species; it is composed of core genes, which are present in all individuals, and variable genes, which are present only in some individuals. Aside from variations at the gene level, single nucleotide polymorphisms (SNPs) are also an important form of genetic variation. The advent of next-generation sequencing (NGS) coupled with the heritability of SNPs make them ideal markers for genetic analysis of human, animal, and microbial data. SNPs have also been extensively used in crop genetics for association mapping, quantitative trait loci (QTL) analysis, analysis of genetic diversity, and phylogenetic analysis. This review focuses on the use of pangenomes for SNP discovery. It highlights the advantages of using a pangenome rather than a single reference for this purpose. This review also demonstrates how extra information not captured in a single reference alone can be used to provide additional support for linking genotypic data to phenotypic data.Entities:
Keywords: SNP discovery; assembly; copy number variation; core genome; gene; genetic diversity; pangenome; presence absence variation; single nucleotide polymorphism; variable genome
Year: 2017 PMID: 28287462 PMCID: PMC5372014 DOI: 10.3390/biology6010021
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1Figure illustrating a normal reference with the correct number of genes represented by coloured blocks (a), copy number variations (b) and presence absence variations (c).
Figure 2Different approaches to pangenome assembly. Three genomes (A, B and C) are shown and together they constitute a pangenome. Each genome consists of genomic segments that are marked by the same colour if present in multiple genomes. In the whole de novo assembly approach, the three genomes are assembled individually (a); In the de Bruijn graph approach, the genomes are broken down into segments and the relationships between segments can be traced back to the edges the graph (b); In the iterative mapping and assembly approach, a single genome is used as the basis and reads from other genomes are sequentially mapped and assembled, creating a non-redundant pangenome (c).