| Literature DB >> 35748708 |
P J Bradbury1, T Casstevens2, S E Jensen3, L C Johnson2, Z R Miller2, B Monier2, M C Romay2, B Song2, E S Buckler1,2,3.
Abstract
MOTIVATION: Pangenomes provide novel insights for population and quantitative genetics, genomics, and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for genomically diverse species is computationally and practically challenging. We developed a trellis graph representation anchored to the reference genome that represents most pangenomes well and can be used to impute complete genomes from low density sequence or variant data.Entities:
Year: 2022 PMID: 35748708 PMCID: PMC9344836 DOI: 10.1093/bioinformatics/btac410
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Diagram of a Practical Haplotype Graph (PHG). Rectangles containing sequence are graph nodes, which represent haplotypes. Edges connect adjacent nodes. Solid edges connect nodes from the same assembly while dashed edges represent recombinations. To avoid cluttering the graph, except for reference range 1 to 2, not all possible dashed edges are shown. The four reads represent sequence reads from a sample to be genotyped. These match sequence in the haplotypes (ATC - range 1, ATG - range 2, CAG - range 3, CAA - range 4). The thicker arrows along the assembly A and B edges show the most likely path through the graph given the sequence reads. Of all possible paths connecting the tagged haplotypes, it has the fewest recombinations. The imputed sequence for the sample is ‘GATCGATGCTACAGACCAAGG’
Fig. 2.Building a PHG database
Fig. 3.Effect of parameters on accuracy. (A) Read mapping error rate for the whole genome as a function of the maximum diversity (mxDiv) parameter for determining consensus haplotypes, read type (paired-end, single) and distance matrix method (kmer, SNP). Read mapping error rate is the number of reads not mapping to the target haplotype divided by the total number of reads. NA labels the result of mapping against the original, non-consensus haplotypes. (B) Imputed SNP error rate for paired and single-end reads for different values of the diversity cutoff and consensus method. Error rate equals the number of wrong SNP calls divided by the number of base pairs of sequence. Where the B73 bar is absent, the error rate was zero. (C) Read mapping error as a function of minimizer redundancy controlled by the minimap2 f-parameter. f parameter values are (a) f1000,5000 [default]; (b) f5000,6000; (c) f10000,11000; (d)15000,16000; (e) f20000,21000; (f) f25000,26000. mxDiv = 1e-4