| Literature DB >> 22555592 |
Shengfeng Huang1, Zelin Chen, Guangrui Huang, Ting Yu, Ping Yang, Jie Li, Yonggui Fu, Shaochun Yuan, Shangwu Chen, Anlong Xu.
Abstract
Whole-genome shotgun assembly has been a long-standing issue for highly polymorphic genomes, and the advent of next-generation sequencing technologies has made the issue more challenging than ever. Here we present an automated pipeline, HaploMerger, for reconstructing allelic relationships in a diploid assembly. HaploMerger combines a LASTZ-ChainNet alignment approach with a novel graph-based structure, which helps to untangle allelic relationships between two haplotypes and guides the subsequent creation of reference haploid assemblies. The pipeline provides flexible parameters and schemes to improve the contiguity, continuity, and completeness of the reference assemblies. We show that HaploMerger produces efficient and accurate results in simulations and has advantages over manual curation when applied to real polymorphic assemblies (e.g., 4%-5% heterozygosity). We also used HaploMerger to analyze the diploid assembly of a single Chinese amphioxus (Branchiostoma belcheri) and compared the resulting haploid assemblies with EST sequences, which revealed that the two haplotypes are not only divergent but also highly complementary to each other. Taken together, we have demonstrated that HaploMerger is an effective tool for analyzing and exploiting polymorphic genome assemblies.Entities:
Mesh:
Year: 2012 PMID: 22555592 PMCID: PMC3409271 DOI: 10.1101/gr.133652.111
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.A flowchart of the HaploMerger pipeline. The components required to generate a reference assembly are highlighted in gray. Users are allowed to choose a desired path to finish running the pipeline, to skip some components for a cursory run, or to repeat some components with different parameters.
Figure 2.A schematic diagram showing the DGA graph-based procedure. This diagram shows how to reconstruct allelic relationships and create the reference haploid assembly for a tiny diploid polymorphic assembly with eight scaffolds. The original diploid assembly (A), is first duplicated into two copies (B); then whole-genome pair-wise alignments are created between two assemblies (C). Based on the alignments, a DGA graph is created (D), from which a reduced linearized DGA graph is subsequently derived (E); guided by the reduced DGA graph, a reference haploid assembly can finally be created (F). This assembly has been included in the HaploMerger package as a simple example for testing. Readers may refer to Supplemental Figure S2 for more details regarding the conversion from the initial complicated DGA graph (D) into the reduced linearized DGA graph (E).
The application of HaploMerger to polymorphic diploid assemblies
Figure 3.A schematic diagram showing the algorithm used for detecting tandem alleles. The dot plots (A) and (B) show the self-alignments. An algorithm is used to slice the alignment panel into small cells based on the coordinates for the ends of alignment portions (C). A pair of tandem alleles are detected by the algorithm and shown in detail (D), where length_1, length_2, interval_1, interval_2, coverage_1, and coverage_2 are adjustable parameters used to detect potential cases of tandem assembled alleles.
The difference in transcript alignments between the original and the reference assemblies for B. belcheria,b
The allelic differences for transcript alignment to genome sequencesa