| Literature DB >> 26150829 |
Abstract
Modern high-throughput DNA sequencing has made it possible to inexpensively produce genome sequences, but in practice many of these draft genomes are fragmented and incomplete. Genetic linkage maps based on recombination rates between physical markers have been used in biology for over 100 years and a linkage map, when paired with a de novo sequencing project, can resolve mis-assemblies and anchor chromosome-scale sequences. Here, I summarize the methodology behind integrating de novo assemblies and genetic linkage maps, outline the current challenges, review the available software tools, and discuss new mapping technologies.Entities:
Keywords: draft genome; next-generation sequencing; optical mapping; physical mapping; scaffolds
Year: 2015 PMID: 26150829 PMCID: PMC4473057 DOI: 10.3389/fgene.2015.00220
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1(A) In whole genome assembly errors result from residual alleles which appear as discrete sequences in the reference, and mis-joins. Small fragments have no genomic context and contribute little information. (B) Using a genetic linkage map to anchor a de novo assembly resolves error in the reference sequence by giving small sequences genomic context, resolving allelism, and identifying mis-joins. Chromosome-scale assemblies can be constructed by ordering and orienting sequences with the linkage map. (C) A genetic linkage map can be estimated from a parental cross resulting in an F2, F3, or Backcross (here, BC1) population. Estimating a genetic linkage map requires (D) genotyping individuals at discrete markers (here, six markers across eight individuals with missing data); and (E) grouping markers into linkage groups; and ordering and spacing markers within linkage groups. Estimating order and spacing is difficult due to missing data and little recombination between adjacent markers.
Software packages for estimating genetic linkage maps.
| Written in R (user-friendly); High functionality; Integrated graphics; Transparent, open-source implementation; Supported and under current development | Difficulty handling >1000 markers; No methods to address bias in high-throughput DNA sequence markers | |
| User-friendly Graphical User Interface (GUI); Efficient algorithms for grouping and ordering <3000 markers | Only available commercially; Not open-source; Difficulty handling >3000 markers; No methods to address bias in high-throughput DNA sequence markers | |
| F1 crosses; Written in R; Integrates with R/qtl's functionality and graphics; Transparent, open-source implementation; Robust to genotyping errors and missing data | Difficulty handling >1000 markers; No methods to address bias in high-throughput DNA sequence markers | |
| Efficient algorithms for linkage grouping and marker ordering; Can handle >10,000 markers | Can not handle F1 crosses; Little documentation; Currently unsupported and may not be under further development; No methods to address bias in high-throughput DNA sequence markers | |
| F1 crosses; Can handle >10,000 markers; Specialized module utilizes scaffold location of genetic markers in assigning linkage groups | Assumes no recombination in one parent (specialized Lepidopteran mating system; Suomalainen et al., | |
| Can handle >1000 markers; Utilizes high-throughput sequencing errors in correcting genotyping errors and imputing missing data; Graphics and evaluation functions | Recently published and has not been widely tested |