| Literature DB >> 24531727 |
Matthew W Hahn1, Simo V Zhang, Leonie C Moyle.
Abstract
Current de novo whole-genome sequencing approaches often are inadequate for organisms lacking substantial preexisting genetic data. Problems with these methods are manifest as: large numbers of scaffolds that are not ordered within chromosomes or assigned to individual chromosomes, misassembly of allelic sequences as separate loci when the individual(s) being sequenced are heterozygous, and the collapse of recently duplicated sequences into a single locus, regardless of levels of heterozygosity. Here we propose a new approach for producing de novo whole-genome sequences-which we call recombinant population genome construction-that solves many of the problems encountered in standard genome assembly and that can be applied in model and nonmodel organisms. Our approach takes advantage of next-generation sequencing technologies to simultaneously barcode and sequence a large number of individuals from a recombinant population. The sequences of all recombinants can be combined to create an initial de novo assembly, followed by the use of individual recombinant genotypes to correct assembly splitting/collapsing and to order and orient scaffolds within linkage groups. Recombinant population genome construction can rapidly accelerate the transformation of nonmodel species into genome-enabled systems by simultaneously producing a high-quality genome assembly and providing genomic tools (e.g., high-confidence single-nucleotide polymorphisms) for immediate applications. In populations segregating for important functional traits, this approach also enables simultaneous mapping of quantitative trait loci. We demonstrate our method using simulated Illumina data from a recombinant population of Caenorhabditis elegans and show that the method can produce a high-fidelity, high-quality genome assembly for both parents of the cross.Entities:
Keywords: assembly; duplication; genetics; genome; next-generation sequencing
Mesh:
Year: 2014 PMID: 24531727 PMCID: PMC4059239 DOI: 10.1534/g3.114.010264
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1General outline of the steps involved in recombinant population genome construction. Each individually numbered step is described in detail in the text.
Figure 2The pattern of segregation of collapsed duplicates through an F2 cross. The top panel shows the physical reality of genes arranged on chromosomes, with two duplicates present in each parent, F1, and F2. The bottom panel shows how, when duplicates are collapsed into a single locus, all individuals appear to be heterozygous at sites that differentiate the two copies.
Figure 3The pattern of segregation of split alleles through an F2 cross. The top panel shows the physical reality of a single gene that differs in allelic sequence between the parents. The F1 and half the F2s are heterozygous. The bottom panel shows how, when alleles are split into two loci, each parent appears to be missing a locus, whereas the F1 and half the F2s have both loci present.
Summary of assemblies
| Standard Assembly | RPGC Assembly | |
|---|---|---|
| # scaffolds assigned to chromosomes | 0 | 110 |
| # scaffolds ordered within chromosomes | 0 | 107 |
| Proportion of scaffolds correctly ordered | N/A | 100% |
| # scaffolds oriented | 0 | 90 |
| Proportion of scaffolds correctly oriented | N/A | 100% |
| Final # scaffolds | 236 | 88 |
| Final total length of assembly (with gaps) | 99,320,007 bp | 98,533,986 bp |
Scaffolds on the X chromosome could have been assigned based on read-depth if males were sequenced. RPGC, recombinant population genome construction; N/A, not applicable.
Assembly error and corrections
| Standard Assembly | RPGC Assembly | |
|---|---|---|
| # pairs of candidate split loci | N/A | 31 |
| # pairs of candidate split loci with markers | N/A | 16 |
| # pairs of loci split | 9 | 9 |
| # pairs of loci correctly identified as split | N/A | 8 |
| Length of split loci in assembly | 19,503 bp | 32,354 bp |
| Length of split loci corrected in assembly | 0 | 30,927 bp |
| # pairs of candidate collapsed loci | N/A | 69 |
| # pairs of loci collapsed | 44 | 68 |
| # pairs of loci correctly identified as collapsed | N/A | 68 |
| Length of collapsed loci in assembly | 73,693 bp | 156,468 bp |
| Length of collapsed loci reassembled in assembly | 0 | 19,505 bp |
The total single-locus length of loci confirmed as split into two loci. RPGC, recombinant population genome construction; N/A, not applicable.