| Literature DB >> 21554765 |
Thomas N Heider1, James Lindsay, Chenwei Wang, Rachel J O'Neill, Andrew J Pask.
Abstract
INTRODUCTION: Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies.Entities:
Year: 2011 PMID: 21554765 PMCID: PMC3090765 DOI: 10.1186/1753-6561-5-S2-S7
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1The tammar Wallaby ( An adult female tammar wallaby of Abrolhos Island origin, Western Australia. Females weigh 4-6kg and males 5-9kg.
Figure 2Phylogenetic tree of the three extant mammalian lineages. Marsupials form a separate lineage from the eutherian mammals and last shared a common ancestor approximately 148 million years ago making them powerful species for comparative genomics.
Summary statistics for scaffolding of sequence and nonsequence based data
| Run Included | Number of paired-end reads | Total scaffold span | N50 scaffold span | Number of scaffolds | Percentage of original contigs included into scaffolds |
|---|---|---|---|---|---|
| --- | --- | 36 KB | 277,711 | 0.0 % | |
| 173,294 | 3204 MB | 39 KB | 271,687 | 2.2 % | |
| 8,415,542 | 3069 MB | 49 KB | 165,909 | 40.2 % | |
| 11,718,457 | 3177 MB | 52 KB | 202,026 | 27.2 % | |
| 20,133,999 | 2829 MB | 78 KB | 129,290 | 53.4 % | |
| 20,407,293 | 2534 MB | 105 KB | 124,099 | 55.2 % | |
| --- | 2700 MB | --- | 8 | --- |
This table shows the relative contributions that each library makes to reducing the number of scaffolds and increasing the N50. All data are derived from the Bambus output statistics file. The left column indicates the data set used to enhance the assembly with Bambus. Number of pared end reads indicates the total number of paired data points used for each data set to enhance the assembly. The total scaffold span provides an indirect assessment of the genome size based on the assembly and was used by Bambus to calculate the N50. The N50 indicates the size of the smallest contig in the smallest set of contigs that add up to 50% of the size of their respective total scaffold span. The number of scaffolds indicates the number of independently ordered regions in our assembly. The reduction in this number with the integration of each library indicates the integration and ordering of the original contigs into larger scaffolds. The total number of scaffolds generated from the assembly is listed and the percentage reduction is from the initial number of contigs present in the input library. The bottom row lists the ideal genome size (2.7 GB) and number of contigs (one for each chromosome = 8).
Figure 3Analysis pipeline. This schematic is a representation of the steps in preparing the non-sequence based data for analysis by Bambus. The shaded ovals represent the programs and scripts, the rectangles represent the data sets while the arrows represent the flow of information through the pipeline.