| Literature DB >> 24621616 |
Yvette A Halley1, Scot E Dowd2, Jared E Decker3, Paul M Seabury4, Eric Bhattarai1, Charles D Johnson5, Dale Rollins6, Ian R Tizard1, Donald J Brightsmith1, Markus J Peterson7, Jeremy F Taylor3, Christopher M Seabury1.
Abstract
Wild populations of northern bobwhites (Colinus virginianus; hereafter bobwhite) have declined across nearly all of their U.S. range, and despite their importance as an experimental wildlife model for ecotoxicology studies, no bobwhite draft genome assembly currently exists. Herein, we present a bobwhite draft de novo genome assembly with annotation, comparative analyses including genome-wide analyses of divergence with the chicken (Gallus gallus) and zebra finch (Taeniopygia guttata) genomes, and coalescent modeling to reconstruct the demographic history of the bobwhite for comparison to other birds currently in decline (i.e., scarlet macaw; Ara macao). More than 90% of the assembled bobwhite genome was captured within <40,000 final scaffolds (N50 = 45.4 Kb) despite evidence for approximately 3.22 heterozygous polymorphisms per Kb, and three annotation analyses produced evidence for >14,000 unique genes and proteins. Bobwhite analyses of divergence with the chicken and zebra finch genomes revealed many extremely conserved gene sequences, and evidence for lineage-specific divergence of noncoding regions. Coalescent models for reconstructing the demographic history of the bobwhite and the scarlet macaw provided evidence for population bottlenecks which were temporally coincident with human colonization of the New World, the late Pleistocene collapse of the megafauna, and the last glacial maximum. Demographic trends predicted for the bobwhite and the scarlet macaw also were concordant with how opposing natural selection strategies (i.e., skewness in the r-/K-selection continuum) would be expected to shape genome diversity and the effective population sizes in these species, which is directly relevant to future conservation efforts.Entities:
Mesh:
Year: 2014 PMID: 24621616 PMCID: PMC3951200 DOI: 10.1371/journal.pone.0090240
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of Illumina sequence data used for de novo assembly of the bobwhite genome.
| Data Source | Total Reads | Library Type | Insert Size PD Dist. (bp) | Average Read Length (bp) |
| Illumina HiSeq | 1,575,625,135 | Small Insert Paired End | 230–475 | 84 |
| Illumina HiSeq | 510,031,444 | Mate Pair (Small) | 2100–3100 | 49 |
| Illumina HiSeq | 276,134,302 | Mate Pair (Medium) | 4600–6000 | 50 |
Total usable reads after quality and adapter trimming (n = 2,361,790,881).
Insert size and corresponding range of paired distances for each Illumina sequencing library.
Averages for quality and adapter trimmed reads, rounded to the nearest bp.
Summary data for the bobwhite de novo genome assembly with comparison to the initial turkey and chicken genome assemblies.
| Genome Characteristics | Simple | Scaffolded Bobwhite 1.1 | Turkey 2.01 | Chicken 1.0 |
| Total Contig Length | 1.042 Gbp | 1.047 Gbp | 0.931 Gbp | 1.047 Gbp |
| Total Contigs >1 Kb | 198,672 | 65,833 | 128,271 | 98,612 |
| N50 Contig Size | 6,260 bp | 45,400 bp | 12,594 bp | 36,000 bp |
| Largest Contig | 163,812 bp | 600,691 bp | 90,000 bp | 442,000 bp |
| Total Contigs | 374,224 | 220,307 | 152,641 | NA |
| Contig Coverage | ≥100× | ≥77× | 17× | 7× |
| Cost (M = million) | <$0.020M | <$0.020M | <$0.250M | >$10M |
No scaffolding procedure implemented (NB1.0).
Scaffolding based on paired reads (NB1.1); no genome maps or BACs were available.
Excluding gaps; scaffolded assembly with gaps (i.e., N's) = 1.172 Gbp.
Not provided; see [46].
Median and average coverage, excluding contigs with coverage >300× (n = 4,293).
Median and average coverage, excluding scaffolds with coverage >300× (n = 3,717).
The one-time cost of sequencing also reflects all library costs.
Figure 1Relationship Between Total Contig Length (Kbp) and Total Contig Number for the Scaffolded Bobwhite (Colinus virginianus) Genome (NB1.1).
The y-axis represents total contig length, expressed in kilobase pairs (Kbp), and the x-axis represents the total number of scaffolds. The bobwhite genome was estimated to be 1.19–1.20 Gbp. For NB1.1 (1.172 Gbp), >90% of the assembled genome was captured within <40,000 scaffolds.
Major classes of repetitive content predicted by RepeatMasker within the bobwhite NB1.1 scaffolded de novo assembly.
| Repeat Type | Total | Total bp (% of Genome) |
| Predicted | Elements | |
| SINEs | 4,425 | 545,252 (0.047%) |
| LINEs (L2/CR1/Rex) | 172,398 | 44,762,255 (3.818%) |
| LTR Retroviral | 31,766 | 8,987,247 (0.767%) |
| DNA Transposons | 22,793 | 6,863,495 (0.585%) |
| Unclassified Interspersed Repeats | 2,096 | 337,844 (0.0288%) |
| Small RNA | 757 | 70,666 (0.006%) |
| Satellites | 3,624 | 580,253 (0.050%) |
| Low Complexity & Simple Repeats | 403,599 | 32,608,785 (2.781%) |
|
|
|
|
Scaffolded de novo assembly NB1.1 (1.17 Gb including gaps with N's).
Figure 2Autosomal Coverage and Quality Score Distributions for Variants Predicted in the Scaffolded Bobwhite (Colinus virginianus) Genome (NB1.1).
Total genome-wide variants predicted within NB1.1 appears on the y-axis, with coverage and quality scores presented on the x-axis, respectively. Total variants include putative single nucleotide polymorphisms and small insertion deletion mutations (≤5 bp) that were predicted within the repeat masked NB1.1 assembly.
Figure 3Comparative Demographic History Analysis and PSMC Effective Population Size Estimates for Bobwhite (Colinus virginianus) (A) and Scarlet Macaw (Ara macao) (B).
Estimates of effective population size are presented on the y-axis as the scaled mutation rate. The bottom x-axis represents per-site pairwise sequence divergence and the top x-axis represents years before present, both on a log scale. Generation intervals of 1.22 years for the bobwhite (Colinus virginianus) and 12.7 years for the scarlet macaw (Ara macao) were used (See Methods). In the absence of known per-generation de novo mutation rates for the bobwhite and the scarlet macaw, we used the two human mutation rates (μ) of 1.1×10−8 and 2.5×10−8 per generation [124], [125] (see Methods). Darker lines represent the population size inference, and lighter, thinner lines represent 100 bootstraps to quantify uncertainty of the inference.
Figure 4Whole Genome Analysis of Divergence.
(Top) Genome-wide nucleotide-based divergence (CorrectedForAL) between the bobwhite (Colinus virginianus; NB1.0; simple de novo assembly) and the chicken genome (Gallus gallus 4.0). (Bottom) Genome-wide nucleotide-based divergence (CorrectedForAL) between the bobwhite (Colinus virginianus; NB1.0; simple de novo assembly) and the zebra finch genomes (Taeniopygia guttata 1.1, 3.2.4). Each histogram represents the full distribution of the composite variable defined as: CorrectedForAL = [42]. The left edges of the distributions represent extreme conservation, whereas the right edges indicate extreme putative divergence. The observed ranges of the composite variable were 2.19545E-05 – 0.052631579 (chicken), and 4.28493E-05 – 0.052631579 (zebra finch). Distributional outliers were predicted using a percentile-based approach (99.98th and 0.02th) to construct interval bounds capturing >99% of the total data points in each distribution.
Biologically relevant bobwhite NB1.0 simple de novo outliers from a genome-wide analysis of divergence with the chicken genome (G. gallus 4.0).
| Predicted Outlier Contig Genes | Known Function or GWAS Trait Classification | References |
|
| Aortic Stiffness |
|
|
| Cardiac Heath and Development |
|
|
| Heart Ventricular Conduction |
|
|
| Cardiomyopathy |
|
|
| Heart Q-wave T-wave Interval Length |
|
|
| Coronary Artery Disease |
|
|
| Blood Pressure |
|
|
| Pulmonary Function and Health |
|
|
| Cognitive Abilities |
|
|
| Brain Structure |
|
|
| Brain Imaging |
|
|
| Working Memory |
|
Outlier for extreme nucleotide-based conservation.
Outlier for extreme nucleotide-based divergence.
See Table S11 for an exhaustive list of outlier contigs with annotation.
Biologically relevant bobwhite NB1.0 simple de novo outliers from a genome-wide analysis of divergence with the zebra finch genome (T. guttata 3.2.4).
| Predicted Outlier | Known Function or | |
| Contig Genes | GWAS Trait Classification | References |
|
| Blood Pressure |
|
|
| Heart Ventricular Conduction |
|
|
| Aortic Stiffness |
|
|
| Resting Heart Rate |
|
|
| Bone Density |
|
|
| Bone Strength |
|
|
| Bone Mineral Density |
|
|
| Spinal Development |
|
|
| Osteogenic Differentiation |
|
| And Regeneration | ||
|
| Height |
|
|
| Waist Circumference |
|
|
| Anthropometric Traits |
|
|
| Body Weight |
|
|
| Average Daily Gain |
|
|
| Age of onset of Menarche |
|
|
| Reasoning |
|
Outlier for extreme nucleotide-based conservation.
Outlier for extreme nucleotide-based divergence.
See Table S11 for an exhaustive list of outlier contigs with annotation.