| Literature DB >> 23667475 |
Christopher M Seabury1, Scot E Dowd, Paul M Seabury, Terje Raudsepp, Donald J Brightsmith, Poul Liboriussen, Yvette Halley, Colleen A Fisher, Elaine Owens, Ganesh Viswanathan, Ian R Tizard.
Abstract
Data deposition to NCBI Genomes: This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AMXX00000000 (SMACv1.0, unscaffolded genome assembly). The version described in this paper is the first version (AMXX01000000). The scaffolded assembly (SMACv1.1) has been deposited at DDBJ/EMBL/GenBank under the accession AOUJ00000000, and is also the first version (AOUJ01000000). Strong biological interest in traits such as the acquisition and utilization of speech, cognitive abilities, and longevity catalyzed the utilization of two next-generation sequencing platforms to provide the first-draft de novo genome assembly for the large, new world parrot Ara macao (Scarlet Macaw). Despite the challenges associated with genome assembly for an outbred avian species, including 951,507 high-quality putative single nucleotide polymorphisms, the final genome assembly (>1.035 Gb) includes more than 997 Mb of unambiguous sequence data (excluding N's). Cytogenetic analyses including ZooFISH revealed complex rearrangements associated with two scarlet macaw macrochromosomes (AMA6, AMA7), which supports the hypothesis that translocations, fusions, and intragenomic rearrangements are key factors associated with karyotype evolution among parrots. In silico annotation of the scarlet macaw genome provided robust evidence for 14,405 nuclear gene annotation models, their predicted transcripts and proteins, and a complete mitochondrial genome. Comparative analyses involving the scarlet macaw, chicken, and zebra finch genomes revealed high levels of nucleotide-based conservation as well as evidence for overall genome stability among the three highly divergent species. Application of a new whole-genome analysis of divergence involving all three species yielded prioritized candidate genes and noncoding regions for parrot traits of interest (i.e., speech, intelligence, longevity) which were independently supported by the results of previous human GWAS studies. We also observed evidence for genes and noncoding loci that displayed extreme conservation across the three avian lineages, thereby reflecting their likely biological and developmental importance among birds.Entities:
Mesh:
Year: 2013 PMID: 23667475 PMCID: PMC3648530 DOI: 10.1371/journal.pone.0062415
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Consensus Scarlet Macaw (Ara macao) Karyotype.
Cytogenetic analyses indicate that the scarlet macaw diploid chromosome number is 2n = 62–64, as inferred from chromosome counts of multiple cells derived from three individuals, including the sequenced female macaw (Neblina). All investigated scarlet macaws had 22 macrochromosomes, which included 10 pairs of autosomes and the sex chromosomes, and approximately 40–42 microchromosomes, the numbers of which varied due to technical reasons such as metaphase overlaps, variation in staining, and chromosome spreading.
Figure 2Chicken-Scarlet Macaw (Ara macao) Comparative Chromosome Painting (ZooFISH).
Using chicken flow sorted macrochromosomes (GGA1-GGA9) as well as GGAZ and GGAW, the homologous chromosome segments of the scarlet macaw were established via fluorescent in situ hybridization. All flow sorted probes were validated via hybridization to chicken metaphase spreads (see Figure S1).
Summary of Roche 454 Titanium and Illumina sequence data used for de novo assembly of the scarlet macaw genome.
| Data Source | Total Reads | Library Type | Insert Size | Average Read Length (bp) |
| Roche 454 | 4,489,636 | Random Shotgun | 500–600 | 301 |
| Illumina GA IIx | 59,090,507 | Small Insert Paired End | 250–450 | 116 |
| Illumina HiSeq | 132,052,204 | Small Insert Paired End | 250–450 | 93 |
| Illumina HiSeq | 116,445,199 | Mate Pair (Small) | 1100–2700 | 48 |
| Illumina HiSeq | 114,034,657 | Mate Pair (Medium) | 4000–5700 | 47 |
Total usable reads after quality and adapter trimming (n = 426,112,203).
Targeted fragment population after nebulization of high molecular weight genomic DNA.
Range of observed paired distances for each Illumina sequencing library.
Reflects the averages for quality and adapter trimmed reads.
Summary data for the scarlet macaw first-generation draft de novo genome assembly with comparison to the initial turkey and chicken genome assemblies.
| Genome Characteristics | Scarlet Macaw 1.0 | Scarlet Macaw 1.1 | Turkey 2.01 | Chicken 2.1 |
| Total Contig Length (without gaps) | 1.035 Gbp | 0.997 Gbp | 0.931 Gbp | 1.047 Gbp |
| Total Contigs >1 Kb | 214,754 | 140,453 | 128,271 | 98,612 |
| N50 Contig Size | 6,366 bp | 15,968 bp | 12,594 bp | 36,000 bp |
| Largest Contig | 87,225 bp | 177,843 bp | 90,000 bp | 442,000 bp |
| Contig Coverage | 16x | 13x | 17x | 7x |
| Cost of Sequencing (M = million) | < $0.034M | < $0.034M | < $0.250M | > $10M |
SMACv1.0 is unscaffolded. The cost of sequencing also reflects all library costs.
SMACv1.1 is scaffolded based on paired-reads and is 1.205 Gbp including gaps.
SMACv1.0 = 282,983 total contigs; SMACv1.1 = 192,790 total scaffolded contigs.
Median value of average coverage across all genomic contigs.
Figure 3Relationship Between Total Contig Length (Kbp) and Total Contig Number for the Scaffolded Scarlet Macaw (Ara macao) Genome (SMACv1.1).
The y-axis represents total contig length, expressed in kilobase pairs (Kbp), whereas the x-axis represents the total number of scaffolded contigs. Based on the estimated size of the scarlet macaw genome (1.11–1.16 Gbp), ≥90% of the assembled genome was captured within approximately 95,000 contigs.
Major repetitive content predicted by RepeatMasker within the scarlet macaw first generation de novo genome assembly (SMACv1.0, SMACv1.1).
| Repeat Type Predicted | Total Elements | Total bp (% of Genome) | Total Elements | Total bp (% of Genome) |
| SINEs | 6,741 | 834,386 (0.08%) | 6,612 | 816,297 (0.07%) |
| LINEs (L2/CR1/Rex) | 189,424 | 35,772,793 (3.46%) | 156,131 | 32,227,015 (2.67%) |
| LTR Retroviral | 41,307 | 9,866,884 (0.95%) | 34,526 | 8,474,428 (0.70%) |
| DNA Transposons | 3,697 | 536,502 (0.05%) | 3,561 | 511,534 (0.04%) |
| Unclassified Interspersed Repeats | 2,561 | 421,784 (0.04%) | 2,482 | 406,059 (0.03%) |
| Satellites | 2,033 | 235,306 (0.02%) | 1,584 | 187,049 (0.02%) |
| Low Complexity & Simple Repeats | 142,486 | 6,275,857 (0.61%) | 114,664 | 5,076,913 (0.42%) |
|
|
|
|
|
|
Simple, unscaffolded (SMACv1.0) de novo assembly (1.035 Gb).
Scaffolded (SMACv1.1) de novo assembly (1.205 Gb including gaps with N’s).
Figure 4Whole Genome Analysis of Divergence.
(A) Genome-wide nucleotide-based divergence (CorrectedForAL) between the scarlet macaw (Ara macao; simple de novo assembly) and chicken genomes (Gallus gallus 2.1). (B) Genome-wide nucleotide-based divergence (CorrectedForAL) between the scarlet macaw (Ara macao; simple de novo assembly) and zebra finch genomes (Taeniopygia guttata 1.1, 3.2.4). Each histogram represents the full, ordered distribution of the composite variable defined as: . The observed ranges of the composite variable for pane (A) and pane (B) were 3.89591E-05–0.052631579, and 3.33792E-05–0.052631579, respectively. The left edges of the distributions represent extreme conservation, whereas the right edges indicate extreme putative divergence. Distributional outliers were predicted using a percentile-based approach (99.98th and 0.02th) to construct interval bounds capturing >99.9% of the total data points in each ordered distribution.
SMACv1.0 simple de novo outlier contigs from a genome-wide analysis of divergence with the chicken.
| Predicted Outlier Contig Genes and Pathways | Known Function or GWAS Trait Classification | References |
| Mitochondrial Genome | Energy Production |
|
|
| Neuronal Development |
|
|
| Neurological Disorders |
|
| Intergenic | ||
|
| ||
|
| Motor Coordination |
|
|
| Cognition, Learning |
|
|
| Intelligence |
|
|
| Speech |
|
|
| Brain Size |
|
|
| Hippocampal Volume and Intelligence |
|
|
| Height |
|
| Intergenic | Heart Failure |
|
|
| ||
|
| Heart Ventricular Conduction |
|
|
| Heart Q-wave T-wave Interval Length |
|
|
| Blood Pressure |
|
| Intergenic | Stroke |
|
|
| Bone Mass |
|
|
| Blood Traits |
|
|
| Diabetes |
|
|
| Response or Susceptibility to Viruses |
|
|
| Asthma, Lung Function, Respiratory |
|
|
| ||
|
| Age-Related Macular Degeneration |
|
|
| Longevity |
|
For outlier direction, see Table S14.
SMACv1.0 simple de novo outlier contigs from a genome-wide analysis of divergence with the zebra finch.
| Predicted Outlier Contig Genes and Proteins | Known Function or GWAS Trait Classification | References |
|
| Neuronal Development |
|
|
| Neuron Specification |
|
|
| Neurological Disorders |
|
|
| ||
|
| Human Developmental Anomalies |
|
|
| Hippocampal and Cognitive Aging |
|
| UPF0632 Protein A | White Matter Integrity in Old Age |
|
|
| Brian Striatal Volume, Cognition |
|
|
| Susceptibility to Coronary Artery Disease |
|
|
| Cardiovascular-Left Ventricular Mass |
|
|
| Cardiomyopathy |
|
|
| Embryonic Cardiovascular Development |
|
|
| Heart Ventricular Conduction |
|
|
| Heart Q-wave T-wave Interval Length |
|
|
| Cerebrovascular Developmental Disorders |
|
|
| Developmental Disorders of the Genitalia |
|
|
| Osteogenic Differentiation, Regeneration |
|
|
| Myogenesis |
|
|
| Pigment Biosynthesis |
|
|
| Eye Development-Microphthalmia |
|
|
| Ocular Neural Pattern Development |
|
|
| Optic Disc Size-Cup Area |
|
|
| Innate Immunity |
|
|
| Tendon Differentiation and Development |
|
|
| Adiposity |
|
| NPY2R | Feeding Behavior, Oral Feeding Success |
|
|
| Diabetes |
|
For outlier direction, see Table S14.