| Literature DB >> 29618046 |
Bradley M Colquitt1, David G Mets1, Michael S Brainard1,2.
Abstract
Background: Vocal learning in songbirds has emerged as a powerful model for sensorimotor learning. Neurobehavioral studies of Bengalese finch (Lonchura striata domestica) song, naturally more variable and plastic than songs of other finch species, have demonstrated the importance of behavioral variability for initial learning, maintenance, and plasticity of vocalizations. However, the molecular and genetic underpinnings of this variability and the learning it supports are poorly understood. Findings: To establish a platform for the molecular analysis of behavioral variability and plasticity, we generated an initial draft assembly of the Bengalese finch genome from a single male animal to 151× coverage and an N50 of 3.0 MB. Furthermore, we developed an initial set of gene models using RNA-seq data from 8 samples that comprise liver, muscle, cerebellum, brainstem/midbrain, and forebrain tissue from juvenile and adult Bengalese finches of both sexes. Conclusions: We provide a draft Bengalese finch genome and gene annotation to facilitate the study of the molecular-genetic influences on behavioral variability and the process of vocal learning. These data will directly support many avenues for the identification of genes involved in learning, including differential expression analysis, comparative genomic analysis (through comparison to existing avian genome assemblies), and derivation of genetic maps for linkage analysis. Bengalese finch gene models and sequences will be essential for subsequent manipulation (molecular or genetic) of genes and gene products, enabling novel mechanistic investigations into the role of variability in learned behavior.Entities:
Mesh:
Year: 2018 PMID: 29618046 PMCID: PMC5861438 DOI: 10.1093/gigascience/giy008
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:An adult male Bengalese finch (Lonchura striata domestica).
Figure 2:Comparison of Bengalese finch and Avian Phylogenomics Project assemblies. The distributions of sequencing depths (A), scaffold N50 (B), and number of annotated genes (C) are shown for the assemblies in the Avian Phylogenomics Project as of 14 September 2017. Vertical red line indicates the corresponding statistics for the Bengalese finch assembly and annotation described here.
Descriptions of libraries used for genome assembly and gene annotation.
|
| |||||
|---|---|---|---|---|---|
| Library | Insert size (expected) | Insert size (measured) | Reads (M) | Sequence (Gbases) | Coverage (x) |
| Fragment 1 | 200 | 202 | 403 | 50 | 42 |
| Fragment 2 | 220 | 226 | 412 | 51 | 43 |
| Jumping 1 | 3000 | 3300 | 753 | 60 | 50 |
| Jumping 2 | 5000 | 5300 | 149 | 12 | 10 |
| Jumping 3 | 9000 | 9000 | 100 | 7 | 6 |
| Totals | 1817 | 180 | 151 | ||
Statistics of draft genome assembly
| ALLPATHS-LG output | |
|---|---|
| Number of contigs | 37 187 |
| Number of contigs per Mb | 35.1 |
| Number of scaffolds | 3016 |
| Total contig length | 1 027 319 005 |
| Total scaffold length, with gap | 1 058 688 097 |
| N50 scaffold size in kb, with gaps | 2953 |
| Number of scaffolds per Mb | 2.85 |
| Median size of gaps in scaffolds | 270 |
| % of bases in captured gaps | 2.94 |
|
| |
| Total scaffold length as percentage of assumed genome size | 88.30% |
| % of estimated genome that is useful (>= 25 kb) | 87.60% |
| Longest scaffold | 15 662 897 |
| Shortest scaffold | 887 |
| Number of scaffolds > 1K nt | 2987 (99.0%) |
| Number of scaffolds > 10K nt | 1254 (41.6%) |
| Number of scaffolds > 100K nt | 719 (23.8%) |
| Number of scaffolds > 1M nt | 297 (9.8%) |
| Number of scaffolds > 10M nt | 3 (0.1%) |
| Mean scaffold size | 351 516 |
| Median scaffold size | 5349 |
| N50 scaffold length | 2 953 339 |
| L50 scaffold count | 103 |
| NG50 scaffold length | 2 494 006 |
| LG50 scaffold count | 129 |
| N50 scaffold—NG50 scaffold length difference | 459 333 |
| Scaffold %A | 28.31 |
| Scaffold %C | 20.13 |
| Scaffold %G | 20.09 |
| Scaffold %T | 28.24 |
| Scaffold %N | 2.94 |
| Percentage of assembly in scaffolded contigs | 99.60% |
| Percentage of assembly in unscaffolded contigs | 0.40% |
| Average number of contigs per scaffold | 10.5 |
| Average length of break (>25 Ns) between contigs in scaffold | 1082 |
Repeat elements in the genome assembly identified by RepeatMasker
| Class | N | Total length (Mbases) | Percent of genome |
|---|---|---|---|
| DNA | 3460 | 0.31 | 0.03 |
| LINE | 118 051 | 32.03 | 3.03 |
| Low_complexity | 46 755 | 2.66 | 0.25 |
| LTR | 66 142 | 25.51 | 2.41 |
| Satellite | 3822 | 2.01 | 0.19 |
| Simple_repeat | 242 428 | 11.94 | 1.13 |
| SINE | 2163 | 0.15 | 0.01 |
| Unknown | 14 079 | 4.91 | 0.46 |
| Total | 496 900 | 79.52 | 7.52 |
Figure 3:Flowchart of genome assembly and annotation. Experimental and computational approach used for genome assembly and gene annotation.