| Literature DB >> 32075855 |
Swarnali Louha1, David A Ray2, Kevin Winker3, Travis C Glenn4,5.
Abstract
The song sparrow, Melospiza melodia, is one of the most widely distributed species of songbirds found in North America. It has been used in a wide range of behavioral and ecological studies. This species' pronounced morphological and behavioral diversity across populations makes it a favorable candidate in several areas of biomedical research. We have generated a high-quality de novo genome assembly of M. melodia using Illumina short read sequences from genomic and in vitro proximity-ligation libraries. The assembled genome is 978.3 Mb, with a physical coverage of 24.9×, N50 scaffold size of 5.6 Mb and N50 contig size of 31.7 Kb. Our genome assembly is highly complete, with 87.5% full-length genes present out of a set of 4,915 universal single-copy orthologs present in most avian genomes. We annotated our genome assembly and constructed 15,086 gene models, a majority of which have high homology to related birds, Taeniopygia guttata and Junco hyemalis In total, 83% of the annotated genes are assigned with putative functions. Furthermore, only ∼7% of the genome is found to be repetitive; these regions and other non-coding functional regions are also identified. The high-quality M. melodia genome assembly and annotations we report will serve as a valuable resource for facilitating studies on genome structure and evolution that can contribute to biomedical research and serve as a reference in population genomic and comparative genomic studies of closely related species.Entities:
Keywords: Dovetail genomics; Melospiza melodia; Passeriformes; de novo assembly; whole genome sequencing
Mesh:
Year: 2020 PMID: 32075855 PMCID: PMC7144075 DOI: 10.1534/g3.119.400929
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
A comparison of assembly quality statistics from the initial shotgun sequencing assembled by Meraculous and the final HiRise assembly
| Meraculous Assembly | Chicago HiRise Assembly | |
|---|---|---|
| Total length | 972.4 Mb | 978.3 Mb |
| Scaffold N50 | 33 kb | 5.58 Mb |
| Scaffold N90 | 5 kb | 303 kb |
| Scaffold L50 | 7,552 scaffolds | 48 scaffolds |
| Scaffold L90 | 35,731 scaffolds | 324 scaffolds |
| Longest scaffold | 366,149 | 26,942,064 |
| Number of scaffolds | 74,832 | 13,785 |
| Number of scaffolds > 1kb | 74,806 | 13,768 |
| Contig N50 | 22.5 kb | 31.7 kb |
| Number of gaps | 53,577 | 95,490 |
| Percent of genome in gaps | 1.427% | 1.847% |
| Number of N’s per 100 kbp | 1427.15 | 1847.03 |
| GC content | 41.07% | 41.08% |
Figure 1Comparison of assembly contiguity.
Number and percentage of repeats in the M. melodia genome assembly
| Classification | Number of copies | Percentage of assembly |
|---|---|---|
| LINEs | 104,032 | 3.01 |
| LTRs | 85,276 | 2.83 |
| SINEs | 6,695 | 0.08 |
| DNA Transposons | 13,521 | 0.21 |
| Unclassified | 4,884 | 0.12 |
| Satellites | 569 | 0.00 |
| Low complexity repeats | 38,561 | 0.20 |
| Microsatellites | 192,996 | 0.90 |
Figure 2Comparison of percentages of transposable elements (TEs) among related songbird genome assemblies. * Data from: Zhang Science. 346: 1311-1320.
Figure 3Abundance of microsatellite repeat motif size classes in the M. melodia genome assembly (details are given in Supplemental File S16).
Characteristics of genes predicted in the M. melodia genome compared to Taeniopygia guttata (zebra finch), Ficedulla albicollis (collared flycatcher), Manacus vitellinus (golden-collared manakin) and Geospiza fortis (medium ground finch)
| Number of genes | 15,086 | 17,561 | 16,763 | 18,976 | 14,388 |
| Mean gene length (bp) | 14,457 | 26,458 | 31,394 | 27,847 | 30,164 |
| Mean CDS length (bp) | 1,325 | 1,677 | 1,942 | 1,929 | 1,766 |
| Number of exons | 131,940 | 171,767 | 189,043 | 190,390 | 164,721 |
| Mean exon length (bp) | 153 | 225 | 253 | 264 | 195 |
| Mean number of exons/gene | 8.67 | 10.25 | 12.22 | 11.51 | 11.41 |
| Number of introns | 116,724 | 153,909 | 171,236 | 171,089 | 149,563 |
| Mean intron length (bp) | 1,695 | 2,930 | 3,257 | 3,294 | 2,813 |
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Taeniopygia_guttata/103/
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Ficedula_albicollis/101/
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Manacus_vitellinus/102/
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Geospiza_fortis/101/
Figure 4Jupiter plot correlating zebra finch and song sparrow genome assemblies, considering scaffolds greater than 100 kbp in the reference zebra finch genome and the largest scaffolds representing 85% of the song sparrow genome.
Number of ncRNAs predicted in the Melospiza melodia genome compared to Taeniopygia guttata (zebra finch) and Ficedulla albicollis (collared flycatcher)
| tRNA | 267 | 184 | 179 |
| miRNA | 166 | 302 | 510 |
| snRNA | 16 | 44 | 32 |
| snoRNA | 154 | 241 | 199 |
| rRNA | 8 | 100 | 22 |
| lncRNA | 20 | 908 | 1473 |
http://useast.ensembl.org/info/data/ftp/index.html
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Taeniopygia_guttata/103/
https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Ficedula_albicollis/101/