| Literature DB >> 35404451 |
Guillermo Friis1, Joel Vizueta2, Ellen D Ketterson3, Borja Milá1.
Abstract
The dark-eyed junco (Junco hyemalis) is one of the most common passerines of North America, and has served as a model organism in studies related to ecophysiology, behavior, and evolutionary biology for over a century. It is composed of at least 6 distinct, geographically structured forms of recent evolutionary origin, presenting remarkable variation in phenotypic traits, migratory behavior, and habitat. Here, we report a high-quality genome assembly and annotation of the dark-eyed junco generated using a combination of shotgun libraries and proximity ligation Chicago and Dovetail Hi-C libraries. The final assembly is ∼1.03 Gb in size, with 98.3% of the sequence located in 30 full or nearly full chromosome scaffolds, and with a N50/L50 of 71.3 Mb/5 scaffolds. We identified 19,026 functional genes combining gene prediction and similarity approaches, of which 15,967 were associated to GO terms. The genome assembly and the set of annotated genes yielded 95.4% and 96.2% completeness scores, respectively when compared with the BUSCO avian dataset. This new assembly for J. hyemalis provides a valuable resource for genome evolution analysis, and for identifying functional genes involved in adaptive processes and speciation.Entities:
Keywords: zzm321990 Junco hyemaliszzm321990 ; Hi-C; dark-eyed junco; genome assembly
Mesh:
Year: 2022 PMID: 35404451 PMCID: PMC9157146 DOI: 10.1093/g3journal/jkac083
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Fig. 1.Geographic distribution and phenotypic diversity in the dark-eyed junco. a) Distribution map of the main dark-eyed junco forms. Colored areas correspond to the approximate breeding ranges of each form. b) Photographs of male individuals of the 6 main dark-eyed junco forms (Photos by BM).
Fig. 3.Changes in the ancestral effective population size of the dark-eyed junco. The dark red line represents the original effective population size through time, and light red lines represent 100 bootstrap estimates. The indices g and µ denote the generation time and the mutation rate, respectively.
Summary statistics for the genome assembly of Junco hyemalis.
| Genome assembly | |
|---|---|
| Total length (bp) | 1,031,523,571 |
| Number of scaffolds | 4,684 |
| N50/L50 | 71,317,294 bp/5 scaffolds |
| N90/L90 | 14,099,349 bp/19 scaffolds |
| Chromosome scale | 1,013,712,310 bp/30 scaffolds |
| Longest scaffold (bp) | 152,011,357 |
| Missingness | 3.13% |
| GC content | 41.85% |
| BUSCO Eukaryota database | C: 92.9% [S: 92.5%, D: 0.4%], F: 3.9%, M: 3.2%, N: 255 |
| BUSCO Aves database | C: 95.4% [S: 95.2%, D: 0.2%], F: 1.6%, M: 3.0%, N: 8,338 |
CDS indicates protein-coding sequences. BUSCO parameters are C: Complete BUSCO; S: Complete and single-copy BUSCOs; D: Complete and duplicated BUSCOs; F: Fragmented BUSCOs; M: Missing BUSCOs; and N: Total BUSCO groups searched
Fig. 2.Circos plot showing synteny patterns between the zebra finch (left hemisphere) and the dark-eyed junco (right hemisphere) genome assemblies. Chromosome 3 is represented by 2 scaffolds (frA and frB). Only the 2 largest scaffolds of the 293 that mapped against the W chromosome are shown.
Summary statistics for the genome annotation of Junco hyemalis compared with other similarly sized avian species (Fringilla coelebs, Melospiza melodia, Taeniopygia guttata, Ficedula albicollis, Manacus vitellinus, and Geospiza fortis), modified from Recuerda .
| Genome annotation |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Number of genes | 19,026 | 17,703 | 15,086 | 17,561 | 16,763 | 18,976 | 14,399 |
| Average gene length (bp) | 15,402 | 15,818 | 14,457 | 26,458 | 31,394 | 27,847 | 30,164 |
| Number of CDS | 23,245 | 17,703 | 15,086 | 17,561 | 16,763 | 18,976 | 14,399 |
| Average CDS length (bp) | 1,647 | 1,679 | 1,325 | 1,677 | 1,942 | 1,929 | 1,766 |
| Number of exons | 229,210 | 221,872 | 131,940 | 171,767 | 189,043 | 190,390 | 164,721 |
| Average exon length (bp) | 167 | 165 | 153 | 255 | 253 | 264 | 195 |
| Average number of exons/gene | 9.86 | 10.16 | 8.67 | 10.25 | 12.22 | 11.51 | 11.41 |
| Number of introns | 205,965 | 200,041 | 116,724 | 153,909 | 171,236 | 171,089 | 149,563 |
| Average intron length (bp) | 1,945 | 1,902 | 1,695 | 2,930 | 3,257 | 3,294 | 2,813 |
|
| |||||||
|
| |||||||
BUSCO parameters are C: Complete BUSCO; S: Complete and single-copy BUSCOs; D: Complete and duplicated BUSCOs; F: Fragmented BUSCOs; M: Missing BUSCOs; and N: Total BUSCO groups searched. CDS indicates protein-coding sequences.