Literature DB >> 36100590

Draft genome assemblies of four manakins.

Xuemei Li1,2, Rongsheng Gao1,2, Guangji Chen1,2, Alivia Lee Price3, Daniel Bilyeli Øksnebjerg4, Peter Andrew Hosner5,6, Yang Zhou2, Guojie Zhang3,7,8,9, Shaohong Feng10,11,12.   

Abstract

Manakins are a family of small suboscine passerine birds characterized by their elaborate courtship displays, non-monogamous mating system, and sexual dimorphism. This family has served as a good model for the study of sexual selection. Here we present genome assemblies of four manakin species, including Cryptopipo holochlora, Dixiphia pipra (also known as Pseudopipra pipra), Machaeropterus deliciosus and Masius chrysopterus, generated by Single-tube Long Fragment Read (stLFR) technology. The assembled genome sizes ranged from 1.10 Gb to 1.19 Gb, with average scaffold N50 of 29 Mb and contig N50 of 169 Kb. On average, 12,055 protein-coding genes were annotated in the genomes, and 9.79% of the genomes were annotated as repetitive elements. We further identified 75 Mb of Z-linked sequences in manakins, containing 585 to 751 genes and an ~600 Kb pseudoautosomal region (PAR). One notable finding from these Z-linked sequences is that a possible Z-to-autosome/PAR reversal could have occurred in M. chrysopterus. These de novo genomes will contribute to a deeper understanding of evolutionary history and sexual selection in manakins.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 36100590      PMCID: PMC9470731          DOI: 10.1038/s41597-022-01680-0

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   8.501


Background & Summary

Manakins (Aves: Pipridae), a family of Passeriformes, contain 17 genera and about 50 species distributed across the Neotropics, and have some unique behavioral and morphological features[1]. Most species in the family have sexual dimorphism in plumage color[2] and are polygynous[3,4]. Moreover, the complex courtship displays of males, which include high-speed movements, sophisticated acrobatics, coordinated movements of multiple males, mechanical and vocal sounds and constructed display site construction[5,6], makes this lineage a fascinating model for studying sexual selection. During mating periods, males hold territories or aggregate for competitive displays to attract females for the chance to mate[7]. Courtship varies substantially among genera and species[8-11]. For example, in genus Chiroxiphia, one male forms a partnership with another male and they perform elaborate courtship dances and sing common songs together[12]. In contrast, Corapipo gutturalis does not cooperate with other males during courtship displays[13]. Xenopipo atronitens males elaborate courtship displays by making mechanical sounds through flapping their wings[14], whereas Lepidothrix coronata males sing to attract females in addition to acrobatic displays[15]. Courtship behavior plays an important role in attracting the opposite sex, increasing the chance of producing offspring and improving the reproductive rate of birds[2,16]. At present, the courtship display of manakin species has been studied from the aspect of behavior observation[17,18], neuroendocrine[14,19,20] and physiology[2]. The genetic mechanisms have also been discussed[21-24], yet insights are lacking due to a lack of comparative genomic and transcriptomic data. As courtship displays are derived from sexual selection[25,26], we expect that investigating the evolution of their genomes, particularly the sex chromosomes, could bring insights to the understanding of underlying genetic mechanisms. To address this knowledge gap, we conducted whole genome sequencing of four species representing four manakin genera: C. holochlora, D. pipra, M. deliciosus and M. chrysopterus[27,28]. Genome sizes of these four manakin species were estimated to be 1.15 Gb, the contig N50 ranged from 125 Kb to 212 Kb, and the scaffold N50 ranged from 18.4 Mb to 36.6 Mb. We annotated about 12,055 protein-coding genes on each manakin genome. On average, 99.97% of the predicted protein-coding genes were successfully annotated by three functional databases (SwissProt, InterPro, and KEGG). About 75 Mb of Z-linked sequences, including an ~600 Kb PAR, were identified from the available female manakin genomes, including two published species (Corapipo altera and Neopelma chrysocephalum). These genomic resources will benefit research on genetic mechanisms of manakin courtship displays, and other behavioral and ecological aspects.

Methods

Sample collection, library construction, and sequencing

Tissue samples of four manakin species (C. holochlora, D. pipra, M. deliciosus and M. chrysopterus) were provided by the Natural History Museum of Denmark. High-molecular-weight genomic DNA of these samples was extracted with the Kingfisher Cell and Tissue DNA Kit Protocol. Single tube-Long Fragment Read (stLFR) technology[29] was used to construct the libraries for each sample. The resulting libraries underwent DNA Nanoball (DNB™) generation and DNBSEQ sequencing in 100 + 100 + 30 mode. On average, 149 Gb raw reads were produced for each species (Table 1).
Table 1

Sequencing reads statistics.

speciesCryptopipo holochloraDixiphia pipraMachaeropterus deliciosusMasius chrysopterus
Intitution acronymNHMDNHMDNHMDNHMD
SpecimenCodeB-126359B-126493B-125026B-125031
GenusCryptopipoDixiphiaMachaeropterusMasius
SpeciesNameholochlorapipradeliciosuschrysopterus
subspeciesholochloradiscolor/pax
DateCollected6-Jul-946-Jul-9415-Apr-9112-Sep-90
SexNot recordedNot recordedNot recordedNot recorded
Field numberNK4-15.11.94NK6-13.7.94NK10-15.4.91NK17-12.9.90
Sample inEDTAEDTAEDTAEDTA
voucherskin; MECN:6936Not recordedskeleton; Salango Museumskeleton; Museo Arqueológico Salango
CollectedByNiels KrabbeNiels KrabbeNiels KrabbeNiels Krabbe
LocationYasuní, Napo, EcuadorParque Nacional Yasuní, Napo, Ecuador9 km west of Piñas, El Oro, EcuadorAbove Chinapinza, Zamora-Chinchipe, Ecuador
LocLatitude−0.63333−0.63333−3.65−4.039
LocLongitude−76.43333−76.43333−79.75−78.583
Elevation3003009001700
strategystlFRstlFRstlFRstlFR
Sequencing platformDNBseqDNBseqDNBseqDNBseq
Library Insert Size (bp)200~2000200~2000200~2000200~2000
Raw readsTotal Data (Gb)155.92119.28152.54169.08
Reads Length (bp)100 + 100 + 30100 + 100 + 30100 + 100 + 30100 + 100 + 30
Sequence depth (×)11381109112
Clean readsTotal Data (Gb)111.6290108.28117.25
Reads Length (bp)100 + 100100 + 100100 + 100100 + 100
Sequence depth (×)10579101101
Sequencing reads statistics.

Genome assembly and quality evaluation

A series of filtering steps was applied to these stLFR reads prior to the downstream analyses using SOAPfilter2 package (v2.2). Remove reads with more than 10% of N bases; Remove reads with more than 40% low quality bases (Phred score < = 10); Remove reads with undersize insert size; Filter out the PCR duplicates. All cleaned stLFR library reads were transformed into 10X Genomics linked-reads format and passed into Supernova software (v2.0.1)[30] to assemble the genome under the “pseudohap” mode for each species. After removing scaffolds with “N” >80%, GapCloser (v1.12)[31] was used to close the intra-scaffold gaps. The size of the four assembled genomes are about 1.15 Gb, similar to the sizes of other avian genomes[32] (Fig. 1a, Table 2). The scaffold N50 of all species is higher than 18 Mb, with the largest scaffold N50 found in M. chrysopterus (36 Mb). The contig N50 of all species is higher than 124 Kb. (Fig. 1b, Table 2).
Fig. 1

Genome assembly statistics of four manakin genomes assembled in this study and three previously published genomes. (a) Comparison of genome sizes. (b) Distribution of N50 statistics of the manakin genomes. Each dot represents a manakin species, with the x-axis representing the value of scaffold N50 and the y-axis representing the value of contig N50. (c) BUSCO analysis of the seven manakin genomes. Assembly completeness is shown as the percentage of single, duplicated, fragmented and missing genes. Four newly assembled manakin genomes in this study were marked in red, while three published ones in black. Three published species are Corapipo altera (GCF_003945725.1), Manacus vitellinus (GCF_001715985.3) and Neopelma chrysocephalum (GCF_003984885.1).

Table 2

Genome assembly and BUSCO statistics.

SpeciesGenome assemblyBUSCO
Contig N50 (bp)Scaffold N50 (bp)Genome Size (bp)Single (%)Duplication (%)Fragmented (%)Missing (%)
Corapipo altera262,50112,385,8331,095,745,97696.100.600.902.40
Cryptopipo holochlora160,30335,963,5301,099,400,13795.400.301.402.90
Dixiphia pipra124,58518,411,0731,187,686,03288.405.102.803.70
Machaeropterus deliciosus178,41523,705,3761,116,150,81695.200.901.302.60
Masius chrysopterus212,47236,568,1891,189,220,94489.306.501.402.80
Manacus vitellinus290,58017,883,5821,072,328,54195.500.401.103.00
Neopelma chrysocephalum237,02910,517,2231,142,796,17995.500.901.002.60
Genome assembly statistics of four manakin genomes assembled in this study and three previously published genomes. (a) Comparison of genome sizes. (b) Distribution of N50 statistics of the manakin genomes. Each dot represents a manakin species, with the x-axis representing the value of scaffold N50 and the y-axis representing the value of contig N50. (c) BUSCO analysis of the seven manakin genomes. Assembly completeness is shown as the percentage of single, duplicated, fragmented and missing genes. Four newly assembled manakin genomes in this study were marked in red, while three published ones in black. Three published species are Corapipo altera (GCF_003945725.1), Manacus vitellinus (GCF_001715985.3) and Neopelma chrysocephalum (GCF_003984885.1). Genome assembly and BUSCO statistics. We applied BUSCO (v5.2.2)[33] to evaluate the completeness of these seven manakin genomes using aves_odb10 as the reference gene set. On average 92% of the core genes were assembled as complete single-copy genes in the four manakin genomes and only about 3% of the core genes could not be annotated on the four manakin genomes (Fig. 1c, Table 2). Therefore, the overall quality of the newly assembled genomes was high and comparable to other published manakin assemblies.

Repeat annotation

Tandem repeats were identified by Tandem Repeat Finder (TRF, v4.09.1)[34], and transposable elements (TEs) were annotated using a combination of homology-based RepeatMasker (v4.1.2)[35], and de novo methods with RepeatModeler (v2.0.2a)[36] and LTR_Finder(v1.07)[37]. The homology-based annotation of TEs was performed by RepeatMasker with its built-in library. RepeatModeler and LTR_Finder methods were used to build the de novo repeat library for each species, which was further used by RepeatMasker to predict repeats for each species. We found that the four species contained an average of 9.79% TEs in the genomes, with the proportions of each type being similar across these species (Fig. 2, Table 3). Long Interspersed Nuclear Elements (LINEs) accounted for most TEs, occupying about 6.79% of the genome.
Fig. 2

Distribution of divergence rate of four types of transposable elements (TEs) in the four manakin genomes. (a) The divergence rate was calculated between the identified TEs in the genome by homology-based method and the consensus sequence in the built-in RepeatMasker TE library. (b) The divergence rate was calculated between the identified TEs in the genome by de novo and the consensus sequence in the de novo TE library.

Table 3

Repeats statistic.

speciesDNALINESINELTRUnknowntotal
Length (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genome
Cryptopipo holochlora3,878,9150.3576,847,3486.991,170,6560.1123,590,1722.158,465,3900.77106,843,7429.72
Dixiphia pipra3,856,2890.3273,600,5966.201,420,0490.1225,019,0522.119,522,4080.80107,604,2649.06
Machaeropterus deliciosus3,354,1420.3077,377,7826.931,362,7790.1235,341,8553.178,143,0000.73118,332,72610.60
Masius chrysopterus3,596,9700.3083,835,6057.051,338,8160.1123,337,4581.9610,145,4050.85116,163,5279.77
Distribution of divergence rate of four types of transposable elements (TEs) in the four manakin genomes. (a) The divergence rate was calculated between the identified TEs in the genome by homology-based method and the consensus sequence in the built-in RepeatMasker TE library. (b) The divergence rate was calculated between the identified TEs in the genome by de novo and the consensus sequence in the de novo TE library. Repeats statistic.

Protein-coding gene annotation

We applied the homolog-based approach to annotate the protein-coding genes by using the protein sequences of Gallus gallus, Taeniopygia guttata and Homo sapiens downloaded from Ensembl release 105 as the reference gene sets. The protein sequences of these reference genes were aligned to each genome using TBLASTN (v2.2.26)[38] with an e-value cut off 1e-5, and multiple adjacent hits of the same query were connected by genBlastA (v1.0.4)[39]. Homologous blocks with length greater than 30% of the query protein length were retained. The connected hit region was later extended to include its 2 Kb upstream and downstream flanking regions, on which gene structure was predicted by Genewise (v2.4.1)[40]. MUSCLE (v3.8.31)[41] was then used to align the annotated protein with the reference protein. Predicted proteins with length ≥30 amino acids and identity value ≥40% were retained. Pseudogenes (annotated genes containing >2 frame shifts or >1 premature stop codon) and retrogenes were further removed. To build a non-redundant gene set, we first used hierarchical clustering[42] to combine the homologous-based gene sets of G. gallus and T. guttata. The gene model with the highest identity to the query was preserved if a locus has been annotated with more than one gene model. By doing so, we obtained 8,250 protein-coding genes on average after removing the highly duplicated genes (genes had >10 duplicates, were single-exon genes, and overlapped with the repeats in >70% of coding region). In the end, the newly annotated loci from the human gene set, i.e., the gene model did not overlap with the above combined one, were added into the results. In summary, we predict an average of 12,055 protein-coding genes for each manakin with an average gene length of 22,952 bp. (Table 4).
Table 4

Protein-coding gene statistics.

species# Total gene# Single exon geneMean gene length (bp)Mean coding sequence length (bp)# Mean exons per geneMean exon length (bp)Mean intron length (bp)
Cryptopipo_holochlora11,68170222,8591,6779.891702,384
Dixiphia pipra11,77070322,8211,6699.841702,393
Machaeropterus_deliciosus11,98573022,8801,6899.911702,378
Masius_chrysopterus12,78576123,2471,6859.881712,428
Protein-coding gene statistics.

Gene function annotation

The translated gene coding sequences were aligned to the SwissProt database (release-2020_05)[43] using BLASTP (v2.2.26)[38] with e-value cutoff 1e-5. The best match was assigned as the function annotation for each gene. Motifs and domains of each gene was annotated with modules PRINTS, SMART, PANTHER, ProSiteProfiles, ProSitePatterns, CDD, SFLD, Gene3D, SUPERFAMILY, and TMHMM of InterPro (v5.52–86.0)[44]. To identify the pathways in which genes may be involved, we also aligned the protein sequence of each gene to the KEGG database (release-93)[45] using BLASTP (v2.2.26)[38] with e-value cutoff 1e-5. Overall, 99.97% of the protein-coding genes of the four manakin genomes were annotated by the functional databases (Table 5).
Table 5

Function annotation results.

speciesSwissprotKEGGInterproOverall
#gene%#gene%#gene%#gene%
Corapipo altera15,77296.2914,83590.5715,85896.8115,98797.60
Cryptopipo holochlora11,65299.7510,97293.9311,66999.9011,67799.97
Dixiphia pipra11,74299.7611,03693.7611,76299.9311,76899.98
Machaeropterus deliciosus11,95899.7511,27694.0711,98099.9311,98599.97
Manacus vitellinus14,20498.0313,29991.7914,25498.3814,31798.81
Masius chrysopterus12,75499.7612,02794.0712,77099.8812,78199.97
Neopelma chrysocephalum16,22996.0115,28790.4416,30296.4416,44797.30
Function annotation results.

Orthology assignment and phylogeny inference

To reconstruct the phylogenetic history of the seven genera in manakins, we chose one representative species for each genus, including the four species in this study and three published species (C. altera: GCF_003945725.1, M. vitellinus: GCF_001715985.3 and N. chrysocephalum: GCF_003984885.1). T. guttata (GCF_003957565.2) and Calypte. anna (GCF_003957555.1) were used as outgroups. The protein-coding gene sets of these species were obtained from NCBI. We used the T. guttata gene sets as the reference and performed a BLASTP (v2.2.26)[38] search on the protein sequences with an e-value cut-off of 1e-5. The reciprocal best hit (RBH) orthologs between T. guttata and every other species were identified following the published literature[46] but without the evidence of genomic synteny. In total, we obtained 9,654 one-to-one orthologs of these nine species by merging pairwise orthologs according to the reference T. guttata gene set. The phylogeny of nine species was inferred based on the coalescent-based method, ASTRAL-III (v5.14.2)[47]. First, ortholog alignments were generated as follows: (1) we aligned the protein sequences with MAFFT L-INS-I (v7.487)[48]; (2) we used trimAl (v1.4.rev15)[49] to achieve a column-based alignment filtering with the parameter “automated”, i.e., a heuristic selection of the automatic method based on similarity statistics; and (3) the nucleic acid alignments were back-translated from the trimmed protein alignments. After these steps, we obtained 9,653 trimmed ortholog alignments containing 805,481 parsimony informative sites in total. Then, we inferred the gene tree for each ortholog alignment using IQ-TREE (v1.6.12)[50] with ModelFinder[51] function to determine the best-fit model. The output gene trees were next used as the input for ASTRAL-III (v5.14.2)[47] with default parameters to infer the species tree shown in Fig. 3. As ASTRAL-III measures the branch lengths in coalescent units, we further ran RAxML (v8.2.12)[52] under GTR + GAMMA substitution model to estimate the branch lengths in substitution per site for the concatenated ortholog alignments by specifying the ASTRAL species tree (Fig. 4a). We also used DiscoVista[53] to analyze the discordance frequencies between the ASTRAL species tree and the 9,653 gene trees (Fig. 4b). The frequency of three potential topologies is inferred based on the focal internal branches of the species tree with the main topology (in red) and alternative topologies (in blues). More phylogenetic discordance can be observed in branch 5. Specifically, the frequency of the gene trees that support C. holochlora or M. vitellinus as the sister clade to D. pipra and M. deliciosus is close (Fig. 4c). In contrast to our species tree based on the coding regions, the UCEs-based topology published by Leite et al. 2021 concluded M. vitellinus as the sister clade to D. pipra and M. deliciosus[54]. Previous studies have suggested that such topological differences could result from data-type effects[55,56]. As in Leite et al. 2021 study, the UCE-based and exon-based topologies were not consistent either. Considering that our result still differed from their reported tree even based on coding regions, we assumed that such conflicts of C. holochlora and M. vitellinus could be caused by their limited parsimony informative sites, our restricted number of species, or the evolutionary forces (e.g. introgression and incomplete lineage sorting). More whole genome resources are needed to solve the phylogenomics of these genera.
Fig. 3

Species tree with courtship displays. The ASTRAL species tree has the local posterior probabilities of all branch support as 1.0 across the tree. C. anna: The male ascends and swoops over the female. As the male nears the bottom of the dive, it flies upwards and its tail feathers make a sound[90]. T. guttata: The male jumps in the direction of the female, rotating 180° with each hop, moving its head and tail, and singing. When facing a female, the male sings and rhythmically shakes its head[91]. N. chrysocephalum: The male flaps its wings in a vertical leap[92]. M. chrysopterus: I. The male performs a side-to-side bow, with his head down and his tail up, turning his body 90°–180° degrees as he bows. II. The male flies to the log, then jumps to another place, lands on the log and sings. This can be done by two males working together[93]. C. altera: The male flies up from the display log, following by a high speed descent, wings making a sound, turns in the air, and lands facing the original landing site[94]. M. deliciosus: I. The male produces mechanical sounds by flapping their wings. II. When the male stands perpendicular to the perch, he bends forward, jumps from side to side, as if to display the black and white markings on the wings. but makes no sound. III. The male first flies along the perch in a short distance and then flies vertically upward, turning its body 180° in the process[95]. D. pipra: The male low jumped forward and high jumped back, spins his body in the air in a somersault-like motion, then flies to the perch and lands on it[10]. C. holochlora: We don’t have much information about its courtship. M. vitellinus: The male performs snap-jump displays, jumping from one sapling to another, shaking its wings in midair. II. The male flaps its wings to produce mechanical sounds[96]. In the silhouette males are in blue and females are in pink.

Fig. 4

Discordance of gene trees with the species tree. (a) The branch lengths are estimated by RAxML based on the concatenated ortholog alignments. The branch length scale refers to substitutions per site. (b) Frequency of three potential topologies around focal internal branches of the ASTRAL species tree. Main topology (species tree) is shown in red, and the other two alternative topologies are shown in light and dark blue. The dotted line indicates the 1/3 threshold. The title of each subfigure indicates the label of the corresponding branch on the tree in panel a. Each internal branch has four neighboring branches which could be used to represent quartet topologies. On the x-axis the exact definition of each quartet topology is shown using the neighboring branch labels separated by “|”. (c) Example of the three topologies with the relative frequency values for internal branch 5, corresponding to panel 5 in b. The alternative topologies for other branches can be found in the Figshare database[89].

Species tree with courtship displays. The ASTRAL species tree has the local posterior probabilities of all branch support as 1.0 across the tree. C. anna: The male ascends and swoops over the female. As the male nears the bottom of the dive, it flies upwards and its tail feathers make a sound[90]. T. guttata: The male jumps in the direction of the female, rotating 180° with each hop, moving its head and tail, and singing. When facing a female, the male sings and rhythmically shakes its head[91]. N. chrysocephalum: The male flaps its wings in a vertical leap[92]. M. chrysopterus: I. The male performs a side-to-side bow, with his head down and his tail up, turning his body 90°–180° degrees as he bows. II. The male flies to the log, then jumps to another place, lands on the log and sings. This can be done by two males working together[93]. C. altera: The male flies up from the display log, following by a high speed descent, wings making a sound, turns in the air, and lands facing the original landing site[94]. M. deliciosus: I. The male produces mechanical sounds by flapping their wings. II. When the male stands perpendicular to the perch, he bends forward, jumps from side to side, as if to display the black and white markings on the wings. but makes no sound. III. The male first flies along the perch in a short distance and then flies vertically upward, turning its body 180° in the process[95]. D. pipra: The male low jumped forward and high jumped back, spins his body in the air in a somersault-like motion, then flies to the perch and lands on it[10]. C. holochlora: We don’t have much information about its courtship. M. vitellinus: The male performs snap-jump displays, jumping from one sapling to another, shaking its wings in midair. II. The male flaps its wings to produce mechanical sounds[96]. In the silhouette males are in blue and females are in pink. Discordance of gene trees with the species tree. (a) The branch lengths are estimated by RAxML based on the concatenated ortholog alignments. The branch length scale refers to substitutions per site. (b) Frequency of three potential topologies around focal internal branches of the ASTRAL species tree. Main topology (species tree) is shown in red, and the other two alternative topologies are shown in light and dark blue. The dotted line indicates the 1/3 threshold. The title of each subfigure indicates the label of the corresponding branch on the tree in panel a. Each internal branch has four neighboring branches which could be used to represent quartet topologies. On the x-axis the exact definition of each quartet topology is shown using the neighboring branch labels separated by “|”. (c) Example of the three topologies with the relative frequency values for internal branch 5, corresponding to panel 5 in b. The alternative topologies for other branches can be found in the Figshare database[89].

Selection analysis of plumage color related genes

Manakins are characterized by a variety of plumage colors[57-59]. To explore the possible genetic mechanism of the color diversity, we investigated the signatures of selection on 37 orthologous genes related to plumage color reported in previous studies[60-69]. With our phylogenetic tree, the maximum likelihood estimation of dN (non-synonymous substitution rate), dS (synonymous substitution rate), and ω (dN/dS) for each gene was performed under two branch models, one-ratio model (H0) and free-ratio model (H1), by using codeml program in PAML package (v4.9)[70]. Likelihood ratio test was used to test if H1 was significantly better than H0, and the output p-values were next corrected with the false-discovery rate (FDR) method. Under FDR-corrected p-value cutoff 0.05, if a branch showed ω > 1 in the branch model analysis, the gene was considered to be positively selected at this branch. We further filtered results with abnormally high ω values (ω > 3)[71]. We finally obtained four genes, TBC1D22A, EDA, SLC45A2 and GOLGB1, that were likely to have undergone positive selection during manakin evolution. Among them, SLC45A2 was found to be positively selected in M. deliciosus. The gene encodes a transporter protein that mediates melanin synthesis[66]. As pheomelanin is responsible for brown and reddish coloration[72,73], the positive selection signal in M. deliciosus may explain its unique reddish-brown body plumage among other studied manakin species. The other three genes were found under positive selection in the internal branches. TBC1D22A was positively selected in the most recent common ancestor (MRCA) of M. deliciosus, D. pipra and C. holochlora (branch 5 in Fig. 4a). EDA was positively selected in the MRCA of M. deliciosus, D. pipra, C. holochlora and M. vitellinus (branch 4 in Fig. 4a). GOLGB1 was positively selected in both M. chrysopterus and MRCA of M. deliciosus, D. pipra and C. holochlora (branch 5 in Fig. 4a).

Sex chromosomes

Unlike mammals where males are heterogametic (XY system), in birds the females are heterogametic (ZW system). The avian ZW chromosomes are evolved from a pair of ancestral autosomes about 102 million years ago[74]. During evolution, the differentiation of sex chromosomes is caused by recombination arrests on the W chromosome, resulting in the reduction of functional genes on the chromosome and the accumulation of repetitive elements. The Z and W chromosomes of the extant Neoaves are of great differences in length and gene content[74]. Only a small PAR remains for recombination during cell division in females[74]. We first confirmed the sex of the manakin samples by mapping the sequencing reads of the same individual to its genomes with BWA MEM (v0.7.17)[75]. Coverage information extracted by samtools (v1.9)[76] was calculated in 5 Kb non-overlapping windows with bedtools (v2.29.2)[77] and normalized by the peak coverage. We also softmasked the genomes and performed LASTZ(v1.04.00)[78] alignment with the manakin genomes using the T. guttata genome as a reference with parameter set ‘--step = 19 --hspthresh = 2200 --inner = 2000 --ydrop = 3400 --gappedthresh = 10000 --format = axt’. Based on the assumption that Z chromosomes are relatively conserved among avians, scaffolds mapped to the Z chromosome of T. guttata with the aligning rate >50% were treated as candidate Z-linked sequences. The distribution of normalized coverage of candidate Z and other (not_Z) sequences were then visualized to check the sex of the sequenced individuals. We confirmed most of the sex information was consistent with records except M. vitellinus (BioSample SAMN02299332). This sample is more likely to be a male instead of a female. Its normalized coverage distribution was similar between the Z and not_Z sequences, with both peaks at around one and without a rise at 0.5 (Fig. 5).
Fig. 5

Sex-related information of manakins. The sex of each sequenced bird is confirmed with the sequencing depth distribution of candidate Z (blue) and other sequence (not_Z, pink) shown in the middle. The putative Z chromosomes of species with a female sequenced are shown on the right. Each female avian species has a color-coded track showing the female read depth of chrZ under 100 Kb resolution, where PAR is shown in dark green, DR in light green and assembly gaps as blanks. One exception is found in M. chrysopterus where a large region shows normalized female sequencing depth around one, implying that a DR-to-autosome/PAR reversal might have happened in this species. The positions of the putative sex-determining gene DMRT1 were traced with the dotted red line.

Sex-related information of manakins. The sex of each sequenced bird is confirmed with the sequencing depth distribution of candidate Z (blue) and other sequence (not_Z, pink) shown in the middle. The putative Z chromosomes of species with a female sequenced are shown on the right. Each female avian species has a color-coded track showing the female read depth of chrZ under 100 Kb resolution, where PAR is shown in dark green, DR in light green and assembly gaps as blanks. One exception is found in M. chrysopterus where a large region shows normalized female sequencing depth around one, implying that a DR-to-autosome/PAR reversal might have happened in this species. The positions of the putative sex-determining gene DMRT1 were traced with the dotted red line. With the above procedures we identified about 75 Mb of Z-linked scaffolds containing 585 to 751 genes in the manakins species where a female was sequenced (Fig. 5, Table 6, Supplementary Tables 1 and 2). We further constructed these Z-linked sequences into pseudo-Z-chromosomes for visualization with Ragtag (v2.1.0)[79] using T. guttata Z chromosomes as reference under parameter set “-q 10 -d 100,000 -i 0.2 -a 0.0 -s 0.0 -g 100 -m 100000 –aligner minimap2 –mm2-params ‘-x asm5’”. To obtain the genomic coordinate of the avian candidate sex determining gene DMRT1[80], we used the DMRT1 protein sequence of G. gallus downloaded from UniPort as a query, and annotated the orthologous genes on the pseudo-Z-chromosome of manakins using Genewise.
Table 6

Z chromosome statistics.

speciesPAR candidatechrZ (PAR included)
length (bp)# genelength (bp)# gene
Corapipo altera674,2171674,797,643744
Machaeropterus deliciosus610,5631175,476,763585
Masius chrysopterus29,966,716*22874,940,613*602
Neopelma chrysocephalum660,1091776,014,776751
Calypte anna1,095,0002074,081,004703
Taeniopygia Guttata495,0001775,396,176802

*We suggest a differentiated region-to-autosome/pseudoautosomal region reversal has happened in this species.

Z chromosome statistics. *We suggest a differentiated region-to-autosome/pseudoautosomal region reversal has happened in this species. We also used the normalized coverage to identify PAR in the genomes assembled from female individuals. Z-linked scaffolds with normalized depth greater than 0.7 were identified as PAR candidates. We found that PAR is conserved between manakins and T. guttata with length of about 600 Kb and containing about 16 genes. However, one exception was found in M. chrysopterus where the candidate PAR is 30 Mb and contains 228 genes (Fig. 5, Table 6 & Supplementary Table 1). Most of the 30 Mb region has become differentiated region (DR) in the most recent common ancestor of Neoaves for about 69 million years[74], as well as the other manakins in this study. Thus, it is more likely that the region has reverted back to PAR or even autosome in M. chrysopterus. Such reversal is rare but has been found in other species[81,82]. Further exploration is required for the mechanism and explanation of this possible reversal.

Data Records

The genome sequencing data and assembly of the four manakin species has been deposited to CNSA (https://db.cngb.org/cnsa/) of CNGBdb[83] with accession number CNP0002887. The raw reads from DNBSEQ sequencing and the genome assembly of four manakins in this study was deposited to NCBI with SRA accession SRR19721507, SRR19721508, SRR19721509, SRR19721510, SRR19721511[84-88]. The annotation results of four manakin species, phylogenetic tree, discordance trees and the diploid assemblies were deposited in Figshare database[89].

Technical Validation

The assemblies of four manakins used in this study are the first version of the species. The average length of scaffold N50 and contig N50 were 29 Mb and 169 Kb, respectively. BUSCO analysis evaluated the genome assembly completeness. In total, about 95.23% core genes were assembled as complete genes of the four manakin genomes (single ~92.075%, duplicated ~3.200%, fragmented ~1.725%, missing ~ 3.000%). These results are comparable to those of three previously published manakins (Corapipo altera, Manacus vitellinus, and Neopelma chrysocephalum). Supplementary Table 1 Supplementary Table 2
Measurement(s)whole genome sequencing
Technology Type(s)BGISEQ-500 Sequencing
Sample Characteristic - OrganismPipridae
  61 in total

1.  Why Do Phylogenomic Data Sets Yield Conflicting Trees? Data Type Influences the Avian Tree of Life more than Taxon Sampling.

Authors:  Sushma Reddy; Rebecca T Kimball; Akanksha Pandey; Peter A Hosner; Michael J Braun; Shannon J Hackett; Kin-Lan Han; John Harshman; Christopher J Huddleston; Sarah Kingston; Ben D Marks; Kathleen J Miglia; William S Moore; Frederick H Sheldon; Christopher C Witt; Tamaki Yuri; Edward L Braun
Journal:  Syst Biol       Date:  2017-09-01       Impact factor: 15.683

2.  Phylogenomics of manakins (Aves: Pipridae) using alternative locus filtering strategies based on informativeness.

Authors:  Rafael N Leite; Rebecca T Kimball; Edward L Braun; Elizabeth P Derryberry; Peter A Hosner; Graham E Derryberry; Marina Anciães; Jessica S McKay; Alexandre Aleixo; Camila C Ribas; Robb T Brumfield; Joel Cracraft
Journal:  Mol Phylogenet Evol       Date:  2020-11-17       Impact factor: 4.286

3.  DiscoVista: Interpretable visualizations of gene tree discordance.

Authors:  Erfan Sayyari; James B Whitfield; Siavash Mirarab
Journal:  Mol Phylogenet Evol       Date:  2018-02-05       Impact factor: 4.286

4.  RepeatModeler2 for automated genomic discovery of transposable element families.

Authors:  Jullien M Flynn; Robert Hubley; Clément Goubert; Jeb Rosen; Andrew G Clark; Cédric Feschotte; Arian F Smit
Journal:  Proc Natl Acad Sci U S A       Date:  2020-04-16       Impact factor: 11.205

5.  Courtship dives of Anna's hummingbird offer insights into flight performance limits.

Authors:  Christopher James Clark
Journal:  Proc Biol Sci       Date:  2009-06-10       Impact factor: 5.349

6.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

7.  Divergence in gene expression within and between two closely related flycatcher species.

Authors:  Severin Uebbing; Axel Künstner; Hannu Mäkinen; Niclas Backström; Paulina Bolivar; Reto Burri; Ludovic Dutoit; Carina F Mugal; Alexander Nater; Bronwen Aken; Paul Flicek; Fergal J Martin; Stephen M J Searle; Hans Ellegren
Journal:  Mol Ecol       Date:  2016-04-02       Impact factor: 6.185

8.  Combined transcriptomics and proteomics forecast analysis for potential genes regulating the Columbian plumage color in chickens.

Authors:  Xinlei Wang; Donghua Li; Sufang Song; Yanhua Zhang; Yuanfang Li; Xiangnan Wang; Danli Liu; Chenxi Zhang; Yanfang Cao; Yawei Fu; Ruili Han; Wenting Li; Xiaojun Liu; Guirong Sun; Guoxi Li; Yadong Tian; Zhuanjian Li; Xiangtao Kang
Journal:  PLoS One       Date:  2019-11-06       Impact factor: 3.240

9.  Genome-Wide Analysis Identifies Candidate Genes Encoding Feather Color in Ducks.

Authors:  Qixin Guo; Yong Jiang; Zhixiu Wang; Yulin Bi; Guohong Chen; Hao Bai; Guobin Chang
Journal:  Genes (Basel)       Date:  2022-07-14       Impact factor: 4.141

10.  An atypical mating system in a neotropical manakin.

Authors:  Milene G Gaiotti; Michael S Webster; Regina H Macedo
Journal:  R Soc Open Sci       Date:  2020-01-08       Impact factor: 2.963

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.