Literature DB >> 35857449

Rapid adaptive radiation of Darwin's finches depends on ancestral genetic modules.

Carl-Johan Rubin1,2, Erik D Enbody1, Mariya P Dobreva3, Arhat Abzhanov3, Brian W Davis4, Sangeet Lamichhaney5, Mats Pettersson1, Ashley T Sendell-Price1,6, C Grace Sprehn1, Carlos A Valle7, Karla Vasco7, Ola Wallerman1, B Rosemary Grant8, Peter R Grant8, Leif Andersson1,4,9.   

Abstract

Recent adaptive radiations are models for investigating mechanisms contributing to the evolution of biodiversity. An unresolved question is the relative importance of new mutations, ancestral variants, and introgressive hybridization for phenotypic evolution and speciation. Here, we address this issue using Darwin's finches and investigate the genomic architecture underlying their phenotypic diversity. Admixture mapping for beak and body size in the small, medium, and large ground finches revealed 28 loci showing strong genetic differentiation. These loci represent ancestral haplotype blocks with origins predating speciation events during the Darwin's finch radiation. Genes expressed in the developing beak are overrepresented in these genomic regions. Ancestral haplotypes constitute genetic modules for selection and act as key determinants of the unusual phenotypic diversity of Darwin's finches. Such ancestral haplotype blocks can be critical for how species adapt to environmental variability and change.

Entities:  

Mesh:

Year:  2022        PMID: 35857449      PMCID: PMC9269886          DOI: 10.1126/sciadv.abm5982

Source DB:  PubMed          Journal:  Sci Adv        ISSN: 2375-2548            Impact factor:   14.957


INTRODUCTION

Identification of factors that promote or constrain the process of adaptive radiation—rapid morphological and ecological diversification from a common ancestor—provides opportunities for understanding the origins of biodiversity. Species that radiate rapidly are thought to share some common features (), promoting their ability to evolve into diverse forms (, ), whereas depauperate clades may lack them (, ). One such feature, evolvability, may be determined in part by the modularity of phenotypic traits (), allowing some species to exploit ecological opportunity more readily (). Two factors that influence why some species exhibit greater evolvability than others are phenotypic plasticity and the genetic potential for diversification (). While rapid speciation in adaptive radiations provides limited time to generate de novo genetic variation, ancestral polymorphisms can facilitate rapid accumulation of diverse combinations of alleles (–). Under this model, ancestral variation is sorted in unique combinations in descendent lineages () and/or is transmitted across lineages through introgression (, ). Introgressive hybridization may lead to loss of genetic differentiation (, ); on the other hand, it enhances the potential for selection by increasing phenotypic and genetic variation (, ). Identification of genetic variants underlying phenotypic variation is essential for understanding the role of ancestral genetic variation in evolutionary change. This remains an outstanding challenge when comparing species because causal variants are greatly outnumbered by neutral variants. However, recent adaptive radiations, particularly those that still hybridize, are excellent groups for studying the origins of genetic variation and their effects on phenotype because gene flow has homogenized the genetic background, thus facilitating the identification of loci contributing to phenotypic differences among species (, ). The Darwin’s finch radiation comprises 18 species, 17 present in Galápagos and one on Cocos Island. The group is highly unusual in that no species is known to have become extinct because of human activities, in contrast to some other avian radiations (). The species have experienced current and historical gene flow (–), and diversification involved a key ecological trait, beak morphology, that mediates the efficient use of different food sources (insects, seeds of various sizes, cactus fruits, and even blood from other birds) (). Previous genetic studies have revealed a few loci where ancestral alleles explain variation in beak morphology: ALX1 affecting beak shape (), a genomic region controlling beak size including HMGA2 and three other genes (MSRB3, LEMD3, and WIF1) (, ) and, in addition, a number of suggestive loci under selection (, –). Whether the presence of ancestral alleles is a general property of the genetic architecture, governing phenotypic variation in the radiation is unknown. Here, we present a high-quality chromosome-scale reference genome and leverage a natural scaling transformation in beak size () across three species of ground finches (Geospiza) to identify 28 loci under selection. We show that the origin of these haplotype blocks linked to phenotypic divergence predates speciation events. These genetic modules have been reused over the past million years, were exchanged by gene flow, and contributed to the rapid phenotypic evolution and speciation among Darwin’s finches.

RESULTS

High-quality assembly of the Camarhynchus parvulus genome

The previously reported genome assembly based on Illumina short reads of a medium ground finch (Geospiza fortis) is highly fragmented (contig N50 = 30.5 kb) (). We therefore developed a high-quality, highly contiguous assembly for Darwin’s finches by combining long-read data with chromatin contact (HiC) data (fig. S1). Because of uncertainty in safely exporting tissue samples with intact long DNA molecules from the Galápagos National Park, we carried out Oxford Nanopore Technologies (ONT) sequencing at the Galápagos Science Center. Genomic DNA prepared from a male small tree finch (C. parvulus) was of high molecular weight, and this individual was selected for ONT sequencing. The close evolutionary relationship among all species of Darwin’s finches (pairwise interspecies d in the range of 0.2 to 0.3% ()) implies that this reference assembly can be used across the phylogeny. We generated 35x ONT long-read sequence coverage from the reference individual and assembled the haploid genome (fig. S1). Erroneous base calls were corrected using linked-read data, and chromosome-sized scaffolds were generated using HiC data. The resulting assembly is of similar quality to current state-of-the art genome assemblies in contiguity and accuracy [96% of the sequence assigned to chromosomes, scaffold N50 = 71.1 Mb, Benchmarking Universal Single-Copy Orthologs (BUSCO) = 96.1% complete, gaps = 0.01%] and shows a high degree of conserved synteny to the zebra finch genome assembly (Fig. 1A). Field-collected tissue samples were used to generate RNA sequencing (RNA-seq) data for annotation (table S1 and fig. S1). Genome annotation of these data using the Ensembl annotation pipeline () generated 17,167 gene modules that include noncoding RNA and microRNA. We further used 25 C. parvulus individuals to generate a linkage disequilibrium–based recombination map (fig. S2) (). Consistent with other avian species (, ), recombination rates are generally elevated at the ends of chromosomes and correlated with nucleotide diversity [coefficient of determination (R2) = 0.19, P < 0.001], particularly on the Z chromosome (R2 = 0.27, P < 0.001). This relationship is consistent with the widespread effects of background selection known in birds (, ).
Fig. 1.

Genome assembly, genetic diversity, and differentiation among three species of Darwin’s finches.

(A) Illustration of conserved synteny to zebra finch (Taenopygia guttata). (B) Genome-wide admixture mapping using three species sorted ascendingly by beak and body size: 0 = G. fuliginosa, 1 = G. fortis, and 2 = G. magnirostris. The red dashed line indicates the significance threshold set by permutation. Illustrations of the three species are adapted from P.R.G. and Darwin (). (C) Boxplot showing the difference in nucleotide diversity between the regions of association marked in (B) and regions outside the main area of association. Centerline indicates the median, bounded by the 25th and 75th percentiles, with whiskers extending to 1.5× the interquartile range. (D) Genome-wide Z-transformed FST for all three possible pairwise combinations of Geospiza considered here. Lines are colored by the comparison of interest. Many highly divergent regions are shared between contrasts and overlap with regions of association in (B) (data S2 and S3).

Genome assembly, genetic diversity, and differentiation among three species of Darwin’s finches.

(A) Illustration of conserved synteny to zebra finch (Taenopygia guttata). (B) Genome-wide admixture mapping using three species sorted ascendingly by beak and body size: 0 = G. fuliginosa, 1 = G. fortis, and 2 = G. magnirostris. The red dashed line indicates the significance threshold set by permutation. Illustrations of the three species are adapted from P.R.G. and Darwin (). (C) Boxplot showing the difference in nucleotide diversity between the regions of association marked in (B) and regions outside the main area of association. Centerline indicates the median, bounded by the 25th and 75th percentiles, with whiskers extending to 1.5× the interquartile range. (D) Genome-wide Z-transformed FST for all three possible pairwise combinations of Geospiza considered here. Lines are colored by the comparison of interest. Many highly divergent regions are shared between contrasts and overlap with regions of association in (B) (data S2 and S3).

Twenty-eight trait loci explaining phenotypic differentiation

We used admixture mapping () to search for loci contributing to genetic differentiation among three closely related species that differ primarily in a scaling factor for beak and body size from small to large (): the small, medium, and large ground finches (Geospiza fuliginosa, G. fortis, and Geospiza magnirostris, respectively) (fig. S3). This trio was selected on the basis of their notable phenotypic differentiation in beak and body traits alone and low genome-wide genetic differentiation (pairwise FST = 0.02 to 0.10) because this reduces the background noise due to genetic drift. In this study, we generated whole-genome, short-read sequence data from 28 individuals across the three species and combined them with previously published samples for a total of 75 birds on nine islands (mean coverage = 17 ± 9; table S2 and data S1) and applied phenotypic scores of 0, 1, and 2 to reflect the increasing beak and body size of G. fuliginosa < G. fortis < G. magnirostris because phenotypic data were not available for each individual. The experimental setup is similar to an earlier study of these three species from only one island and which used reduced representation sequencing (). Admixture mapping () revealed 28 loci that exceeded the significance threshold set by permutation (Fig. 1B) and represent independent loci (data S2). The size of these regions ranges from hundreds of kilobases (kb) to 2.7 Mb (Fig. 2) and contained between 0 and 35 genes (data S2). Outlier loci were clustered on macrochromosomes and included the previously described ALX1 and HMGA2 loci affecting the beak morphology, both located on chromosome 1A and only ~7 Mb apart (loci 4 and 9 in Fig. 1B). These loci are separated by a recombination hotspot (fig. S2), consistent with previous results showing that these loci do not show strong linkage disequilibrium (, ).
Fig. 2.

Haplotype variation across the Darwin’s finch phylogeny.

The heatmap displays average delta allele frequency (DAF) based on 34 to 328 SNPs/locus for each species compared to G. magnirostris, the species with the largest beak. On the right, bill size is presented according to a principal component (PC) analysis of three beak dimensions averaged across all island populations for each species. Only Geospiza species are shown. Above, the size of each genomic region (in mega-base-pairs) is marked in a dot plot. Right: Finch illustrations reproduced by permission of Lynx Edicions.

Haplotype variation across the Darwin’s finch phylogeny.

The heatmap displays average delta allele frequency (DAF) based on 34 to 328 SNPs/locus for each species compared to G. magnirostris, the species with the largest beak. On the right, bill size is presented according to a principal component (PC) analysis of three beak dimensions averaged across all island populations for each species. Only Geospiza species are shown. Above, the size of each genomic region (in mega-base-pairs) is marked in a dot plot. Right: Finch illustrations reproduced by permission of Lynx Edicions. Regions of association identified with admixture mapping largely mirrored the results of FST-based contrasts (Fig. 1D) and strongly correlated with per-window estimates in the two contrasts involving G. fuliginosa (R2 = 0.84, R2 = 0.69; data S2). We do not expect a perfect match between the results of admixture mapping and FST analysis because the former is based on a linear comparison of the trio, while the latter is derived from pairwise comparisons of species. In the 28 regions of association, G. fuliginosa and G. magnirostris were often homozygous for different haplotypes, while G. fortis exhibited intermediate allele frequencies (fig. S4 and data S3). This is highlighted at the HMGA2 locus on chromosome 1A, where measures of Tajima’s D are strongly negative for G. fuliginosa and G. magnirostris but strongly positive for G. fortis (fig. S4), consistent with balancing selection maintaining haplotype diversity in the phenotypically variable G. fortis population (). Nucleotide diversity was reduced in G. magnirostris relative to the genomic background in 23 of the 28 regions (Fig. 1C), consistent with selective sweeps in G. magnirostris or an ancestor. The 28 loci also fell in genome-wide low recombination regions (mean ρ in peaks = 1.4 compared with mean ρ outside = 2.0), with the exception of one peak on chromosome 25 (ρ = 4.7), consistent with previous studies in Darwin’s finches showing that regions of elevated differentiation often lie in recombination coldspots (). Low recombination in these regions has likely facilitated the persistence of large haplotype blocks despite high gene flow among Darwin’s finch species, as predicted from theoretical studies (). To explore the extent to which the loci detected in the ground finch contrast are also associated with phenotypic diversity among tree finches (Camarhynchus), we next performed a similar admixture analysis comparing small, medium, and large tree finches (C. parvulus, Camarhynchus pauper, and Camarhynchus psittacula, respectively), also classified as 0, 1, and 2, respectively, based on beak and body size. These samples were previously sequenced and include 46 individuals (n = 10 to 25 each) from eight islands. This replicated a signal for the HMGA2 locus affecting beak size (P = 4 × 10−16) (fig. S5), as expected from previous work (). No other locus showed such a notable signal of genetic differentiation, but we noted an overlap of higher genetic differentiation, approaching genome-wide significance, at several of the loci detected in the ground finch contrast (fig. S5). The identification of regions associated with phenotypic variation in size among Camarhynchus is hampered by the relatively small sample size (n = 46 versus n = 75 for Geospiza). Furthermore, we should not expect a perfect overlap because the beak proportions, skull architecture, and musculature of the three tree finches differ from those in the ground finches and thus may have a different genetic basis (). To determine the evolutionary origin of haplotypes at the 28 loci detected in the ground finch contrast, we compared allele frequencies at the most differentiated single-nucleotide polymorphisms (SNPs) across all 18 species of Darwin’s finches and two outgroups Loxigilla noctis and Tiaris bicolor. We also included the Big Bird hybrid lineage, which was formed by the mating of a Geospiza conirostris male and two G. fortis females (). The sample included genome sequences from previously published data and 62 new individuals (n = 321 in total; data S1). If genetic differentiation among Geospiza is caused primarily by de novo mutations that occurred after the split from Camarhynchus, then we would expect to find little shared haplotype structure in non-Geospiza species as a consequence of random accumulation of variants. In sharp contrast to this expectation, the allele frequency comparison revealed a nonrandom pattern (Fig. 2) and broad haplotype structures across the most differentiated SNPs at the 28 loci (data S4). Notably, a few G. magnirostris haplotypes are present at a relatively high frequency across the radiation (e.g., 10 and 25), while most are consistently highly differentiated from haplotypes in other species, except those with relatively large beaks. Furthermore, a large portion of G. magnirostris major alleles, 67% (1273 of 1914), were derived relative to outgroups L. noctis and T. bicolor, consistent with selective sweeps in the G. magnirostris lineage and after Galápagos colonization (data S5). Together, these results imply that the “large” and “small” haplotype blocks identified by admixture mapping in Geospiza predate the separation of Geospiza and Camarhynchus. The radiation of finches in the Galápagos proceeded rapidly, and phylogenetic reconstructions are characterized by short branch lengths, gene tree discordances, and incomplete lineage sorting (ILS). We therefore evaluated the hypothesis that these haplotype blocks are ancestral by examining the haplotype structure at each locus. Neighbor-joining trees based on the most differentiated SNPs at the 28 loci are not consistent with the genome-wide phylogeny of Darwin’s finch (fig. S6), which is driven by haplotype structuring at each locus (locus 24 in Fig. 3A and all 28 loci in data S4). We also find support for phylogenetic discordance at each locus using topology weighting, which demonstrates that the placement of G. magnirostris at each locus is inconsistent with the species tree (Fig. 4 and fig. S7). Next, we estimated the divergence time between haplotypes among all pairwise combinations of G. magnirostris, G. fuliginosa, and Certhidea olivacea (green warbler finch) based on pairwise nucleotide divergence (d). These data indicate that the divergence times between the G. magnirostris and G. fuliginosa haplotypes at the 28 loci tend to be older than the coalescent time for the two species (~260,000 years before present; Fig. 3A) and in the range of 380,000 to 800,000 years before present (Fig. 3B, fig. S8, and data S2), which means that the great majority predate the split between Geospiza and Camarhynchus about 400,000 years ago (Fig. 3A). The estimated divergence times involving C. olivacea haplotypes consistently exceeded the divergence time between haplotypes from the two Geospiza species, suggesting that they evolved after the split between warbler finches and other finches in the phylogeny. However, these data do not exclude the possibility of a divergence before this early split for some loci, because genetic exchange between haplotypes is possible in species that segregate for both alleles like many do in G. fortis (data S4). To explore this possibility, we estimated the time to most recent common ancestor (TMRCA) among all haplotypes of each strongly associated SNP using a probabilistic modeling approach that incorporates both mutation and recombination rate. We found that strongly associated SNPs have a broad distribution of TMRCA (fig. S9), but, on average, the 28 loci have an upper allele age of 937,000 years ago (SD = 206,000 years).
Fig. 3.

Characterization of 28 adaptive loci.

(A) Left: A neighbor-joining tree for all species of Darwin’s finches based on 4.9 million SNPs. The tree was converted to a chronogram using ape (), and branching times are from Lamichhaney et al. (). Right: A haplotype plot for each of the top associated variants on locus 24 with 0/0 for homozygous reference (yellow), 1/1 homozygous alternate allele (red), and 0/1 for heterozygous positions. Species names are colored by a-priori clade assignments, and a co-phylo diagram () highlights the changes in topology between the trees. (B) Time to most recent common ancestor (TMRCA) between G. fuliginosa and G. magnirostris haplotypes for all 28 loci. Time was estimated by the conversion of genetic divergence (d) to time using T = d/(2 μ) and a mutation rate (μ) of 2.04 × 10−9 (). The horizontal line marks 900,000 years, the approximate time of divergence between warbler finches and all other finches in the radiation (). (C) Fraction of introgression from G. magnirostris to G. propinqua on Genovesa Island for each of the 28 loci, as measured by d. Loci are arranged by the strength of the introgression, and a line is drawn at the median of genome-wide d = 0.04. The trio arrangement is written on the graph and ABBA/BABA statistics listed under Methods. In (B) and (C), error bars represent 95% confidence intervals.

Fig. 4.

Phylogenetic discordance at the eight loci on chromosome 1A.

(A) The three possible topologies representing relationships among four taxa: P. inornata, C. parvulus, G. magnirostris, and G. fuliginosa. Topology 1 is consistent with the genome-wide phylogeny (Fig. 3A), whereas topologies 2 and 3 are discordant, separating the monophyletic Geospiza species. (B) Violin/box plots showing weightings for topology 1 (the species tree) across the eight chromosomes containing the 28 loci regions. (C) Topology weightings calculated in nonoverlapping 100 SNP windows across chromosome 1A (chr1A). Locations of the 28 loci regions within this chromosome are indicated by gray shading.

Characterization of 28 adaptive loci.

(A) Left: A neighbor-joining tree for all species of Darwin’s finches based on 4.9 million SNPs. The tree was converted to a chronogram using ape (), and branching times are from Lamichhaney et al. (). Right: A haplotype plot for each of the top associated variants on locus 24 with 0/0 for homozygous reference (yellow), 1/1 homozygous alternate allele (red), and 0/1 for heterozygous positions. Species names are colored by a-priori clade assignments, and a co-phylo diagram () highlights the changes in topology between the trees. (B) Time to most recent common ancestor (TMRCA) between G. fuliginosa and G. magnirostris haplotypes for all 28 loci. Time was estimated by the conversion of genetic divergence (d) to time using T = d/(2 μ) and a mutation rate (μ) of 2.04 × 10−9 (). The horizontal line marks 900,000 years, the approximate time of divergence between warbler finches and all other finches in the radiation (). (C) Fraction of introgression from G. magnirostris to G. propinqua on Genovesa Island for each of the 28 loci, as measured by d. Loci are arranged by the strength of the introgression, and a line is drawn at the median of genome-wide d = 0.04. The trio arrangement is written on the graph and ABBA/BABA statistics listed under Methods. In (B) and (C), error bars represent 95% confidence intervals.

Phylogenetic discordance at the eight loci on chromosome 1A.

(A) The three possible topologies representing relationships among four taxa: P. inornata, C. parvulus, G. magnirostris, and G. fuliginosa. Topology 1 is consistent with the genome-wide phylogeny (Fig. 3A), whereas topologies 2 and 3 are discordant, separating the monophyletic Geospiza species. (B) Violin/box plots showing weightings for topology 1 (the species tree) across the eight chromosomes containing the 28 loci regions. (C) Topology weightings calculated in nonoverlapping 100 SNP windows across chromosome 1A (chr1A). Locations of the 28 loci regions within this chromosome are indicated by gray shading.

Hybridization and the mixing of ancestral haplotypes

The presence of distinct combinations of haplotypes across the phylogeny (Fig. 2 and data S4) indicates ILS or hybridization and introgression. The Genovesa cactus finch Geospiza propinqua has not only the fourth largest beak of all Geospiza but also the pointed beak characteristic of other cactus finches and carries a mixture of large and small haplotypes (Fig. 2). Gene flow from G. magnirostris to G. propinqua has been implicated from field observations (). We estimated the fraction of introgression from G. magnirostris to G. propinqua on Genovesa using d, which incorporates d into an extension of ABBA/BABA D statistics (), and found that regions of high G. magnirostris similarity share an excess of derived alleles (fig. 3C) and often reduced genetic divergence d (data S6), consistent with introgression and not ILS. In general, it remains an outstanding challenge to distinguish between ILS and introgression using D statistics (), but the genomic evidence presented here for introgression in G. propingua demonstrates that gene flow can transfer these haplotypes among species. The role of hybridization in generating distinct combinations of haplotypes is demonstrated in the Big Bird lineage (Fig. 2). The Big Bird lineage is characterized by a proportionally large beak for its body size (). The combination of G. conirostris, a sister species to G. magnirostris, and G. fortis alleles resulted in a unique phenotype and genotype with G. magnirostris haplotypes predominating at 22 of the 28 loci (Fig. 2).

Trait loci are enriched for developmental genes

The loci identified here segregate with phenotypic variation among the species and are expected to contain genes important for beak and body size variation. However, because this result is based on between-species contrasts, we cannot exclude the possibility that some loci reflect unmeasured phenotypic differences. We therefore explored previously described genotype-phenotype relationships in other taxa for the genes within these regions and their expression patterns. We conducted an enrichment analysis using mouse orthologs for the genes in the vicinity of the 28 loci using the software GREAT (genomic regions enrichment of annotations tool) () and found that deleterious mutations at these loci were significantly associated with abnormal development of cartilage and bones (Fig. 5A and fig. S10). Furthermore, these mouse orthologs were significantly enriched for genes expressed during craniofacial and limb development, consistent with our expectation that genetic changes at many of these 28 loci affect beak development and body size variation. The enrichment includes genes spread across the 28 loci (data S2).
Fig. 5.

Enrichment analysis and gene expression.

(A) Annotation term enrichment analysis. GREAT () was used to screen for enrichment of gene annotation terms associated with the 28 differentiated regions. −Log10 Bonferroni-corrected P values of significantly enriched terms are shown on the x axis. Fold enrichment is indicated using dot colors. Dot sizes indicate numbers of genes belonging to each annotation term. (B) Gene expression in Darwin’s finch upper beak versus other tissues. RNA-seq gene expression levels in upper beak (n = 9) were compared with expression levels in noncraniofacial tissues (n = 7, representing four different tissues. −Log10 (P values) for differential expression are shown on the y axis, and M values [log2(fold change beak samples versus other tissues)] are shown on the x axis. Dashed boxes show genes with M values < −5 and > +5, representing genes with lower (left) and higher (right) expression in beak samples (right) compared with other tissues. The asymmetry in −log10(P values) is a result of comparing one tissue (beak) to several other tissues. Gene names are shown for selected genes. * indicates separate genes belonging to a gene cluster of scale keratins. ** is a gene of uncertain function (see data S7). (C) In situ hybridization (ISH). Top: Schematic representation of the expression pattern of ALX1 (blue) in E7 embryos of zebra finch (n = 5) and Darwin’s finches (nine species, n = 21). Bottom: mRNA expression of ALX1 (dark purple) in mid-face longitudinal sections through the heads of Darwin’s finch embryos. The developing beak region is shown. Magnified area of the left images is shown on the right. Scale bars, 250 μm.

Enrichment analysis and gene expression.

(A) Annotation term enrichment analysis. GREAT () was used to screen for enrichment of gene annotation terms associated with the 28 differentiated regions. −Log10 Bonferroni-corrected P values of significantly enriched terms are shown on the x axis. Fold enrichment is indicated using dot colors. Dot sizes indicate numbers of genes belonging to each annotation term. (B) Gene expression in Darwin’s finch upper beak versus other tissues. RNA-seq gene expression levels in upper beak (n = 9) were compared with expression levels in noncraniofacial tissues (n = 7, representing four different tissues. −Log10 (P values) for differential expression are shown on the y axis, and M values [log2(fold change beak samples versus other tissues)] are shown on the x axis. Dashed boxes show genes with M values < −5 and > +5, representing genes with lower (left) and higher (right) expression in beak samples (right) compared with other tissues. The asymmetry in −log10(P values) is a result of comparing one tissue (beak) to several other tissues. Gene names are shown for selected genes. * indicates separate genes belonging to a gene cluster of scale keratins. ** is a gene of uncertain function (see data S7). (C) In situ hybridization (ISH). Top: Schematic representation of the expression pattern of ALX1 (blue) in E7 embryos of zebra finch (n = 5) and Darwin’s finches (nine species, n = 21). Bottom: mRNA expression of ALX1 (dark purple) in mid-face longitudinal sections through the heads of Darwin’s finch embryos. The developing beak region is shown. Magnified area of the left images is shown on the right. Scale bars, 250 μm. Because of the difficulties in interpreting gene-by-term enrichment with data from other species (), we performed two types of analysis to validate the association of beak gene expression with the differentiated loci. Using RNA-seq data from Darwin’s finches, we compared genes expressed at embryonic day 7 (E7) in the developing upper beak (nine embryos representing six species) with five other tissues (brain, gut, heart, left forelimb, and trunk from six embryos representing three species) (table S1). This analysis revealed that many of the genes within the 28 loci are expressed in the developing beak, and these loci were 14-fold enriched for genes with higher expression levels (M≥5 = fold change ≥32) in the developing beak compared with other tissues (χ2 test, P = 7.4 × 10−24, d.f. = 1; Fig. 5B). We carried out in situ hybridization (ISH) of two candidates for craniofacial development (ALX1, locus 7; RUNX2, locus 24). ISH data on a total of seven zebra finch (Taeniopygia gutatta) and 27 Darwin’s finch embryos (of nine species) revealed that ALX1 expression was strongly biased to the beak region over the developmental period when beak size and shape are established (E6 to E7) (Fig. 5C and figs. S11 and S12A) (). In addition, we confirmed a similar expression pattern in zebra finch embryos for RUNX2 (alias OSF2)—an essential gene for ossification of the mesenchyme (fig. S12B) () located within a strong signal of differentiation among Geospiza (fig. S4B).

DISCUSSION

We have identified ancestral haplotypes at 28 loci that contribute to the unique phenotypic diversity of the Darwin’s finch radiation. Our functional characterization of these loci contributes to a growing body of literature, suggesting that genetic differences between species of Darwin’s finches are enriched for genes involved in the key pathways for growth and development of beaks (, –, ). In this study, we show that the distribution of these genetic modules across the phylogeny reflects natural selection and most likely both ILS and introgression (, ). It is not only the presence/absence of these haplotype blocks that affects the phenotype but also their frequency within species, which is illustrated by intermediate haplotype frequencies at many of these loci in G. fortis (Fig. 2 and data S4 and S5). Intermediate frequencies, indicative of balanced polymorphism, provide the underlying variation for selection to sort adaptive haplotypes during speciation. In G. fortis, variation may have been crucial for survival during strong selection events (). Our findings support previous suggestions that ancestral variants contribute to phenotypic diversity, as indicated for pigmentation phenotypes among other songbird species (, ), color morphs in the common wall lizard (), color patterns in Heliconius butterflies (), various phenotypic traits in cichlids (, ), craniofacial morphology in pupfish (), winter coat in snowshoe hares (), and adaptation to high altitude in humans (). This type of ancestral variation is also retained within large populations preceding speciation, as illustrated in Atlantic herring and stickleback where ecotypes show differences in the frequency of haplotype blocks at hundreds of loci, all underlying ecological adaption (, ). Characteristic features of these 28 loci are the large size of the haplotype blocks, often spanning hundreds of kilobases, and their ancient origins (Fig. 3B). The haplotype structure is in contrast to another ancestral polymorphism in Darwin’s finches, at the BCO2 locus controlling nestling beak color, where a single base change constitutes the likely causal mutation (). The identification of causal variants within the haplotype blocks described here is challenging because of strong linkage disequilibrium among many sequence variants within each region. Such large haplotypes could include structural variants, which have been proposed as a key determinant of adaptive evolution and speciation (, ), but these 28 loci do not represent large inversions (>5 Mb) and tend to be relatively small (0.1 to 2.7 Mb) compared to previously described inversions in vertebrates (, –). Furthermore, none of the loci exhibit the sharp borders in our association analysis that are characteristic of inversions maintained as balanced polymorphisms (, ). The block structures at the 28 loci most likely reflect large-effect haplotypes composed of clusters of multiple causal variants that have accumulated during the evolution of Darwin’s finches (), similar to the evolution of alleles in domestic animals by the sequential accumulation of causal mutations during the past 10,000 years (). The occurrence of these haplotype blocks in low recombination regions likely facilitated their evolution (data S2). The reuse of ancestral genetic modules is a much faster route to adaptive change than the slow accumulation of adaptive de novo mutations (). Our study is comprehensive in surveying genomic variation across all 18 extant species of a single adaptive radiation, and yet, the principle finding, of repeated reassembly of ancient haplotype blocks in the formation of species, is likely to be a general feature of rapid radiations (, ) and of general importance for how species adapt to environmental variability and change ().

METHODS

Sample collection

Blood samples were collected from various Galápagos islands as part of sampling described elsewhere (, ) and stored on EDTA-soaked filter paper in DRIERITE to preserve red blood cells for DNA extraction later. This included 101 individuals of eight different species (data S1). An additional 18 samples of six species were captured using mist nests on San Cristóbal in 2018 for this study and resequenced using short reads (data S1). In total (combined with sequences from previous studies), our sampling includes all 18 species of Darwin’s finches, the hybrid Big Bird lineage, and two outgroup species that sum to a combined sampling of 321 individuals (data S1). Of the individuals captured on San Cristóbal in 2018, one small tree finch was sampled for targeted long-read sequencing (see below). Sampling was conducted in accordance with the protocols of Princeton University’s Animal Welfare Committee. Embryos for RNA-seq for annotation (n = 11, three species) and for differential expression (n = 9, six species) analyses and for ISH (n = 27, nine species) were collected on Santa Cruz, Genovesa, and Pinta as described (table S1) (). Embryos were stored in methanol or RNAlater (Thermo Fisher Scientific, CA) until further use.

ONT sequencing

Pilot experiments before the expedition indicated that avian DNA (chicken and Darwin’s finches) was not sequenced efficiently. We decided to develop a protocol to optimize yield from avian DNA sequencing with the MinION and hypothesized that the issue was partially due to avian DNA being enriched in molecules carrying certain motifs or creating certain types of secondary structures, which promoted blocking of nanopores and thus resulted in premature loss of actively sequencing nanopores and ultimately poor sequencing yields. We further hypothesized that T7 endonuclease I treatment of DNA before library generation would increase yields because T7 endonuclease I has been used to increase sequencing yield in an ONT protocol for sequencing DNA amplified using bacteriophage Φ29 polymerase and has a broad substrate specificity involving four-way junctions, various branched structures, and single-base mismatched heteroduplexes (). We first used g-TUBEs (Covaris) to achieve a controlled fragmentation of the isolated DNA to approximately 6 to 10 kb. Electrophoresis of the original g-TUBE–fragmented DNA side by side with the T7 endonuclease I–treated sample revealed a low-molecular weight smear unique to the treated sample. To efficiently remove DNA cut by T7 endonuclease I, we used a custom SPRI bead mixture (SeraMag SPRI beads, Thermo Fisher Scientific) capable of retaining DNA above the size of approximately 4 to 5 kb, thus selecting against the DNA cut by T7 endonuclease I. We used the T7 endonuclease I cleavage protocol to prepare DNA for library generation for libraries aimed to generate reads in the 6 to 10 kb size range. For libraries where we aimed at sequencing longer molecules, we did not conduct any T7 cleavage. In the case of T7 cleavage of DNA before library generation, we ran agarose gel electrophoresis before advancing to library isolation to ascertain successful removal of the low–molecular weight DNA smear by the SeraMag speed bead mixture. DNA was isolated by two main protocols, depending on whether we aimed to construct DNA libraries in the 6- to 10-, 20- to 30-, or 50+-kb size ranges. For all samples, we started out with small amounts of blood (5 to 20 μl) sampled from wing veins of birds using glass capillaries. Immediately following sampling, the blood was deposited in a tube containing 1 ml of cold phosphate-buffered saline (10 mM EDTA). The sampled blood was kept cold in a Styrofoam box containing ice packs until back at the laboratory facility. Nuclei were isolated by adapting the protocol for cell culture in the QIAGEN Genomic DNA Handbook using ice-cold buffer C1 (QIAGEN). Following isolation of nuclei, we used either blood and tissue spin columns (QIAGEN) or NaCl/ethanol precipitation to isolate <30 kb or high–molecular weight DNA, respectively. Regardless of means of DNA isolation, we included a DNA cleanup step using SPRI beads before library generation. SeraMag SPRI beads were prepared by mixing 10 ml of 5 M NaCl, 500 μl of 1 M tris, 100 μl of 0.5 M EDTA, and 10 ml of H2O in a 50-ml Falcon tube. SeraMag beads (1 ml) were washed five times in 1 ml of tris-EDTA buffer (10 mM tris, 1 mM EDTA) using a magnet stand, and the final 1-ml volume was added to the 20.6 ml of solution. A 50% (w/v) solution of PEG8000 in water was prepared, and 18 ml of this mix was added to the 21.6 ml of bead solution. The content was rigorously vortexed. Tween 20 (27.5 μl) was added to the mixture, and the volume was adjusted to 50 ml with H2O. The final mixture was vortexed and then aliquoted to 1.5-ml Eppendorf tubes. The performance of the bead mix at different bead/sample volume ratios was evaluated using a mixture of bacteriophage lambda DNA and 1 kb of DNA ladder. From one small tree finch (C. parvulus) individual (STF5), we produced four LSK-108 libraries from g-TUBE–fragmented DNA and five LSK-108 libraries made from high–molecular weight DNA using the protocol Genome sequencing by ligation, selecting for long reads (ONT). The libraries were sequenced on a GridION instrument (ONT) according to the manufacturer’s instructions. Together, from individual STF5, we generated 32.3 Gb of sequence data, corresponding to approximately 30× coverage.

HiC analysis

We were not able to perform HiC analysis at the Galápagos Science Center but were able to export fresh blood from a G. fortis individual; therefore, the HiC analysis was based on a G. fortis individual, although the long-read data were from a C. parvulus individual. The blood was kept refrigerated until departure from Galápagos and was kept chilled (2° to 10°C) until arrival at Uppsala University, where a nuclei isolation protocol (buffer C from the QIAGEN Genomics Buffer) was applied to isolate erythrocyte nuclei. Isolated nuclei were immediately snap frozen in liquid nitrogen and kept at −80°C until further use. Nuclei were used for HiC library generation using the Arima Genomics HiC Kit v1, and the resulting HiC library was sequenced on a NextSeq 2000 instrument (Illumina).

RNA sequencing

We prepared RNA-seq libraries for genome annotation using a variety of library preparation protocols, tissues, and finch species. We used a ribosomal RNA depletion kit from NEBNext for six samples of lower beak, jaw muscles, brain, gut and heart, left forelimb, and brain. An additional four samples were prepared using poly(A) enrichment protocols from NEBNext for brain and jaw muscle tissues. Last, a single G. fortis trunk sample was enriched using a Lexogon SENSE total poly(A)-enriched RNA-seq kit and sequenced at three times higher depth than the remaining samples. The specific kits and sample accessions are listed in table S1. All 11 libraries were multiplexed and sequenced on a flow cell of an Illumina (San Diego, CA) SP chip. For differential expression analysis, we extracted RNA from the upper beak primordia of nine embryos from six species of Darwin’s finches (table S1), prepared cDNA libraries with the NEBNext Ultra RNA Library Prep Kit for Illumina (New England Biolabs, MA) with poly(A) selection, and sequenced them on HiSeq 4000 (Illumina, CA).

Contrasting gene expression in beak and other tissues

RNA-seq data were aligned to the genome assembly using STAR v2.7.2b (), guided by the GTF file from genome annotation. For each sample, uniquely mapping reads for each annotated gene were extracted using the STAR option “--quantMode GeneCounts.” Per-sample gene counts were normalized to transcripts per kilobase million (TPM) values, by first dividing observed counts with the longest isoform length in kilobases and then by dividing those values with the total numbers (in millions) of uniquely mapping reads. Obtained TPM values were used to contrast upper beak development samples (n = 9) with samples corresponding to other tissues (n = 7) (table S1). P values for differential expression from two-sided t tests were calculated for each gene. M values were calculated as −log2(average TPM beaks / average TPM other tissues).

Short-read sequencing

One hundred one individuals were extracted using a custom salt preparation method [described by Enbody et al. ()] and sequenced using the TruSeq Kit (Illumina, CA). Sixteen additional whole-genome libraries were prepared using a custom Tn5 transposon–based tagmentation protocol derived from Picelli et al. () and detailed by Enbody et al. (). Briefly, we assembled the Tn5 transposon construct using the stock Tn5 (prepared by Karolinska Institutet Protein Science Facility) and the following primers (): Tn5MErev: (5′-[phos]CTG TCTCTTATACACATCT-3′), Tn5ME-A (Illumina FC-121-1030) (5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3′), and Tn5ME-B (Illumina FC-121-1031) (5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3′).

In situ hybridization

Tissue sectioning and fixation, probe hybridization, and signal detection were performed according to the previously published protocols () with probe concentration of 0.5 ng/μl. Digoxigenin (DIG)–labeled antisense riboprobes were generated by polymerase chain reaction followed by RNA synthesis according to the standard procedures using T3/T7 promoter primers combined with the following gene-specific primers: ALX1 [581 base pairs (bp)] (forward: 5′-CAGGACAGCAACGTCAACTA-3′; reverse: 5′-AAGCCTGTGTAGCCAGAATC-3′), COL2A1 (683 bp) (forward: 5′-GCAAGGCCAAGGAGAAGAA-3′; reverse: 5′-TGATTCTGGTGTTGGGATGAG-3′), COL9A1 (608 bp) (forward: 5′-CTGGCCCAAAGGGTAATAGAG-3′; reverse: 5′-ACCAAATTCTGGCCTCCTAAG-3′), and RUNX2 (619 bp) (forward: 5′-GAACCAGGTGGCCAGATTTA-3′; reverse: 5′-GACTGGCGGTGTATAGGTAAAG-3′).

Genome assembly

The methods used for genome assembly, gene and repeat annotation, and construction of linkage map are provided in Supplementary Methods.

Population genomics analysis

The Supplementary Methods contains a full description of the methods used for short-read variant analysis, genotype phasing, admixture mapping, genetic diversity and divergence analyses, allele age estimation, analysis of introgression, gene enrichment analysis, phylogenetic reconstruction, analyses of allele frequencies, and topology weighting.
  78 in total

1.  Collection of embryos from Darwin's finches (Thraupidae, Passeriformes).

Authors:  Arhat Abzhanov
Journal:  Cold Spring Harb Protoc       Date:  2009-03

2.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

Review 3.  Molecular consequences of animal breeding.

Authors:  Leif Andersson
Journal:  Curr Opin Genet Dev       Date:  2013-04-16       Impact factor: 5.578

4.  Massive haplotypes underlie ecotypic differentiation in sunflowers.

Authors:  Marco Todesco; Gregory L Owens; Natalia Bercovich; Jean-Sébastien Légaré; Shaghayegh Soudi; Dylan O Burge; Kaichi Huang; Katherine L Ostevik; Emily B M Drummond; Ivana Imerovski; Kathryn Lande; Mariana A Pascual-Robles; Mihir Nanavati; Mojtaba Jahani; Winnie Cheung; S Evan Staton; Stéphane Muños; Rasmus Nielsen; Lisa A Donovan; John M Burke; Sam Yeaman; Loren H Rieseberg
Journal:  Nature       Date:  2020-07-08       Impact factor: 49.962

5.  Chromosomal inversion differences correlate with range overlap in passerine birds.

Authors:  Daniel M Hooper; Trevor D Price
Journal:  Nat Ecol Evol       Date:  2017-08-28       Impact factor: 15.460

6.  pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data.

Authors:  Katharine L Korunes; Kieran Samuk
Journal:  Mol Ecol Resour       Date:  2021-02-05       Impact factor: 7.090

7.  Genomic evidence of speciation reversal in ravens.

Authors:  Anna M Kearns; Marco Restani; Ildiko Szabo; Audun Schrøder-Nielsen; Jin Ah Kim; Hayley M Richardson; John M Marzluff; Robert C Fleischer; Arild Johnsen; Kevin E Omland
Journal:  Nat Commun       Date:  2018-03-02       Impact factor: 14.919

8.  Regulatory changes in pterin and carotenoid genes underlie balanced color polymorphisms in the wall lizard.

Authors:  Pedro Andrade; Catarina Pinho; Guillem Pérez I de Lanuza; Sandra Afonso; Jindřich Brejcha; Carl-Johan Rubin; Ola Wallerman; Paulo Pereira; Stephen J Sabatino; Adriana Bellati; Daniele Pellitteri-Rosa; Zuzana Bosakova; Ignas Bunikis; Miguel A Carretero; Nathalie Feiner; Petr Marsik; Francisco Paupério; Daniele Salvi; Lucile Soler; Geoffrey M While; Tobias Uller; Enrique Font; Leif Andersson; Miguel Carneiro
Journal:  Proc Natl Acad Sci U S A       Date:  2019-02-28       Impact factor: 11.205

9.  Fast and accurate long-read assembly with wtdbg2.

Authors:  Jue Ruan; Heng Li
Journal:  Nat Methods       Date:  2019-12-09       Impact factor: 28.547

10.  Ecological adaptation in Atlantic herring is associated with large shifts in allele frequencies at hundreds of loci.

Authors:  Fan Han; Minal Jamsandekar; Mats E Pettersson; Leyi Su; Angela Fuentes-Pardo; Brian Davis; Dorte Bekkevold; Florian Berg; Michele Casini; Geir Dahle; Edward D Farrell; Arild Folkvord; Leif Andersson
Journal:  Elife       Date:  2020-12-04       Impact factor: 8.140

View more
  1 in total

1.  Spatiotemporal variations in retrovirus-host interactions among Darwin's finches.

Authors:  Jason Hill; Mette Lillie; Mats E Pettersson; Carl-Johan Rubin; B Rosemary Grant; Peter R Grant; Leif Andersson; Patric Jern
Journal:  Nat Commun       Date:  2022-10-13       Impact factor: 17.694

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.