Cryptic speciation may occur when reproductive isolation is recent or the accumulation of morphological differences between sister lineages is slowed by stabilizing selection preventing phenotypic differentiation. In North America, Bicknell's Thrush (Catharus bicknelli) and its sister species, the Gray-cheeked Thrush (Catharus minimus), are parapatrically breeding migratory songbirds, distinguishable in nature only by subtle differences in song and coloration, and were recognized as distinct species only in the 1990s. Previous molecular studies have estimated that the species diverged approximately 120,000-420,000 YBP and found very low levels of introgression despite their similarity and sympatry in the spring (prebreeding) migration. To further clarify the history, genetic divergence, genomic structure, and adaptive processes in C. bicknelli and C. minimus, we sequenced and assembled high-coverage reference genomes of both species and resequenced genomes from population samples of C. bicknelli, C. minimus, and two individuals of the Swainson's Thrush (Catharus ustulatus). The genome of C. bicknelli exhibits markedly higher abundances of transposable elements compared with other Catharus and chicken. Demographic and admixture analyses confirm moderate genome-wide differentiation (Fst ≈ 0.10) and limited gene flow between C. bicknelli and C. minimus, but suggest a more recent divergence than estimates based on mtDNA. We find evidence of rapid evolution of the Z-chromosome and elevated divergence consistent with natural selection on genomic regions near genes involved with neuronal processes in C. bicknelli. These genomes are a useful resource for future investigations of speciation, migration, and adaptation in Catharus thrushes.
Cryptic speciation may occur when reproductive isolation is recent or the accumulation of morphological differences between sister lineages is slowed by stabilizing selection preventing phenotypic differentiation. In North America, Bicknell's Thrush (Catharus bicknelli) and its sister species, the Gray-cheeked Thrush (Catharus minimus), are parapatrically breeding migratory songbirds, distinguishable in nature only by subtle differences in song and coloration, and were recognized as distinct species only in the 1990s. Previous molecular studies have estimated that the species diverged approximately 120,000-420,000 YBP and found very low levels of introgression despite their similarity and sympatry in the spring (prebreeding) migration. To further clarify the history, genetic divergence, genomic structure, and adaptive processes in C. bicknelli and C. minimus, we sequenced and assembled high-coverage reference genomes of both species and resequenced genomes from population samples of C. bicknelli, C. minimus, and two individuals of the Swainson's Thrush (Catharus ustulatus). The genome of C. bicknelli exhibits markedly higher abundances of transposable elements compared with other Catharus and chicken. Demographic and admixture analyses confirm moderate genome-wide differentiation (Fst ≈ 0.10) and limited gene flow between C. bicknelli and C. minimus, but suggest a more recent divergence than estimates based on mtDNA. We find evidence of rapid evolution of the Z-chromosome and elevated divergence consistent with natural selection on genomic regions near genes involved with neuronal processes in C. bicknelli. These genomes are a useful resource for future investigations of speciation, migration, and adaptation in Catharus thrushes.
SignificanceWe present the first high-coverage reference genomes for a cryptic avian species complex that is a model study system for ecological speciation. Our analyses of population samples of whole-genomes confirm the lack of genetic exchange between Bicknell’s Thrush and Gray-cheeked Thrush and suggest a role for neuronal differentiation in their divergence.
Introduction
The work by scientists and natural historians to understand the process of speciation, which stretches back to well before Charles Darwin, is advanced today by comparisons of the complete genome sequences of recently diverged sister species. Knowing how genomes have diverged and which genes are involved in reproductive isolation is the key to understanding how biodiversity is generated. Genomic comparisons of species complexes and hybridizing species have brought more and more-detailed sequence data to bear on the genetic and ecological mechanisms of speciation, clarifying the linkage between genetic divergence and reproductive isolation. In some cases, differences in genomic architecture, such as high numbers of transposable elements (TEs) or the plasticity of sex chromosomes, are thought to contribute to rapid evolution of reproductive isolation (Campbell et al. 2018). A growing number of cases have implicated diversifying selection on a small number of coding and regulatory genes that contribute to reproductive isolation, even in the face of ongoing gene flow between diverging lineages (Weber et al. 2017; Chapman et al. 2013; Lavretsky et al. 2015; Wellenreuther and Bernatchez 2018; Adrion et al. 2020; Martinsen et al. 2018; Katoh and Standley 2013; Capella-Gutierrez et al. 2009; Jehl et al. 2015; Stamatakis 2014; Broad Institute 2019; Korneliussen et al. 2014; Gautier et al. 2017; Szpiech 2021; Browning and Browning 2007; and Angiuoli and Salzberg 2011). In birds, whole-genome sequencing and gene expression approaches that permit comparison of thousands of loci among recently diverged species have identified gene regions with elevated levels of divergence relative to background levels across the genome (Ellegren et al. 2012). Several studies have implicated divergent selection at loci on the Z sex chromosome (Lavretsky et al. 2015) and on autosomal genes involved in physiological and developmental pathways related to plumage differences (Poelstra et al. 2014), or beak morphology (Abzhanov et al. 2004, 2006) as the basis for speciation. Much of the focus of research at this early stage of the speciation genomics era (Campbell et al. 2018) has focused on lineages in which species recently descended from a common ancestor differ conspicuously in beak morphology (Lamichhaney et al. 2015) or in plumage traits (Ellegren et al. 2012; Toews et al. 2016; Turbek et al. 2021), and on sister species known to hybridize in nature (Irwin et al. 2018; Baiz et al. 2020).Genomic analysis of cryptic species provides a different opportunity: to examine the genetic basis of reproductive isolation in taxa for which mating signals are nonvisual, or for which stabilizing selection results in morphological stasis despite deep divergence (Bickford et al. 2007). The definition of “cryptic species” is a matter of some debate (Struck et al. 2018), but phenotypically similar sibling species, regardless of the extent to which they are sympatric or distinguishable in nature, present the chance to characterize the barriers to gene flow that arise as populations adapt to local environmental conditions (Cicero 2004). Parapatric range boundaries and secondary contact zones between avian sibling species often occur at ecotones, and genomic comparisons may identify candidate genes under divergent environmental selection, as in the case of genes associated with osmoregulation and plumage melanism in sibling sparrow species that occupy freshwater versus tidal marsh habitats (Walsh et al. 2018, 2019). Alternatively, we may expect to find genomic divergence to be highly heterogeneous across the genome, and to reflect the accumulation of Dobzhansky–Muller incompatabilities that have arisen during the species’ long history of geographic segregation in different habitat types (Coyne and Orr 2004). Another possibility is that cryptic species may differ with respect to major genomic structural features such as chromosome inversions that act as postzygotic isolating mechanisms (Wellenreuther and Bernatchez 2018).North American thrushes in the genus Catharus are models for speciation and the evolution of seasonal migration and its role in population differentiation (Winker and Pruett 2006; Voelker et al. 2013; Ruegg et al. 2014). Bicknell’s Thrush (Catharus bicknelli) and Gray-cheeked Thrush (Catharusminimus) comprise a cryptic species complex because they were once classified as a single nominal species and are superficially indistinguishable in the field (supplementary fig. S1, Supplementary Material online). From its first description (Ridgeway 1882) on the basis of its smaller body size and distinct song (Bicknell 1882a, 1882b), Bicknell’s Thrush was considered a subspecies of the Gray-cheeked Thrush Catharus minimus bicknelli (Ridgeway 1882). Wallace (1939) clarified the distribution and geographic variation of the species complex, and recognized that there were three distinct forms, later classified as subspecies: the nominate form breeding on Newfoundland (C. minimus minimus) was intermediate in size and coloration between Gray-cheeked Thrushes breeding on the Canadian mainland, Alaska, and Siberia (C. minimus aliciae), and those breeding in high-elevation boreal forest isolates from Nova Scotia to the Catskills Mountains of New York (C. minimus bicknelli). Recognition by taxonomic authorities (American Ornithologists’ Union 1995) of Bicknell’s Thrush as a valid species (C.bicknelli) followed Ouelett’s (1993) careful analyses of new vocal recordings, phenotypic (color and size) and habitat variation, and the first mtDNA data (restriction fragment length polymorphism) from the species complex. Ouelett (1993) reported a 1.7% mitochondrial divergence between Bicknell’s Thrush and Gray-cheeked Thrush and noted the absence of known hybridization, but found size variation to be overlapping and clinal, with Newfoundland birds (C. m. minimus) occupying the intermediate range. Marshall (2001) doubted that the genetic differences were sufficient to merit designation of Bicknell’s Thrush as a full species, but subsequent DNA sequencing studies of Bicknell’s Thrush and Gray-cheeked Thrush have corroborated the deep mtDNA divergence, dating the split between the species to approximately 410,000–420,000 years ago (Voelker et al. 2013; FitzGerald 2017). Using approximately 2,000 ultraconserved elements (UCEs), Everson et al. (2019) estimated a divergence time between C. bicknelli and C. minimus of approximately 123 kyr, although this date likely represents a slight overestimate (see Discussion). FitzGerald et al. (2020) found very low levels of introgression at 5,633 anonymous SNP loci and only one putative hybrid (possibly an F1) among 264 birds sampled, a phenotypic Gray-cheeked Thrush breeding in southern Labrador.Recent analyses of breeding range climate and habitat characteristics have shown that Bicknell’s and Gray-cheeked thrushes occupy broadly different forest niches (FitzGerald 2017) and inhabit distinct local-scale habitats (Ralston et al. 2019). Pleistocene climate models suggest that the species occupied allopatric breeding refugia at the Last Glacial Maximum (Fitzgerald et al. 2020). The two species breed parapatrically, with a gap of only 60 km separating them in southern Quebec (Marshall 2001), and winter allopatrically in South America (C. minimus) and the Greater Antilles (C. bicknelli). It remains unclear whether breeding habitat specialization is a sufficient isolating mechanism in migratory birds that are sympatric at some periods of the annual migration cycle. Every spring, Gray-cheeked Thrushes pass through the breeding range of the Bicknell’s Thrush at a time when both species are reproductively active as evidenced by accumulating sperm in cloacal protuberances (Quay 1986); see maps in supplementary figure S1, Supplementary Material online. Moreover, Bicknell’s Thrushes are polygynandrous; both males and females mate with multiple partners, and most broods are sired and cared for by multiple males (Goetz et al. 2003). The high potential for extra-pair copulations on the Bicknell’s Thrush breeding grounds would seem to increase opportunities for introgression, and yet little has been found (Fitzgerald et al. 2020). Winker (2010) has proposed that the divergence of Bicknell’s and Gray-cheeked Thrush is an example of heteropatric speciation, in which reproductive isolation is a byproduct of adaptation to different environments that enhances breeding allopatry and allochrony despite periods of sympatry outside the breeding season.To investigate the extent to which divergent natural selection on individual loci and differences in major genomic features are associated with speciation, we sequenced and assembled reference genomes from Bicknell’s Thrush and Gray-cheeked Thrush, hereafter referred to as C. bicknelli and C. minimus, respectively. We characterized their TE landscapes, refined our understanding of genetic diversity and demographic history in the complex, and conducted multiple tests of natural selection on the genome to understand how selection has caused the species to diverge. We find strong evidence for rapid evolution and natural selection on the Z-chromosome and identify genomic regions undergoing strong allele or haplotype frequency shifts that implicate neurological processes accompanying speciation. Although our results do not unambiguously identify definitive drivers of speciation, they are consistent with a role for song divergence and potentially habitat- or altitudinal preferences in divergence of C. bicknelli.
Results
Standard and Reference-Guided Genome Assemblies
We assembled two reference genomes, from C. bicknelli and C. minimus, using fragment and jumping libraries sequenced with Illumina technology and assembled with AllPaths. Our phylogenetic analysis of UCEs from the two reference genomes and those from Everson et al. (2019) confirmed that our reference genomes grouped with their conspecifics with high support (supplementary fig. S7, Supplementary Material online). Whereas the assembly quality for C. minimus was average for short-read sequencing technology (Scaffold/ContigN50, 2.5/0.035 Mb; see Bravo et al. [2021] for comparison to other species), the quality of the C. bicknelli assembly was markedly lower (1.07/0.024 Mb; supplementary table S2, Supplementary Material online), despite similar DNA quality for both samples (supplementary fig. S2, Supplementary Material online). Despite the higher effective coverage of the C. bicknelli genome (42×) compared with the C. minimus reference (35×), BUSCO analysis revealed a higher proportion of fragmented genes in the C. bicknelli genome as compared with the C. minimus genome (supplementary fig. S3, Supplementary Material online). Unexpectedly for such closely related taxa, the genome size estimated from k-mers using AllPaths was substantially higher for C. bicknelli (1.5 Gb) than for C. minimus (1.21 Gb), with GC contents differing subtly (43.8% vs. 42.7%, see supplementary table S2, Supplementary Material online). Also surprising was the Allpaths report of 42% repetitive content in C. bicknelli as compared with 21% for C. minimus—both high for birds and unusually different for such closely related taxa (supplementary table S2, Supplementary Material online).Genome completeness improved for both species after we scaffolded the two reference genomes to the guided Catharusustulatus chromosome scale reference, with reduced numbers of fragmented and missing BUSCO genes and increased complete and single-copy genes as compared with without scaffolding (supplementary fig. S3, Supplementary Material online). Additionally, grouping scores for each scaffold—a measure of the specificity of mapping of each scaffold to the reference guide genome—were generally high and indicated that 75–90% of the scaffolds mapped entirely to single chromosomes of the guided reference genome. Catharusbicknelli and C. minimus pseudochromosome 19 had the lowest quality grouping score (supplementary fig. S4, Supplementary Material online). The lengths of C. bicknelli and C. minimus pseudochromosomes were similar to those of the C. ustulatus reference guide, although some pseudochromosomes were missing up to approximately 5 Mb. Only half of pseudochromosome 19 was recovered compared with the Swainson’s Thrush (supplementary fig. S4, Supplementary Material online), but the scaffolding procedure was efficient for the other chromosomes. We removed unplaced scaffolds and separated scaffolds mapping to the Z pseudochromosome and autosomes from all downstream analyses. Because of the greater fragmentation of the C. bicknelli assembly, we based downstream mapping and SNP calling on the C. minimus reference genome. All but two of the resequenced samples had high mapping rates (supplementary fig. S5, Supplementary Material online) and coverage between 5× and 19× (supplementary figs. S5, Supplementary Material online), indicating variable coverage per sample in our data set and necessitating the use of genotype likelihoods (GL) to call SNPs and generate summary statistics.
Evolution of TE Landscapes
The higher fragmentation of the C. bicknelli was associated with an estimate of 42% repeat content by Allpaths, a surprisingly high value for an avian genome. However, the estimates of TE abundance from RepeatMasker were generally lower (∼10–11%) (supplementary table S3A–F, Supplementary Material online). Overall, RepeatMasker results imply that the repeat content of the two species is broadly similar. Estimates of repeat content for the Allpaths assemblies were comparable with the Ragoo-assisted assemblies. We additionally interrogated the TE and repeat landscape of our data set using raw Illumina reads, dnaPipeTE (Goubert et al. 2015) and the same improved TE library constructed with EDTA that we used with RepeatMasker. Consistent with the pattern of fragmentation in the assemblies, we found that the repeat content of C. bicknelli was estimated to be considerably higher than those of the C. minimus, C. ustulatus or Catharusfuscescens genomes, as well as the chicken genome. Whereas the non-bicknelli Catharus genomes had annotated TE contents ranging from 11.88% in C. fuscescens to 14.06% in C. ustulatus, C. bicknelli exhibited up to 19.34% annotated TE content (supplementary table S4, Supplementary Material online). When we include the nonannotated TEs and those placed in the “other” category by dnaPipeTE, non-C. bicknelli genomes were estimated to contain from 15.11% TEs in C. minimus to 19.24% in C. fuscescens, whereas C. bicknelli possessed up to 35.97%, close to the value estimated by Allpaths. The chicken “control” was estimated to contain 9.16% TE and repeat content, close to previously published estimates (Wicker et al. 2005). Comparison of the four focal species in our comparative analysis (fig. 1) suggests a detectable increase in TE content in the lineage leading to C. bicknelli. This increase appeared to be distributed across all chromosomes; comparison of the TE proportion per chromosome between C. bicknelli and C. minimus indicated that TEs were distributed relatively uniformly across the chromosomes of both species (supplementary fig. S9, Supplementary Material online) and most of the pseudochromosomes yielded slightly higher proportions of TEs for C. bicknelli than for C. minimus (fig. 1supplementary fig. S10 and table S3, Supplementary Material online). The fraction of reads sampled had a minor effect on the estimated proportion of major classes of annotated TEs (fig. 1 and supplementary fig. S8, Supplementary Material online). However, the estimated proportion of unannotated repeats increased considerably with increased sampling of reads, in Catharus but not in chicken, suggesting that this genome component of Catharus was sensitive to intensity of sampling (supplementary fig. S8, Supplementary Material online). The distribution of annotated TE proportions decreased when using mapped Illumina reads as compared with raw Illumina reads (supplementary fig. S11, Supplementary Material online; −31.2% for bicknelli, −25.5% for minimus), although the typical flags for mitochondrial DNA in the reads were not evident (Goubert et al. 2015), perhaps because most of the reads were derived from blood, which in birds has a low fraction of mitochondrial DNA (Quinn and White 1987; supplementary fig. S8 and tables S4 and S5, Supplementary Material online). Regardless of the source of the reads, the higher proportion of TEs in bicknelli as compared with minimus remained, ranging from approximately 6% to 10% on autosomes (fig. 2 and supplementary table S4, Supplementary Material online). Whereas the Z chromosome was estimated to have slightly higher percentage of TEs compared with autosomes by RepeatMasker (supplementary table S3, Supplementary Material online), the Z chromosome proportion as estimated by dnaPipeTE was lower than for mapped autosomal reads (fig. 2 and supplementary table S4, Supplementary Material online). Overall, however, the proportions of TEs estimated by dnaPipeTE were higher than for RepeatMasker. All the previous analyses were conducted using an EDTA-augmented TE library from the C. bicknelli reference genome. However, the higher TE content of C. bicknelli relative to C. minimus did not result from the making of the TE reference library from C. bicknelli, because a similar pattern emerged when using a TE reference library made from C. minimus (see supplementary figs. 21 and 22, Supplementary Material online).
Abundances of TE and repeats in Catharus genomes. (A) Pie charts of estimated abundances of major TE classes using dnaPipeTE. Numbers around pie charts indicate percentages for major TE and repeat classes. Phylogeny from Everson et al. (2019). (B) Influences of sampling intensity on estimates of TE abundance by dnaPipeTE. Only the major TE classes are shown. Vertical gray dashed line and bold “0.2” indicate sampling intensity used for final analyses. The individuals used in panels (A) and (B) are: NYSM zt1435 (Catharus bicknelli), NYSM zt1220 (Catharus minimus), NCBI accession GCA_013398975.1 (Catharus fuscescens), and NYSM 15384 (Catharus ustulatus). See supplementary figures S6 and S7, Supplementary Material online, for additional analyses. (C) Distribution of TE proportions among pseudochromosomes of C. bicknelli and C. minimus, based on parsing of RepeatMasker analysis of the two reference genomes by the method of Bailly-Bechet et al. (2014). See also, supplementary figure S8, Supplementary Material online.
Individual variation in TE abundance in genomes from mapped reads of four species of Catharus thrushes and a Catharus bicknelli×Catharus fuscescens hybrid. Only major classes of TEs are shown. Individuals are identified by museum catalog number at the bottom as per supplementary table S1, Supplementary Material online. Individuals in red indicate individuals showing high levels of unannotated TEs (see supplementary fig. S11, Supplementary Material online). Dashed lines indicate mean TE content among individuals of C. bicknelli and Catharus minimus (see supplementary table S4, Supplementary Material online). Additional classes of TEs, repeats, and sources of sequence reads are depicted in supplementary figure S11, Supplementary Material online.
Abundances of TE and repeats in Catharus genomes. (A) Pie charts of estimated abundances of major TE classes using dnaPipeTE. Numbers around pie charts indicate percentages for major TE and repeat classes. Phylogeny from Everson et al. (2019). (B) Influences of sampling intensity on estimates of TE abundance by dnaPipeTE. Only the major TE classes are shown. Vertical gray dashed line and bold “0.2” indicate sampling intensity used for final analyses. The individuals used in panels (A) and (B) are: NYSM zt1435 (Catharus bicknelli), NYSM zt1220 (Catharus minimus), NCBI accession GCA_013398975.1 (Catharus fuscescens), and NYSM 15384 (Catharus ustulatus). See supplementary figures S6 and S7, Supplementary Material online, for additional analyses. (C) Distribution of TE proportions among pseudochromosomes of C. bicknelli and C. minimus, based on parsing of RepeatMasker analysis of the two reference genomes by the method of Bailly-Bechet et al. (2014). See also, supplementary figure S8, Supplementary Material online.Individual variation in TE abundance in genomes from mapped reads of four species of Catharus thrushes and a Catharus bicknelli×Catharus fuscescens hybrid. Only major classes of TEs are shown. Individuals are identified by museum catalog number at the bottom as per supplementary table S1, Supplementary Material online. Individuals in red indicate individuals showing high levels of unannotated TEs (see supplementary fig. S11, Supplementary Material online). Dashed lines indicate mean TE content among individuals of C. bicknelli and Catharus minimus (see supplementary table S4, Supplementary Material online). Additional classes of TEs, repeats, and sources of sequence reads are depicted in supplementary figure S11, Supplementary Material online.We further analyzed TE and repeat content in all individuals in our study using dnaPipeTE, as well as the hybrid C. bicknelli/fuscescens individual. This analysis confirmed the finding that C. bicknelli genomes on an average contain more TEs than those of the other three species, whether considering the unannotated component or not (fig. 2 and supplementary fig. S11, Supplementary Material online). Two-sample t-tests confirmed significantly higher TE content in C. bicknelli (n = 16, including reference) versus minimus (n = 15, including reference) for all genome-wide TE components except for LINE (LTR: t = 3.86, df = 15.6, P = 0.001; LINEs: t = −0.80, df = 20.9, P = 0.43; DNA elements: t = 8.21, df = 19.2, P = 1.03e-07; other TEs: t = 7.05, df = 21.3, P = 5.39e-07). To evaluate the extent of sampling error in the dnaPipeTE algorithm, we divided the fastq reads of a “high TE” and “low TE” C. bicknelli into four equal subsets and analyzed TE abundance using dnaPipeTE. We found that the coefficient of variation (CV) of the most abundant class of annotated TEs (LTRs) ranged from 4.4% to 10.5%, whereas the CV for the least abundant class (SINEs) was higher (26.4–27.6%) (supplementary table S6, Supplementary Material online). When all TEs are included, including the “other” category and unannotated TEs (supplementary table S6, Supplementary Material online), the CV for the low TE individual was 24.5% and 20.8 for the high TE individual. Across different read sources, thee hybrid C. bicknelli–C. fuscescens individual tended to contain the highest proportion of TEs (including annotated other) in the data set (26.6% on mapped autosomal reads; fig. 2). Including the unannotated TEs again caused the hybrid individual to have an unusually high percentage (46.9%); C. bicknelli individuals. NYSM zt1434 and 15344 (supplementary table S4 and fig. S11, Supplementary Material online) also exhibited unusual TE profiles when including unannotated TEs.
Genetic Structure, Diversity, and Admixture
After filtering the VCF file, we obtained a SNP variant data set in which 95–97% of reads from each individual were aligned to the reference genome of the C. minimus (supplementary fig. S5, Supplementary Material online). In the unpruned data set C. bicknelli 309,043 SNPs, whereas C. minimus had 321,601 SNPs. All individuals were deemed unrelated, with kinship values greater than degree 3, except for one pair of C. bicknelli, with an estimated second-kinship relatedness (supplementary fig. S12, Supplementary Material online). As expected (Tajima 1995), estimates of diversity calculated with and without one of this pair of related C. bicknelli did not differ substantially, and we elected to retain all individuals in our analyses.The principal components plot of the pruned SNP data set (fig. 3) clearly distinguished between the two cryptic species, with the first two eigenvectors explaining 30.45% and 5.10% of the genetic variance and strongly separating clusters of C. bicknelli and C. minimus individuals. The second eigenvector (but not the first) further distinguished among some individuals within species (fig. 3). A second principal components analysis that also included including the C. fuscescense×C. bicknelli hybrid also clearly separated clusters of individuals from each species and placed the hybrid closer to C. bicknelli but still well apart from both species (supplementary fig. S13, Supplementary Material online). The ADMIXTURE analysis of approximately 100,000 high-quality SNPs revealed a very low introgression between the two cryptic species (fig. 3). Only one C. bicknelli individual from Quebec, Canada, was minimally admixed, with less than approximately 0.1% of its genome attributable to the C. minimus genetic cluster (fig. 3). PCAs conducted on each pseudochromosome recapitulated the genome-wide patterns, with no obvious signatures of inversions or other rearrangements that can skew the PCA (Nowling et al. 2020; data on Dryad).
Summary of observed genetic variation in genomes of Catharus bicknelli (n = 16) and Catharus minimus (n = 15). (A) Principal component analysis of genetic variation in a pruned data set of 102,188 unlinked autosomal SNPs. (B) ADMIXTURE ancestry proportions in the same SNP data set. (C) Comparison of nucleotide diversity (mean and SE) in C. bicknelli, C. minimus, and Catharus ustulatus (n = 2) on autosomes and (D) the Z-chromosome.
Summary of observed genetic variation in genomes of Catharus bicknelli (n = 16) and Catharus minimus (n = 15). (A) Principal component analysis of genetic variation in a pruned data set of 102,188 unlinked autosomal SNPs. (B) ADMIXTURE ancestry proportions in the same SNP data set. (C) Comparison of nucleotide diversity (mean and SE) in C. bicknelli, C. minimus, and Catharus ustulatus (n = 2) on autosomes and (D) the Z-chromosome.The average autosomal nucleotide diversity was slightly higher for C. bicknelli (= ∼0.0003) than for C. minimus ( = ∼0.0002) (fig. 3). Nucleotide diversity was substantially higher than both these species for C. ustulatus ( = ∼0.00135) (fig. 3). A similar pattern of relative diversity among species was observed for the Z pseudochromosome, although in this case, C. bicknelli exhibited twice the diversity observed in C. minimus. For C. bicknelli and ustulatus, the diversity for the Z pseudochromosome was lower than those for the corresponding autosomal values by 71% and 57%, respectively, whereas for C. minimus, the values were nearly the same (fig. 3). The folded site frequency spectrum for C. bicknelli autosomes revealed a commonly encountered decrease in frequency of minor alleles, whereas the spectrum for C. minimus autosomes was characterized by an increased number of high-frequency alleles present in 31 or 32 copies (fig. 4). The Z pseudochromosome revealed a pattern similar to the autosomes for both species, with an uptick in the number of high-frequency alleles in C. minimus (fig. 4).
Folded site frequency spectrum of minor alleles on pseudochromosomes in Catharus bicknelli and Catharus minimus estimated using ANGSD. (A) Autosomes. (B) Z chromosome.
Folded site frequency spectrum of minor alleles on pseudochromosomes in Catharus bicknelli and Catharus minimus estimated using ANGSD. (A) Autosomes. (B) Z chromosome.
Comparative Demography
To obtain an overview of population demography in the C. bicknelli/minimus complex, we first examined the distribution of Tajima’s D across windows of our resequenced genomes. The distribution of Tajima’s D on the autosomes was similar between C. bicknelli and minimus, with a slightly lower mean in C. minimus (D = −0.4), than in C. bicknelli (D = −0.2) (supplementary fig. S14, Supplementary Material online). The Z pseudochromosome exhibited a bimodal distribution of Tajima’s D for both species, this time with the mean for bicknelli slightly more negative (Dz = −1.7) than for minimus (Dz = −1.2). This bimodality is visible on the scatterplot of Tajima’s D of the two species by scaffold (supplementary fig. S14, Supplementary Material online).We next applied the pairwise sequentially Markovian coalescent (PSMC) model of Li and Durbin (2011) and a mutation rate of 2.21×10−9 to the C. bicknelli and minimus reference genomes. Estimates of effective population size per species revealed by PSMC were similar between the two species (supplementary fig. S14C, Supplementary Material online), with a substantial reduction in effective population size just before and during the Last Glacial Period, between approximately 500,000 and approximately 100,000 YBP. Catharusbicknelli is estimated to have achieved a maximum Ne of approximately 1,400,000, whereas C. minimus had a somewhat lower estimate (∼800,000). Whereas the Ne of C. bicknelli remains low to the present at 200,000, that for C. minimus rises slightly toward the present to approximately 300,000. Applying the mutation rate to the estimates of nucleotide diversity in figure 3 suggested similar values for both species, about approximately 1,551,600 for C. bicknelli compared with approximately 1,704,000 for C. minimus. These estimates from nucleotide diversity are not expected to be similar to those of the most recent time frames for the PSMC analysis because they represent averages over the depth of the genome-wide coalescent tree (Felsenstein 2006).We then applied the IMcoalHMM algorithm of Mailund et al. (2012) to the reference genomes of C. bicknelli and minimus. We applied both the isolation and isolation–migration models to two 10-Mb blocks within each of the largest nine pseudochromosomes, as well as the Z chromosome, of C. bicknelli and minimus. IMcoalHMM uses single genomes of two species and models coalescent processes at pairs of sites to estimate demographic parameters and recombination rate. The isolation–migration model of IMcoalHMM models gene flow by assuming a period after divergence during which there is gene flow (migration period), followed by a period during which there is no gene flow (isolation period). We restricted the number of blocks to two per pseudochromosome so as to allow straightforward comparisons among chromosomes. The number of pairwise differences per bin between the two genomes ranged from 75,000 to 175,000, with pseudochromosome Z having the lowest number of differences between species (supplementary fig. S15, Supplementary Material online). This translates to a median level of sequence divergence between the two reference genomes of approximately 0.00003 substitutions per site for the autosomes and 0.0005 substitutions per site for the Z chromosome, very small values that reflect the cryptic status of these species (supplementary table S7, Supplementary Material online). These values, however, are only between the two reference genomes and do not incorporate information on allele frequency differences between the species.All pseudochromosomes except for 1, 5, and Z favored the isolation–migration model by AIC, but the difference between models was small in all cases (supplementary fig. S16, Supplementary Material online). Pseudochromosomes 3, 7, and 9 had AIC scores between the two models close to zero. The three main parameters in common between the pure isolation and isolation–migration models—effective population size of the common ancestor, number of substitutions since isolation and recombination rate—were similar on an average for the two models (fig. 5; supplementary table S7, Supplementary Material online), but varied with the number of pairwise differences between the two genomes found in different 10-Mb bins. For ancestral effective population size, both models yielded similarly large values (fig. 5), suggesting an ancestral Ne on the order of approximately 1.1 million (supplementary table S7, Supplementary Material online); variation in this estimate scaled positively with the number of pairwise differences per bin, as expected (supplementary fig. S15, Supplementary Material online). Time since isolation was slightly negatively correlated with the number of differences between the genomes (supplementary fig. S15, Supplementary Material online) and was smaller for the pure isolation model than for the isolation migration model (fig. 5 and supplementary table S7, Supplementary Material online), both estimates on the order of approximately 10,000 generations. Applying a mutation rate per year of 2.21×10−9, and one generation per year, the median estimated time since isolation for autosomes was only approximately 6,146 YBP, whereas it was approximately 15,000 generations for the isolation–migration model (supplementary table S7, Supplementary Material online). However, the isolation–migration model estimated a very long period, approximately 1.5 Ma, during which gene flow is estimated to have taken place prior to full isolation (supplementary table S7, Supplementary Material online). The estimate of the migration period in the isolation–migration model was substantially different between bins. Whereas pseudochromosomes 2 and 6 yielded estimates of migration interval less than 10,000 YBP after scaling by mutation rate, pseudochromosome 5 suggested a much older time of approximately 12,500,000 YBP (supplementary fig. S15 and table S7, Supplementary Material online). This may suggest that the migration pattern is difficult to estimate for most chromosomes and likely violates the assumptions of the isolation–migration model (see Discussion). Similarly, the actual rate of migration was difficult to estimate because the MCMC chains did not converge, hence we do not report it. Estimated recombination rates scaled positively with the number of differences per bin between genomes (supplementary fig. S15, Supplementary Material online) and was slightly higher for the isolation model (fig. 5). For all three parameters, estimates obtained for the Z pseudochromosome were outliers, with the lowest effective population size (fig. 5), largest number of substitutions since isolation (fig. 5) and lowest recombination rate (fig. 5).
Estimates of parameters for 10-Mb regions of ten pseudochromosomes (nine autosomes and the Z) for Catharus bicknelli and Catharus minimus using IMcoalHMM. Box plots (A, C, E) show parameter estimates for the final selected model (supplementary table S6, Supplementary Material online). Plots (B), (D), and (F) compare parameter estimates for the pure isolation and isolation-with-migration models on autosomes. (A) Effective population size. (B) Effective population size estimate for each model on autosomes. (C) Number of substitutions since divergence. (D) Number of substitutions since divergence for each model on the autosomes. (E) Recombination rate. (F) Recombination rate estimated for each model on autosomes.
Estimates of parameters for 10-Mb regions of ten pseudochromosomes (nine autosomes and the Z) for Catharus bicknelli and Catharus minimus using IMcoalHMM. Box plots (A, C, E) show parameter estimates for the final selected model (supplementary table S6, Supplementary Material online). Plots (B), (D), and (F) compare parameter estimates for the pure isolation and isolation-with-migration models on autosomes. (A) Effective population size. (B) Effective population size estimate for each model on autosomes. (C) Number of substitutions since divergence. (D) Number of substitutions since divergence for each model on the autosomes. (E) Recombination rate. (F) Recombination rate estimated for each model on autosomes.We examined the possibility of introgression from a different perspective with the ABBA-BABA test. A test using C. bicknelli and C. minimus as the putative introgressing lineages indicated minimal introgression, with a small excess of ABBA sites and a D = 0.021, which is marginally significant at Z = 6.32. As expected, higher values of introgression were observed in tests involving C. minimus and C.minimus aliciae, with a D of 0.125 and a significant Z = 14.92 for minimus to aliciae. Introgression from aliciae to minimus was also high, with a positive D = 0.103, and significant Z = 12.46.
Genome Scans for Selection
To obtain an overview of the landscape of departure from demographic processes with genome-wide signatures, we first examined the distribution of Tajima’s D across chromosomes. A scatterplot of autosomal Tajima’s D shows a correlation between values in C. bicknelli and minimus across the genome, with 27 10-kb regions with extreme negative values (<−2) in both species, an indication of nonequilibrium processes occurring in regions of pseudochromosome 1 to 8 and 17 and 18 (fig. 6). These likely represent regions of low recombination that either have undergone recent selective sweeps in both species or in which demographic processes interact with a low-recombination landscape (Thornton 2005). On the Z pseudochromosome, we detected 320 regions with extreme negative values (<−2) and two regions with positive extreme values (>2; fig. 6). The larger number of regions with extreme values of Tajima’s D is likely due to the lower recombination rate on the Z chromosome as well as its smaller effective population size.
Genome scan for natural selection in Catharus bicknelli and Catharus minimus. (A) Neutrality test with Tajima’s D plotted for both species on autosomes and (B) on the Z pseudochromosome. The highlighted dots have critical values of D (>2 and <−2). (C) Bayesian estimates using Bayescan of signals of diversifying selection for (A) 114 autosomal SNPs and (D) absence of significantly differentiated SNPs on the Z pseudochromosome.
Genome scan for natural selection in Catharus bicknelli and Catharus minimus. (A) Neutrality test with Tajima’s D plotted for both species on autosomes and (B) on the Z pseudochromosome. The highlighted dots have critical values of D (>2 and <−2). (C) Bayesian estimates using Bayescan of signals of diversifying selection for (A) 114 autosomal SNPs and (D) absence of significantly differentiated SNPs on the Z pseudochromosome.Genome-wide Fst, calculated in ANGSD with the weighted Fst statistic and with a prior of the 2D SFS from the site likelihoods, indicated that most of the regions on the upper 95% percentile of the empirical distribution were on the Z pseudochromosome, with a few additional regions on pseudochromosomes 1, 4, 5, 6, 10, 17, 21 (supplementary fig. S17, Supplementary Material online). The average length of the spline window used to detect outliers on all chromosomes was 5 kb (supplementary fig. S18, Supplementary Material online). The weighted Fst without including the SFS identified outlier regions on several of the same autosomes found in the previous analysis (fig. 7), but with higher variability in window size between pseudochromosomes (average length of the spline window, 30 kb; fig. S18B). The difference between the two analyses may suggest variability across the autosomes in nucleotide diversity that influences calculations of Fst with and without SFS. Plotting Fst in standard abutting 20-kb windows suggests a more even distribution of outliers across the chromosomes (supplementary fig. S17B, Supplementary Material online).
Genome-wide scans for natural selection in Catharus bicknelli and Catharus minimus. (A) Genome-wide Fst. Green dots represent the upper 95% percentile of the empirical distribution as the outliers in a variable window size from the cubic spline technique described in Methods. The dashed line indicates the average Fst. (B) Cross-population extended haplotype homozygosity (XP-EHH) test with green dots indicating outliers with a −log10 P value >5 with variable window size from the spline technique. (C) Standardized cross-population haplotype-base statistic (XP-nsl) indicating several outliers in the upper 95% percentile of the empirical distribution with a variable window size from the spline technique.
Genome-wide scans for natural selection in Catharus bicknelli and Catharus minimus. (A) Genome-wide Fst. Green dots represent the upper 95% percentile of the empirical distribution as the outliers in a variable window size from the cubic spline technique described in Methods. The dashed line indicates the average Fst. (B) Cross-population extended haplotype homozygosity (XP-EHH) test with green dots indicating outliers with a −log10 P value >5 with variable window size from the spline technique. (C) Standardized cross-population haplotype-base statistic (XP-nsl) indicating several outliers in the upper 95% percentile of the empirical distribution with a variable window size from the spline technique.We conducted a Bayesian analysis of Fst outliers at the level of SNPs using the multinomial-Dirichlet model in Bayescan. Using the approximately 100,000 SNP data set and a false discovery rate of 5%, we found signals of positive selection for 114 autosomal SNPs (fig. 6), whereas for the Z pseudochromosome no SNPs were detected as outliers (fig. 6). We then used GppFst to conduct neutral simulations to determine the null distribution of Fst, using demographic parameters estimated with SNAPP (supplementary table S8, Supplementary Material online). In contrast to the Manhattan plots (fig. 7), we found that the empirical distribution of Fst did not contain outliers when compared with the PPS distribution (supplementary fig. S19 and B, Supplementary Material online), suggesting that on the autosomes a neutral model is more likely to fit the data. However, on the Z pseudochromosome the empirical distribution of Fst does suggest an excess of outliers when compared with the PPS distribution (supplementary fig. S19 and D, Supplementary Material online), with 418 sites observed as compared with 14 false-positives in the simulations.We then applied two haplotype tests to the C. bicknelli and minimus resequencing data. With the extended haplotype homozygosity (XP-EHH) test, signals of selective sweeps (log P value >5) were detected on most of the pseudochromosomes and on the Z (fig. 7). The standardized cross-population haplotype-base statistic (XP-nsl) suggested several outliers on the upper 95% percentile of the empirical distribution, including regions on the Z pseudochromosome, for C. bicknelli (positive values in fig. 7), and in C.minimus (negative values in fig. 7) on pseudochromosomes 1–6. Two of the smallest pseudochromosomes (31 and 36) exhibited fewer outliers in C. bicknelli and on the Z pseudochromosome no outliers were detected in C. minimus. Overall, selective sweeps appeared stronger in C. bicknelli than in C. minimus (fig. 7).The spline windows used for the haplotype tests were substantially smaller than those used for the Fst scan (supplementary fig. S18 and D, Supplementary Material online), indicating more numerous break points in the data as expected for haplotype-based tests, as opposed to tests based on individual SNPs.
Gene Content of Genomic Regions with Signals of Natural Selection
Several genomic regions exhibiting signals of selection were shared between different statistical tests. 142 genes (28%) were shared between autosomal haplotype tests (XP-EHH and XP-nsl) in C. bicknelli, whereas only eight genes (4.2%) were shared between such tests for C. minimus (fig. 8). On the Z pseudochromosome only five genes (23%) were shared between haplotype tests for C. bicknelli, although these tests yielded small numbers of selected regions generally (fig. 8). Based on our genome annotations, CDS regions comprised the closest annotation to genomic outliers 95% of the time, whereas 5% of the features closest to outliers were outside CDS, ranging in distance from a few bp to over 100 kb.
Shared signals of selection between tests and functional enrichment of genes closest to the SNPs showing signatures of selection. (A) Venn diagrams showing the number of genes shared between selective sweep tests for autosomes of Catharus bicknelli and Catharus minimus. (B) Genomic regions showing signatures of selection shared in two haplotype tests on the Z pseudochromosome of C. bicknelli. (C) Gene ontology (GO) terms with significant P values obtained from the enrichment analysis of genes shared between tests for selective sweeps on autosomes and the Z pseudochromosome.
Shared signals of selection between tests and functional enrichment of genes closest to the SNPs showing signatures of selection. (A) Venn diagrams showing the number of genes shared between selective sweep tests for autosomes of Catharus bicknelli and Catharus minimus. (B) Genomic regions showing signatures of selection shared in two haplotype tests on the Z pseudochromosome of C. bicknelli. (C) Gene ontology (GO) terms with significant P values obtained from the enrichment analysis of genes shared between tests for selective sweeps on autosomes and the Z pseudochromosome.We performed an enrichment analysis on the final list of genes lying near selected regions shared between tests of selective sweeps on autosomes and on the Z pseudochromosome (supplementary table S9, Supplementary Material online). This analysis revealed significant P values on a variety of biological, cellular, and molecular processes related to neuronal activity and the function of neurotransmitters and ion channels (fig. 8), including genes such as GABRA1, GABRA2, GABRG1, and GABRG2, which contributed substantially to the significance of the enrichment scores (supplementary table S9, Supplementary Material online). The enrichment analysis of the relatively small number of genes (n = 32) shared genes between two neutrality test statistics (weighed Fst) on the Z pseudochromosome (supplementary fig. S20, Supplementary Material online) did not yield significant results.
Discussion
We have presented genome-wide analyses of genetic diversity, species differentiation, and natural selection in C. bicknelli and C. minimus, two thrush species whose taxonomic status has puzzled biologists for over a century. Our results, although based on a partial sampling of the geographic range of C. minimus, suggest subtly different demographic histories in the two species, and an overall low level of genetic diversity. A notable finding in our study is the overall higher level of TEs in the genomes of Catharus species compared with other birds, especially in C. bicknelli, where some estimates based on mapped autosomal reads suggested up to 37% of the genome comprised annotated TEs. These and the numerous, although sometimes ambiguous, signals of natural selection across the genome may suggest fruitful areas of future research for understanding the genomic drivers of speciation in this complex.
Higher TE Content in Bicknell’s and Other Catharus Thrushes
We were surprised by the overall high level of TEs detected in Catharus genomes detected by dnaPipeTE. The RepeatMasker analyses suggested autosomal levels of TEs in Catharus that were detectably to higher than found in other birds (13.1% in bicknelli; 12.2% in minimus; supplementary table S3, Supplementary Material online), despite the fact that TE abundance is known to be underestimated in metazoan genomes by RepeatMasker through a number of factors, especially the quality of the reference genome assemblies (Gao et al. 2017; Manthey et al. 2018; Sohn and Nam 2018; Wu and Lu 2019). We ran RepeatMasker on the latest assembly of the chicken genome and found that it recovered the expected 10% TE component found in many other studies (Hillier et al. 2004; Wicker et al. 2005; Chalopin et al. 2015; Sotero-Caio et al. 2017). This confirmation suggests that the quality of the Catharus genomes may be too low to accurately estimate TE abundance by methods assuming complete genome contiguity. Additionally, our analysis suggests that TE families not present in the standard reference database may comprise a substantial component of the Catharus TE landscape. For example, our annotation of TEs was greatly improved by including the repeat library constructed by EDTA (Ou et al. 2019); most TEs that went previously unannotated became annotated with this augmented TE library. The library produced by EDTA had a total of 2,783 TEs, of which only 221 (7.9%) were completely unclassified. Annotated TE numbers increased substantially when using this improved TE library.Given the challenges of obtaining highly contiguous assemblies, particularly for C. bicknelli, methods that estimate TE abundance directly from sequence reads may be advantageous, particularly for recently diversifying TE families (Goubert et al. 2015; Weilguny and Kofler 2019). Estimates of TE abundance on Catharus using dnaPipeTE on mapped autosomal reads were high when considering only the well-annotated TEs, varying among individuals of C. minimus from 13.7% to 15.5% s and among C. bicknelli individuals from 17.2% to 22.3% in. In contrast to the estimates for annotated TEs, the estimated abundance of unannotated TEs was very sensitive to sampling intensity of reads; the higher estimates with greater sampling likely represents the mistaken classification of distinct repeats when sampling loci in the genome that are heterozygous (Goubert et al. 2015). The inability of dnaPipeTE to distinguish heterozygous loci from novel repeats may explain the high abundance of TEs estimated for the hybrid C. bicknelli×C. fuscescens sample, thus making it possible that the hybrid values are artificially high in this and two other C. bicknelli individuals, especially for unannotated TEs (fig. 2 and supplementary figs. S8 and S11, Supplementary Material online). By contrast, the estimated abundances of annotated TEs increased only slightly upon denser read sampling (supplementary fig. S8 and table S5, Supplementary Material online). Even if we disregard the “other-annotated” category, estimates of TE abundance in C. bicknelli, and less so in the other Catharus species, were much higher than in chicken, whose repeat landscape was reliably captured at approximately 10% by dnaPipeTE. These estimates are comparable with or higher than those for bird clades with unusually high TE abundances, such as Piciformes (Jarvis et al. 2014; Manthey et al. 2018; Feng et al. 2020). Although our conclusions regarding the putatively high TE content in C. bicknelli must remain tentative, we find that the data strongly suggest a more diverse TE landscape in Catharus than in chicken and other birds. Our analysis of TE abundance across multiple individuals also provides a population perspective on TE abundance that is rare among studies of TE evolution (Bourgeois and Boissinot 2019).It is unclear why TE abundances in Catharus, and in particular in C. bicknelli, should have increased relative to the average for birds. It is particularly striking that C. bicknelli exhibits higher abundances relative to its closely related sister species, C. minimus, which this and previous analyses suggest diverged very recently (Fitzgerald et al. 2020). Such an increase might be driven by an extreme bottleneck, which could result in increased fixation rates of mildly deleterious TEs as predicted by the nearly neutral theory of molecular evolution (Ohta 1992; Lynch 2007; Akashi et al. 2012). However, our data do not include a strong signal of a bottleneck in C. bicknelli. The suggestion of high TE content in the hybrid C. bicknelli×C. fuscescens recalls repeated suggestions that hybridization and breakdown of isolating barriers can sometimes permit higher TE mobilization and diversification, particularly in Drosophila (Kofler et al. 2015; Warren et al. 2015; Bourgeois and Boissinot 2019), although this trend is not universal (Kawakami et al. 2011). However, as we mention above, this high value could be attributed to the presumably high heterozygosity of this individual, and requires further study. TEs have been implicated in rapid speciation in a variety of organisms (Jurka et al. 2007; Belyayev 2014; Hoffmann et al. 2015), and in woodpeckers and relatives is associated with increased diversification rate of an entire clade (Manthey et al. 2018). One clear implication of the higher TE abundance in C. bicknelli is that it is likely responsible for the lower contiguity of the C. bicknelli reference genome. Higher repeat and TE abundance is often associated with lower genome assembly quality (Peona et al. 2021), although it has rarely been observed to influence variation in assembly quality within a single avian genus. However, comparing TE landscapes across studies is made more difficult because of the lack of accessible archived TE libraries from those studies; this gap strongly hinders robust comparison among studies of birds and other groups.
Patterns of Genetic Diversity and Admixture
In general, our estimates of nucleotide diversity for the three Catharus thrushes for which we sequenced multiple individuals were low compared with what we know about intraspecific genomic diversity in birds (Wang et al. 2021). The relative magnitudes of diversity for C. ustulatus compared with C. minimus was similar to that found for allozyme diversity assayed by Avise et al. (1980), who estimated heterozygosity in C. ustulatus to be over twice as high as that in C. minimus. This was particularly striking given that we only sampled two C. ustulatus individuals. We were surprised that genetic diversity in C. bicknelli was slightly higher than that for C. minimus. We would expect C. minimus to have higher diversity than C. bicknelli if only because it has a much larger breeding range than C. bicknelli, which is confined to the northern Appalachian mountain chain and southern Canada; in recent studies, geographic range is often used as a proxy for genetic diversity, and there is a weak dependence of genetic diversity on range size across metazoans (Corbett-Detig et al. 2015; Buffalo 2021). The diversity in western C. minimus that we did not survey could account for this discrepancy. Everson et al. (2019) found roughly similar levels of genetic diversity in C. bicknelli and minimus, although their estimates were several fold higher than ours, perhaps because of their focus on highly variable flanking regions of UCEs. Genetic diversity in C. minimus bears signatures of a recent population expansion, given its slightly negative Tajima’s D, results similar to those of Everson et al. (2019). A recent expansion and concomitant low diversity in C. minimus would be expected given its breeding range in the boreal forest of Canada, which represents a classic example of a biome recently colonized once Pleistocene glaciers retreated, with expansions from glacial refugia observed in many high-latitude vertebrates (Lessa et al. 2003; Ralston and Kirchman 2012; Lait and Burg 2013). Tajima’s D in C. bicknelli was half that in minimus, suggesting less extreme expansion. The PSMC plots of the two species were similar, as expected due to their shared ancestry (Cahill et al. 2016), and exhibited a pattern found in many north temperate bird species, with a steady decline in effective population size around 100,000 YBP (Nadachowska-Brzyska et al. 2015). Our sampling of genomic diversity in C. minimus was intended to minimize population structure given available samples, and, without detailed geographic sampling, is somewhat in conflict with a full measurement of genetic diversity in this species (Edwards et al. 2021). Further surveys of C. minimus and other Catharus thrushes will be required to assess the true diversity of Catharus species.
Recent Divergence and Limited Gene Flow between Cryptic Species
A number of tests we conducted, including the admixture plot and ABBA-BABA test, revealed marginal introgression between C. bicknelli and C. minimus consistent with the small proportion of admixture found in the analyses of 5,600 SNPs from a much larger sampling of individuals by FitzGerald et al. (2020). Additionally, support for the isolation–migration model using IMcoalHMM was equivocal, with very low AIC values in favor of it on four of nine pseudochromosomes. In these cases, the estimate of the migration period—the time slice during which gene flow is estimated to have taken place—is quite long, about approximately 1.6 Ma on an average. This result may stem from the possibility that there was only a small amount of gene flow occurring between the two cryptic species, which may have taken place in such a way as to violate the assumptions of the IMcoalHMM model. Our average Fst between C. bicknelli and C. minimus was 0.14 (weighted Fst), which is somewhat lower than that estimated using ddRadseq by FitzGerald et al. (2020), who estimated values of 0.234–0.321, even when confined to comparisons between eastern C. minimus and C. bicknelli. FitzGerald et al. (2020) estimated haplotype-based Fst (Hudson et al. 1992; Bhatia et al. 2013), whereas here we used biallelic SNP-based Fst as implemented in ANGSD. Regardless of these uncertainties, the relatively low Fst underscores the recent divergence of the two cryptic species.We estimated divergence times between C. bicknelli and minimus at approximately 6,000–15,900 YBP using the number of substitutions per site estimated with IMcoalHMM and a mutation rate from the literature. This estimate, although not implausible, seems quite recent, even for cryptic species. Our estimates are more recent than the approximately 123 kyr divergence estimated by Everson et al. (2019). (We note that the divergence times in Everson et al. (2019) should be reduced by a factor of 2.5/4 because the generation time used was miscalculated; Winker K, personal communication). They are also substantially more recent than those estimated from the ND2 gene of mitochondrial DNA by FitzGerald et al. (2020), who also used a published mutation rate. It is common for estimates of mtDNA to exceed those for nuclear genes, in part because nuclear data often involve simple allele frequency differences and lack of reciprocal monophyly compared with mtDNA. Additionally, and most critically, FitzGerald et al. (2020) estimated the coalescence time of mtDNA, which by definition will exceed the population divergence time (Edwards and Beerli 2000), the parameter of interest measured by IMcoalHMM (Mailund et al. 2012). A recent survey of 952 avian sister species pairs found approximately 12 divergence times on the order of 10,000 YBP (McEntee et al. 2018), although these estimates are also based on coalescence times and thus even more divergences may fall into the time frame estimated in our study. Our estimate for the Z chromosome is much higher than that for the autosomes in our data set and is on par with the deeper times estimated for this species pair by Everson et al. (2019). This increased distance for the Z chromosome is perhaps expected, given the frequent observation of natural selection on the avian Z chromosome and its tendency to evolve faster than the autosomes (Mank et al. 2010; Ellegren 2013; Wang et al. 2014; Dean et al. 2015). In our IMcoalHMM analysis, the Z chromosome also exhibited lower recombination rate, another expected difference from the autosomes.
Selective Sweeps on the Z Chromosome and on Genomic Regions for Behavior in C. bicknelli
The Z chromosome showed signatures of selective sweeps and of positive selection in the two cryptic species. Additionally, selective sweeps for genomic regions encompassing genes for behavioral traits are more evident in C. bicknelli than in C. minimus. GppFst suggested few or no outliers, whereas Bayescan suggested 114 autosomal outliers, although none on the Z chromosome. The contradictory results between Bayescan and GppFst could be related to the sensitivity of Bayescan to false-positives when the demographic history is not modeled appropriately (Lotterhos and Whitlock 2014, 2015). Additionally, our results must be tempered by the fact that tests for selection are highly sensitive to assumptions about the local recombination rate and the demographic history (Nachman and Payseur 2012; Cruickshank and Hahn 2014; Booker et al. 2020). Tellingly, when we accounted for demographic history using posterior predictive simulations using GppFst, we found no autosomal outliers, suggesting that under an appropriate null model, the number of outliers is minimal. However, when using haplotype-based tests, found extensive evidence for selective sweeps, particularly on the Z chromosome. The signal for selective sweeps will be stronger in regions of low recombination, but such regions are also expected to yield higher rates of false positives (Booker et al. 2020). Fully understanding the selection landscape in Catharus genomes will require estimating the genome-wide recombination rate, which may be possible from moderate-coverage data such as ours (Adrion et al. 2020). Our use of a spline technique that modulates window size across genome scans presumably improves our detection of outliers and comparison between results of tests of selection, whereas choosing an arbitrary window size as is most often done could recover large numbers of false positives or complicate comparison between selection tests (Beissinger et al. 2015).The recent divergence of C. bicknelli and C. minimus provides a demographic setting conducive to detecting differences in sweep frequency between species. The haplotype-based tests we employed suggested substantially more selection on autosomes in C. bicknelli than in C. minimus. The outliers we did detect appeared to fall near genes involved with neuronal function and neural substrates for behavior in C. bicknelli, a result that could conceivably be connected to the divergence in song systems for these two species. Ouelett (1993) found that C. bicknelli breeding in southern Quebec responded aggressively to playbacks of recorded C. bicknelli vocalizations but not to recordings of C. minimus songs, but no one has yet to conduct reciprocal playback experiments to rigorously test for prezygotic reproductive isolation on the basis of song recognition. Our results mirrors patterns found in a variety of microevolutionary systems, such as song evolution in Zimmerius flycatchers (Rheindt et al. 2014) and natural selection induced by winter storms in Anolis lizards (Campbell-Staton et al. 2017), which also found enrichment functional categories related to the brain in selection scans or gene expression. Although we did not find strong evidence for a bottleneck in C. bicknelli, which could complicate the search for natural selection on the genome, its average autosomal Tajima’s D was slightly positive, which could compromise our search for selective sweeps. As with all such selection scans, our results primarily provide hypotheses for future study.In summary, we have produced draft reference genomes of two Catharus thrushes, Bicknell’s Thrush (C. bicknelli) and the Gray-cheeked Thrush (C. minimus). These genomes, as well as those of Swainson’s Thrush (C. ustulatus) appear high in TE abundance, particularly for C. bicknelli. Demographic analyses confirm a recent divergence of C. bicknelli and minimus, with low genetic diversity in both species. Major signatures of natural selection were found on the Z chromosome of both species and an enrichment of functional categories related to neuronal function provides a basis for further tests of the role of these genomic regions in cryptic speciation. Ours is a preliminary survey of genes that might explain the pattern of divergence in behavioral but not morphological or plumage traits in this complex, and which could ultimately be involved in the reduced gene flow observed between these two cryptic species. The genomic data we have assembled will be a useful resource for future genomic studies of Catharus thrushes.
Materials and Methods
Sample Selection and DNA Preparation
All new genome sequences were derived from samples archived in the frozen tissues collections of the New York State Museum (NYSM) or the Harvard Museum of Comparative Zoology (MCZ; supplementary table S1, Supplementary Material online). Each sample consisted of muscle, frozen at the time of specimen preparation or blood samples preserved in the field in cell lysis buffer, from banded and released birds. To obtain high-molecular weight DNA for reference genome assemblies, DNA was isolated from specimen-vouchered (study skin) tissue samples from one C. bicknelli (NYSM 10982) and one C. minimus (NYSM 17433) using a QIAGEN MagAttract HMW kit following the manufacturer’s protocol. DNA quality of the two reference genomes was similarly high as measured by TapeStation analysis, with DNA fragments falling between 39 and 50 kb (supplementary fig. S2, Supplementary Material online). We used a QIAGEN DNeasy blood and tissue kit to perform DNA isolation for genome resequencing of population samples of 15 C. bicknelli obtained in the breeding season throughout that species’ range (supplementary fig. S1, Supplementary Material online) and 14 C. minimus representing breeding birds from both the Newfoundland and mainland subspecies, and a smaller number of migrating and wintering birds. In addition, we sequenced one individual from a breeding population in Vermont identified by Martinsen et al. (2018) as a Bicknell’s Thrush×Veery hybrid (C. bicknelliXC. fuscescens) and two Swainson’s Thrushes (C. ustulatus) to serve as an outgroup for the tests of introgression (supplementary table S1, Supplementary Material online).
Library Preparation and Sequencing
DNA extracts were quantified using a fluorometer (Qubit 3.0) and fragment length evaluated using an Agilent TapeStation. Reference genomes from high-molecular weight extractions were derived from “jumping” libraries prepared with Nextera Mate Pair Library Preparation kit in combination with short-insert fragment libraries and were sequenced at approximately 50X coverage with paired-end sequencing on a Novaseq S4 platform. For all individuals, we sheared each DNA sample to a length of 300 bp by sonication (Covaris 2200) and prepared paired-end libraries on the automated Apollo 324 NGS library preparation system using the PrepX ILM 32i protocol and sequenced to approximately 15X coverage on four lanes of the Illumina HiSeq platform.
Genome Assembly
Read quality was assessed with FastQC (Andrews 2012) and adapter trimming was performed with Trimmomatic (Bolger et al. 2014). The assembly was produced with ALLPATHS-LG assembler (Butler et al. 2008) with default settings but including haploidify flag as true. All bioinformatic work was performed on the Harvard University Faculty of Arts and Sciences Cannon Research Computing cluster. Quality assessments of the two reference genome assemblies were performed with ALLPATHS-LG’s inbuilt assembly benchmarking statistics and the BUSCO v3 completeness assessment (Waterhouse et al. 2018) with default settings and including the chicken as the Augustus species gene finding. To improve our assembly, we generated “pseudochromosome” scaffolds with RaGOO v1.0 (Alonge et al. 2019) with default settings and using the chromosome-level long-read reference genome for C.ustulatus (GenBank assembly accession number GCA_009819885.2, coverage = 60.58X). This enabled us to remove unplaced scaffolds and perform downstream analyses separately for autosomes (1–40) and the Z sex chromosome. To reduce the influence of coverage depth on analyses of the Z chromosome, we removed the two female samples from Z pseudochromosome analyses. To confirm the authenticity and phylogenetic position of the two reference genomes, we used PHYLUCE (Faircloth 2016) to extract the 2,111 UCEs used by Everson et al. (2019) and compared them with the same loci those authors obtained from ten species of Catharus, including multiple individuals of C. bicknelli and C. minimus. We aligned UCEs with MAFFT v7.4 (Katoh and Standley 2013) (with flags –maxiterate 1,000 –localpair –adjustdirection), trimmed with trimal v1.2 with the flag -automated1 (Capella-Gutierrez et al. 2009), and checked for odd alignments with OD-seq (Jehl et al. 2015) with default settings. We concatenated the UCE alignment with the perl script BeforePhylo.pl v0.9 (https://github.com/qiyunzhu/BeforePhylo) with default settings and reconstructed the phylogeny with RAxML v8.2 (Stamatakis 2014) under a GTR+ɣ model and 1,000 rapid bootstrap replicates.
Annotation of TEs
We evaluated the repetitive fraction of the reference genomes of our two focal species and compared them with publicly available reference genomes of two other, closely related congeners: a Swainson’s Thrush (C. ustulatus) sequenced by the Vertebrate Genomes Project (GenBank accession number GCA_009819885.2) and a Veery (C. fuscescens) sequenced by the B10K consortium (GCA_013398975.1; Feng et al. 2020). We estimated TE content with RepeatMasker v. 4.1.1 as implemented in RepeatModeler2 (Flynn et al. 2020), which requires a genome assembly, as well as dnaPipeTE (Goubert et al. 2015), which works on unassembled sequence reads. However, both methods require a library of TEs for annotation; we built our custom library in three phases. First we used the RepBase-RepeatMasker database version 20181026, available on the GIRI web site (Jurka 2000). Second, we augmented this library with improved annotations of repeat content (including olfactory receptor genes) using CENSOR (Kohany et al. 2006) and TEclass (Abrusán et al. 2009) annotations of TEs unannotated by dnaPipeTE. In practice, we found that approximately 70–75% of TEs unclassified by dnaPipeTE could be reliably classified into major TE classes. Thus, we suspect that most of the TEs classified as “not annotated” by dnaPipeTE were likely bona fide TEs. Third and finally, we undertook a final round of annotation using the pipeline provided by Extensive de novo TE Annotator (EDTA; Ou et al. 2019), a pipeline that incorporates a variety of tools including GenomeTools (Gremme et al. 2013), LTR_FINDER (Xu and Wang 2007), LTR_retriever (Ou and Jiang 2018), Generic Repeat Finder (Shi and Liang 2019), TIR_Learner (Su et al. 2019), and Helitron Scanner (Xiong et al. 2014). All downstream analyses relied on this composite library, which greatly improved TE annotation, at the very least converting many TEs formerly classified as “unannotated” into “other—annotated.”We first applied RepeatMasker version 4.1.2-p1 as implemented in RepeatModeler2 v. 2.0.1 to the Ragoo-assisted reference assemblies of C. bicknelli and C. minimus. Output tables were then summarized with “onecodetofindthemall” (Bailly-Bechet et al. 2014), which attempts to join adjacent and nearby fragments of annotated TEs and repeats, thereby providing a better estimate of TE abundance than the raw output of RepeatMasker. We summarized all repeats in the RepeatMasker output table regardless of sequence divergence from the consensus element. This approach recorded TE abundances about approximately 30% higher than the raw RepeatMasker output, which we feel under-reports total TE abundances. The abundances estimated by this method were also more congruent with those estimated by our second TE assessment tool, dnaPipeTE. dnaPipeTE (Goubert et al. 2015) estimates TE and repeat content from unaligned and unassembled fastq reads, employing a series of BLAST analyses followed by cluster assemblies using Trinity (Grabherr et al. 2011) to estimate TE abundances and is recommended when reference genome quality is not high. All dnaPipeTE analyses were conducted on unpaired, single reads, and assumed a genome coverage of 0.2 for a approximately 1-Gb genome (-genome_coverage 0.2). Because dnaPipeTE can be sensitive to this parameter, we conducted extensive sensitivity analyses on two exemplar individuals to determine how estimated TE abundances vary with sampling fraction (supplementary table S6, Supplementary Material online). For both RepeatMasker and dnaPipeTE, we used the most recent assembly and associated short reads, respectively, of the chicken genome (GRCg6a; https://www.ncbi.nlm.nih.gov/grc/chicken; date accessed January 6, 2021), as a control to estimate TE abundance from a species whose TE landscape is well studied and is expected to comprise about 10% (Hillier et al. 2004; Wicker et al. 2005). Because reads from mitochondrial DNA can mimic repetitive DNA for methods like dnaPipeTE, all dnaPipeTE analyses were conducted on three data sets: raw, unmapped Illumina reads; reads mapped to the respective reference genome and aligning to autosomes; and mapped reads aligning to the Z chromosome. The analyses presented in this article all used an EDTA-augmented library from the C. bicknelli reference genome. We confirmed that use of an EDTA-augmented library from the C. minimus reference genome did not change results, still indicating higher TE abundances in C. bicknelli (supplementary figs. S21 and S22, Supplementary Material online).
SNP Calling
We mapped pair-end resequencing reads to the Gray-cheeked Thrush reference genome pseudochromosomes using BWA-mem V0.7 (Li and Durbin 2011) with default settings. We then used the Picard Toolkit (Broad Institute 2019) to calculate alignment metrics and remove duplicated sequences from the BAM files. We assessed mapping quality, coverage, and GC content with Qualimap v2.2.1 (García-Alcalde et al. 2012). Variant identification was performed via GL using ANGSD v.0.930 (Korneliussen et al. 2014) on two data sets: one with all 16 C. bicknelli and 15 C. minimus individuals, and the second with those same 31 birds plus one Swainson’s Thrush (C. ustulatus) individual as an outgroup. We used ANGSD v.0.930 to calculate the site allele frequency and infer major and minor alleles from the GLs. We filtered out reads with a minimum mapping quality of 30, bases with quality score less than 20, and sites with MAF below 0.05, and retained sites with a high probability of being variable (P = 1e-6) and that were present in 90% of the individuals. The output from ANGSD was put into VCF format with glactools (Renaud 2018), https://github.com/grenaud/glactools. For some downstream analyses the VCF file was reduced to one SNP per 10-kb window with VCFtools 0.1.14 (https://vcftools.github.io/index.html).
Population Admixture and Genome Scans for Selection
Using the reduced version of the variants data set (one SNP per 10-kb window), we performed a principal components analysis with SNPRelate (Zheng et al. 2012), and calculated ancestry proportions shared between individuals with ADMIXTURE (Alexander and Lange 2011). Additionally, we performed selection analyses with BayeScan (Foll and Gaggiotti 2008) on the reduced version of the variants data set. Allele frequency distributions were calculated with the folded site frequency spectrum with ANGSD from the sequences aligned to the C. bicknelli and C. minimus reference genomes separately. We implemented Tajima’s D neutrality test and Fst statistics with ANGSD and vcftools to detect outliers. To estimate parameters of a demographic model for additional selection tests, such as the neutral mutation parameter and divergence times between species, we used the multispecies coalescent model from SNAPP (Bryant et al. 2012). The multispecies coalescent was implemented in BEAST2 (Bouckaert et al. 2014) from biallelic SNP data thinned to one per 50-kb window. SNAPP was run for 50 million generations, sampling every 1,000 steps. Tracer (Rambaut et al. 2018) was used to summarize the posterior estimates of the various parameters sampled by the Markov Chain and to assess convergence. We then used these parameters as a framework to conduct posterior predictive simulations of Fst with the R package GppFst (Adams et al. 2016). To identify putative regions of recent or ongoing positive selection under a selective sweep model, we used two haplotype-based statistics for each pseudochromosome: the cross-population Extended Haplotype Homozygosity (XP-EHH) test (Sabeti et al. 2007) with the R package rehh v2.0 (Gautier et al. 2017) and the related cross-population haplotype-based statistic XP-nSL (Ferrer-Admetlla et al. 2014) with selscan 2.0 (Szpiech 2021), neither of which requires a genetic recombination map to detect soft sweeps (see Ferrer-Admetlla et al. [2014]). For these haplotype calculations per chromosomes, we phased genotypes with Beagle v5 (Browning and Browning 2007) without a reference panel. To visualize genome scans, we used the spline-based window size determined by GenWin (Beissinger et al. 2015), which scales window size across the genome based on statistically guided breakpoints in the data. Scaling window sizes in this way increases sensitivity to detecting outliers and lowers the false discovery rate compared with using arbitrary window sizes.
Reconstructing Demographic History
We estimated changes in the effective population size through time with PSMC (Li and Durbin 2011) using the highest coverage resequencing data of to avoid introducing bias from lower coverage samples. We used a yearly mutation rate of 2.21×10−9 originally calculated for the zebra finch genome (Nam et al. 2010) and a 1-year generation time generally employed in studies of passerine birds, including the North American Turdidae (Topp et al. 2013). The mutation rate used here is similar to the value used for noncoding DNA in previous phylogeographic studies of birds (Lee and Edwards 2008), adjusted in this case from estimates for avian introns in Ellegren (2007). Additionally, we used IMcoalHMM (Mailund et al. 2012) to estimate divergence time, effective population size, migration interval, and recombination rate under a pure isolation model and isolation with migration model, comparing approximately 10-Mb regions from each pseudochromosome of both reference genomes. We first aligned each pseudochromosme from the reference genomes with Mugsy (Angiuoli and Salzberg 2011), and filtered the alignments using MafFilter (Dutheil et al. 2014) and SeqKit (Shen et al. 2016). We then selected the 20 pseudochromosome alignments (two for each of nine autosomes and the Z chromosome) with the best quality between 8 and 10 Mb length. For the isolation model, chains were run for 50 million generations with a temperature scale of 8 million, and 500 MCMC generations between samples, for a total of 5,000 samples. For the isolation–migration model, we needed ran chains for 1 billion generation, a temperature scale of 30 billion, 1,000 steps between samples, yielding a total of 5,000 samples. For both models, we used a burn-in of 100. We used Tracer (Rambaut et al. 2018) to assess convergence and ensure that the effective sample size of all parameters was >200.Finally, to corroborate the low level of introgression between C. bicknelli and C. minimus estimated from IMcoalHMM and to examine possible asymmetry of introgression in the species complex, we conducted ABBA-BABA tests using ANGSD to compare samples from each of the two C. minimus subspecies, one C. bicknelli and one C. ustulatus as the outgroup.
Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.Click here for additional data file.
Authors: Sangeet Lamichhaney; Jonas Berglund; Markus Sällman Almén; Khurram Maqbool; Manfred Grabherr; Alvaro Martinez-Barrio; Marta Promerová; Carl-Johan Rubin; Chao Wang; Neda Zamani; B Rosemary Grant; Peter R Grant; Matthew T Webster; Leif Andersson Journal: Nature Date: 2015-02-11 Impact factor: 49.962
Authors: Pardis C Sabeti; Patrick Varilly; Ben Fry; Jason Lohmueller; Elizabeth Hostetter; Chris Cotsapas; Xiaohui Xie; Elizabeth H Byrne; Steven A McCarroll; Rachelle Gaudet; Stephen F Schaffner; Eric S Lander; Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; Todd A Johnson; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart Journal: Nature Date: 2007-10-18 Impact factor: 49.962
Authors: Jennifer Walsh; Gemma V Clucas; Matthew D MacManes; W Kelley Thomas; Adrienne I Kovach Journal: Ecol Evol Date: 2019-11-07 Impact factor: 2.912