Clemens Küpper1,2, Michael Stocks1, Judith E Risse3, Natalie Dos Remedios1, Lindsay L Farrell1,4, Susan B McRae5, Tawna C Morgan4,6, Natalia Karlionova7, Pavel Pinchuk7, Yvonne I Verkuil8, Alexander S Kitaysky6, John C Wingfield9, Theunis Piersma8,10, Kai Zeng1, Jon Slate1, Mark Blaxter3, David B Lank4, Terry Burke1. 1. Department of Animal & Plant Sciences, University of Sheffield, Sheffield, UK. 2. Institute of Zoology, University of Graz, Graz, Austria. 3. Edinburgh Genomics, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK. 4. Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, Canada. 5. Department of Biology, East Carolina University, Greenville, North Carolina, USA. 6. Department of Biology and Wildlife, Institute of Arctic Biology, Fairbanks, Alaska, USA. 7. Scientific and Practical Center for Bioresources, Minsk, Belarus. 8. Conservation Ecology Group, Groningen Institute for Evolutionary Life Sciences, Groningen, The Netherlands. 9. Department of Neurobiology, Physiology and Behavior, University of California Davis, Davis, California, USA. 10. Department of Marine Ecology, NIOZ Royal Netherlands Institute for Sea Research, Den Burg, Texel, The Netherlands.
Abstract
Three strikingly different alternative male mating morphs (aggressive 'independents', semicooperative 'satellites' and female-mimic 'faeders') coexist as a balanced polymorphism in the ruff, Philomachus pugnax, a lek-breeding wading bird. Major differences in body size, ornamentation, and aggressive and mating behaviors are inherited as an autosomal polymorphism. We show that development into satellites and faeders is determined by a supergene consisting of divergent alternative, dominant and non-recombining haplotypes of an inversion on chromosome 11, which contains 125 predicted genes. Independents are homozygous for the ancestral sequence. One breakpoint of the inversion disrupts the essential CENP-N gene (encoding centromere protein N), and pedigree analysis confirms the lethality of homozygosity for the inversion. We describe new differences in behavior, testis size and steroid metabolism among morphs and identify polymorphic genes within the inversion that are likely to contribute to the differences among morphs in reproductive traits.
Three strikingly different alternative male mating morphs (aggressive 'independents', semicooperative 'satellites' and female-mimic 'faeders') coexist as a balanced polymorphism in the ruff, Philomachus pugnax, a lek-breeding wading bird. Major differences in body size, ornamentation, and aggressive and mating behaviors are inherited as an autosomal polymorphism. We show that development into satellites and faeders is determined by a supergene consisting of divergent alternative, dominant and non-recombining haplotypes of an inversion on chromosome 11, which contains 125 predicted genes. Independents are homozygous for the ancestral sequence. One breakpoint of the inversion disrupts the essential CENP-N gene (encoding centromere protein N), and pedigree analysis confirms the lethality of homozygosity for the inversion. We describe new differences in behavior, testis size and steroid metabolism among morphs and identify polymorphic genes within the inversion that are likely to contribute to the differences among morphs in reproductive traits.
Stable genetic polymorphisms for alternative reproductive strategies, involving differences in body size, structure, and behaviour, have been identified in a range of taxa[9,10]. These traits are clearly associated with fitness and putatively maintained by frequency-dependent selection. However, little is known about their genomic basis or mode of evolution, or why examples are relatively rare, in contrast to the widespread phenotypic plasticity in reproductive strategy in response to environmental variation[10].The ruff (Philomachus pugnax) is a Eurasian sandpiper with a highly polygynous, lek-based mating system involving competition and cooperation among three male morphs (Fig. 1), and extensive female mate choice and genetic polyandry[1,2,11,12]. Territorial “Independent” males, with hypervariable but predominantly dark ornamental plumage, defend small mating courts on leks. Non-territorial “Satellite” males, with predominantly white ornamental plumage, join Independents on courts, co-displaying to attract females, but competing for matings. Rare small female-mimicking “Faeder” males attend Independents’ courts and attempt rapid copulations when females solicit matings from ornamented displaying males[3].
Figure 1
Comparison of of Independent, Satellite, and Faeder ruff males. a, Representative breeding plumages. b, Body size distributions[5] (see Methods). c, Comparisons of morph behavioural profiles for Aggression (“Aggr.” – black), Display (“Disp.” – red), Proximity (“Prox.” – brown), and Alert stance (“Alert” – green). Morph-specific boxplots are plotted for each behaviour from the means of individual male ruffs, standardized across the sample population by subtracting the global mean and dividing by s.d.. Morphs differed in their distributions of behaviour (MANOVA on individual bird mean values; Wilks’s Lambda = 0.33, approximate F6,44 = 5.37, P < 0.0003). d, Residuals of the logarithm of testes volume index corrected for date differ among morphs (F2,37 = 6.31, P = 0.0045). Testes of Independents (n = 29) are substantially smaller than those of Satellites (n = 4, Tukey-adjusted P = 0.006) and Faeders (n = 6, P = 0.02). Boxplots indicate median (bold line), 25% and 75% quartiles (box) and ranges (whiskers). e, f, Seasonal patterns of circulating steroid concentrations in blood plasma of male ruffs: e, androstenedione (A4), f, testosterone plus dihydrotestosterone (T+dHT). Points are daily means±s.d. Independents (n = 11: red, jittered −1 day; Satellites (n = 9): blue; Faeders (n = 2): black, jittered +1 day. Independents have higher date-specific levels of T than Satellites (F1,97 = 10.01, P = 0.002), and Satellites higher levels of A4 than Independents (F1,97 = 17.03, P < 0.0001; Supplementary Table 1); Faeders appear similar to Satellites, but were not tested statistically due to n = 2.
Two autosomal Mendelian factors, Faeder (F) and Satellite (S), have been shown to be dominant to independent (i), but the precise genetic architecture of morph determination remains unclear[4,5]. We have maintained a pedigreed captive breeding population of several hundred ruffs since 1985. Adult males were phenotyped for mating behaviour (Independent, Satellite, Faeder), and the presence or absence of ornamental plumage (ornamented vs Faeder). Some adult females were also phenotyped, with small size indicating carriers of Faeder[5], and aggressive behaviour following testosterone implantation permitting classification as Independents or Satellites[13].We explored the reproductive physiology of the morphs, with a particular focus on steroid metabolites known to affect these phenotypes[13]. The testes of Satellites, like those of Faeders[3], were larger than those of Independents, despite their smaller body sizes (Fig. 1b, Supplementary Fig. 1), presumably to increase the efficiency of less frequent or effective copulations by producing larger numbers of sperm. Breeding Independents had higher circulating testosterone concentrations, while Satellites and Faeders had higher concentrations of androstenedione (Fig. 1e, f; Supplementary Table 1), suggesting fundamental morph differences in the regulation of the hypothalamus–pituitary–gonadal system that drives seasonal reproduction[14,15].To identify and characterise the loci controlling morph divergence we used a combination of genetics and genomics, based on both our captive population and wild ruffs. We generated a reference ruff genome from a single Independent male in the captive colony, using Illumina paired-end, 3-kb and 5-kb mate-pair data, and long-read PacBio data (Supplementary Fig. 2; Supplementary Table 2). The assembled genome has a span of ~1.17 Gb in 12,085 scaffolds longer than 1 kb, with 292 scaffolds longer than 877 kb (Supplementary Table 3). Genes were predicted using extensive new transcriptome data (Supplementary Table 4). Assessment of assembly quality using genomic read mapping, transcriptome read and assembly mapping, and comparison to the high-quality chicken (Gallus gallus) genome assembly suggest high completeness and contiguity (Supplementary Table 5; Supplementary Fig. 3). The draft genome was ordered and orientated using the chicken reference to yield a chromosome-level assembly. We identified and typed single nucleotide polymorphisms (SNPs) in the pedigree population using reduced representation, restriction-site associated DNA (RAD) sequencing[16] at low density (i.e. based on SbfI restriction sites), identifying 1,068,556 SNPs. We mapped Faeder and Satellite to a genetic map based on 3,948 biallelic SNPs with minor allele frequency ≥ 0.1, which had been typed in 286 individuals with a success rate of ≥ 0.99.Faeder and Satellite each mapped significantly and uniquely (LOD >3), to the same region of chromosome 11 (Fig. 2; Supplementary Fig. 4), coincident with the region previously identified for Faeder using microsatellite markers[17]. The region of linkage encompasses about one fifth of the chromosome. Independent estimation of genetic maps for cohorts containing or excluding birds carrying Satellite or Faeder gave very different recombination lengths for this region (Fig. 2b), suggesting that it might contain a genomic rearrangement refractory to recombination between haplotypes.
Figure 2
Genetic mapping and genome-wide association analysis identify the genomic region determining ruff reproductive morphs. a, The ruff linkage mapping pedigree. Paternal links are shown as red lines and maternal links as blue lines. b, Linkage maps (in centiMorgans) of the inversion region on ruff chromosome 11 support the presence of the inversion polymorphism, indicated by an ~70% shorter linkage map when only Independents are considered. SNPs on the same contig share the same colour. c, Association between markers and morphs based on 41 unrelated males (10 Faeder, 10 Satellite, 21 Independent). Alternating shades of grey indicate different contigs, ordered based on synteny with the chicken genome. The top panel shows −log10P-values of association across the entire ruff genome; middle panel (Faeder) and bottom (Satellite) show blow ups of the associated peak (0.703–0.713 Gbp, indicated by dashed lines in the top panel) for comparisons with independent. Red and blue dots indicate genome-wide significance (P < 0.05, 1000 permutations) for the Faeder and Satellite loci, respectively.
A separate genome-wide association study, using densely sampled SNPs (from RAD sequencing at PstI restriction sites) in a sample of 41 unrelated Faeder, Satellite and Independent birds, identified genome-wide significance of association with the mating morphs on contigs from the same section of chromosome 11 (Fig. 2c). The high resolution provided by the unrelated individuals revealed specific segments of this region uniquely associated with either Faeder or Satellite, indicating the presence of morph-specific haplotypes spanning several megabases of the chromosome.To map finely the variation in the region, we whole-genome sequenced an additional five males (one Independent, two Satellites and two Faeders) at 80x genome coverage (Supplementary Table 2). Nucleotide variation (measured by Watterson’s theta) was substantially greater in Faeder and Satellite individuals within the region of interest (Faeder = 0.011; Satellite = 0.011), compared to regions adjacent to the inversion (Faeder = 0.003; Satellite = 0.002). Divergence (measured by Dxy) between Faeder and the other morphs was high across the entire region (Fig. 3a), with morph-specific lineage sorting occurring within the region (Fig. 3c). These patterns are consistent with the presence of a large non-recombining inversion, and this was confirmed by the orientations of read pairs across the breakpoints (Supplementary Fig. 4).
Figure 3
Sequence divergence among morph-determining haplotypes. a, Divergence (Dxy) among morphs across the inversion (4.4 Mbp) and flanking regions. Vertical dotted lines refer to the two inversion breakpoints. The gene CENP-N crosses the breakpoint located on Contig 3357.Each line is staggered by 15 kb to make each line visible. b, c, d, Evolutionary relationships among the three morphs. Maximum-likelihood trees showing the relationship among the sequences from two resequenced Satellites, two Faeders, and the non-inverted reference (Independent 1) and a resequenced Independent (Independent 2) for different regions within and around the inversion: b, adjacent to the inverted region (Contig 3913: 140,000–150,000), c, within regions of the inversion exhibiting high divergence from the reference (Contig 3357: 280,000–290,000), d, within regions of the inversion where Satellites show low divergence from Independents (Contig 3208). e, f, g, h, Divergence patterns between Faeder and Satellite across regions containing candidate loci involved in steroid hormone metabolism (PLCG2, SDR42E1, HSD17B2 and CYB5B), sperm motility (GAS8) and pigmentation (MC1R), plus also transcription factors expressed in the gonads (ZFPM1) and in proximity to MC1R showing Satellite-specific divergence (TCF25). Arrows indicate the presence of morph-specific deletions affecting the coding region of PLCG2 (see Supplementary Fig. 6) and deletions in noncoding regions surrounding HSD17B2[20]. Dots indicate the presence of morph-specific nucleotide substitutions (Satellite – blue; Faeder – red).
One inversion breakpoint disrupts the Centromere protein N (CENP-N) gene between its fourth and fifth exons. CENP-N is essential for mitotic centromere assembly[18]. We therefore predicted that homozygosity for the inversion haplotypes would be lethal and our breeding data confirmed a complete absence of inversion homozygotes (Table 1). The other breakpoint appears to be in a non-coding repeat.
Table 1
Segregation ratios demonstrating apparent lethality of inversion genotypes in matings between heterozygotes identified with diagnostic SNPs.
Phenotypeassociatedwith SNP
SNP*
Progeny
Genotypes
Deviation from1:2:1
Deviation from1:2†
χ 22
P
χ 21
P
Faeder
++
+I
II
A & B
All
Obs
20
16
0
22.67
1.2 × 10−5
8.00
0.005
Exp
9
18
9
Males
Obs
8
7
0
8.60
0.014
2.70
0.100
Exp
3.75
7.5
3.75
Females
Obs
12
9
0
14.14
8.5 × 10−4
5.36
0.021
Exp
5.25
10.5
5.25
Satellite
++
+I
II
C
All
Obs
20
18
0
21.16
2.5 × 10−5
6.37
0.012
Exp
9.5
19
9.5
Males
Obs
9
9
0
9.00
0.011
2.25
0.134
Exp
4.5
9
4.5
Females
Obs
11
9
0
12.30
0.002
4.23
0.040
Exp
5
10
5
Data were obtained by pooling all offspring across reproductive events where both parents were heterozygous for diagnostic SNPs. I represents the inversion haplotype (either Faeder or
Satellite associated), + is the ancestral-ordered haplotype (independent associated).
SNPs used: A, contig 1270:576,631; B, contig 3047:368,535; C, contig 1270:576,631. The sex ratios produced by these crosses show no suggestion of sex-specific lethality, i.e. no sex difference in the ratio of ++:+I (SNPs A, B and C Fisher Exact test, P = 1.00)
excluding inversion homozygote (II) class.
Since recombination is completely suppressed close to inversion breakpoints, regions in linkage disequilibrium with breakpoints are expected to have high divergence[7,19]. This expectation was supported by divergence analyses using our resequencing data. We identified 44,433 SNPs specific to one of the three haplotypes in the assembled reference (independent) inversion region. The inverted haplotypes showed several structural changes, i.e. regions with large (> 100 bp) deletions and duplications (Supplementary Fig. 5). Within some regions of the inversion, Satellites showed greater similarity to Independents (Fig. 3a, d), suggesting that Satellite originated through recombination or gene conversion between inverted Faeder and non-inverted independent alleles[20].We phased SNPs located in the inversion into longer haplotypic sequences and compared 100 gene sequences directly between the three morphs. Of these, 78% showed consistent morph-specific differences, including amino-acid substitutions, insertions and deletions (Supplementary Table 6). In addition, two of six genes adjacent to the inversion showed Faeder-specific protein sequence differences. Under neutral drift and in the absence of recombination we would expect consistent divergence across the inverted region. However, the genes in the inversion varied widely in their divergence among morphs, with several showing high divergence (Fig. 3e-h).Several divergent protein-coding genes have predicted functions in hormonal systems, such as androsteroid homeostasis and plumage development relevant to ruff morph phenotypes (Fig. 3e-h and Supplementary Table 7). A key candidate is Estradiol 17-beta-dehydrogenase 2 (HSD17B2), encoding an enzyme that preferentially inactivates testosterone to androstenedione and estradiol to estrone[21]. Also present are Short-chain dehydrogenase reductase (SDR42E1), Palmitoyltransferase (ZDHHC7)[22] and Cytochrome b5 (CYB5B) (Supplementary Table 7). The morph-specific alleles of these enzymes may alter steroid secretion levels and/or receptor responsiveness, driving morphological and neurological mechanisms responsible for contrasting anatomical, plumage and behavioural profiles (Fig. 1c). Genes involved in steroid metabolism have also been implicated in another polymorphic vertebrate – the white-throated sparrow Zonotrichia albicollis[15] – in which the two morphs are similarly associated with an inversion. As in ruffs, the sparrow morphs differ in aggression and testosterone during the breeding season[23]. This suggests that convergent molecular pathways may contribute to the evolution of behavioural variation during reproduction. Another gene located in the ruff inversion, Melanocortin-1 receptor (MC1R), a locus that controls colour polymorphisms in other birds[24], might account for the reduced melanin in Satellite display feathers (Fig. 1a). In Faeder (but not Satellite), the gene for 1-Phosphatidylinositol 4,5-bisphosphate phosphodiesterase γ-2 (PLCG2) has experienced complex deletions and rearrangement, including the loss of an exon encoding an SH3 protein-interaction domain, and is probably a loss-of-function allele (Supplementary Fig. 6). PLCG2 is a transmembrane signalling enzyme involved in cell receptor activation[25] and interacts with epidermal growth factor receptor EGFR[. As EGFR is involved in the formation of feather arrays[27], this locus is a candidate for the loss of secondary sexual expression of display feathers and behaviour in Faeders. Additional genes with roles in sperm motility and gonadal expression are also present (Supplementary Table 7).As inversion homozygotes appear to be lethal, to maintain allelic frequencies, the fitness of individuals carrying the inversion, in one or both sexes, must exceed that of ancestral independent homozygotes. Heterozygous carriers of the inversion also have poor survival in crosses (Table 1). Higher reproductive success by Satellite and Faeder males is a likely explanation for how the survival disadvantage is offset, and their larger testis sizes suggest that they might be more successful in sperm competition, despite equal or lower mating rates[2,11]. Selection should also favour disassortative mating by individuals carrying the inversion, particularly by females. Since some ruff females mate with multiple morphs[12], morph discrimination of mates, if it occurs, is not ubiquitous. Strong disassortative mating is a key feature of the white-throated sparrow system, although the causative inversion in this species is not lethal[15].Alternative reproductive morphs are predicted to evolve when strong reproductive skews provide low thresholds for invading forms[9]. The lek mating system of ancestral Independents would have satisfied this condition, but numerous other species that lack genetically polymorphic alternatives also do so[10]. The occurrence of genomic rearrangements that can provide viable substrates for differentiation must be rare, but by suppressing recombination and combining the fates of loci within the same genomic region, the inversion in the ruff enabled a phenotypically complex alternative strategy to evolve through the coevolution of genes affecting male behaviour, morphology, and fertility. As shown here and independently elsewhere[20], the initial occurrence of one alternative can facilitate the evolution of multiple morphs,which has also occurred in other species[9,10,28,29,30].
URLs
Assemblage gene prediction pipeline, https://github.com/sujaikumar/assemblage/blob/master/README-annotation.md; CLC Bio Assembler, http://www.clcbio.com/products/clc-assembly-cell/; Slide, https://github.com/mspopgen/slide; Evolib, https://github.com/mspopgen/Evolib; GWAS and population genetic pipelines, https://github.com/mspopgen/kuepper2015.
ONLINE METHODS
Samples
DNA samples were obtained from the Simon Fraser University colony, which was founded with 110 ruffs hatched from wild eggs collected in Finland prior to 1990[4,5] plus two Faeder males from the Netherlands in 2006 (under permits from the Canadian Food Inspection Agency, Canadian Wildlife Service, and Simon Fraser University Animal Care Committee). Additional blood samples were obtained from wild males in breeding plumage caught in the Netherlands (10 Faeders and 10 Satellites) between 2004–2008 or Belarus (one Independent and one Satellite) in 2014 (under permits from Dutch Ringing Centre, Animal Experimentation Committee of the University of Groningen and Belarus Bird Ringing Centre).
Male morph determination
Behavioural phenotypes of captive male ruffs were determined during the breeding season[1,3,13,32] (Fig. 1c). Classification of ornamented males was based on ethological displays[13], with some wild males assigned from plumage[32,33,34,35]. Faeders were definitively identified by small size, and lack of ornamental plumage, seasonal facial wattles, and epigamic display[3,37].
Behavioural profiles
We quantified behaviour in captivity of 19 Independents, six Satellites and two wild-caught Faeders (Fig 1c). The Faeders were housed with 61 females. Two Independents and one Satellite were visible in an adjacent pen. These ornamented males were introduced to the Faeders and females during 1–2-hour morning observation periods. We scanned postures and relative positions of males at 2-minute intervals, and recorded all interactions, in 55 sessions over 50 days. Independents and Satellites were replaced at least every five or ten days, respectively.Four behavioural variables summarized differences among morphs[1,3,13]. Aggressive behaviour included: total chases, bill points, bill thrusts, or fights per minute. Display was defined as the proportion of scans with squats, half-squats, and obliques. Proximity was the proportion of scans when positioned < 2 bird lengths from another displaying male. Alert stance was the proportion of scans standing on the lek with head up. Fig. 1c shows morph-specific means of rates calculated for each behaviour from the means of each male, standardized across the sample population.
Testes volume index
Lengths (L) and widths (W) of both right and left testes were measured by T.P. with callipers to the nearest 0.1 mm from pre-breeding and breeding ruffs that died during capture in the Netherlands during March–June in 1993–2005[38] (Supplementary Fig. 1a). A volume index including both testes, in mm3, was calculated assuming testes were cylindrical (L × W2 × 0.785). A non-breeding baseline index, measured during late winter, was defined as <120 ml3. Residuals from log(index) were calculated for males caught during 10 April–15 May each year showing gonadal recrudescence above baseline, using a quadratic regression controlling for date (F2,36 = 7.62, P = 0.0002, ,
Supplementary Fig. 1).
Steroid hormone measurement
Hormone levels were measured in blood plasma samples collected in 2003 and 2006[39]. In 2003, we sampled 16 displaying males 3–9 years old held with four other males in two groups of ten (five Independents and five Satellites). Blood was sampled between 09:00–14:00 approximately every two weeks prior to, throughout, and after the breeding season (14 March–7 July) (n = 107) and plasma separated and stored at −20°C. In 2006, we sampled two groups of three Independents and two Satellites. Two Faeder males (see Behavioural profiles, above) were also sampled. Males had constant visual access to females, and physical access for 2–3 hours between 06:00–11:00, when lek attendance is highest in the wild[40]. Blood samples (n = 50) were collected between 10:00–12:00, immediately after males had access to females, approximately every two weeks between 14 May–2 July.Plasma samples were analysed at the University of Alaska, Fairbanks in duplicates following established radio-immunoassay (RIA) procedures[41,42]. Thirty 2003 samples were extracted with HPLC-grade dichloromethane. Steroids were separated using diatomaceous earth/glycol chromatography (“column” RIA), such that testosterone (T), dihydrotestosterone (DHT), and androstenedione (A4) could be analysed from a single 100-μl plasma sample. T and DHT titres were strongly correlated (r230 = 0.95, P < 0.0001), thus the remaining 2003 samples were analysed in two assays without separation of steroids prior to RIAs (“direct” RIA), using a 100-μl sample for T+DHT (“total T”), and 50 μl for A4. All 2006 samples were assayed for total T and A4 using 100 μl plasma with steroids separated using diatomaceous earth/glycol chromatography (“short column RIA”). The antibody with the same cross-reactivity for T and DHT (T-3003 Research Diagnostics) was used for T, DHT, and total T RIAs, and the A4-specific antibody (A-1707 Wien Laboratories) was used for A4 RIAs. Mean ± s.d. percentage recoveries were: 55.5 ± 8.7, 45.8 ± 6.9, and 45.6 ± 7.3 for A4, T, and DHT, respectively, in the column assay; 69.1 ± 5.2 for total T and 67.7 ± 6.2 for A4 in direct assays; and 72.7 ± 7.0 for total T and 64.0 ± 10.0 A4 in short columns. Inter-assay CV was 21% for total T and 22% for AE, and intra-assay CVs were < 10%. Minimum detectability was 3.90 pg/sample for A4, and 1.95 pg/sample for T, DHT, and total T.We tested for morph differences in levels and temporal patterns of circulating steroids (Figs. 1e, f) using generalized linear mixed models with log linked Poisson distribution. Date, its quadratic effect, and two-way interactions with morph were included as factors. Bird was a random factor, along with assay type and year (= social situation), which had no detectable effects (Supplementary Table 1).
Pedigree construction
Preliminary pedigree assignments from 1985–2013 were generated using 27 microsatellite markers[43] in 756 ruffs. We assigned parentage including all possible candidate parents using Cervus[44] and Colony[45]. Inconsistencies were resolved by manual inspection, incorporating housing information of the most likely candidates.
Genome sequencing
We sequenced the genome of an Independent male from the captive colony. Illumina HiSeq 2500 v4 150-bp paired reads were generated, using paired-end and mate-pair libraries with various insert sizes, resulting in 137x raw coverage (Supplementary Table 2, Supplementary Fig. 2). We used Pacific Biosciences RS II technology with P5–C3 chemistry to generate 8.8x coverage in long reads (mean length 5,713 bp).
Genome sequence assembly
The genome was de novo assembled using an integrated approach (Supplementary Fig. 2).
Cleaning and trimming of raw reads
The raw Illumina paired-end library reads were quality trimmed (> q30) and adapter sequences removed using fastqc-mcf (ea-utils.1.1.2-537[46]). Short reads (< 50 bases) were discarded. Mate-pair reads were quality trimmed (> q30) and adapter and linker sequences removed using CutAdapt[47] 1.3. We retained only read pairs containing Nextera linker sequence and remaining read length > 50 bases.
Removal of contaminant data
An initial assembly of raw paired-end reads was prepared using CLC Bio assembler (CLC bio 4.2.0, Aarhus, Denmark; see URLs). Contaminant data deriving from bacterial, parasite and viral genomes were identified using BLAST[48] against the NCBI nr database, reporting the best hits with E-value < e−50. Contigs likely to be contaminant were extracted and reads mapping to these removed.
Kmer optimisation
The optimal kmer size for assembly was estimated with sga-preqc v0.10.13[49] and kmergenie[50] v1.5924 on one lane each for each paired-end library and a kmer sweep with ABySS[51] v1.3.7) using all paired-end and mate-pair reads. Sga identified an optimum around k = 35, while kmergenie identified k = 30 and 38 as the optimal sizes and ABySS suggested k = 38. We used k = 38.
Genome assembly
Preliminary assembly of all paired-end and mate-pair reads using ABySS resulted in unitigs that were masked using RepeatMasker[52] (version open-4.0.5), with mate-pair reads mapped using bwa-aln and bwa-sampe (v0.7.7[53]). Reads where both of the pair mapped to the masked unitigs were retained. Filtered paired-end reads were then used to scaffold the unitigs using SSPACE[54] (Basic 2.0). Further scaffolding was done with PacBio reads using PBJelly[55] v14.1.14.
RNA sequencing and annotation
Previous RNA-Seq data in ruffs were available for genes expressed in feather follicles[56]. We generated new RNA-Seq data from egg, chick heart, lung and brain, female heart and brain, and male heart, brain and testes to obtain a wide variety of transcripts for gene annotation (Supplementary Table 2). Raw data were quality and adapter trimmed using CutAdapt 1.3), and assembled using Trinity[57] (version r20140413). The initial transcripts were filtered for abundance (Trinity: align_and_estimate_abundance.pl, filter_fasta_by_rsem_values.pl; RSEM 1.2.7[58]).
Gene prediction
We predicted genes using the Assemblage gene prediction pipeline (see URLs) up until the second round of Maker, using Maker 2.31.7[59], cegma 2.4[60], SNAP version 2006-07-28[61] and GeneMark-ES 2.3e[62]. Highly-expressed, unique transcripts from the Trinity assembly and proteins from Uniprot/Swissprot were used as evidence. Predicted genes with an Annotation Edit Distance (AED) < 1 were selected from the Maker first-pass results and used as hints to train Augustus v3.02[63] (Supplementary Table 4). Genes predicted by Augustus on the inversion-associated contigs were further annotated using blast2go[64]. Annotations were checked manually to identify incorrectly split or fused genes (n = 5 and 4, respectively). In total, we annotated 125 genes (101 with known homologues) within the inversion, with 89 (71%) supported by partial mRNA transcripts.
RAD sequencing
We chose 300 ruffs from the pedigree for low-density RAD sequencing and genetic mapping. For the initial GWAS we used high-density RAD sequencing of 41 unrelated males with established phenotypes originating from Finland (21 Independents) and the Netherlands (10 Faeders, 10 Satellites). For the pedigree analyses, we sampled to maximise pedigree completeness, morph representation and number of generations in our pedigree. Genomic DNA was digested using restriction enzymes SbfI (low-density) or PstI (high-density), following Baird et al.[19]. We pooled samples from different morphs during library preparation.
SNP calling
For calling SNPs in the pedigreed, low-density RAD sequences and in the resequenced individuals we used the GATK 3.2.2 variant-calling pipeline[65]. Reads were aligned to the reference genome with bwa-mem 0.7.10 and realigned using GATK RealignerTargetCreator and IndelRealigner to improve alignment quality before running GATK HaplotypeCaller. Genotypes were then called using GATK GenotypeGVCFs. For the analysis of high-density RAD sequences from unrelated individuals, demultiplexing and filtering were performed using Stacks v1.21. Mapping was performed using bwa-mem 0.7.10. Samtools v0.1.19-44428cd[66] was used for filtering bam files. We only included properly mapped and paired reads, removing reads with non-primary, supplementary or terminal alignments, and with mapping quality > 30. SNP calling was performed using bcftools v0.1.19-44428cd[67].
Confirming linkage to the Satellite- and Faeder-determining loci
We used twopoint linkage mapping to locate the Faeder and Satellite loci. Fourteen individuals lacking pedigree links with the remaining 286 RAD-sampled birds were excluded, resulting in an eight-generation pedigree with 286 paternal and maternal links, 186 maternal grandmaternal/grandpaternal links and 195 paternal grandmaternal/grandpaternal links (Fig. 2a). Phenotypes were known for 189 individuals (117 Independents, 38 Faeders and 34 Satellites); 85 additional individuals were known to not be Faeders. Twelve birds were unknowns.Satellite and Faeder causal loci were both scored as segregating in the pedigree (Independents as 1/1 homozygotes, Satellites and Faeders as 1/2 at the Satellite and Faeder loci, respectively). We followed a two-stage procedure. First, 3,948 informative SNPs (3,901 had ≥ 50 informative meioses) were tested for linkage with Satellite and Faeder using CriMap v2.503[68] (modified by Jill Maddox, Department of Veterinary Science, University of Melbourne, Australia). We split the pedigree into smaller families, using the crigen function of the Linkage Mapping Software (Xuelu Liu, Monsanto). We then performed twopoint linkage analysis. SNPs associated with Faeder or Satellite and LOD score > 3 were associated with Chromosome 11. In the second stage, all low-density RAD SNPs from contigs spanning the inversion (n = 3,810) were tested for linkage with the two morphs. Twopoint mapping usually finds relatively few markers co-segregating with a causal variant. However, numerous SNPs within the chromosome 11 contigs showed highly significant, perfect co-segregation (LOD score > 3 and recombination fraction of zero) with both Faeder and Satellite (Fig. 2b), suggesting an inversion polymorphism.
Confirming the inversion with linkage mapping
An inversion polymorphism produces alternative marker orders segregating within the mapping panel. Thus any marker order tested will contain erroneous recombination events that lead to larger estimates of the map length. We tested whether maps from the complete dataset (286 birds) and from a subset of the data (Independents only, n = 215 birds) differed in map distances. SNPs on contigs spanning the inversion, with at least 70 informative meioses and separated by ≥50 kbp, which caused no parent–offspring mismatches in the pedigree, were retained for analyses, leaving 35 markers. Because of the conserved synteny between the ruff and chicken genomes, linkage map lengths were initially tested using the contig and SNP order inferred from homology to the chicken genome. Alternative orders attempted with the FLIPS option showed lower likelihoods and longer maps than the initial marker order. Sex-averaged map distances, estimated using the Crimap command CHROMPIC, were 74.9 cM for the complete dataset and 20.7 cM (28%) for the reduced dataset (Fig. 2b). The latter measure is consistent with other avian maps of chromosome 11[69,70,71,72], supporting the presence of an inversion polymorphism in this region.
Genome-wide association study
Tests of association between markers and morphs were performed using GenABEL[73]. Correction for population stratification was performed by first calculating identity-by-state from all SNPs in the dataset and adding these as cofactors in the model. Genome-wide significance was assessed by performing 1,000 permutations of the data.
Inversion mapping with paired-end reads
Following Corbett-Detig et al.[74] , we searched for an inversion in the region of interest identified by linkage mapping and genome-wide association study. We identified two breakpoints exhibiting morph-specific clustering of reads mapping in parallel to the (independent) reference genome at position 185,694 on Contig 3357 and position 821,901 on Contig 1270 (Supplementary Fig. 7). Resequenced Faeders and Satellites had reads mapping in parallel orientation at these breakpoints and reduced coverage (~50%) of properly mapped reads (Supplementary Fig. 7). Using allele-specific primers, we confirmed the predicted sequence across one breakpoint (Contig 1270) in three Satellite males. The inverted sequence at one breakpoint showed an inserted repetitive motif in Faeder but absent in independent, mapping to a non-LTR retrotransposon. These two breakpoints coincided with sharp changes in between-morph divergence (Dxy; Supplementary Fig. 7) and increased heterozygosity in Faeder and Satellite individuals, indicating that the inversion is heterozygous in Faeders and Satellites.
Confirming that the inversion haplotype is a lethal recessive
The pedigree enables tests of lethality of the inversion haplotype(s) carrying the Faeder and Satellite alleles. Since Satellite and Faeder morphs are imperfectly identified phenotypically, we first identified SNPs co-segregating with Faeder or Satellite. Eighteen SNPs co-segregated with Faeder with zero recombination fraction and a LOD score > 15. For two of these (Contig3047:314,715; Contig3047:314,697), we identified > 30 progeny where both parents were heterozygous for the inversion haplotype. Under Mendelian segregation, we expected a 1:2:1 ratio of genotypes AA:AB:BB among the progeny, where A is the ancestral-ordered and B is the inversion-associated allele. At both SNPs the ratio of genotypes deviated significantly from expectation (Table 1), suggesting that inversion homozygotes are lethal. Furthermore, the observed ratio of AA:AB also exceeded Mendelian expectations (1:2), suggesting that heterozygotes also carry a viability cost (Table 1).These results were supported by two further SNPs that perfectly co-segregated with Satellite alleles and where > 30 progeny of heterozygous × heterozygous matings were observed (Table 1). For both, the lack of inversion homozygotes and the deficit of heterozygotes were significant. The lethality of the inversion was not sex-specific and is probably not morph-specific, although this is difficult to confirm due to some unknown female phenotypes. In both sexes, the absence of inversion homozygotes was statistically significant and we produced more non-inversion homozygotes than heterozygotes, although the departure from the expected 1:2 ratio was only significant in females.We did not specify family structure in Table 1, as fertilizations in birds are independent events. A more conservative test estimated a G-test statistic for each of 5 paternal half-sibships with 4 or more offspring (27 offspring across 5 families), and compared the summed the G-test statistics against a chi-square distribution. The complete absence of inversion homozygotes remained statistically significant (Chi-square = 12.2, df = 5, P = 0.03).
Population genomics and phylogenomics
Divergence (Dxy) and heterozygosity were obtained using Evolib (see URLs), with sliding windows, nucleotide diversity and Tajima’s D calculated using Slide (see URLs). Maximum-likelihood reconstruction of morph phylogenies was performed using RAxML[75] under the generalised time-reversible substitution model with a gamma model of rate heterogeneity (GTRGAMMA), with maximum-likelihood searches performed on 50 randomized stepwise addition parsimony trees. Orthologous regions of the outgroup species, killdeer (Charadrius vociferus)[76], were obtained through blasting and performing multiple alignment with the phased haplotypes.
Haplotype calling and analysis of morph-specific amino-acid differences
We established haplotypes of alleles differing from the reference for five 80x resequenced wild ruffs for all inversion contigs. We used read-backed phasing, implemented in GATK[65] (version 3.3.0), phasing SNPs co-occurring on the same (or paired) sequence reads into the same haplotype. We refrained from haplotype calling in the region within 3 kbp of the breakpoints. Inversion haplotypes were, on average, longest for Faeders (14.1 kbp), then Satellites (3.4 kbp), then Independents (1.0 kbp). Deletions longer than 100 bp were identified, based on a morph-specific drop in read coverage (read depth reduced to approximately 50% in inversion carriers, Supplementary Fig. 5).After phasing, we used GATK’s FastaAlternateReferenceMaker tool to generate haplotype-specific fasta files for each contig. Using these haplotypic sequences, we predicted genes and established the amino-acid sequences using Augustus (see ‘Annotation’ above). One hundred (94 with known homologues) of the 125 predicted genes in the inversion were identified by the trained Augustus algorithm established for gene prediction in the reference (Supplementary Table 6). We then aligned amino-acid sequences of all six haplotypes for these genes using the CLUSTALW algorithm (MEGA6[77] #6140220) and identified consistent morph-specific amino acid changes in 78 genes. Candidate gene predictions (Supplementary Table 7) were verified by comparing mRNA and BLAST evidence to available gene models suggested by Maker and Augustus. Where predictions conflicted, we chose the best supported model. We resolved the complex copy number variation, rearrangement and deletion in the Faeder haplotype of the PLCG2 locus by combining coverage maps with rearrangement-spanning read pairs.
Authors: Muhammad L Aslam; John W M Bastiaansen; Richard P M A Crooijmans; Addie Vereijken; Hendrik-Jan Megens; Martien A M Groenen Journal: BMC Genomics Date: 2010-11-20 Impact factor: 3.969
Authors: Devon E Pearse; Nicola J Barson; Torfinn Nome; Guangtu Gao; Matthew A Campbell; Alicia Abadía-Cardoso; Eric C Anderson; David E Rundio; Thomas H Williams; Kerry A Naish; Thomas Moen; Sixin Liu; Matthew Kent; Michel Moser; David R Minkley; Eric B Rondeau; Marine S O Brieuc; Simen Rød Sandve; Michael R Miller; Lucydalila Cedillo; Kobi Baruch; Alvaro G Hernandez; Gil Ben-Zvi; Doron Shem-Tov; Omer Barad; Kirill Kuzishchin; John Carlos Garza; Steven T Lindley; Ben F Koop; Gary H Thorgaard; Yniv Palti; Sigbjørn Lien Journal: Nat Ecol Evol Date: 2019-11-25 Impact factor: 15.460
Authors: Erik I Svensson; Stevan J Arnold; Reinhard Bürger; Katalin Csilléry; Jeremy Draghi; Jonathan M Henshaw; Adam G Jones; Stephen De Lisle; David A Marques; Katrina McGuigan; Monique N Simon; Anna Runemark Journal: Nat Ecol Evol Date: 2021-04-15 Impact factor: 15.460
Authors: Cuong Nguyen Huu; Barbara Keller; Elena Conti; Christian Kappel; Michael Lenhard Journal: Proc Natl Acad Sci U S A Date: 2020-08-31 Impact factor: 11.205
Authors: Donna L Maney; Jennifer R Merritt; Mackenzie R Prichard; Brent M Horton; Soojin V Yi Journal: Horm Behav Date: 2020-09-19 Impact factor: 3.587
Authors: Matthew B Toomey; Cristiana I Marques; Pedro Andrade; Pedro M Araújo; Stephen Sabatino; Małgorzata A Gazda; Sandra Afonso; Ricardo J Lopes; Joseph C Corbo; Miguel Carneiro Journal: Proc Biol Sci Date: 2018-10-03 Impact factor: 5.349