Literature DB >> 35100370

Genomic prediction of growth in a commercially, recreationally, and culturally important marine resource, the Australian snapper (Chrysophrys auratus).

Jonathan Sandoval-Castillo¹, Luciano B Beheregaray¹, Maren Wellenreuther^2,3.

Abstract

Growth is one of the most important traits of an organism. For exploited species, this trait has ecological and evolutionary consequences as well as economical and conservation significance. Rapid changes in growth rate associated with anthropogenic stressors have been reported for several marine fishes, but little is known about the genetic basis of growth traits in teleosts. We used reduced genome representation data and genome-wide association approaches to identify growth-related genetic variation in the commercially, recreationally, and culturally important Australian snapper (Chrysophrys auratus, Sparidae). Based on 17,490 high-quality single-nucleotide polymorphisms and 363 individuals representing extreme growth phenotypes from 15,000 fish of the same age and reared under identical conditions in a sea pen, we identified 100 unique candidates that were annotated to 51 proteins. We documented a complex polygenic nature of growth in the species that included several loci with small effects and a few loci with larger effects. Overall heritability was high (75.7%), reflected in the high accuracy of the genomic prediction for the phenotype (small vs large). Although the single-nucleotide polymorphisms were distributed across the genome, most candidates (60%) clustered on chromosome 16, which also explains the largest proportion of heritability (16.4%). This study demonstrates that reduced genome representation single-nucleotide polymorphisms and the right bioinformatic tools provide a cost-efficient approach to identify growth-related loci and to describe genomic architectures of complex quantitative traits. Our results help to inform captive aquaculture breeding programs and are of relevance to monitor growth-related evolutionary shifts in wild populations in response to anthropogenic pressures.

Entities: Chemical

Keywords: GenPred; aquaculture; ddRAD; ecological genomics; fisheries genomics; genome-wide association studies; genomic prediction; reduced genome representation; shared data resource; teleost

Mesh：

Year: 2022 PMID： 35100370 PMCID： PMC8896003 DOI： 10.1093/g3journal/jkac015

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.542

Introduction

Body size is considered one of the most important organismal traits because it influences several biological characteristics, from survivorship to fecundity (Peters 1986). It has significant ecological and evolutionary consequences, not just for the individual but for its population and community (Lorenzen 2016; Dijoux and Boukal 2021). Moreover, the somatic growth patterns of exploited fishes have a considerable role in the sustainability and economic viability of their fisheries stocks and aquaculture programs (Gjedrem et al. 2012; Murua et al. 2017; Barneche et al. 2018). These characteristics make growth and related traits major targets for selective breeding and human-induced evolutionary studies (Sharpe and Hendry 2009; Gjedrem et al. 2012; Denechaud et al. 2020; Huang et al. 2021). Teleost fish typically show indeterminate growth, but both the growth rate and growth-related plasticity are determined by the genetic makeup and the prevalent environmental conditions (De-Santis and Jerry 2007; Boltaña et al. 2017; Tung and Levin 2020). However, despite the general importance of growth, the genetic bases of growth traits in marine species are still not well understood. Anthropogenic activities are rapidly transforming environments, resulting in changes to the selection landscape that organisms to which are exposed, and consequently, phenotypic changes in response to altered selection pressures (Hendry et al. 2017; Laufkötter et al. 2020; Silvy et al. 2020). Fish growth rates can respond rapidly to external selective pressures by phenotypic plasticity or by changes in genotypic frequencies in the population (Bowles et al. 2020; Denechaud et al. 2020; Oke et al. 2020; Pinsky et al. 2021). Fish growth is also a quantitative trait that is easy to measure, and as such, can be used to assess physiological and demographic responses to human-mediated environmental changes (Denechaud et al. 2020). In addition, pressures from capture fisheries have produced one of the fastest rates of phenotypic change observed in wild populations (Oke et al. 2020), with effects not only to targeted species but also to associated communities (Andersen et al. 2016; Dijoux and Boukal 2021). However, size-selective harvesting and climatic effects could be species and system specific (Denechaud et al. 2020), and despite increasing concerns, the evolutionary consequences have so far been investigated in only a limited number of species. Thus, the extent to which changes to growth rate in response to human activities are evolutionary or plastic, and the degree to which are they reversible, remain debated (e.g. Pinsky et al. 2021). The idea of rapid evolutionary responses due to fisheries pressures and aquaculture practices is supported by both theoretical and empirical evidence. In theory, high rates of harvesting of a population will favor earlier sexual maturity and slower growth (Uusi‐Heikkilä et al. 2015; Monk et al. 2021). It is also expected that selective breeding programs increase the frequency of genetic variants underlying improved growth through the successive selective breeding of only the fastest growing individuals in each cohort. Experiments in the laboratory, where either large or small individuals are removed for several generations, have shown drastic changes in allele frequencies, loss of genetic diversity, and increased linkage disequilibrium in specific parts of the genome (Le Rouzic et al. 2020; Valenza‐Troubat et al. 2021a). Some genes and genomic regions associated with growth have been identified in a few marine fish species (Wang et al. 2015; Robledo et al. 2016; Wu et al. 2019; Zhou et al. 2019; Yang et al. 2020; Valenza‐Troubat et al. 2021c). However, because of the apparently polygenic nature and high phenotypic plasticity associated with the trait, the genetic basis of growth is not well understood, and its study is a persistent challenge (Wellenreuther and Hansson 2016; Gong et al. 2021; Valenza‐Troubat et al. 2021a). The combination of genotyping based on reduced genome representation data sets and genome-wide association studies (GWAS) potentially provides a cost-effective and convenient approach for accurate localization and identification of growth-related genomic regions in fishes. The Australian snapper (Chrysophrys auratus) is a large (∼1 m) and long-lived (>50 years) marine teleost (Fig. 1) that supports important recreational and commercial fisheries in Australia, and New Zealand (Parsons et al. 2014; Cartwright et al. 2021). The species is also of major cultural importance, having been caught by New Zealand’s indigenous Māori people for hundreds of years, and is considered taonga (a treasure) by them. The species stocks have large biomass, but since the early 2000s most have shown declines, with some losing over 75% of their biomass (Cartwright et al. 2021). These reductions in snapper productivity have negative economic and ecological consequences that could be amplified by global environmental change (Ogier et al. 2020; Parsons et al. 2020). A strategy for mitigating some of these negative effects, while enhancing wild snapper stock productivity, is the development of captive production. Although there is not yet commercial aquaculture of the species, research programs intending to develop Australian snapper into a profitable and sustainable aquaculture species exist in New Zealand and Australia (Catanach et al. 2019; Ashton et al. 2019a). The goal of our work is to contribute to both the long-term New Zealand efforts to develop this aquaculture activity, and the understanding of the genetic bases of growth-related traits in marine fish more generally. We took advantage of several resources available from the New Zealand snapper aquaculture program, including a chromosome-scale assembly (Catanach et al. 2019) and fish from the New Zealand snapper breeding program representing extreme growth phenotypes following a grow-out phase in a sea pen. Using these resources, we then applied reduced genome representation to conduct GWAS analyses to identify growth-related genetic variation in a cost-efficient way. In addition, we implemented a Bayesian mixture model to describe the genetic architecture of growth and to test the accuracy of size phenotype prediction based on single-nucleotide polymorphism (SNP) genotypes. This information will help to improve the selection of high-quality broodstock and the monitoring of human-induced evolution in wild stocks.

Fig. 1.

Juveniles of Australian snapper (Chrysophrys auratus) reared in captivity at The New Zealand Institute for Plant & Food Research Limited finfish facility.

Methods

Fish rearing and grow-out conditions in the sea pen

In 2016, a snapper breeding program was started in Nelson (New Zealand) with wild-caught animals at The New Zealand Institute for Plant and Food Research Limited (PFR). Uncontrolled mass spawning of a wild-caught broodstock at the PFR finfish facility at the Nelson Research Centre produced 2 batches of eggs that hatched on 2015 January 31 and February 7, respectively. Mass spawning is the typical method for most bream species, where group interactions are often necessary to induce spawning behavior, with all individuals being able to mate freely with other individuals in the broodstock tank population. These F1 snapper probably contained a diverse mix of related individuals including full-siblings, half-siblings, and unrelated individuals. Approximately 80,000 larvae were reared in 5,000-l tanks held at 21°C and supplied with a filtered, UV-sterilized supply of incoming water. First standard grading of animals occurred on 2015 May 1, whereupon approximately 54,000 fish were retained for on-growing. This reduction in fish numbers was also caused by natural mortality events common in larval rearing, particularly during the first few weeks, due to the lack of a fully functional immune system (Bricknell and Dalmo 2005), digestive system (Holt 2011), osmotic system, and compromised host microbe interaction (Vadstein et al. 2013) in young fish larvae. Routine grading was repeated every 2–6 weeks thereafter to ensure that fish have sufficient space and to maintain animal welfare. Smaller animals were pooled separately from larger individuals but were retained at the same ration and water temperatures. Overstocking was prevented by grading fish into additional 5,000-l tanks as required. The water supply of the tanks was retained at temperatures above environmental (ambient) temperature, set to a minimum of 16°C. When fish were around 180 days old, they were graded following standard husbandry procedures to remove fish with deformities and lesions. Following the grading, approximately 17,000 individuals of mixed sizes were moved to a 50,000-l, another 10,000 large individuals were retained in two 5,000-l tanks, and another 12,000 small individuals were separated into 2 additional 5,000-l tanks, totaling to 39,000 fish that were kept for further on growing in the facility. These fish movements into new tank space are routine procedures in fish farming aimed to provide fish with the necessary space for optimal grow out, where the separation of different size classes aims to avoid intercohort cannibalism and food competition. Following the movement into new tanks, all fish were retained on equivalent rations and at temperatures above ambient. Stocking densities were held below 25 kg/cm3 in all instances. Final grading of the cohort to again removed fish with deformities and lesions was performed in the week of 23 November 2015. Fish suitable for transport to the sea pen (i.e. fish in good condition, uninjured, and of normal conformation) were graded to a minimum of 100-mm fork length and hand counted, yielding 21,891 individuals. Transport to PFR’s sea pen in Beatrix Bay (Pelorus, Marlborough) took place on 2015 December 16, when fish were either 319 or 312 days old. On 2017 April 26–27, the snapper sea pen trial was ended following 17.5 months of on-growing. The mean size of snapper at harvest was 238-mm fork length, where sizes ranged between 139- and 301-mm fork lengths (Cook and Black 2017). Of these snappers, 200 of the smallest and 200 of the largest were selected by eye for this study, roughly falling within the 5% extremes of the size distribution (representing 5% of the smallest and 5% of the largest fish). We have deliberately opted for such a sampling design of extreme phenotypes because it improves the power of QTL studies without incrementing their false discovery rate. This also reduces the cost of genome scale variant calling, which could be prohibitively expensive (e.g. Li et al. 2019). This is indeed a standard approach in QTL study design, and is akin to the rationale used for the selection of recombinant inbreed lines for QTL mapping studies (for an exhaustive review of study designs for fish QTL studies in the genomics era see: Ashton et al. 2017). In this study, the extreme growth phenotypes were selected during harvesting of a sea pen snapper cohort that were on-grown in this environment for well over a year. This deliberately capitalizes on the power that comes with the application of an extreme study design, as applied in several investigations seeking to identify the genomic basis of phenotypic traits in plants (e.g. Yang et al. 2015) and fish (Houston et al. 2008; Wang et al. 2008; al. 2015).

DNA extraction and library preparation

Total genomic DNA was extracted from fin tissue samples collected from the 5% biggest (L) and the 5% smallest (S) individuals (200 for each size group), following the protocol described by Ashton et al. (2019). The concentration, purity, and integrity of the extractions were assessed using Qubit (Life Technologies), NanoDrop (Thermo Scientific), and 2% agarose electrophoresis gels, respectively. The 188 samples with the best quality for each size group, plus 8 replicates, were used for the preparation of 364 libraries using the reduced genome representation technique of ddRADseq (double-digest restriction site-associated DNA sequencing). We followed the protocol described by Peterson et al. (2012) with a few modifications as described by Sandoval-Castillo et al. (2018). For each sample, approximately 300 ng of DNA was digested with the restriction enzymes SbfI and MseI, and then ligated with forward and reverse adaptors, with forward adaptors including 1 of 96 individual barcodes designed in-house. Libraries were size selected for 250–800-bp fragments with a Pippin Prep (Sage Science), and then amplified using PCR. Individual libraries were mixed in equimolar concentration in 4 pools of 96 samples, and each pool was sequenced in an Illumina HiSeq 4000 lane (paired-end 150 bp) at Novogene Co.

Sequence quality checking and filtering

Sequence quality was checked using FASTQC 0.11 (Andrews 2011). Raw reads were then demultiplexed into individual samples using “process_radtags” from STACKS 1.5 (Catchen et al. 2013). Barcodes, rag-tags, bad quality base pairs (slide window 5, Q < 25), and adapters were trimmed using TRIMMOMATIC 3.9 (Bolger et al. 2014). Reads with more than 5% “Ns,” average quality lower than 20, and length shorter than 70 bp were also removed. SNPs were called using the GATK pipeline, following recommendations by Van der Auwera et al. (2013). We used BOWTIE 2.3 (Langmead and Salzberg 2012) to align filtered reads to the chromosome-level Australian snapper genome (Ashton et al. 2019b). We improved the alignments by locally realigning indels, then called SNPs using BCFtools 1.19 (Danecek et al. 2021). To reduce the number of false variants produced as inherent artifacts of the ddRAD sequencing approach (e.g. paralogous, sequencing errors, allele drop), several filtering steps were applied to the raw, called SNPs (Supplementary Table 1) using VCFTOOLS 0.1.16 (Danecek et al. 2011). First, we removed SNPs called in <80% of the samples, with a minimum allele frequency <3%, or with allele balance <20% or >80% on heterozygote genotypes. After removing indels, and to avoid sequencing or misalignment errors, we selected SNPs with ≥20% depth/quality ratio and ≥30 mapping quality (Li 2014). Additionally, to reduce paralogous loci and allelic dropout, we eliminated SNPs with too high or too low coverage (>mean depth plus twice the SD, or ≤10× average depth per individual). Loci that did not conform to Hardy Weinberg Equilibrium in samples of both size groups, missing in more than 5% of samples of either of the size groups, or that showed discrepancy between 2 or more replicates, were also removed. For some analyses, we created an unlinked data set by pruning SNPs using a window size of 100 kb, a variance inflation factor of 5 and an R2 of 0.5 in Plink 1.9 (Chang et al. 2015). As recommended for GWAS analyses, we removed individual that deviate ±3SD from the average heterozygosity (Marees et al. 2018). Because inbreeding and relatedness can affect GWAS analyses, we calculated the individual inbreeding coefficient (FIS) in Plink 1.9 and pairwise relatedness in VCFTOOLS using the unlinked SNP data set (11,588). We use the algorithm implemented in Plink that is based on the relative observed and expected homozygosity under Hardy–Weinberg equilibrium. We used the formula F = (O#hom − E#hom)/(1 − E#hom). Where O#hom and E#hom are the observed and expected number of homozygous genotypes in the sample, respectively. We then compared the average FIS and relatedness of each size group using a nonparametric Mann–Whitney U test. Age, sex, and population structure are factors that can potentially influence GWAS analyses, but given all fish used in this study were from the same population and age class and were immature at the time of the terminal sampling, these factors were considered not to bias the GWAS analyses. With the aim of identifying SNPs involved in growth-related traits, we compared the allele frequencies between size groups using 3 different methods, each of which implement a different approach to control for relatedness and possible inbreeding. Firstly, based on identical-by-state pairwise genetic distances, we used complete linkage agglomerative clustering to create multidimensional scaling plots (MDS). Then, to control for family group substructure, we used MDS components as covariables in a logistic regression implemented in Plink and calculated asymptotic P-values to test for significance. This single-locus approach tests for odds of the minor allele frequency of each SNP being associated with either phenotype (in this case large or small fish), when adjusting for potential confounders. Secondly, a pairwise kinship matrix was used as a covariable in a linear polygenic mixed model. The significance of the model was calculated using the FAmily-based Score Test for Association, all implemented in the R package GenABEL (Aulchenko et al. 2007). Thirdly, the same pairwise kinship matrix was used as a covariable in a mixed linear model, but the significance was tested using a combination of a sequence kernel association test and a likelihood-ratio (LR) test, comparing a model with association against a model of nonassociation; these approaches were implemented using the R package rainbowR (Hamazaki and Iwata 2020). Like GenABEL’s model, rainbowR tests for a significant contribution of a polygenic component in the total genetic variance between size groups, when considering a random component and multiple covariance component effects. The main difference is implementation of LR in rainbowR, which is expected to better control for false positives (Hamazaki and Iwata 2020). These analyses were based on single-SNP effects, but considering that gene function could be more commonly influenced by the small effects of several SNPs combined than the large effect of 1 SNP, haplotype-based approaches can improve the detection power of causal variance (Hamazaki and Iwata 2020; Sinclair-Waters et al. 2020). We therefore implemented 2 different methods to calculate haplotype blocks in rainbowR. One method considers pairs of variants within 200 kb of each other that show high coefficients of linkage disequilibrium (DlowerCI90 > 0.7 and DhigherCI90 > 0.98), as implemented in Plink (Chang et al. 2015). The other method uses a sliding window of 15 to cluster the closest 15 SNPs, implemented directly in rainbowR (Hamazaki and Iwata 2020). For each GWAS analysis, we calculated a significant P-values threshold following Brzyski et al. (2017). For this, we clustered the number of tests (SNPs or Haplotypes) by performing a LD clumping with a threshold r2 > 0.1 at maximum distance of 1 Mb. Then, the best P-values per cluster was selected and the maximum P-value significant at 5% false discovery rate was used as threshold. In addition to investigating the genetic architecture of growth in Australian snapper, we examined the proportion of size variance explained by each SNP and chromosome using the package bayesR (Moser et al. 2015) with default priors. This software estimates the genetic variance explained by each SNP and describes the genetic architecture of complex traits using Markov chain Monte Carlo and Bayesian mixture models. We calculated SNP-based heritability () as the proportion of phenotypic variance explained by all SNPs and considered this an approximation of the genome narrow-sense heritability (). Accuracy of genomic prediction was evaluated using 10 runs of bayesR, in each of which 80% of samples were randomly selected as training data and the remaining 20% was used as validation data. Since we used size as a categorical variable, the accuracy was measured in the validation sets as the beta coefficient of a logistic regression (predicted phenotype as a function of real phenotype) divided by the square root of heritability (). To explore the function of the candidate SNPs, we clustered SNPs in sliding windows of 10 kb and extracted the sequence of each cluster from the reference genome. These sequences were then aligned to a Teleostei protein database obtained from UniProt (2021), using blastx 2.2.28 (Camacho et al. 2009) with a minimum alignment length of 30 amino acids, a similarity score of 50%, and an e-value threshold of 1×10−06. Enrichment of genetic functions on the putative candidate genes was explored using the gene ontology (GO) terms provided by UniProt and the R package topGO 2.26 (Alexa et al. 2016). Pathways and protein interaction analyses were performed using the web servers String 11 (Szklarczyk et al. 2021) and reactome 3.7 (Jassal et al. 2020).

Results

Approximately 3.6 billion raw reads were obtained from the 4 Illumina lanes. After demultiplexing, trimming, and filtering, we retained over 3.5 million pairs of reads per sample, from which >83% were successfully aligned to the snapper reference genome. From these, 1,273,017 variable sites were identified. After strict filtering, the data set was reduced to 17,490 high-quality SNPs, which were evenly distributed among the 24 chromosomes of the reference genome (Supplementary Table 2). After removing replicates and 12 samples (5 large and 8 small fish) owing to large amounts of missing data or extreme heterozygosity, 363 individuals (183 large and 180 small) were retained, with an average of 0.8% missing data. The mean inbreeding coefficient was slightly higher in the small fish group (large =−0.1225 ± 0.1990; small=−0.0486 ± 0.1805, P< 0.01). However, all individuals in both groups had inbreeding coefficients lower than 0.15, with the average per group lower than 0. As such, we considered the extent of inbreeding to be low in both groups. Average relatedness was not significantly different within compared with between the 2 size groups (large = 0.0089 ± 0.1320; small = 0.0098 ± 0.1170; large vs small = −0.0071 ± 0.1353; all P> 0.1). The 3 single-SNP-based methods identified 31 candidate loci under selection between large and small fish. Of these 9, 11, and 12 were selected by Plink, GenABEL, and rainbowR, respectively (Fig. 2). None of these was identified by 2 or more tests; however, these 31 candidates were in 10 chromosomes and 1 scaffold, and most of them (18 loci; 58%) were in physical proximity to 1 region of chromosome 16 (Fig. 2). The haplotype-based methods identified 77 SNPs in 7 unique blocks as candidates, with 3 of these blocks identified by the 2 approaches (Fig. 3). The 7 blocks were localized on 5 chromosomes and 1 scaffold, but more (3 blocks; 43%) were found on chromosome 16 than on any other chromosome (Fig. 3). All these results suggest a key role of chromosome 16 on growth in the Australian snapper. This was concordant with the bayesR results, which found that the SNPs on chromosome 16 explained 16.4% of the size variance in our samples (Fig. 4 and Supplementary Table 2). Interestingly, this analysis suggested that a significant amount of growth variance in the Australian snapper can be explained by large effects of a relatively small number of SNPs (60 SNPs; Supplementary Table 3). However, to explain as much as 76% of the phenotypic variation, small effects of many loci need to be considered (2,102 SNPs; Supplementary Table 3). Based on the 10 repetitions of crossvalidation, the phenotype prediction accuracy was considerably high (74.4%), but expected given that the set of SNPs can explain most of the phenotypic variation in our data.

Fig. 2.

Fig. 3.

Manhattan plot of results from the 2 haplotype-based genome-wide association analyses of growth on 363 Australian snapper (Chrysophrys auratus) using 17,490 SNPs in 1,337 blocks and 8,201 SNPs in 2,959 windows. Haplotypes are plotted according to their chromosomal position against their −log 10 (P-value). Significant haplotypes are the diamonds over the horizontal line (−log 10(P)>3).

Fig. 4.

SNP-based heritability () of growth in Australian snapper (Chrysophrys auratus) estimated from Bayesian mixture models using 17,490 SNPs in 24 chromosomes. a) Convergence of MCMC sampling with average heritability ( = 0.757) shown by the red horizontal line. Proportion of heritability or genetic variance of growth explained by each chromosome as a function of b) chromosome size in megabase pairs and c) number of SNPs per chromosome.

Manhattan plot of results from the 3 single-SNP-based genome-wide association analyses of growth for 363 Australian snapper (Chrysophrys auratus) using 17,490 SNPs. SNPs are plotted according to their chromosomal position against their −log 10 (P-value). Significant SNPs are the diamonds over the horizontal line (−log 10(P)>4). Manhattan plot of results from the 2 haplotype-based genome-wide association analyses of growth on 363 Australian snapper (Chrysophrys auratus) using 17,490 SNPs in 1,337 blocks and 8,201 SNPs in 2,959 windows. Haplotypes are plotted according to their chromosomal position against their −log 10 (P-value). Significant haplotypes are the diamonds over the horizontal line (−log 10(P)>3). SNP-based heritability () of growth in Australian snapper (Chrysophrys auratus) estimated from Bayesian mixture models using 17,490 SNPs in 24 chromosomes. a) Convergence of MCMC sampling with average heritability ( = 0.757) shown by the red horizontal line. Proportion of heritability or genetic variance of growth explained by each chromosome as a function of b) chromosome size in megabase pairs and c) number of SNPs per chromosome. From both single-SNP and haplotype block approaches, 100 unique candidates were identified. We extracted 65 contigs of 10 kb that included all unique candidate SNPs. Of these contigs, 57 were annotated to 51 proteins, where 16 SNPs were localized in exon regions. The proteins were assigned to 153 GO terms, but none of the functions or pathways were significantly enriched (P = 0.18; results not shown). Despite this, most of these genes were involved in the metabolism of proteins, cell organization, and tissue development, including 5 directly involved in growth (Supplementary Table 4).

Discussion

In this study, we successfully identified 100 growth-related SNPs, of which 60 were located on chromosome 16. We also provide estimates of parameters related to the genomic architecture of growth, including heritability, and the number and effect sizes of SNPs contributing to size variation. These results demonstrate that the reduced genome representation SNPs, combined with appropriate bioinformatic tools, provide a cost-efficient approach to identify growth-related loci and to broadly describe complex trait genomic architectures. Our results provide crucial information to aquaculture breeding programs about the targets of selection to enhance growth in this and related species. Knowledge about the genomic basis of growth also improves our general understanding of the impacts of anthropogenic stressors, such as climate change and extractive fisheries, on growth trait evolution in wild fisheries stocks.

Genome architecture of a complex growth trait

Several studies have identified genes with large effects underlying simple Mendelian traits, but because of the statistical limitations of single locus GWAS approaches, they have been less successful with polygenetic quantitative traits like growth or size (Wellenreuther and Hansson 2016). Perhaps the most famous example of this is human height, where even with large samples sizes (∼250,000 people), <20% of variation can be explained using genome-wide markers (Yengo et al. 2018; Sohail et al. 2019). Akin to many other nonhuman studies, our study relied on a relatively small sample size, which reduced the power to detect loci of medium or small effect. However, by transforming the quantitative trait to a qualitative trait (big vs small) by comparing extreme phenotypes representing the 5% extremes of the phenotypic variation, and by using both haplotype-based and Bayesian mixture model approaches, we not only identified several growth-related loci, but also confirmed the polygenetic nature of the trait in the Australian snapper. These growth-related genes were detected in F1 obtained from a wild broodstock thus indicates that these growth genes are segregating in wild snapper populations. Wild snapper populations in New Zealand show high genetic diversity (Papa et al. 2021), which is common for many marine teleost species with large population sizes (Nielsen et al. 2009; Koot et al. 2021). The snapper cohort used here was obtained following mass spawning of wild broodstock. While our on-growing of snapper resulted in natural mortality and only retained healthy individuals (e.g. grading removed deformities and fish with lesions) of the starting 80,000 strong cohort, our study population contained high genome-wide diversity (observed heterozygosity for the 363 graded snapper was 0.307), providing sufficient diversity for our GWAS. As such, our study provides a clear example of how switching the focus away from single significant loci, to using information from all genotyped SNPs, can improve the statistical power to detect polygenic components of important phenotypic variants, including some regions with small but significant effects. Our results suggest that a relatively small number of SNPs (<20K) is enough to explain most of the phenotypic variation related to growth. The overall heritability was generally higher than those reported in other studies (but see Valenza‐Troubat et al. 2021a), however, the large heritability estimate is likely confounded upwards due to the well-known inflation of allele substitution effects and heritability estimates when using selective genotyping approaches (Falconer and Mackay 1996). Moreover, the 100 candidate SNPs explain similar phenotypic variation (4.4%) to those in other fish studies (0.1–19.0%; Wu et al. 2019; Gong et al. 2021), including a quantitative trait loci analysis of the same species (Ashton et al. 2019b). As expected in complex traits like growth, individual candidate effects were small and multiple candidates were required to explain most of the phenotypic variation (Wellenreuther and Hansson 2016; Sinclair-Waters et al. 2020). However, only 4 candidates in the last 100 kb of chromosome 16 could explain up to 3% of the total phenotypic variation. In addition, if we consider all SNPs in this chromosome, the explanatory power is elevated to over 16%. The variation explained by the markers on chromosome 16 is very high and could suggest physical linkage of genetic variation associated with growth. This is consistent with empirical examples of evolution of multiple linked variations that together modify the function of a gene or a complex of genes (Koshikawa et al. 2015; Kingman et al. 2021). This type of evolution promotes the formation of tightly linked haplotype blocks, allowing for the selection and inheritance of multiple sites with effect over several aspects of a complex trait, such as growth. Haplotype-based evolution of polygenic regulation of growth could explain the lack of overlapping candidates from different analyses, such lack of concordance is expected when compare different GWAS algorithms (Manunza et al. 2014; Dong et al. 2016). The statistical approach of each method will capture loci with a particular degree of effect, and therefore only a fraction of the complex polygenic architecture. Such architecture would be more likely to include a mixture of loci with large, medium, small, pleiotropic, and synergic effects (Wellenreuther and Hansson 2016; Ashraf et al. 2020; Sinclair-Waters et al. 2020). Although the selected candidate SNPs differed between methods, all methods selected at least 1 candidate SNP within the last 5% of chromosome 16’s length. This chromosome region is broadly consistent with putative growth-related quantitative trait loci peaks identified for 3 growth traits in 1-year-old Australian snapper reared in a land-based finfish facility (Fig. 5 in Ashton et al. 2019b), and several candidate genes on chromosome 16 were also detected in a study investigating domestication selection following selective breeding for improved growth in this species (Baesjou and Wellenreuther 2021). In the study by Ashton et al. (2019b), chromosome-level additive variance was not calculated; however, their results also suggested that chromosome 16 tended to explain most of the phenotypic variation. Here, neither chromosome length nor SNPs per chromosome was significantly correlated with the proportion of variation explained (Fig. 4). This suggests that the high relative contribution of chromosome 16 is not just an artifact of our reduced genome representation approach, but an important part of the genetic architecture of growth in Australian snapper. The results from this study being in part consistent with those by Ashton et al. (2019) also shows that the genomic basis of growth is comparable across different rearing environments (land-based finfish facilities vs a sea pen). Chromosome 6 showed the second largest number of candidates, but the SNPs in this chromosome explained <4% of the phenotypic variation. In contrast, chromosomes 1 and 22 each explain over 8% of the phenotypic variation, despite chromosome 1 having a similar number of candidates and chromosome 22 having only a few candidates compared with chromosome 6. This is additional evidence for the very complex genomic architecture of growth, which is composed by a mixture of several loci with small effects detected by the haplotype approaches on chromosome 6, and a few loci with larger effects detected by single-SNP approaches, especially on chromosome 22. Thus, it is important to combine conceptually different approaches to extract information from all markers, to gain a better understanding of the genetic components of phenotypic variation (de Maturana et al. 2014; Akond et al. 2021; Gong et al. 2021).

Candidate genes

The high-quality genome assembly made it possible to identify 51 genes surrounding the 100 SNP candidates (Supplementary Table 4). Although most of these SNPs were in introns, several were in the proximity of, or inside, exons of the relevant candidate gene. Nevertheless, we are cautious about inferred associations. Previous studies have suggested several candidate genes and pathways for teleost growth-related traits (De-Santis and Jerry 2007; Johnston et al. 2011; Baesjou and Wellenreuther 2021). Of these, the somatotropic axis is perhaps the most important, since it plays a central role in the regulation of metabolic and physiological processes involved in fish growth (De-Santis and Jerry 2007). Although key genes in this axis include the growth hormone (Irving et al. 2021) and insulin-like growth factors, several other factors, carriers, and receptors are implicated in this pathway. Several of our candidate SNPs were near the genes from the somatotropic axis, including the fibroblast growth factor 18 (FGF18) and growth factor 5 (GDF5). Both genes are essential regulators of cell growth, cell differentiation, morphogenesis, tissue growth, and tissue repair, especially during skeleton development (Jovelin et al. 2010; Nagayama et al. 2013), potentially affecting multiple traits. Concordantly, FGF18 has been associated with growth variation in large yellow croaker, Larimichthys crocea (Zhou et al. 2019). Two additional candidate genes detected, nucleophosmin 3 (NPM3) and transmembrane protein 132E (TMEM132E); both associated with the regulation of growth factors. NPM3 is predicted to be a histone chaperone protein modulating replication of DNA and gene expression during development (Wu et al. 2009). Because of proximity, it coactivates with the fibroblast growth factor 8 gene, and both regulate development, cell growth, and proliferation in vertebrates (Kikuta et al. 2007). TMEM132E is a component of the cell membrane and is implicated in the regulation and transport of insulin-like growth factors. This gene is also involved in the posterior lateral line (Li et al. 2015) which, as a mechano-sensory organ, has physiologically demanding growth requirements (Sapède et al. 2002). Biosynthesis and retention of proteins is fundamental for tissue development, and the efficiency of these processes determines growth rate (Fraser and Rogers 2007). Seven of our candidates are directly involved in metabolism of proteins (Supplementary Table 4), 6 of which are on chromosome 16. Of these, the prolyl 3-hydroxylase 1 (P3H1) and the serpin peptidase inhibitor clade H (SERPINH1) are notable in fishes for their implication in biosynthesis of collagen and assembly of collagen fibrils (Oecal et al. 2016; Tonelli et al. 2020). Collagen is essential for the structure and strength of bones, skin, muscle, and cartilage tissues, and therefore integrity during somatic growth (Fraser and Rogers 2007). Moreover, collagen metabolic processes have been associated with changes in growth rates and responses to temperature increments in the Australian snapper (Wellenreuther et al. 2019). Hence, variation in P3H1 and SERPINH1 could have a direct effect on the collagen metabolism efficiency and thus growth rate in this species. Other important processes for tissue growth include cell proliferation, differentiation, migration, and adhesion. From our candidates, 7 genes were associated with these cell processes (Supplementary Table 4), including laminin subunit alpha 3 (LAMA3), related extracellular matrix protein 2 (FREM2), and periostin (OSF2). These 3 genes have a regulatory function in cell migration and adhesion, and are involved in organ morphogenesis and tissue development of integuments, spinal cord, brain, eyes (Sztal et al. 2011), pharynx, fins (Carney et al. 2010), and the skeleton of fishes (Kessels et al. 2014). Finally, small difference in expression levels of specific genes have been shown to impact growth rate and metabolic efficiency (Fraser and Rogers 2007; Wellenreuther et al. 2019). Thus, variation in genes that act as transcript regularity factors can have important biological consequences. We detected several genes involved in gene regulation (see Supplementary Table 4). Faster growth can be achieved by reducing energetic cost of protein metabolism, increasing transcription regulation adeptness, and cell migration and adhesion efficiency, which together can make a larger proportion of energy and cell proliferation available for growth. We consider all the above-described genes as of potential importance in the growth patterns of not only Australian snapper but perhaps of other teleosts.

Applied relevance of genomic prediction of growth

Animal breeding has a long history and the underpinning science has been developed over decades, with significant work coming from the field of quantitative genetics (Falconer and Mackay 1996), as well as population genetics (Fisher 1918, 1930). The application of this to improve selective breeding in aquaculture has been revolutionized with the advent of high-throughput sequencing (Houston et al. 2020), and since then studies into the genomic basis of economically important traits have flourished (Valenza-Troubat et al. 2021a; Houston et al. 2020) and increasing efforts are now underway to disentangle how these genes are differentially expressed (Valenza-Troubat et al. 2021b). The most important contribution of our results related to the genomic basis of growth is the potential to inform individual breeding values in terms of growth efficiency. This information has direct applications to selective breeding programs of commercially important species, such as Australian snapper, but also for the breeding programs of the gilthead sea bream (Sparus aurata) in the Mediterranean Sea, and the red sea bream (Pagrus major) in Japan, which reside in the same family. The increment of growth rate and its energetic efficiency can directly reduce production time and cost, leading to higher economic returns (Ye et al. 2017; Valenza‐Troubat et al. 2021a). However, the complexity and polygenetic nature of commonly involved traits has made it difficult to implement efficient genomic selective breeding programs (Goddard and Hayes 2009; Ashton et al. 2019b). Here, we partially dissected the genetic architecture of growth and, in doing so, we have provided a set of SNPs that can assist genome selection of elite broodstock lines for commercial breeding programs. Our results show that the use of reduced genome representation is sufficient to estimate breeding values (i.e. to predict phenotypes) with relative high accuracy even without pedigree information (Supplementary Table 3 and Supplementary Fig. 1). Owing to the inclusion of most of the causal mutations, and decreased limitation due to linkage between SNPs and causal mutations, the use of whole-genome resequencing (WGS) could increase predictive ability. However, empirical results comparing the use of WGS vs a set of SNPs show nil to marginal increase in genomic prediction accuracy (Lu et al. 2020; Yoshida and Yáñez 2021). This suggests WGS is unnecessary for this purpose and supports our suggestion that reduced genome representation provides accurate genomic prediction estimates to assist in the selection of elite broodstock lines. For systems where hundreds or thousands of samples may need to be evaluated over time, such as for Australian snapper aquaculture, this has significant cost implications. Recent advancements in the development of cost-effective genotyping technologies, such as SNP chips, are being developed for this, and related species (Montanari et al. 2022), and will prove crucial in enabling the cost-effective assessment of commercially relevant SNPs. Our data have the potential to determine individual breeding values, but this can be extended to the population level, with broader evolutionary and ecological implications. Estimating a genotype value and its frequency in a population could make it possible to predict how the population will respond to a future environmental event in respect to the phenotypic trait in question (Hunter et al. 2021; McGaugh et al. 2021). This can be an important tool in the restocking of heavily affected populations, either by climatic change or by overfishing. There are several factors affecting growth in marine organisms, but one of the most important abiotic factors is temperature (Besson et al. 2016; Boltaña et al. 2017; Wellenreuther et al. 2019). In addition, many fish populations are under heavy exploitation, and there is strong evidence of selective harvesting gradually reducing the genetic potential for somatic growth in the population (Enberg et al. 2009; Denechaud et al. 2020), and thus magnifying climatic change impacts (Morrongiello et al. 2021; Wootton et al. 2021). Small changes in growth rates within a population can not only influence individual fitness but can also cause long-lasting shifts in population characteristics and demographic dynamics, including a reduction in fecundity, survival, and recruitment rates (Lorenzen 2016; Denechaud et al. 2020). For most marine fishes, mortality is considerably higher during early-life stages. At that time, individual phenotypes influence the probability of survival, and both field and laboratory research have shown this effect to be size dependent (Johnson et al. 2014). Moreover, reproductive success is often broken into 2 components: reproductive potential and offspring survival, and in many marine fish species both components are strongly related to body size. Because female fish retain their oocytes internally during their development, maximum reproductive output will be subject to body size constraints (Lambert 2008; Ohlberger et al. 2020). Alternatively, reduced body size can result in small eggs that maximize the number produced, but with a significant reduction of offspring survival (Einum and Fleming 2000; Wootton et al. 2021). Since large organisms have relatively high survival probabilities and reproductive success, it is expected that by predicting the size composition of a population, we can determine its average survival and recruitment dynamics (Garrido et al. 2015). In other words, introducing size-enhanced genotypes to wild populations to modify natural size distributions could in turn increase the resilience to both climatic change and harvesting. Thus, size genomic prediction can be a valuable tool for fishery and conservation management, not only for Australian snapper, but also for other marine fishes. Using breeding values to predict population response has potential complications, such as failure to incorporate the complexity of factors and uncertainty involved in trait measurement (Hadfield et al. 2010). A wide range of factors affects growth in marine fishes, such as temperature (Boltaña et al. 2017; Wellenreuther et al. 2019), nutritional state (Escalante-Rojas et al. 2020), and intra- and interspecies interactions (Mallard et al. 2020; Korman et al. 2021). Although these factors can induce phenotypic plastic changes to growth, there are also important genetic components in response to these factors (Wellenreuther et al. 2019; Escalante-Rojas et al. 2020). Adaptive variation in our candidate genes that regulate expression and synthesis of stress-related proteins, such as SERPINH1 and PRKAG2 (Wang et al. 2016; Causey et al. 2019), can potentially constrain these responses in Australian snapper and, therefore, the effects of external factors. In addition, Bayesian approaches, such as the 1 used here, can integrate the effects of unknown factors and uncertainty involved in genomic predictions in an efficient way (Ashraf et al. 2020). This suggests that our genomic predictions provide a reliable way to measure individual breeding values of Australian snapper from both wild and captive populations. This study confirms that, despite the complex polygenetic architecture of growth in Australian snapper, reduced genome representation combined with a mix of bioinformatic approaches can detect candidate genes relevant to a quantitative trait. This opens the possibility of using Bayesian genomic prediction frameworks to measure individual breeding values for growth rates. This information is expected to assist the selective breeding programs for this and related species, and can be used to provide insights into growth changes experienced by wild populations following the exposure of anthrophonic stressors, such as climate change and fishing.

Data availability

As the genomic data of this species are from a taonga and thus culturally important species in Aotearoa New Zealand, the data have been deposited in a managed repository that controls access. Raw and analyzed data are available through the Genomics Aotearoa data repository at https://repo.data.nesi.org.nz/. This was done to recognize Māori as important partners in science and innovation and as intergenerational guardians of significant natural resources and indigenous knowledge. Supplemental material is available at G3 online. Click here for additional data file.

97 in total

Review 1. Harnessing genomics to fast-track genetic improvement in aquaculture.

Authors: Ross D Houston; Tim P Bean; Daniel J Macqueen; Manu Kumar Gundappa; Ye Hwa Jin; Tom L Jenkins; Sarah Louise C Selly; Samuel A M Martin; Jamie R Stevens; Eduarda M Santos; Andrew Davie; Diego Robledo
Journal: Nat Rev Genet Date: 2020-04-16 Impact factor: 53.242

Review 2. Extra-genomic instructive influences in morphogenesis: A review of external signals that regulate growth and form.

Authors: Angela Tung; Michael Levin
Journal: Dev Biol Date: 2020-01-22 Impact factor: 3.582

3. Identification and verification of QTL associated with growth traits in two genetic backgrounds of Barramundi (Lates calcarifer).

Authors: C M Wang; L C Lo; F Feng; Z Y Zhu; G H Yue
Journal: Anim Genet Date: 2007-12-12 Impact factor: 3.169

4. Cell migration in the postembryonic development of the fish lateral line.

Authors: Dora Sapède; Nicolas Gompel; Christine Dambly-Chaudière; Alain Ghysen
Journal: Development Date: 2002-02 Impact factor: 6.868

5. High-impact marine heatwaves attributable to human-induced global warming.

Authors: Charlotte Laufkötter; Jakob Zscheischler; Thomas L Frölicher
Journal: Science Date: 2020-09-25 Impact factor: 63.714

6. The evolutionary legacy of size-selective harvesting extends from genes to populations.

Authors: Silva Uusi-Heikkilä; Andrew R Whiteley; Anna Kuparinen; Shuichi Matsumura; Paul A Venturelli; Christian Wolter; Jon Slate; Craig R Primmer; Thomas Meinelt; Shaun S Killen; David Bierbach; Giovanni Polverino; Arne Ludwig; Robert Arlinghaus
Journal: Evol Appl Date: 2015-05-27 Impact factor: 5.183

7. Proteomics analysis of the zebrafish skeletal extracellular matrix.

Authors: Maurijn Y Kessels; Leonie F A Huitema; Sjef Boeren; Sander Kranenbarg; Stefan Schulte-Merker; Johan L van Leeuwen; Sacco C de Vries
Journal: PLoS One Date: 2014-03-07 Impact factor: 3.240

8. Extreme sampling design in genetic association mapping of quantitative trait loci using balanced and unbalanced case-control samples.

Authors: Yi Li; Orna Levran; JongJoo Kim; Tiejun Zhang; Xingdong Chen; Chen Suo
Journal: Sci Rep Date: 2019-10-29 Impact factor: 4.379

9. Prediction of genomic breeding values based on pre-selected SNPs using ssGBLUP, WssGBLUP and BayesB for Edwardsiellosis resistance in Japanese flounder.

Authors: Sheng Lu; Yang Liu; Xijiang Yu; Yangzhen Li; Yingming Yang; Min Wei; Qian Zhou; Jie Wang; Yingping Zhang; Weiwei Zheng; Songlin Chen
Journal: Genet Sel Evol Date: 2020-08-18 Impact factor: 4.297

10. Unraveling the complex genetic basis of growth in New Zealand silver trevally (Pseudocaranx georgianus).

Authors: Noemie Valenza-Troubat; Sara Montanari; Peter Ritchie; Maren Wellenreuther
Journal: G3 (Bethesda) Date: 2022-03-04 Impact factor: 3.542

1 in total

1. The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus.

Authors: Mike Ruigrok; Bing Xue; Andrew Catanach; Mengjie Zhang; Linley Jesson; Marcus Davy; Maren Wellenreuther
Journal: Genes (Basel) Date: 2022-06-23 Impact factor: 4.141

1 in total