Literature DB >> 26000771

Genome-wide analysis of positively selected genes in seasonal and non-seasonal breeding species.

Yuhuan Meng1, Wenlu Zhang1, Jinghui Zhou1, Mingyu Liu1, Junhui Chen1, Shuai Tian1, Min Zhuo1, Yu Zhang2, Yang Zhong3, Hongli Du1, Xiaoning Wang4.   

Abstract

Some mammals breed throughout the year, while others breed only at certain times of year. These differences in reproductive behavior can be explained by evolution. We identified positively-selected genes in two sets of species with different degrees of relatedness including seasonal and non-seasonal breeding species, using branch-site models. After stringent filtering by sum of pairs scoring, we revealed that more genes underwent positive selection in seasonal compared with non-seasonal breeding species. Positively-selected genes were verified by cDNA mapping of the positive sites with the corresponding cDNA sequences. The design of the evolutionary analysis can effectively lower the false-positive rate and thus identify valid positive genes. Validated, positively-selected genes, including CGA, DNAH1, INVS, and CD151, were related to reproductive behaviors such as spermatogenesis and cell proliferation in non-seasonal breeding species. Genes in seasonal breeding species, including THRAP3, TH1L, and CMTM6, may be related to the evolution of sperm and the circadian rhythm system. Identification of these positively-selected genes might help to identify the molecular mechanisms underlying seasonal and non-seasonal reproductive behaviors.

Entities:  

Mesh:

Year:  2015        PMID: 26000771      PMCID: PMC4441472          DOI: 10.1371/journal.pone.0126736

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The environment can influence gene evolution and thus animal behaviors, including reproduction-related behaviors. Some mammals can breed throughout the year, while others only breed successfully at certain times of year. Such animals are defined as non-seasonal and seasonal breeding species, respectively. Day length, temperature, and food supply can all influence the reproductive behavior of seasonal breeding species and subsequent survival of offspring [1]; if they breed too early, the growing offspring may be exposed to low temperatures and scarce resources, whereas late breeding limits the time available for reproductive behaviors and preparation for the following winter. Accurate timing is therefore an essential component of life-history strategies for organisms living in seasonal environments [2]. The different reproductive behaviors of seasonal and non-seasonal breeding species may result from natural selection pressures [3]. Both strategies benefit the respective species to survive by adaption of their breeding behaviors to the environment through their long evolutionary histories. Whole genome-wide analysis of genes that are positively selected in mammal lineages using the respective breeding strategies may help us to understand the mechanisms responsible for the divergent reproductive behaviors as a result of adaptive evolution. Positive Darwinian selection of protein-coding genes is a major driving force for detecting adaptive evolution and species diversification. The modified version of the branch-site test (Model A) [4, 5] was designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues in particular lineages. This test has been shown to be a reasonably powerful tool, and has been widely used to investigate the adaptive evolution of genes in many species [6-8]. However, alignment errors may influence the results of branch-site gene analysis in mammalian and vertebrate species. It is therefore necessary to use reliable alignment methods to reduce the incidence of false-positive results [9]. Although the aligner software PRANK [10, 11] cannot eliminate false-positive results, it is nonetheless more powerful than other aligners [9, 12] such as MUSCLE [13] and ClustalW [14]. In addition to misalignments in multiple sequences, other factors such as sequence errors, misassembly, and annotation mistakes also increase the incidence of falsely-identified positive selection [15, 16]. More stringent filters are needed to ensure that branch-site analysis has a low and acceptable false positive rate. In this genome-wide study, we investigated the evolution of seasonal breeding strategies by identifying positively-selected genes in non-seasonal and seasonal breeding species using modified branch-site models. We established Distant-Species and Close-Species sets, each of which included seasonal and non-seasonal groups. We then identified positively-selected genes in these groups. PRANK (codon) software was used to align all the gene orthologs in the two gene sets. However, because PRANK generates a relatively high false-positive rate with the branch-site model, stringent filtering using sum of pairs (SP) [17, 18] scoring was used to remove potentially unreliable alignments generated by multiple sequence alignments. Sequence errors, misassembly, and annotation mistakes were also detected by cDNA mapping. Functional analysis of genes identified as positively-selected after this stringent filtering process might help us to understand the molecular mechanisms that determine non-seasonal and seasonal breeding.

Materials and Methods

Materials preparation

Five non-seasonal breeding species and five seasonal breeding species were chosen as the Distant-Species set. The five non-seasonal species included: human (Homo sapiens, GRCh37), chimpanzee (Pan troglodytes, CHIMP2.1) [19], cynomolgus monkey (or crab-eating macaque, Macaca fascicularis) [20], mouse (Mus musculus, NCBIM37) [21] and rat (Rattus norvegicus, RGSC3.4) [22]. The five seasonal breeding species were Indian rhesus monkey (Macaca mulatta, MMUL_1) [23], Chinese rhesus monkey (M. mulatta lasiota, CR) [24], dog (Canis familiaris, BROADD2)[25], horse (Equus caballus, EquCab2) [26] and rabbit (Oryctolagus cuniculus, oryCun2) [27]. The long lineages between species in the Distant-Species set means that behaviors may have changed back and forth between seasonal and non-seasonal breeding strategies several times, while the divergent sequences might influence the branch-site model analysis and generate false positives [28]. To address this problem, we also established a Close-Species set that only included closely-related, non-seasonal (human, gorilla (Gorilla gorilla, gorGor3.1) [29], chimpanzee, and cynomolgus monkey), and seasonal-breeding species (orangutan (Pongo abelii, PPYG2) [30], Indian rhesus monkey, Chinese rhesus monkey, and marmoset (Callithrix jacchus, C_jacchus3.2.1) [31]). The protein-coding sequences for human, gorilla, chimpanzee, orangutan, Indian rhesus monkey, marmoset, mouse, rat, dog, horse, and rabbit were downloaded from the Ensembl database (version 64, Sep. 2011; http://www.ensembl.org/info/data/ftp/index.html) [32]. The sequences for cynomolgus monkey (http://climb.genomics.cn/10.5524/100003) and Chinese rhesus macaque (http://climb.genomics.cn/10.5524/100002) were provided by BGI [33]. The corresponding cDNA sequences used in the accuracy assessment were downloaded from NCBI. Detailed information on the cDNA sequences used in this study are listed in S1 Table.

Calculating positively-selected sites

To identify 1:1 gene orthologs, human protein sequences were used to conduct BLAST [34] searches against other species sequences (blastp-F T-e 1e-5-m 8). It is difficult to select a set of transcripts to minimize alignment gaps and potential errors and thus false-positive branch-site test results [35]. In simple analyses in previous studies [6–8, 33, 36–41], the longest transcript for a given gene was chosen. Reciprocal searches were then performed for each species protein sequences relative to human protein sequences. In each search, pairwise sequences with identities <60% were excluded, and the highest hit for each query was retained to determine the pairwise orthologs between humans and other species. Modified branch-site models [5] for adaptive evolution analysis used each species in one breeding series as the foreground species, and all the other species in that breeding series as background species. For example, to test for positive selection in humans in the Distant-Species set, the human branch was designated as the foreground branch, and the other five species in the seasonal breeding group were designated as background branches. Positive selection signals for all species were tested similarly. Protein-coding sequences associated with the corresponding 1:1 gene orthologs were aligned using PRANK (codon). The corresponding gene-based phylogenetic trees were constructed using the maximum likelihood method in the PHYLIP 3.69 [42] software package, according to the tested aligned protein-coding sequences. The aligned protein-coding sequences and the corresponding phylogenetic trees were then used to analyze the adaptive evolution using the branch-site model in PAML’s codeML program [4]. Branch-site modified model A (model = 2, NSsits = 2) and the corresponding null model (model = 2, NSsits = 2, fix_omega = 1 and omega = 1) [5] were used to identify sequences under positive selection in both test sets of animals. Significance was calculated using the χ2 statistic, with one degree of freedom. Genes with p ≤0.01 were considered to be positively selected [5]. The p values were adjusted according to the FDR method (multiple testing correction with the method of Benjamini and Hochberg) [43] to allow for multiple testing, with a strict criterion of FDR <0.05. Positively-selected sites were obtained based on the Bayes Empirical Bayes (BEB) analysis [5], with a posterior probability >95%.

Screening for valid positive sites by SP penalty scoring

To ensure the accuracy of the positive sites, extended sequences were extracted including 15 amino acids (45 base pairs) upstream and downstream from the positive sites. SP [17, 18] measurements were then performed for penalty scoring of the sequences in both streams. (1) Some of the positive sites were at the edge of the beginning or end of the gene and were not reached by the upstream or downstream sequences, and the penalty base score was set separately for both streams (regarded as S, S = 15/n, where n is equal to the number of amino acids in the upstream or downstream sequence). (2) Penalty scores added 0 point for each position in perfect alignment, while mismatched sites or gaps in the alignment were awarded penalty scores of minus S or 2S, respectively. (3) Penalty scores for the upstream and downstream sequences were calculated separately, and the total penalty scores were the sum of the upstream and downstream scores. (4) Average penalty scores were calculated as the final scores (average penalty score = total penalty score/N, where N is the number of sequences used in each alignment). General and individual penalty scores were used. General penalty scores were equal to the sum of the penalty scores from each of the two compared species. For individual penalty scores, sequences with positive sites were compared with each of the other sequences used in the alignment in turn, and the total penalty scores were regarded as the individual penalty score. Threshold values were set for general and individual penalty scores to filter sequences with valid positive sites. In this study, the threshold values for the general and individual penalty scores were −50 and −15, respectively. If both the general and individual penalty scores were greater than the threshold value, the sequences were filtered and the sites regarded as positive.

Accuracy of positive sites according to cDNA sequences

Mistakes can occur during genome sequencing, sequence assembly, or gene annotation, and cDNA sequences can be used as references to assess the accuracy of the positive sites. Corresponding cDNA sequences were first matched to the gene sequences using the function BLAST [34] (blastn-e 1e-10-a 4-m 8). cDNA sequences that included the positions corresponding to the positive sites were then filtered. Further analysis was conducted using MEGA5 [44]. The gene sequences and their corresponding cDNA sequences were then subjected to alignment analysis using the MUSCLE [13] function. If the nucleotide sequences of the positive sites were identical to those of the corresponding positions in the cDNA sequences, the positive sites were regarded as valid.

Results

Preliminary filtering of positively-selected genes using PRANK and branch-site model

Totals of 11,031 and 13,171 1:1 gene orthologs with >60% identities were filtered from the Distant- and Close-Species sets, respectively, by BLAST [34]. The corresponding protein sequences were used for subsequent alignments. The numbers of pairwise gene orthologs between humans and other species are listed in S2 Table. After alignment using PRANK (codon), 10,918 gene orthologs in the Distant-Species set and 12,485 in the Close-Species set were tested for positive selection signals using the codeML program in the PAML package [4], with the modified branch-site model [5]. Positively-selected genes in each species with a p value <0.01(comparing LRT, the likelihood ratio test, with the χ2 distribution) and with a false-discovery rate (FDR) <5% are shown in Table 1.
Table 1

Numbers of positively-selected genes under different filtering conditions.

ClassDistant-Speciesχ2 test p<0.01Correction FDR<0.05SP score fitered GenesClose-Speciesχ2 test p<0.01Correction FDR<0.05SP score fitered Genes
Non-seasonal Human88164Human116204
Chimpanzee2076827Gorilla27416334
Cynomolgus1136227Chimpanzee28911748
Mouse228154Cynomolgus26615969
Rat2744318
Mean18240.816Mean236.25114.7538.75
Seasonal Indian rhesus453361131Orangutan446303147
Chinese rhesus20311051Indian rhesus603464157
Dog49915854Chinese rhesus22913057
Horse46315755Marmoset688314107
Rabbit44412958
Mean412.418369.8Mean491.5302.75117
In the Distant-Species set, the mean number of positively-selected genes in the seasonal species was four fold greater than in the non-seasonal species (fdr <0.05) (Fig 1A, Table 1). The equivalent increase in the Close-Species set was about 2.63-fold (Fig 1B, Table 1). These results demonstrate that there were more positively-selected genes in seasonal compared with non-seasonal breeders in both species sets.
Fig 1

Numbers of positively-selectived genes (fdr <0.05) and sites (after SP-score filtering).

(A). Positively-selected genes corrected by FDR. Sites (BEB >0.95) were filtered by SP scores in the Distant-Species set. (B). Positively-selected genes (FDR >0.05) and positive sites (BEB >0.95) filtered by general SP score >-50 and individual SP score >-15 in the Close-Species set.

Numbers of positively-selectived genes (fdr <0.05) and sites (after SP-score filtering).

(A). Positively-selected genes corrected by FDR. Sites (BEB >0.95) were filtered by SP scores in the Distant-Species set. (B). Positively-selected genes (FDR >0.05) and positive sites (BEB >0.95) filtered by general SP score >-50 and individual SP score >-15 in the Close-Species set. However, there were more positively-selected genes in the Close-Species than in the Distant-Species set (mean numbers with FDR <0.05 208.75 and 111.9, respectively). In addition to the different numbers of orthologs (12,485 vs. 10,918), it is also possible that more gaps were generated by alignment in the Distant-Species gene ortholog set compared with in the Close-Species set (mean gap length 244 in the Close-Species set and 322 in the Distant-Species set) (S3 Table), because the sequence divergence was smaller in the Close-Species set. The number of gaps may influence the results of branch-site analysis, because the branch-site would remove columns with gaps in the alignment sequences and would thus exclude more potential positive sites in the Distant-Species set compared with the Close-Species set.

Identification of false-positive sites through sequence misalignment

Putative positively-selected sites in the genome (FDR<0.05) were obtained by Bayes Empirical Bayes (BEB) analysis (posterior probability >95%) [5]. The numbers of putative positively-selected sites in each species are listed in Table 2. The details of all the positive sites with BEB >0.95 are listed in S4 Table.
Table 2

Positive sites after BEB and SP-score filtering.

ClassDistant-SpeciesSites (BEB>0.95)SP scores filtered sitesFPRClose-SpeciesSites (BEB>0.95)SP scores filtered sitesFPR
Non-seasonal Human261638.46%Human9633.33%
Chimpanzee1036536.89%Gorilla1588446.84%
Cynomolgus925441.30%Chimpanzee1329031.82%
Mouse10550.00%Cynomolgus23713144.73%
Rat664236.36%
Seasonal Indian rhesus53220661.28%Orangutan44424644.59%
Chinese rhesus1537749.67%Indian rhesus53129943.69%
Dog26110659.39%Chinese rhesus1898952.91%
Horse26213448.85%Marmoset36423236.26%
Rabbit24112747.30%
Alignment problems may influence the performance of the branch-site test, with poor alignment increasing the incidence of false-positive sites. We therefore filtered out sites with obvious signs of unreliable alignment. We also calculated the SP [17, 18] score for each of the positive sites’ extended sequences (± 15 amino acids/45 base pairs). Most unreliable alignments are represented by numerous gaps and sequence divergences (S1 Fig and S5 Table). After filtering, a total of 2009/3810 (52.73%) positive sites remained. Sites with extended alignments with low divergence are listed in S6 Table. The results after filtering revealed more sites with positive selection in the seasonal compared with the non-seasonal breeding species (Table 2). The false-positive rate due to misalignment was 33.33%–61.28% (Table 2), which was similar to that of 50%–55% in a previous report [12]. After alignment filtering, differences in gene numbers between species in the Distant- and Close-Species sets were consistent with those after FDR-adjusted filtering. However, the false positive rate(FPR) statistics only considered misalignment and did not take account of other factors such as sequence errors, misassembly, or annotation problems. According to extended-sequence alignments of the positive sites, SP scores <-50 were generally caused by excessive gaps or deficient matches, of which gaps contributed more to the low SP penalty scores (S1 Fig and S6 Table). Gaps and deficient matches may arise as a result of diversity between species or different transcript lengths, because we used the longest human transcripts to BLAST other species’ protein-coding sequences [35]. Columns with gaps in the alignments would be deleted in branch-site models, even though positive sites may be located within deficient sequence alignments surrounded by gaps or mismatched sequences. A threshold SP score of −50 can filter out most false-positive sites caused by divergent sequence alignments. SP scoring thus improves the reliability of the results by reducing the false-positive rate caused by unreliable alignments. Details of the positive genes filtered by SP scoring are shown in S7 Table.

cDNA mapping as a novel method of filtering positive sites

The quality of the genome may limit the accuracy of evolutionary analysis. It can result in false-positive results associated with sequencing errors, alternative splicing, amino acid repeats, and frameshift mutations, causing mistakes in gene annotation [8, 15]. However, cDNA sequences are much shorter than genome sequences and are thus more reliable. The reliability of positive sites will therefore be increased if sequences with positive sites are mapped to the corresponding cDNA sequences and aligned with most of the bases. We therefore used cDNA mapping as a novel means of testing sequence errors. cDNA sequences corresponding to the positive sites were analyzed. In this study, we aligned a total of 193 positive sites in perfect alignment with at least one cDNA sequence of the corresponding species using the MUSCLE function [13] in MEGA5 [44]. The coverage between positive sites and corresponding cDNA sequences was low (<10%, 193/2009), and the false positive rate was 61.66% (120/193). Most inconsistent sites were in cynomolgus monkey, horse, and orangutan, which had genome sequences of low quality or with annotation mistakes. In contrast, the human, mouse and rat genome sequences showed high accuracy. The details of the positive sites mapped with the corresponding cDNA sequences are shown in S1 Table. A total of 74 corresponding cDNA sites were finally identified that were consistent with the positive sites (S1 Table). No corresponding cDNA sequences mapped to the positive sites in gorillas, Chinese rhesus monkeys, and marmosets. After verification by cDNA filtering, 39 genes remained, including 15 genes that were positively-selected in non-seasonal species (Table 3), and 24 in seasonal species (Table 4). Although the limited availability of cDNA sequences meant that only a few positive sites remained after mapping, these sites were likely to be more accurate.
Table 3

Positively-selected genes in non-seasonal species filtered by SP scoring and corrected by cDNA mapping.

SpeciesGene SymbolSpecies IDSetP-χ2 testFDR correction
Human CGA ENST00000369582Distant-Sspecies0.0000000.000779
Human TOMM6 ENST00000398884Distant-Sspecies0.0000460.035968
Human CD151 ENST00000397420Close-Sspecies0.0000450.029687
Human RRP8 ENST00000254605Distant-Sspecies0.0000400.033188
Human ACCN4 ENST00000358078Distant-Sspecies0.0000000.000808
Close-Sspecies0.0000000.000609
Human CHRNA1 ENST00000261007Close-Sspecies0.0000000.000115
CE SNX5 CE_ENSP00000366998Distant-Sspecies0.0000000.000004
Close-Sspecies0.0000000.000002
CE NCAPG CE_ENSP00000251496Close-Sspecies0.0000130.002031
CE VPS33A CE_ENSP00000267199Distant-Sspecies0.0000460.009533
Mouse SWI5 ENSMUST00000113400Distant-Sspecies0.0000320.032039
Mouse NID2 ENSMUST00000022340Distant-Sspecies0.0000050.005636
Mouse DHDH ENSMUST00000011526Distant-Sspecies0.0000660.047987
Mouse DNAH1 ENSMUST00000048603Distant-Sspecies0.0000040.005318
Rat INVS ENSRNOT00000011622Distant-Sspecies0.0000010.001202
Rat GALK2 ENSRNOT00000012447Distant-Sspecies0.0001460.037931
Table 4

Positively-selected genes in seasonal species filtered by SP scoring and corrected by cDNA mapping.

SpeciesGene SymbolSpecies IDSetP-χ2 testFDR correction
Orangutan TADA1 ENSPPYT00000000676Close-Sspecies0.0000010.000150
Orangutan LGALS3BP ENSPPYT00000010154Close-Sspecies0.0000000.000000
Orangutan ZFR ENSPPYT00000017875Close-Sspecies0.0000000.000001
Orangutan THRAP3 ENSPPYT00000001838Close-Sspecies0.0000000.000001
Orangutan MTMR12 ENSPPYT00000017872Close-Sspecies0.0000000.000000
Orangutan TMCC2 ENSPPYT00000000349Close-Sspecies0.0000000.000018
Orangutan SLC44A2 ENSPPYT00000011142Close-Sspecies0.0000160.001520
Orangutan MIPEP ENSPPYT00000006166Close-Sspecies0.0000100.001011
Orangutan XRN2 ENSPPYT00000012494Close-Sspecies0.0000000.000000
Orangutan RBM47 ENSPPYT00000017075Close-Sspecies0.0000000.000000
Orangutan MBTPS1 ENSPPYT00000008921Close-Sspecies0.0001380.008620
Orangutan FAM69A ENSPPYT00000001379Close-Sspecies0.0001990.011474
Orangutan SLC43A2 ENSPPYT00000009117Close-Sspecies0.0009920.042126
Orangutan RAB1B ENSPPYT00000003634Close-Sspecies0.0000260.002287
Orangutan CMTM6 ENSPPYT00000016330Close-Sspecies0.0000000.000003
Orangutan DARS2 ENSPPYT00000000592Close-Sspecies0.0000040.000458
Orangutan AARS ENSPPYT00000008865Close-Sspecies0.0000020.000217
Orangutan TH1L ENSPPYT00000012980Close-Sspecies0.0000000.000000
Rabbit PLEK ENSOCUT00000023428Distant-Sspecies0.0001130.016744
Rabbit SNX25 ENSOCUT00000024747Distant-Sspecies0.0000050.002487
Dog ALB ENSCAFT00000037121Distant-Sspecies0.0000190.004967
Horse SMC4 ENSECAT00000024113Distant-Sspecies0.0001080.015518
Horse ANO6 ENSECAT00000013517Distant-Sspecies0.0000070.003137
Horse GLIPR1 ENSECAT00000016491Distant-Sspecies0.0000350.007439

Discussion

Influence of alignment and annotation

The results of evolutionary analysis are influenced the quality of the genome sequence; false-positive sites may be detected and important information may be missed as a result of low-quality sequences [15, 16]. Unfortunately, recent genome-sequencing techniques are still unable to provide sequences reliable enough for evolutionary analysis. Stringent filtering functions and parameters are therefore needed to obtain reliable positive sites, and careful analytical design can achieve reliable results, even from low-quality genome sequences. Evolutionary analysis usually starts with sequence alignment using software such as ClustalW, MUSCLE or PRANK. In this study, we used PRANK (codon), because this software takes evolutionary information into consideration before placing the gaps [11], resulting in fewer mismatches but larger gaps compared with the other programs (S3 Table). Valid positive sites are likely to be located in alignments with low divergence and few gaps or mismatches, and sequence misalignments can thus generate false-positive sites in branch-site models. The branch-site model usually deletes columns with gaps in the alignments when calculating positive sites, so some sites located in deficient alignments may be regarded as positive, whereas some true-positive sites may be missed. SP-score filtering, which focuses on filtering out such false-positive sites, can be used to reduce the false-positive rate and ensure the quality of the filtered positive sites. On the other hand, cDNA mapping can exclude false-positive sites that originate from mistakes in genome sequence assembly and gene annotation. The combination of these processes can thus filter out many false-positive sites and identify low-quality genome sequences, such as those for cynomolgus monkey, horse, and orangutan in this study. cDNA sequences in previous genome-wide studies have generally been used as references for gene annotation [45-47]. In contrast, we used cDNA mapping as a novel method to identify positive sites with high quality. Because cDNA sequences are usually relatively short, current sequencing techniques can provide reliable sequences. Moreover, some sites can be mapped to more than one corresponding cDNA sequence. cDNA mapping can thus ensure the quality of the remaining positive sites. However, there are some limitations. More than 90% of sites cannot be matched with corresponding cDNA sequences, and the validity of these sites therefore cannot be checked using this method. Because cDNA sequences are usually sequenced for a specific purpose, corresponding cDNA sequences may not be available for some putative positive sites, and genes with important evolutionary implications may be missed.

Positively-selected genes in seasonal and non-seasonal breeders

Evolutionary analysis of genome sequences can be used to identify specific, positively-selected genes in various species. The genetic mechanisms and potential environmental adaptations associated with seasonal and non-seasonal breeding can then be inferred by functional analysis of positively-selected genes in the respective species. The functions of positively-selected genes in non-seasonal breeding species reflect reproductive tendencies such as sperm generation and cell proliferation. Two key genes perform these functions in humans: CGA (glycoprotein hormones, alpha polypeptide) is a gonadotropin subunit [48, 49], while CD151 functions in promoting metastasis, and increases the expression of phospho-extracellular signal-regulated kinase (ERK) [50, 51]. Given that ERK is a component of the mitogen-activated protein kinase pathway, positive selection pressure on this gene may influence cell proliferation and differentiation [52, 53]. Mutation of Dnah1 in mice has been reported to cause male infertility [54, 55], suggesting that it may play an important role in influencing mating behavior. Another crucial gene in rats, Invs, is involved in controlling cytoskeletal organization and cell division, which are essential for reproduction [56, 57]. Moreover, this gene can interact with NPHP1 and NPHP3 that influence the Wnt signaling pathway, which may in turn influence kidney function and renal cell formation linked to spermatocyte and spermatid generation in the testis [58-60]. These positively-selected genes may reflect modulation of the reproductive system under environmental pressure in non-seasonal breeding species, enabling them to breed throughout the year. The identification of positive sites focused on sperm generation and cell proliferation suggests that mutations in these genes may influence sperm quantity or reproductive capacity. Genes that were positively selected in seasonal breeding species differed from those in non-seasonal species in having less focused functions. However, the orangutan provided the most valid positive genes among these species, and their functional analysis may help to explain some predominant characteristics of seasonal breeding species. The key gene, THRAP3 (thyroid hormone receptor associated protein 3, also known as Thrap150), is a selective coactivator for CLOCK-BMAL1 and promotes CLOCK-BMAL1 binding to target genes [61]. Moreover, THRAP3 can also interact with HELZ2, which regulates adipocyte differentiation [62]. Clock and Bmal1 have previously been reported to be closely related to seasonal breeding behaviors [63], the THRAP3 mutation may thus influence the circadian rhythm of the reproductive system. This is supported by a previous study showing that thyroid hormone catabolism within the mediobasal hypothalamus regulated seasonal gonadotropin-releasing secretion [64]. However, because orangutans live in Indonesia, which has high temperature throughout the year [30, 65], they may not need to adjust their physical condition, such as lipid storage, to cope with cold weather. THRAP3 may thus influence adipocyte differentiation, while other functionally-related genes such as MTMR12 [66] and ZFR [67] would be positively selected because of such environmental conditions. In addition to THRAP3, the positively-selected genes TH1L and CMTM6 may also help to explain the seasonal breeding behavior. As TH1L may have a similar function to TH1, which attenuates androgen signaling [68], while CMTM6 functions in spermatogenesis [69-71]. Evidence from previous studies suggests that orangutans produce 14 times less sperm than chimpanzees, which is a closely-related, but non-seasonal breeder [72]. Seasonal breeding in orangutans may thus be a consequence of circadian rhythm and limited sperm production, which restrict their breeding to the period from December to May, the most productive months in terms of food (fruit) supply, to ensure adequate food and energy for effective reproduction [73]. Diversity in breeding behaviors can generally be attributed to mutations affecting endocrine mechanisms. Such mutations may be related to specific environmental conditions, such as temperature and food supply. In this study, positively-selected genes related to sperm generation were identified in both types of breeding species. Indeed, previous reports have indicated rapid evolution of sperm proteins in mammals [74, 75]. Evolutionary mutations in these genes may not lead to the unique consequences associated with different breeding strategies. However, previous studies have indicated that the reproduction behavior in seasonal breeding species is largely under the regulation of the circadian rhythm system [64]. This is consistent with our results, which showed that THRAP3, which is functionally-related to the CLOCK-BMAL1 system, was under positive selection pressure. The mechanisms determining breeding behaviors can be complicated, but evolution leads to adaptation to the environment, enabling well-adapted lineages to persist for many generations.

Conclusions

In this study, we conducted a precise, genome-wide scan to detect genes that were positively selected between seasonal and non-seasonal breeding species. The evolutionary analysis was designed to reduce the incidence of false-positive sites by SP filtering and cDNA mapping. Although the lack of cDNA sequences means that some positive genes may have been missed, the identification of valid, positively-selected genes with functions relating to spermatogenesis, cell proliferation, and circadian rhythm might indicate possible molecular mechanisms underlying the seasonal and non-seasonal reproductive behaviors. Further developments in genome-sequencing technologies will allow the sequencing and assembly of higher-quality genomes, and more accurate gene annotation, while the availability of more cDNA sequences will increase the value of cDNA mapping for improving the accuracy of evolutionary analysis.

Sites with extended sequences alignments.

(A). Perfect alignment. (B). Acceptable alignment. (C). Unacceptable alignment because of large number of gaps. (D). Unacceptable alignment because of putative positive sites located in poorly-aligned sequences. (E). False negative. SP scoring filtered out mistaken acceptable alignments. (TIF) Click here for additional data file.

Positive sites mapped with the corresponding cDNA sequences.

(XLSX) Click here for additional data file.

1:1 gene orthologs.

Gene orthologs were generated by BLAST, and the best hit of human versus the other species was then reversed. All identities were >60%. (XLSX) Click here for additional data file.

Lengths of gene sequences before and after alignments with different aligners.

(XLSX) Click here for additional data file.

Positive sites (BEB >0.95).

(XLSX) Click here for additional data file.

SP scores of positive sites after sequence alignment.

(XLSX) Click here for additional data file.

Positive sites after SP-score filtering.

(XLSX) Click here for additional data file.

Positive genes filtered by SP scoring.

(XLSX) Click here for additional data file.
  60 in total

1.  Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques.

Authors:  Guangmei Yan; Guojie Zhang; Xiaodong Fang; Yanfeng Zhang; Cai Li; Fei Ling; David N Cooper; Qiye Li; Yan Li; Alain J van Gool; Hongli Du; Jiesi Chen; Ronghua Chen; Pei Zhang; Zhiyong Huang; John R Thompson; Yuhuan Meng; Yinqi Bai; Jufang Wang; Min Zhuo; Tao Wang; Ying Huang; Liqiong Wei; Jianwen Li; Zhiwen Wang; Haofu Hu; Pengcheng Yang; Liang Le; Peter D Stenson; Bo Li; Xiaoming Liu; Edward V Ball; Na An; Quanfei Huang; Yong Zhang; Wei Fan; Xiuqing Zhang; Yingrui Li; Wen Wang; Michael G Katze; Bing Su; Rasmus Nielsen; Huanming Yang; Jun Wang; Xiaoning Wang; Jian Wang
Journal:  Nat Biotechnol       Date:  2011-10-16       Impact factor: 54.908

2.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.

Authors:  Jianzhi Zhang; Rasmus Nielsen; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2005-08-17       Impact factor: 16.240

3.  Evidence for widespread positive and purifying selection across the European rabbit (Oryctolagus cuniculus) genome.

Authors:  Miguel Carneiro; Frank W Albert; José Melo-Ferreira; Nicolas Galtier; Philippe Gayral; Jose A Blanco-Aguiar; Rafael Villafuerte; Michael W Nachman; Nuno Ferrand
Journal:  Mol Biol Evol       Date:  2012-01-31       Impact factor: 16.240

4.  Multiple sequence alignment using ClustalW and ClustalX.

Authors:  Julie D Thompson; Toby J Gibson; Des G Higgins
Journal:  Curr Protoc Bioinformatics       Date:  2002-08

5.  Gap costs for multiple sequence alignment.

Authors:  S F Altschul
Journal:  J Theor Biol       Date:  1989-06-08       Impact factor: 2.691

6.  Mammalian sperm proteins are rapidly evolving: evidence of positive selection in functionally diverse genes.

Authors:  Dara G Torgerson; Rob J Kulathinal; Rama S Singh
Journal:  Mol Biol Evol       Date:  2002-11       Impact factor: 16.240

7.  Trihydrophobin 1 attenuates androgen signal transduction through promoting androgen receptor degradation.

Authors:  Yanzhong Yang; Weiying Zou; Xiangfei Kong; Hanzhou Wang; Hongliang Zong; Jianhai Jiang; Yanlin Wang; Yi Hong; Yayun Chi; Jianhui Xie; Jianxin Gu
Journal:  J Cell Biochem       Date:  2010-04-01       Impact factor: 4.429

8.  Inversin modulates the cortical actin network during mitosis.

Authors:  Michael E Werner; Heather H Ward; Carrie L Phillips; Caroline Miller; Vincent H Gattone; Robert L Bacallao
Journal:  Am J Physiol Cell Physiol       Date:  2013-03-20       Impact factor: 4.249

9.  Patterns of positive selection in six Mammalian genomes.

Authors:  Carolin Kosiol; Tomás Vinar; Rute R da Fonseca; Melissa J Hubisz; Carlos D Bustamante; Rasmus Nielsen; Adam Siepel
Journal:  PLoS Genet       Date:  2008-08-01       Impact factor: 5.917

Review 10.  Clock genes in calendar cells as the basis of annual timekeeping in mammals--a unifying hypothesis.

Authors:  G A Lincoln; H Andersson; A Loudon
Journal:  J Endocrinol       Date:  2003-10       Impact factor: 4.286

View more
  2 in total

1.  High-density genotyping reveals signatures of selection related to acclimation and economically important traits in 15 local sheep breeds from Russia.

Authors:  Andrey A Yurchenko; Tatiana E Deniskova; Nikolay S Yudin; Arsen V Dotsev; Timur N Khamiruev; Marina I Selionova; Sergey V Egorov; Henry Reyer; Klaus Wimmers; Gottfried Brem; Natalia A Zinovieva; Denis M Larkin
Journal:  BMC Genomics       Date:  2019-05-08       Impact factor: 3.969

2.  Comparative transcriptome analysis reveals potential evolutionary differences in adaptation of temperature and body shape among four Percidae species.

Authors:  Peng Xie; Shao-Kui Yi; Hong Yao; Wei Chi; Yan Guo; Xu-Fa Ma; Han-Ping Wang
Journal:  PLoS One       Date:  2019-05-07       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.