Literature DB >> 22930760

The evolution of intron size in amniotes: a role for powered flight?

Abstract

Intronic DNA is a major component of eukaryotic genes and genomes and can be subject to selective constraint and have functions in gene regulation. Intron size is of particular interest given that it is thought to be the target of a variety of evolutionary forces and has been suggested to be linked ultimately to various phenotypic traits, such as powered flight. Using whole-genome analyses and comparative approaches that account for phylogenetic nonindependence, we examined interspecific variation in intron size variation in three data sets encompassing from 12 to 30 amniotes genomes and allowing for different levels of genome coverage. In addition to confirming that intron size is negatively associated with intron position and correlates with genome size, we found that on average mammals have longer introns than birds and nonavian reptiles, a trend that is correlated with the proliferation of repetitive elements in mammals. Two independent comparisons between flying and nonflying sister groups both showed a reduction of intron size in volant species, supporting an association between powered flight, or possibly the high metabolic rates associated with flight, and reduced intron/genome size. Small intron size in volant lineages is less easily explained as a neutral consequence of large effective population size. In conclusion, we found that the evolution of intron size in amniotes appears to be non-neutral, is correlated with genome size, and is likely influenced by powered flight and associated high metabolic rates.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 22930760 PMCID： PMC3490418 DOI： 10.1093/gbe/evs070

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

As one of several types of noncoding DNA, introns are abundant in amniotes genomes. In most mammals, there are on average more than eight introns per gene (Roy and Gilbert 2006; Farlow et al. 2011). First discovered in protein-coding genes of viruses (Berget et al. 1977; Chow et al. 1977) and named later (Gilbert 1978), introns were initially considered nonfunctional DNA sequences because they are spliced from precursor RNAs when producing the mature messenger RNA. However, it is now well accepted that introns are not simply “junk” DNA, as they are the basis of alternative splicing, which can generate multiple proteins from a single gene; some introns also encode noncoding RNA molecules that regulate transcription. Because of their newly discovered functions and conservation in the genome, many introns are now believed to evolve under selective constraints. The observation that many introns harbor conserved sites under purifying selection is now commonplace, and several studies have found evidence for adaptive evolution in variation segregating within introns (Parsch et al. 2010; Hayden et al. 2011; Cagliani et al. 2012), suggesting that both size and sequence may be shaped by non-neutral forces. Previous studies have found that within species, intron size varies substantially among different genes: tissue- or development-specific genes have longer introns compared with housekeeping genes, and highly expressed genes have shorter introns than lowly expressed genes (Castillo-Davis et al. 2002; Eisenberg and Levanon 2003; Urrutia and Hurst 2003; Vinogradov 2004), which could be explained by selection for economy (Castillo-Davis et al. 2002; Eisenberg and Levanon 2003; Urrutia and Hurst 2003; Pozzoli et al. 2007), mutation bias, or the “genome design” hypothesis (Vinogradov 2004, 2005, 2006), which states that the length of genomic elements is determined by their function. Even within a single gene, introns are different: first introns are generally longer than other introns (Marais et al. 2005; Gaffney and Keightley 2006; Gazave et al. 2007; Bradnam and Korf 2008), which may reflect different functional properties they possess, such as intron-mediated enhancement (IME) of heterologous gene expression (Mascarenhas et al. 1990), insertion frequency of SINE elements (Majewski and Ott 2002), or proportion of conserved elements (Keightley and Gaffney 2003; Chamary and Hurst 2004). Moreover, intron size also varies between species, and it has been proposed that avian intron sizes, such as genome sizes, are reduced in comparison with mammals partially because of the selection pressure imposed by metabolically demanding behaviors, such as flight (Hughes and Hughes 1995), where small introns provide a slightly improved transcription efficiency or splicing accuracy (Lynch 2002). Alternatively, small introns may simply mirror reduced genomes and thus reduced cell sizes, which increase the surface to volume ratio and permit a greater rate of gas change per unit volume (Hughes and Hughes 1995), therefore beneficial for metabolically demanding behaviors. In an early study, Hughes and Hughes (1995) surveyed 111 introns homologous between humans and chickens for 31 genes and found that chicken introns are significantly smaller than those of humans. However, in a later study, Vinogradov (1999) examined 176 introns of 55 chicken–human homologous genes but failed to reveal any significant difference in intron size between these two species. Because these studies only included only one bird species (chicken), the possibility cannot be excluded that random changes occurred in chicken and that the trends observed were not bird specific but chicken specific; therefore, the role of flight in shaping the intron size variation is controversial. To overcome this concern, Waltari and Edwards (2002) studied 14 introns from 19 flighted and flightless birds and 1 nonflying relative, the American alligator; their result suggested that the evolution of intron size is consistent with neutral Brownian motion and that there was no significant correlation between intron size and metabolically costly behaviors such as flight. However, the number of introns in that study was quite small, so we still cannot rule out the influence of random effects. Thus, there is no firm conclusion regarding whether introns are smaller in avian species than in mammals and whether flight might impose selection pressures on intron sizes. Recently, great efforts on whole genome sequencing in a larger number of species provide an opportunity to study the evolution of genomic properties in an information-rich phylogenetic context. Here, we exploited recent whole-genome data to revisit the question of intron size variation in amniotes by using a larger number of introns from more species. Our goal is to produce a better understanding of intron size variation and evolutionary forces acting on it, all the while using appropriate comparative methods (Felsenstein 1985; Harvey and Pagel 1991; Lynch 1991). Our main finding is that mammals have larger introns than birds and reptiles and that this difference is comparable to that exhibited by genome size between these two clades. Furthermore, flighted species tend to have shorter introns than their nonflying sister groups, suggesting flight or its related traits may pose selective constraints on the evolution of intron sizes.

Materials and Methods

Data Sets

We generated three different data sets in this study to serve different purposes. All genomes were downloaded from Ensembl genome browser (http://www.ensembl.org, release 59, last accessed October 3, 2012) (Flicek et al. 2011). (We also investigated a high-quality microbat genome from release 64 and achieved almost identical results. See further details in the Supplementary Material online). Data set A includes 11 species, including 9 species with published complete genomes and two prereleased bat genomes. These species are human (Homo sapiens), mouse (Mus musculus), microbat (Myotis lucifugus), megabat (Pteropus vampyrus), opossum (Monodelphis domestica), platypus (Ornithorhynchus anatinus), chicken (Gallus gallus), turkey (Meleagris gallopavo), zebra finch (Taeniopygia guttata), anole (Anolis carolinensis), and xenopus (Xenopus tropicalis). This data set allows informative comparisons between flying and nonflying species in both mammals and reptiles, and it contains a relatively small number of species to assure a large number of orthologous introns to be identified. Data set B includes 20 species with at least 6X coverage genome data to represent a high-quality data set, those are human (H. sapiens), chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), orangutan (Pongo pygmaeus), rhesus (Macaca mulatta), marmoset (Callithrix jacchus), mouse (M. musculus), rat (Rattus norvegicus), Guinea Pig (Carvia porcellus), rabbit (Oryctolagus cuniculus), cow (Bos taurus), horse (Equus caballus), dog (Canis familiaris), elephant (Loxodonta africana), opossum (Mon. domestica), chicken (G. gallus), turkey (Mel. gallopavo), zebra finch (T. guttata), anole (A. carolinensis), and xenopus (X. tropicalis). Data set C contains the two bats and eight arbitrarily chosen mammals in addition to data set B, which represents a broad phylogenetic range. These additional species are alpaca (Vicugna pacos), pig (Sus scrofa), cat (Felis catus), hedgehog (Erinaceus europaeus), shrew (Sorex araneus), lesser hedgehog tenrec (Echinops telfairi), armadillo (Dasypus novemcinctus), and wallaby (Macropus eugenii).

Genome Size

Data on genome size were retrieved from the Animal Genome Size Database (http://www.genomesize.com, last accessed October 3, 2012).

Identification of Orthologous Introns

Intron size and position information were downloaded from Ensembl genome browser (release 59) for each species under study. To identify orthologous introns, we first defined orthologous genes. For data set A, we downloaded peptide sets for the 11 species mentioned above to perform blastp search using the Basic Local Alignment Search Tool (BLAST) suite (Altschul et al. 1990) for each pair of species and used the “reciprocal best hit” method to define orthologous genes. For data sets B and C, we avoided the above method due to computing power limit; instead, we downloaded orthologous genes from Ensembl BioMart, requiring one-to-one orthology type. If a gene had more than one splicing form, only the longest one was used. Then, we denoted human (H. sapiens) genes as query and aligned to them corresponding orthologous genes from other species by performing a 1-to-1 BLASTP. Next, intron positions were mapped to the alignment, and orthologous introns were defined if their positions are within three amino acids in the alignment. Finally, only introns larger than 20 bp were considered to reduce the annotation uncertainty on short introns (Brawand et al. 2011).

Phylogenetic Tree Construction

The phylogenetic tree was downloaded from Ensembl with manual removal of unused species. To construct species trees and to estimate branch lengths, autosomal regions with refSeq annotations were used to create multiple-species alignments. The program phyloFit was applied to generate the tree and branch length, after adjusting the frequencies of the alignment back to a genome-wide GC percent of 0.41.

Ancestral State Reconstruction

To study differences in intron size between mammals and reptiles, we compared the intron size of ancestors of each group. To reconstruct ancestral intron sizes, we used the R package “Analysis of Phylogenetics and Evolution” (ape) (Paradis et al. 2004) to reconstruct ancestral states. For continuous traits such as intron size, a Brownian motion model was assumed. Using custom python scripts, both maximum likelihood (ML) (Schluter 1997) and phylogenetically independent contrast (PIC) method (Felsenstein 1985) were used to fit the model to yield ancestral values for each intron.

Phylogenetically Corrected Tests

To account for the phylogenetic signal between two phylogenetic groups in a comparison, we used phylogenetic generalized least squares (PGLS) method (Martins and Hansen 1997; Cunningham et al. 1998), which is a powerful tool to estimate unknown parameters in a linear regression (LR) model when the observations have a certain degree of correlation (Butler and King 2004). The R package “Linear and Nonlinear Mixed Effects Models” (nlme) (http://cran.r-project.org/web/packages/nlme/index.html, last accessed October 3, 2012) was used to conduct PGLS-based tests. In terms of comparing two phylogenetic groups, we assumed that the trait evolves by Brownian motion and added a binary dummy variable to distinguish two groups in the comparison (e.g., 1 for one group and 0 for another group) and constructed a regression model. If the slope coefficient in the regression model deviated significantly from 0, those groups in the comparison are significantly different.

Binomial Test for Phylogenetic Correction

We assumed that after the separation of mammals and reptiles/birds, introns evolve neutrally on each branch. Then for a given orthologous intron, the probability that it is larger in mammals than in reptiles (including birds) should be 0.5, thus the total number of larger orthologous introns in mammals compared with that in reptiles/birds should follow the binomial distribution with P = 0.5. Significant deviations from this distribution will suggest a violation of the null hypothesis and could indicate non-neutral evolution.

Permutation Test

To confirm that the intron size contraction we found in volant species is not due to random effects, because one could conceive of flying and nonflying groups species as having a 50:50 chance of having “small” or “large” introns, we developed a permutation test. Treating mammals and reptiles separately, we first permuted the distribution of intron sizes across all the species for each intron within each clade. We then counted the number of introns that are smaller in flyers when compared with their nonflying sister group. This process was repeated 1,000 times, and we recorded the number of permutations that are as extreme as the observed numbers to calculate the P value.

Phylogenetically Corrected Correlation

To test the correlation between two traits, such as intron size and genome size, we constructed a simple regression model y = α + βx, where y is the dependent variable and x is the independent variable. To account for the evolutionary nonindependence of trait data, we used the program BayesTraits (http://www.evolution.rdg.ac.uk, last accessed October 3, 2012), which integrates PGLS in a Bayesian framework (Pagel 1999). A Markov chain Monte Carlo (MCMC) algorithm is applied in BayesTraits to produce posterior distributions of regression parameters. Before MCMC analysis, we used ML to decide whether phylogenetic correction is necessary by estimating the phylogenetic signal λ, which indicates whether species are not independent for a given phylogenetic tree and trait. If λ = 1, the trait is evolving as expected by a random walk model, whereas λ = 0 means a trait is evolving among species as if they were independent and no phylogenetic correction is needed. Then the MCMC was run for 5,050,000 iterations with a burn in of 50,000 and a sample period of 1,000. We manually controlled the rate deviation, which determines the boldness of the proposal procedure of the MCMC, to be consistent with acceptance rates ranging between 0.2 and 0.4 (proportion of proposals accepted). To assess the significance of correlations, we compared the proportion of the posterior distribution of slope parameters (β) that crossed 0 (the null model), as suggested elsewhere (Organ et al. 2007). We also used BayesTraits to test the hypothesis that smaller intron size and flight could be correlated when treated as binary traits. For these tests, we used an ML framework with 50 iterations. We first ran the data with all parameters and ancestors unconstrained and then with the common ancestor of birds and Anolis and of bats, horse, cat, and dog constrained to be flightless, forcing the characters to change to flighted and small introns on the appropriate branches.

Repetitive Elements

The repetitive element (RE) data were retrieved from Ensembl. By comparing repeat masked genomic sequences to raw sequences, we obtained the position and length information for REs.

Results

Data Set Summary

In this analysis, we built three nonexclusive data sets with different number of species and thus representing different phylogenetic depth. In our study, data sets with sparse phylogenetic sampling maximize the number of identified orthologous introns, which could avoid the possibility of drawing conclusions based on a small number of introns. Meanwhile, data sets with deeper phylogenetic coverage give us a broad picture of intron size evolution and avert biased results by focusing on few species. Throughout we used data from Ensembl release 59, but we also performed analyses using a recently released high-quality microbat (Myo. lucifugus) genome but found few differences from our initial analyses (see the Supplementary Material online), so we report results using data from Ensembl 59). Using a reciprocal-best-hit approach, we identified 12,506 homologous introns in 11 selected species, which are designated as data set A; and we also exploited the protein ortholog annotation from Ensembl to identify 562 and 98 homologous introns in data sets that we designate B and C, respectively. These introns belong to 2,300, 367, and 67 genes (see Materials and Methods). The small number of introns identified in the latter two data sets was probably due to stringent filters in our method (to pass the filters, introns were required to occur within coding regions, which in turn had to have orthologs in each species that had to occur at orthologous sites in all species); therefore, when more species are used, the probability of changes in exon–intron structure occur, ruling out inclusion in our study. To test this, we relaxed constraints in data set C by requiring orthologous introns presented in bats, reptiles and could be missing in at most one other species, which resulted in 1,070 introns. However, the pattern is very similar to what we observed for the small number of introns (data not shown), so we are convinced that even though data sets B and C contain a small number of introns, analyses based on them are representative. Alternatively, including more incompletely annotated genomes, as in data set B, could also lead to a small number of orthologous regions in all species. Because we used different methods to identify orthologous introns, it is important to determine whether results generated by different methods are consistent. The comparisons of median size of introns in eight species represented in all three data sets showed that data set A is significantly correlated with data sets B and C (P < 0.01), suggesting these two methods are consistent. Data sets B and C are also closely correlated (P < 0.001), which implies that little bias was introduced when we used fewer introns as a result of more species considered. Similar to previous studies on metazoans, we found that the first intron of the amniote genomes we studied was significantly larger than the other introns (fig. 1), presumably due to harboring more functional sequences than other introns (Marais et al. 2005; Gaffney and Keightley 2006; Gazave et al. 2007; Bradnam and Korf 2008).

Distribution of intron median size in 11 species used in data set A. “Other introns” include all other introns after the fourth intron. (A) Introns identified in data set A. (B) Introns from genes with at least five introns in each species.

Reptiles (Including Birds) Have Smaller Introns Compared with Mammals

Mammals and reptiles/birds differ in many genomic characteristics, such as genome size and the proportion of REs. Here, we compared the intron size between these two sister groups, and we found for all three data sets, reptiles (including birds) have smaller introns compared with mammals (fig. 2). To understand whether these differences in intron size are statistically significant or simply random fluctuation, we performed t-tests on the median intron size of these species within a PGLS framework that accounts for nonindependence among data points introduced by shared evolutionary history. In these analyses, no significant P value was found for introns either categorized by position or as a whole (data not shown), suggesting that this apparent pattern is not strong in a phylogenetic context. However, the small sample size of reptiles in our data set (only four species included in our analysis) could affect the power of our test because of the resulting small degrees of freedom. To explore this possibility, we constructed several large species trees by adding different number of birds to our existing trees, based on tree topologies and branch lengths from recent phylogenetic surveys (Hackett et al. 2008). Then, we randomly assigned intron sizes for these additional bird species from a normal distribution with parameters estimated from three known birds (chicken, turkey, and zebra finch). Overall, we created four simulated data sets, two derived from data set A (A03, which has 3 newly added birds, and A12, which has 12 newly added birds), and the other two derived from data set B (B12, which contains 12 newly added birds, and B20, which contains 20 newly added birds). We next repeated the above PGLS analysis 5,000 times, and the result demonstrated that smaller P values were produced as sample size became larger (fig. 3), which suggests that the PGLS-based t-test is heavily affected by the number of species used and has low statistical power if that number is small. Therefore, we used a binomial test (see Materials and Methods) to overcome the confounding phylogenetic effect. To test this hypothesis, we reconstructed the intron size for the common ancestor of mammals and that of reptiles, by both ML method and the PIC method. In data set A, 8,728 of 12,506 (∼70%) introns are longer in the mammalian ancestor compared with the reptile ancestor (P < 0.001) using ML reconstruction and 8,974 of 12,506 (∼72%, P < 0.001) for PIC reconstruction. Similar results are found in data sets B and C with all P values <0.001. These results suggest that reptiles have smaller introns compared with mammals and that this contraction is consistent in direction across large numbers of introns, implying the action of non-neutral or genome-wide forces.

The influence of greater taxon sampling on the significance of PGLS-based t-tests. We generated four larger phylogenetic trees with more bird species (A03 and A12 derived from data set A and B12 and B23 derived from data set B). Then we used the median size of a specific intron class in each species as node values in a phylogenetic tree and performed PGLS analysis. For newly added bird species, node values were generated by normal distribution (see text for details). To get a hypothetical distribution, this procedure was repeated 5,000 times. In each diagram, the red line denotes the P value from PGLS analysis in the original data set, and the blue and green bars denote the 5,000-time simulation of such P value in two simulated data sets derived from a same original data set. (A) Simulation based on the median size of first introns in data set A. (B) Simulation based on the median size of first introns in data set B.

Intron size distributions in different data sets. Boxplot is used to display the logarithmized size distribution of introns in each data set. Species names in black represent mammals, names in red represent reptiles/birds, and names in dark green represent amphibians. (A) Data set A; (B) data set B; and (C) data set C. The influence of greater taxon sampling on the significance of PGLS-based t-tests. We generated four larger phylogenetic trees with more bird species (A03 and A12 derived from data set A and B12 and B23 derived from data set B). Then we used the median size of a specific intron class in each species as node values in a phylogenetic tree and performed PGLS analysis. For newly added bird species, node values were generated by normal distribution (see text for details). To get a hypothetical distribution, this procedure was repeated 5,000 times. In each diagram, the red line denotes the P value from PGLS analysis in the original data set, and the blue and green bars denote the 5,000-time simulation of such P value in two simulated data sets derived from a same original data set. (A) Simulation based on the median size of first introns in data set A. (B) Simulation based on the median size of first introns in data set B.

Volant Species Have Smaller Introns Compared with Nonflying Relatives

We used large-scale data sets to study whether there was relationship between flight and intron size by comparing intron sizes in flying species and nonflying sister lineages in both mammals and birds. In mammals, we compared bats with their sister clade on our consensus phylogenetic tree; here, in data set A, bats were compared with humans and mice, whereas in data set C, bats were compared with horses, cats, and dogs. Figure 2 reveals that in general, flying species have shorter introns than their flightless close relatives. To diminish the influence of correlations imposed by phylogeny, we reconstructed the value for intron lengths in the common ancestor of the two bats and that of their sister group by the ML method. A total of 7,877 of 12,506 (63%) introns in data set A and 69 of 98 (70%) introns in data set C are smaller in the common ancestor of the two bats we studied than in the common ancestor of close mammalian relatives (P < 0.001, fig. 4). In addition, we also used permutation-based tests to exclude the possibility of random effect. For each intron, we permuted the intron size distribution across mammals. Then we counted the number of introns that are smaller in bats, in the same way as described above, repeating this process 1,000 times. We recorded the number of runs that have as many smaller introns in bats as observed in our data (observation). We found that the pattern of a large number of small introns in bats is unlikely to be caused by random effects (P < 0.001 and P = 0.002 for data sets A and C, respectively). In reptiles/birds, comparisons between the three birds (chicken, turkey, and zebra finch) and the green anole were conducted and we observed a similar pattern. As with the mammals, significantly more avian introns are smaller than their anole orthologs (7,552 of 12,506 [60%] introns in data set A, 361 of 562 [64%] introns in data set B, and 59 of 98 [60%] introns in data set C, P < 0.001). Again, permutation tests within Reptilia confirmed the nonrandomness of this pattern (P < 0.001 for all three data sets). Similar results were obtained when using PIC to reconstruct ancestral values for intron length or when using mean size for each group in the comparison. Thus, we found a convergent pattern in mammals and reptiles/birds that flying species have smaller introns than flightless species closely related to them.

Correlation between genome size and intron size. Light-blue lines indicate regression lines derived from normal linear regression model; and brown lines indicate regression lines derived from PGLS model, which accounts for nonindependence among data points. (A) Median size of first introns in data set A; (B) median size of other introns (introns except first introns) in data set A; (C) median size of first introns in data set B; and (D) median size of other introns (introns except first introns) in data set B.

Intron Size Variation Is Correlated with Genome Size Variation

We have shown that mammalian introns are longer than their orthologs in Reptilia. Because previous studies showed that genome size is smaller in avian species compared with other amniotes (Hughes and Hughes 1995; Hughes 1999; Organ et al. 2007), it is interesting to determine whether intron size and genome size are correlated. Because first introns are larger and functionally distinct from other introns, we treated them separately, and data set C was excluded due to the small number of first introns in it. We found a significant correlation between genome size and median intron size (fig. 4a–d). Under the normal LR model, genome size explains 62% and 57% of the variation of first introns in data sets A and B (P < 0.005), and for other introns, genome size explains 58% and 60% of the variation in data sets A and B. Because data points are nonindependent due to shared ancestry, we used the statistical package BayesTraits, which incorporates a Bayesian framework, to account for the phylogenetic signal and build a PGLS model. Again, genome size showed strong correlation with both first introns and other introns and explained 52% and 43% of the variation for the first introns and 57% and 32% for other introns in data sets A and B, respectively (P < 0.05 for all correlations). However, we did not find such correlation between genome size and exon size, presumably because exon size is more conserved than intron size (data not shown). These patterns are consistent with the notion that exons are under strong purifying selection with respect to length because indels are generally deleterious, even when preserving the reading frame. Because most of the genome size variation among amniotes is due to variation in the abundance of REs (Ohno 1970; Cavalier-Smith 1985; Pagel and Johnstone 1992), we also examined whether intron size variation correlates with the proportion of REs among species or, stated differently, whether the proportion of REs is similar between intronic regions and whole genomes among species. Our result showed a significant correlation between genomic and intronic RE proportion (fig. 5, R2 = 0.88 in data set A, R2 = 0.97 in data set B, P < 0.001 for both correlations). These results confirm that intron size and genome size in amniotes are correlated and suggest that REs may be a common driver of both.

Correlations between the proportion of repetitive elements in introns and genomes. Brown lines indicate regression lines from normal linear regression model. (A) Data from data set A and (B) data from data set B.

Discussion

Although the underlying mechanisms are poorly understood, genome size has been shown to be related to various phenotypic traits (Petrov 2001), such as cellular and nuclear sizes (Cavalier-Smith 1982; Gregory and Hebert 1999), the rate of cell division, transcriptional process, and cellular respiration (Kozlowski et al. 2003), duration of mitosis and meiosis (Bennett 1987), weediness in plants (Neal Stewart et al. 2009; Lavergne et al. 2010), embryonic development time (Jockush 1997), morphological complexity in the brains (Roth et al. 1994), and response to CO2 (Jasienski and Bazzaz 1995). It has also been proposed that in warm-blooded amniotes, genome size may be under physiological constraints (Waltari and Edwards 2002), which favor smaller cells and thus larger surface area to volume ratios with an attendant greater ability for gas exchange to maintain a high metabolic rates (Szarski 1983; Hughes and Hughes 1995; Organ et al. 2007). Similarly, small genomes and thus small introns are thought to be favored in volant lineages due to the demands of powered flight (Hughes and Hughes 1995; Hughes 1999), which require high metabolic rates that can be facilitated by small cells with more efficient gas exchange. In support of this claim, several studies found smaller genomes in birds and bats compared with other eutherian mammals (Hughes and Hughes 1995; Van den Bussche et al. 1995), and hummingbirds, which engage in very energy-intensive maneuvers such as hovering flight, have the smallest genomes among birds studied thus far (Gregory et al. 2009). However, Organ et al. (2007) studied the origin of avian genome size by reconstructing ancestral genomes in extant and extinct amniotes and suggested the reduction of genome size occurred along the lineage leading to basal and theropod dinosaurs, long before the origin of birds and powered flight (Organ et al. 2007). Consistent with this pattern, our analysis showed that birds and reptiles together have smaller introns compared with mammals but that within reptiles and mammals, intron size in flighted lineages is smaller than in close relatives that do not fly, suggesting a possible correlation between intron size/genome size and flight ability. Similar to Organ et al. (2007), we suggest that although genome size reduction in reptiles may have occurred before the origin of powered flight in birds and bats, flight nonetheless further reduced genome size in these lineages, leading to further reductions in of intron sizes, likely through biased deletion or ultimately through reduction of cell volume (Johnson 2004). Additional paleogenomics studies have confirmed smaller genomes in other volant reptile lineages, such as pterosaurs (Organ and Shedlock 2009). Although we have found some evidence for a role of flight in reducing intron size in amniotes, it is reasonable to wonder whether the one or two evolutionary events in which these changes took place (on the one or two branches of the trees in our three data sets leading to flight from flightless ancestors) constitute a statistically significant association, given our tree, branch lengths and the distribution of character states among taxa. To investigate this, we ran a simple test of the hypothesis that the binary traits of flight and smaller intron size are significantly associated using BayesTraits (Pagel 1994; Barker and Pagel 2005). In our test, we scored states for both flightless and large introns as “0” and volant and small introns as “1.“ Using the ML mode and leaving all rate parameters between states unconstrained, we found that a model in which flight and small introns were associated was a slightly better explanation of the data than a model in which they were independent in two of three data sets (P = 0.09 in data sets A and B and P = 0.29 in data set C, χ2 test). In the dependent model, the probability that the common ancestors of bats and Zooamata, which comprised the horse–dog–cat clade (Waddell et al. 1999; Benton et al. 2009), or of birds and Anolis arose was flightless and had large introns was surprisingly and perhaps unrealistically small [P(0,0) = 0.1804 or 0.0735 for the Anolis–bird ancestor or the bat–Zooamata ancestor, respectively]. We expect, for example, the ancestor of birds and lizards to have been flightless based on the fossil record. The same was true for the uncorrelated model (P[0] = 0.3946 or 0.1498 for Anolis–bird and bat–Zooamata ancestors). This result may have arisen because the ML estimates of the transition rates from flightless to volant or from large to small introns (rates q12 and q13 in the model) were very small, presumably because the number of transitions from flightless to volant (0→1) was small. To create a more realistic model, we first used the largest data set, data set C, and constrained q12 and q13 to be higher, varying the rate from 10 to 100. Under these scenarios, the probability that the common ancestor at the branch leading to bats or birds arose was flightless and had large introns in the dependent model was higher [P(0,0) = 0.3287 or 0.3076 for q12 = q13 = 100]. In this more realistic case, the difference in log likelihood between the dependent and independent models was even greater (P = 2.5 × 10−5, χ2 test, d.f. = 4) than when transition rates were unconstrained, supporting the hypothesis that just two transitions to flight and small intron size is indeed statistically significant in a likelihood framework. We also confirmed biological intuition by finding that the likelihood of dependent models in which the ancestor of birds and Anolis or bats and Zooamata was forced to be flightless was significantly higher than models in which that ancestor was volant (P = 0.004, χ2 test, d.f. = 2). Additionally, we found that the dependent model in which these ancestors were forced to be flightless with large introns was a much better explanation of the character data than was the independent model (P = 0.0007, χ2 test, d.f. = 4). All these results strongly support a model in which flight and small genomes are correlated, if not related causally, given two origins of powered flight among extant amniotes. This analysis does not include extinct lineages such as pterosaurs, which we now infer to have small genomes (Organ and Shedlock 2009) and could constitute a third origin of the genomic syndrome associated with powered flight. An alternative explanation for genome and intron size variation in amniotes is suggested by theories of neutral processes and their effect on genome architecture (Lynch 2007). For example, Lynch and Conery (2003) studied 43 eukaryotic species and suggested that changes of genome complexity and/or genomic characteristics passively respond to long-term changes in population size. Based on their hypothesis, the contraction of genomes and introns that we observe in birds and bats is the result of their larger effective population sizes relative to close nonflying relatives, thereby allowing selection for smaller genome size to proceed more efficiently than in small populations. However, several lines of evidence suggest that the influence of effective population size in genome/intron size variation might not be enough to explain the pattern we observed in amniotes. First, human and mouse genomes are similar in size (3.5 pg vs. 3.29 pg), but the estimated effective population size of mice is at least 10-fold larger than in humans (Eyre-Walker et al. 2002; Halligan et al. 2010). Second, the majority of estimates of effective population sizes of birds are generally an order of magnitude smaller than 106 (Jennings 2005; Lynch 2007; Lanfear et al. 2010) and are on par with those of rodents (Eyre-Walker et al. 2002; Halligan et al. 2010), but avian genomes are significantly reduced in comparison with rodent genomes. Third, in the work by Lynch and Conery, only two amniotes (H. sapiens and M. musculus) were used in the regression analysis including intron size: this small number could introduce bias, and conclusions based on such a data set cannot easily be extrapolated to amniotes as a whole. Furthermore, in their analysis, the product of effective population size (Ne) and per site mutation rate (μ) is larger in humans than in mice (fig. 1A in their article), which contradicts the well-accepted result that mice have much larger genetic diversities than do humans. Hence, although the effective population size hypothesis may be generally true across broader phylogenetic groups, it does not seem capable of explaining phylogenetically local variation of genome characteristics in amniotes such as we observe here. There are certainly other neutral processes that could explain smaller genomes in birds, such as the fixation of mechanisms that yield a biased spectrum of deletions during replication. Such processes may or may not have fitness effects on lineages that bear them. If, however, smaller genomes do confer a physiological advantage to those lineages, it seems more plausible to us that genome reduction in birds and bats is not a neutral process. Overall, our study demonstrates a complex pattern of intron size evolution suggesting that forces of mutation and natural selection vary among introns within a gene and between species. Although our study is consistent with an influence of powered flight on genome and intron size, additional studies clarifying the mechanism linking these traits are needed. We believe that our understanding of introns will increase with the addition of new amniote genomes, particularly those of reptiles, which are still underrepresented in the databases (Castoe et al. 2011; St John et al. 2012).

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

57 in total

1. Human housekeeping genes are compact.

Authors: Eli Eisenberg; Erez Y Levanon
Journal: Trends Genet Date: 2003-07 Impact factor: 11.639

2. Variation across species in the size of the nuclear genome supports the junk-DNA explanation for the C-value paradox.

Authors: M Pagel; R A Johnstone
Journal: Proc Biol Sci Date: 1992-08-22 Impact factor: 5.349

3. Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: evidence for selectively driven codon usage.

Authors: Jean-Vincent Chamary; Laurence D Hurst
Journal: Mol Biol Evol Date: 2004-03-10 Impact factor: 16.240

4. Ordered disposition of parental genomes and individual chromosomes in reconstructed plant nuclei, and their implications.

Authors: M D Bennett
Journal: Somat Cell Mol Genet Date: 1987-07

5. On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila.

Authors: John Parsch; Sergey Novozhilov; Sarah S Saminadin-Peter; Karen M Wong; Peter Andolfatto
Journal: Mol Biol Evol Date: 2010-02-11 Impact factor: 16.240

6. Speciational history of Australian grass finches (Poephila) inferred from thirty gene trees.

Authors: W Bryan Jennings; Scott V Edwards
Journal: Evolution Date: 2005-09 Impact factor: 3.694

7. Spliced segments at the 5' terminus of adenovirus 2 late mRNA.

Authors: S M Berget; C Moore; P A Sharp
Journal: Proc Natl Acad Sci U S A Date: 1977-08 Impact factor: 11.205

8. Origin of avian genome size and structure in non-avian dinosaurs.

Authors: Chris L Organ; Andrew M Shedlock; Andrew Meade; Mark Pagel; Scott V Edwards
Journal: Nature Date: 2007-03-08 Impact factor: 49.962

9. Ensembl 2011.

Authors: Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Leo Gordon; Maurice Hendrix; Thibaut Hourlier; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Eugene Kulesha; Pontus Larsson; Ian Longden; William McLaren; Bert Overduin; Bethan Pritchard; Harpreet Singh Riat; Daniel Rios; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sobral; Giulietta Spudich; Y Amy Tang; Stephen Trevanion; Jana Vandrovcova; Albert J Vilella; Simon White; Steven P Wilder; Amonida Zadissa; Jorge Zamora; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Richard Durbin; Xosé M Fernández-Suarez; Javier Herrero; Tim J P Hubbard; Anne Parker; Glenn Proctor; Jan Vogel; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2010-11-02 Impact factor: 16.971

10. DNA double-strand break repair and the evolution of intron density.

Authors: Ashley Farlow; Eshwar Meduri; Christian Schlötterer
Journal: Trends Genet Date: 2010-11-22 Impact factor: 11.639

34 in total

1. What's in a genome? The C-value enigma and the evolution of eukaryotic genome content.

Authors: Tyler A Elliott; T Ryan Gregory
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-09-26 Impact factor: 6.237

2. Reverse transcriptase and intron number evolution.

Authors: Kemin Zhou; Alan Kuo; Igor V Grigoriev
Journal: Stem Cell Investig Date: 2014-09-28

3. A bird-like genome from a frog: Mechanisms of genome size reduction in the ornate burrowing frog, Platyplectrum ornatum.

Authors: Sangeet Lamichhaney; Renee Catullo; J Scott Keogh; Simon Clulow; Scott V Edwards; Tariq Ezaz
Journal: Proc Natl Acad Sci U S A Date: 2021-03-16 Impact factor: 11.205

4. Dynamics of genome size evolution in birds and mammals.

Authors: Aurélie Kapusta; Alexander Suh; Cédric Feschotte
Journal: Proc Natl Acad Sci U S A Date: 2017-02-08 Impact factor: 11.205

5. Metabolic 'engines' of flight drive genome size reduction in birds.

Authors: Natalie A Wright; T Ryan Gregory; Christopher C Witt
Journal: Proc Biol Sci Date: 2014-01-29 Impact factor: 5.349

6. The relationship between genome size and metabolic rate in extant vertebrates.

Authors: Jacob D Gardner; Michel Laurin; Chris L Organ
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2020-01-13 Impact factor: 6.237

7. Comparative genomics reveals insights into avian genome evolution and adaptation.

Authors: Guojie Zhang; Cai Li; Qiye Li; Bo Li; Denis M Larkin; Chul Lee; Jay F Storz; Agostinho Antunes; Matthew J Greenwold; Robert W Meredith; Anders Ödeen; Jie Cui; Qi Zhou; Luohao Xu; Hailin Pan; Zongji Wang; Lijun Jin; Pei Zhang; Haofu Hu; Wei Yang; Jiang Hu; Jin Xiao; Zhikai Yang; Yang Liu; Qiaolin Xie; Hao Yu; Jinmin Lian; Ping Wen; Fang Zhang; Hui Li; Yongli Zeng; Zijun Xiong; Shiping Liu; Long Zhou; Zhiyong Huang; Na An; Jie Wang; Qiumei Zheng; Yingqi Xiong; Guangbiao Wang; Bo Wang; Jingjing Wang; Yu Fan; Rute R da Fonseca; Alonzo Alfaro-Núñez; Mikkel Schubert; Ludovic Orlando; Tobias Mourier; Jason T Howard; Ganeshkumar Ganapathy; Andreas Pfenning; Osceola Whitney; Miriam V Rivas; Erina Hara; Julia Smith; Marta Farré; Jitendra Narayan; Gancho Slavov; Michael N Romanov; Rui Borges; João Paulo Machado; Imran Khan; Mark S Springer; John Gatesy; Federico G Hoffmann; Juan C Opazo; Olle Håstad; Roger H Sawyer; Heebal Kim; Kyu-Won Kim; Hyeon Jeong Kim; Seoae Cho; Ning Li; Yinhua Huang; Michael W Bruford; Xiangjiang Zhan; Andrew Dixon; Mads F Bertelsen; Elizabeth Derryberry; Wesley Warren; Richard K Wilson; Shengbin Li; David A Ray; Richard E Green; Stephen J O'Brien; Darren Griffin; Warren E Johnson; David Haussler; Oliver A Ryder; Eske Willerslev; Gary R Graves; Per Alström; Jon Fjeldså; David P Mindell; Scott V Edwards; Edward L Braun; Carsten Rahbek; David W Burt; Peter Houde; Yong Zhang; Huanming Yang; Jian Wang; Erich D Jarvis; M Thomas P Gilbert; Jun Wang
Journal: Science Date: 2014-12-11 Impact factor: 47.728

8. The whale shark genome reveals how genomic and physiological properties scale with body size.

Authors: Jessica A Weber; Seung Gu Park; Victor Luria; Sungwon Jeon; Hak-Min Kim; Yeonsu Jeon; Youngjune Bhak; Je Hun Jun; Sang Wha Kim; Won Hee Hong; Semin Lee; Yun Sung Cho; Amir Karger; John W Cain; Andrea Manica; Soonok Kim; Jae-Hoon Kim; Jeremy S Edwards; Jong Bhak; George M Church
Journal: Proc Natl Acad Sci U S A Date: 2020-08-04 Impact factor: 11.205

9. Genome size and lifestyle in gnesiotrochan rotifers.

Authors: Patrick D Brown; Elizabeth J Walsh
Journal: Hydrobiologia Date: 2019-01-03 Impact factor: 2.694

10. Aberrantly spliced HTT, a new player in Huntington's disease pathogenesis.

Authors: Theresa A Gipson; Andreas Neueder; Nancy S Wexler; Gillian P Bates; David Housman
Journal: RNA Biol Date: 2013-10-11 Impact factor: 4.652