Literature DB >> 20333198

Nucleotide substitution bias within the genus Drosophila affects the pattern of proteome evolution.

Mihai Albu¹, Xiang Jia Min, G Brian Golding, Donal Hickey.

Abstract

The availability of complete genome sequences for 12 Drosophila species provides an unprecedented resource for large-scale studies of genome evolution. In this study, we looked for correlated shifts in the patterns of genome and proteome evolution within the genus Drosophila. Specifically, we asked if the nucleotide composition of the Drosophila willistoni genome--which is significantly less GC rich than the other 11 sequenced Drosophila genomes--is reflected in an altered pattern of amino acid substitutions in the encoded proteins. Our results show that this is indeed the case: There are large and highly significant asymmetries in the patterns of amino acid substitution between D. willistoni and Drosophila melanogaster, and they are in the direction predicted by the nucleotide biases. The implication of this result, combined with previous studies on long-term proteome evolution, is that substitutional biases at the DNA level can be a major factor in determining both the long-term and the short-term directions of proteome evolution.

Entities: Chemical Disease Species

Keywords: GC content; amino acid composition; nucleotide content

Year: 2009 PMID： 20333198 PMCID： PMC2817423 DOI： 10.1093/gbe/evp028

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Molecular sequence data have provided major insights into the process of biological evolution. Essentially, the positive correlation between levels of sequence divergence and the time since the existence of a common ancestor allows us to use sequence divergence as a “molecular clock” (King and Jukes 1969; Zuckerkandl 1972). Many studies have demonstrated, however, that this is not a simple clock; there are many factors in addition to the age of the common ancestor that affect the rate of sequence divergence (Roger and Hug 2006). One obvious complicating factor is natural selection, which can act to either constrain sequence change in order to conserve biological function or can accelerate change in cases of positive selection (Yang 1998; Yang and Nielsen 2002). In addition to natural selection, variations in the rate and direction of both mutation and DNA repair can also have a major impact on the patterns of sequence divergence. For example, it has been shown that molecular phylogenies of eukaryotes based on ribosomal RNA sequences are affected by biases in the nucleotide composition of those sequences (Hasegawa and Hashimoto 1993). Subsequently, many studies have shown that nucleotide bias—usually expressed as GC content—is a pervasive phenomenon (Sueoka 1992), and a host of sophisticated statistical techniques have been developed to minimize its effects on phylogenetic reconstruction (Lockhart et al. 1994; Van Den Bussche et al. 1998; Wang et al. 2008). One approach to avoid the problem of biased nucleotide content has been to construct phylogenies based on the encoded protein sequences rather than on the DNA sequences themselves (Hashimoto et al. 1994). This reduces the problem because the most extreme compositional bias is observed at synonymous sites that do not affect the amino acid sequence. Nevertheless, the problem still persists for protein-based phylogenies because compositionally biased DNA sequences encode biased amino acid sequences (Lobry 1997; Foster et al. 1997; Singer and Hickey 2000; Knight et al. 2001; Wang et al. 2004). The problem is especially troublesome in the case of genome-wide compositional biases because, in these cases, adding more data simply compounds the problem (Foster and Hickey 1999; Leigh et al. 2008; Wang et al. 2008). Previous studies of compositional bias have involved comparisons of widely diverged lineages. This is because more closely related organisms tend to have, on average, more similar nucleotide and amino acid compositions. However, the extensive genomic data that are now available for the genus Drosophila (Ashburner 2007; Drosophila 12 Genomes Consortium 2007) provide us with the possibility of looking at broadscale patterns of nucleotide and protein evolution over a relatively short evolutionary period. Specifically, in this case, there are different species within the same genus that show distinctly different nucleotide compositions. This allows us to look at the shorter term evolutionary effects of substitution biases, both at the DNA and protein levels. In other words, we have focused particularly on the minority of sites where evolutionary change has happened between related sequences, that is, the nonconserved sites.

Materials and Methods

Data Collection

We downloaded the complete set of aligned protein-coding DNA sequences from all 12 Drosophila species from Fly Base Genome project (ftp://ftp.flybase.net). From this set, we extracted only the aligned sequences for Drosophila melanogaster and Drosophila willistoni. Out of the 9,850 files with paired sequences from the two species, we generated a nonredundant gene set by removing genes that have two or more copies encoding identical (100%) protein sequences (only one was chosen). We also tried to avoid possible alignment errors by removing gene pairs that showed a large number of multiple consecutive changes. After this filtering of the data, we obtained 7,780 gene pairs. The aligned sequences were then scanned for gaps, and we removed codons from gapped regions in either species. This resulted in a total gap-free alignment of 11,290,860 bases. The aligned sequences were then compared site by site for both nucleotide and amino acid substitutions.

Data Analysis

All statistical tests were performed using the R statistical package (http://www.R-project.org). Kolmogorov–Smirnov tests (KS tests, Marsaglia et al. 2003) were used to detect significant differences between the aligned D. melanogaster and D. willistoni DNA and protein sequences. These tests were applied to both the concatenated genome sequences and the collection of individual gene sequences. A computer program (available upon request) was implemented to allow analysis of conserved and variable sites. The software scans the reading frames of all gene sequence files and counts conserved and nonconserved nucleotide sites. The program then produces a 64-by-64 codon substitution matrix, with each row corresponding to the occurrences in D. willistoni and each column to the occurrences in D. melanogaster, respectively (supplementary table S1, Supplementary Material online). In silico translation of this codon matrix was used to produce a 20-by-20 amino acid substitution matrix (supplementary table S2, Supplementary Material online). These matrices were then used to extract information about the overall patterns of nucleotide and amino acid substitutions (see Results). The nonconserved sites in the DNA alignments were subdivided into synonymous and nonsynonymous substitutions. The latter—which result in amino acid substitutions—were further subdivided into those that alter the number of amino acids encoded by GC-rich or AT-rich codons. Codons were classified as being GC rich, GC neutral, or GC poor according to the classification used previously (Foster et al. 1997; Singer and Hickey 2000).

Results

First, we compared the nucleotide composition of the D. willistoni genome with that of the extensively studied D. melanogaster. Our final DNA sequence data set contains approximately 11.3 million bp from each of these two species. When the sequences are aligned, there are a total of 3,157,787 nonconserved nucleotides (27.9%). It is already known that the genome of D. willistoni has a lower GC content than that of other Drosophila species such as D. melanogaster (Drosophila 12 Genomes Consortium 2007; Vicario et al. 2007). But since there is greater than 70% sequence identity between homologous coding sequences from these two species and since the identical sites necessarily have identical GC contents, the global difference in GC content underestimates the differences at those sites where nucleotide divergence has occurred. This effect is shown in figure 1. Although we see a marked difference in nucleotide content between the two species (fig. 1), this difference becomes much greater when we consider the variable sites only (fig. 1). From this figure, we also see that the reduction in GC content in the D. willistoni genome involves both G and C nucleotides, with a concomitant increase in both A and T nucleotides. These differences in GC content at the variable sites are statistically highly significant (KS test; Marsaglia et al. 2003; D = 0.8913, P ≪ 0.00001). The fact that the GC content of the D. willistoni genome is significantly lower than the average of the other 11 species strongly suggests that there has been a reduction in the GC content of the D. willistoni genome rather than an increase in the other 11 genomes. We confirmed the direction of the change by comparing with the GC content at 4-fold degenerate sites in D. willistoni with both the other eight species within the subgenus Sophophora and with the three outgroup species that fall within the subgenus Drosophila (Drosophila virilis, Drosophila grimshawi, and Drosophila mojavensis). The GC content of D. willistoni at these sites (51%) is significantly lower (P < 0.0001) than the average of the other Sophophora species (68%; see Vicario et al. 2007). It is also significantly lower (P < 0.01) than the average value for the three outgroup species (64.4%). Thus, it is reasonable to conclude that there has been a reduction in GC content within the D. willistoni genome rather than an increase in the other 11 genomes. Because the results reported by Vicario et al. (2007) are based on all coding sequences—and not just on aligned sequences as we used here—we double checked that this did not bias the results. Specifically, we aligned the sequences of the outgroup species, D. virilis, with D. melanogaster and D. willistoni, and we then calculated the GC content at the third codon position of the aligned sequences. The result, 64% GC, is entirely consistent with the value of 64.4% reported by Vicario et al. (2007) for the average of the three outgroup species. In addition to using the outgroup comparison, there is a more direct method for inferring the GC content of the common ancestor of D. melanogaster and D. willistoni; that is to calculate the GC content of the conserved sites, that is, those sites which have remained unchanged since the time of species divergence. The GC content of the third codon position at conserved sites is 65%, which is close to the value of 68% GC at the variable sites in D. melanogaster; more important, it is much higher than the value of 34% GC at the variable sites in D. willistoni. This provides further confirmation that the trend has been toward a reduction in GC content in the D. willistoni lineage since its divergence from D. melanogaster.

Differences in nucleotide content between the coding sequences of D. melanogaster and D. willistoni. Panel (A) shows a comparison of the frequency of each nucleotide in both species based on all aligned nucleotide sites (conserved and nonconserved). The results for D. melanogaster are shown in red and those for D. willistoni are shown in blue. Panel (B) shows the same comparison as in Panel but limited to the nonconserved, that is, variable, sites only. These frequency differences between the two species were highly significant (P ≪ 0.00001). We then investigated the distribution of the interspecific nucleotide changes among the three codon positions (see fig. 2). The results are again highly significant for each of the three codon positions (first codon position D = 0.7049, second codon position D = 0.2582, third codon position D = 0.9613, and P ≪ 0.00001 in all three cases). As expected, the majority of the changes occur at the largely synonymous third codon position (supplementary fig. S1, Supplementary Material online), and greatest degree of nucleotide bias is also seen at this position (fig. 2; supplementary fig. S1, Supplementary Material online). If we focus on 4-fold degenerate codons only, we see that the trend is highly consistent among the five codon groups (see supplementary fig. S2, Supplementary Material online); A and T ending codons are generally more frequent in D. willistoni, whereas G and C ending codons are more in D. melanogaster. A less expected finding was that a significant difference in GC content occurs at the second codon position (fig. 2). Because changes at the second codon position lead to changes in the amino acid sequence, this led us to predict that the differences in GC content would be reflected in differences in the amino acid contents of the encoded proteins, especially at the nonconserved sites.

GC content at each of the three codon positions. The results for D. melanogaster are shown in red and those for D. willistoni are shown in blue. These data are based on the nucleotide frequencies at variable sites (see fig. 1). As can be seen from this figure, D. melanogaster has a higher GC content at each of the three codon positions than does D. willistoni. The absolute numbers of GC nucleotide pairs at each position are shown in supplementary figure S1 (Supplementary Material online). In order to assess the effect of nucleotide bias on amino acid substitutions, the amino acids were categorized into three groups: 1) those encoded by GC-rich codons—Glycine, Alanine, Arginine, and Proline; 2) those encoded by GC-poor codons—Phenylalanine, Tyrosine, Methionine, Isoleucine, Asparagine, and Lysine; and 3) those encoded by GC-neutral codons—Serine, Aspartate, Glutamate, Valine, Threonine, Leucine, Histidine, Cysteine, Tryptophan, and Glutamine. Figure 3 shows a comparison of the numbers of the first category (G, A, R, P) at the nonconserved sites between the two proteomes. The D. willistoni proteome has 31,959 fewer of these amino acids than the homologous sequences from D. melanogaster (see table 1). Not only are there differences between the two species when we group these four amino acids into a single category but also the same trend is seen for each of the four amino acids separately (fig. 3). A similar, but opposite trend is seen for the amino acids encoded by GC-poor codons (see supplementary fig. S3, Supplementary Material online). The asymmetries in the amino acid substitution matrix are summarized qualitatively in figure 4. From this figure, it is clear that there is pervasive tendency for the D. willistoni proteome to lose amino acids encoded by GC-rich codons and to gain amino acids encoded by GC-poor codons. Out of the 780,000 (approximately) amino acid substitutions between the aligned proteome sequences, there are 272,000 substitutions in the direction predicted by the nucleotide bias and 221,000 in the opposite direction—a difference of more than 50,000 amino acid substitutions (see table 1). This difference is highly significant (P ≪ 0.00001).

Interspecific differences in amino acid content of homologous protein sequences at nonconserved sites. Panel (A): number of amino acids encoded by GC-rich codons in each of the two species. In this Panel, all four of the amino acids that encoded by GC-rich codons (G, A, R, P) are grouped together. The number in D. melanogaster is shown by a red bar and the number in D. willistoni by a blue bar. Panel (B): this panel shows the data for each of the four amino acids—Gly, Ala, Arg, and Pro—separately. The color coding is the same as in Panel (A).

Table 1

Amino Acid Substitution Matrix between D. melanogaster and D. willistoni Homologous Sequences

	GARP	CDEHLQSTVW	FYMINK	Totals (D. willistoni)
GARP	41,338	96,162	26,376	163,876
CDEHLQSTVW	117,200	207,916	98,625	423,741
FYMINK	37,297	117,613	36,847	191,757
Totals (D. melanogaster)	195,835	421,691	161,848	779,374

NOTE.—This summary table contains the amino acids at variable sites only. The amino acids are grouped into those encoded by GC-rich codons (G, A, R, and P), those encoded by GC-neutral codons (C, D, E, H, L, Q, S, T, V, and W), and those encoded by GC-poor codons (F, Y, M, I, N, and K).

Biased patterns of amino acid substitution between D. melanogaster and D. willistoni protein sequences. We constructed an amino acid substitution matrix between the two species (see supplementary table S2, Supplementary Material online). Differences between the upper and lower diagonals were then color coded as follows to illustrate the asymmetry in the matrix. Differences of 250 or greater are shown in red; differences between 50 and 250 are shown in orange; and differences less than 50 are uncolored. Similarly, large negative values are shown in dark blue and intermediate negative values in light blue.

Amino Acid Substitution Matrix between D. melanogaster and D. willistoni Homologous Sequences NOTE.—This summary table contains the amino acids at variable sites only. The amino acids are grouped into those encoded by GC-rich codons (G, A, R, and P), those encoded by GC-neutral codons (C, D, E, H, L, Q, S, T, V, and W), and those encoded by GC-poor codons (F, Y, M, I, N, and K). Interspecific differences in amino acid content of homologous protein sequences at nonconserved sites. Panel (A): number of amino acids encoded by GC-rich codons in each of the two species. In this Panel, all four of the amino acids that encoded by GC-rich codons (G, A, R, P) are grouped together. The number in D. melanogaster is shown by a red bar and the number in D. willistoni by a blue bar. Panel (B): this panel shows the data for each of the four amino acids—Gly, Ala, Arg, and Pro—separately. The color coding is the same as in Panel (A). Biased patterns of amino acid substitution between D. melanogaster and D. willistoni protein sequences. We constructed an amino acid substitution matrix between the two species (see supplementary table S2, Supplementary Material online). Differences between the upper and lower diagonals were then color coded as follows to illustrate the asymmetry in the matrix. Differences of 250 or greater are shown in red; differences between 50 and 250 are shown in orange; and differences less than 50 are uncolored. Similarly, large negative values are shown in dark blue and intermediate negative values in light blue.

Discussion

Our results show that the difference in GC content between the D. willistoni and D. melanogaster genomes is reflected in a bias in the amino acid substitution pattern of their proteomes. Of course, one could ask if this correlation between nucleotide bias and amino acid bias was due to selection for certain amino acids at the protein level, rather than a substitution bias at the DNA level. If we look at nucleotide changes at 4-fold degenerate synonymous sites, we can resolve this question because selection at the protein level would not affect these sites. At these sites, the nucleotide difference is even more marked—68% GC in D. melanogaster and 51% GC in D. willistoni. Moreover, the nucleotide bias affects all five 4-fold synonymous groups (see supplementary fig. S2, Supplementary Material online). This is also true for the 6-fold degenerate codons, for example, Arginine codons (see supplementary fig. 4, Supplementary Material online). Moreover, the same nucleotide bias is also seen in noncoding regions. This can be illustrated by comparing the average GC content of introns within the D. willistoni genome (35% GC) with the average for the other eight species within the Sophophora subgenus (42% GC); this difference in the nucleotide content of introns is also statistically significant (P < 0.0001). Thus, there is an underlying and pervasive DNA substitution bias that affects all nucleotide sites; the effect at synonymous sites is dramatic, whereas, at nonsynonymous sites, the effect is less dramatic but it is still highly significant. Our study focused on the evolutionary effects of nucleotide bias rather than on the molecular causes of these biases. It is generally agreed that the nucleotide bias is the result of an interplay between AT-biased mutation and GC-biased DNA repair (Brown and Jiricny 1988). Gene conversion, which involves heteroduplex repair, has been shown to result in increased GC content, both in Drosophila (Hickey et al. 1991) and in mammals (Galtier 2003). Over the course of evolution, there is a shifting balance between mutation and repair, resulting in fluctuating GC content that can be modeled as a Brownian motion process (Haywood-Farmer and Otto 2003). In the case of the D. willistoni genome, the decreased GC content could be explained by some combination of increased AT-biased mutation and/or decreased levels of GC-biased repair. DNA substitution biases, if they persist for a long periods of evolutionary time, can have profound effects on the overall composition of both genomes and proteomes (Lobry 1997; Foster et al. 1997; Singer and Hickey 2000; Knight et al. 2001; Wang et al. 2004). At the early stages of the process, however, the cumulative effect is not so obvious because the majority of sites have not yet undergone a substitution. Thus, a simple calculation of overall GC content and amino acid composition does not reflect the amount of bias that is actually occurring at the sites undergoing substitution. A more accurate estimate is obtained if one calculates the nucleotide contents at the sites that have undergone substitution, as we have done in this study. Then it becomes clear that the effect is very pronounced, even in the short term. Although the D. willistoni genome has been losing GC-rich codons and gaining AT-rich codons, this has not occurred through a direct substitution of GC-rich codons with AT-rich codons. Instead, it occurs by a two-step process whereby GC-rich codons become GC-neutral and GC-neutral codons become GC-poor (i.e., AT-rich). For example, if we look at the amino acid substitutions involving the abundant GC-neutral Serine codons (see supplementary table S2, Supplementary Material online), we see that D. willistoni gains 37,514 Serine codons from the GC-rich codons (encoding amino acids G, A, R, and P) of D. melanogaster, whereas it loses only 31,156 Serine codons to the same class—a difference of more than 6,000 codons. In other words, GC-neutral codons such as those encoding Serine act as an intermediate, “flow through” step in the biased transformation of the amino acid composition of the D. willistoni proteome. Although the nucleotide bias affects all codons equally, the countervailing selective pressure at the protein level varies depending on the encoded amino acid (see Urbina et al. 2006). We can see evidence for these differential selective constraints in our data also. For example, there are relatively few substitutions involving the highly conserved amino acids Cysteine and Tryptophan (see supplementary table S1, Supplementary Material online). On the other hand, there are many substitutions involving biochemically similar amino acid pairs such as Lysine and Arginine, and these substitutions are asymmetric, consistent with the nucleotide bias. For example, the D. willistoni genome has gained approximately 500 more Lysines (encoded by AT-rich codons) from Arginine codons than it has lost. As expected, there are also many substitutions between the biochemically similar Isoleucine and Valine residues, but there is an approximately 7,000 excess Valine-to-Isoleucine changes from the D. melanogaster sequences to the D. willistoni sequences (see supplementary table S1, Supplementary Material online). This excess is expected because Isoleucine is encoded by more AT-rich codons than Valine. All the four amino acids that are encoded by GC-rich codons follow the predicted trend (see fig. 3). Although the AT-rich group as a whole follows the predicted trend, this does not apply to all six of the amino acids when scored individually (see supplementary fig. 3, Supplementary Material online). For example, Methionine (M) is not enriched at the variable sites in D. willistoni and Phenylalanine (F) appears to counter the prediction. This counterintuitive result can be explained, however, by a more detailed look at the codon substitution table (supplementary table S1, Supplementary Material online). We see that there is a tendency for the ATG Methionine codons to be converted into even more AT-rich Isoleucine codon, ATA. Likewise, the deficiency of Phenylalanine codons can be explained by the fact that Phenylalanine is also converted into even more AT-rich codons. For example, there are only 565 substitutions of the TAT codon (encoding Tyrosine) by TTC (encoding Phenylalanine), but there are 2,587 substitutions in the opposite direction. In other words, the deficiency in Phenylalanine codons in D. willistoni is not because they have mutated to more GC-rich codons (which would be against the prediction) but because the TTC codons been substituted by even more AT-rich codons such as TAT. In summary, our results show that substitution biases can affect protein evolution and that the direction of such biases can change relatively rapidly over the course of evolution (within the genus Drosophila in this case). An important practical implication of our work is that substitution bias between related sequences may not be evident when one simply compares the nucleotide or amino acid composition of the entire sequences. This is because the majority of the sites, which are by definition invariant in closely related sequences, tend to mask the differences at the variant sites. It is necessary to look specifically at the variant sites in order to get an accurate estimate of the amount of bias.

Funding

This work was supported by Discovery Grant from the Natural Sciences and Engineering Research Council of Canada [8516 to D.A.H.].

Supplementary Material

Supplementary figures S1–S3, 3, and 4 and tables S1 and S2 are available at Genome Biology and Evolution online (http://www.oxfordjournals.org/our_journals/gbe/).

26 in total

1. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions.

Authors: P G Foster; D A Hickey
Journal: J Mol Evol Date: 1999-03 Impact factor: 2.395

2. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages.

Authors: Ziheng Yang; Rasmus Nielsen
Journal: Mol Biol Evol Date: 2002-06 Impact factor: 16.240

Review 3. Base compositional bias and phylogenetic analyses: a test of the "flying DNA" hypothesis.

Authors: R A Van Den Bussche; R J Baker; J P Huelsenbeck; D M Hillis
Journal: Mol Phylogenet Evol Date: 1998-12 Impact factor: 4.286

Review 4. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation.

Authors: Andrew J Roger; Laura A Hug
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2006-06-29 Impact factor: 6.237

5. The response of amino acid frequencies to directional mutation pressure in mitochondrial genome sequences is related to the physical properties of the amino acids and to the structure of the genetic code.

Authors: Daniel Urbina; Bin Tang; Paul G Higgs
Journal: J Mol Evol Date: 2006-02-13 Impact factor: 2.395

6. Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species.

Authors: J R Lobry
Journal: Gene Date: 1997-12-31 Impact factor: 3.688

7. Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria.

Authors: P G Foster; L S Jermiin; D A Hickey
Journal: J Mol Evol Date: 1997-03 Impact factor: 2.395

8. Ribosomal RNA trees misleading?

Authors: M Hasegawa; T Hashimoto
Journal: Nature Date: 1993-01-07 Impact factor: 49.962

9. Codon usage in twelve species of Drosophila.

Authors: Saverio Vicario; Etsuko N Moriyama; Jeffrey R Powell
Journal: BMC Evol Biol Date: 2007-11-15 Impact factor: 3.260

10. Evolution of genes and genomes on the Drosophila phylogeny.

Authors: Andrew G Clark; Michael B Eisen; Douglas R Smith; Casey M Bergman; Brian Oliver; Therese A Markow; Thomas C Kaufman; Manolis Kellis; William Gelbart; Venky N Iyer; Daniel A Pollard; Timothy B Sackton; Amanda M Larracuente; Nadia D Singh; Jose P Abad; Dawn N Abt; Boris Adryan; Montserrat Aguade; Hiroshi Akashi; Wyatt W Anderson; Charles F Aquadro; David H Ardell; Roman Arguello; Carlo G Artieri; Daniel A Barbash; Daniel Barker; Paolo Barsanti; Phil Batterham; Serafim Batzoglou; Dave Begun; Arjun Bhutkar; Enrico Blanco; Stephanie A Bosak; Robert K Bradley; Adrianne D Brand; Michael R Brent; Angela N Brooks; Randall H Brown; Roger K Butlin; Corrado Caggese; Brian R Calvi; A Bernardo de Carvalho; Anat Caspi; Sergio Castrezana; Susan E Celniker; Jean L Chang; Charles Chapple; Sourav Chatterji; Asif Chinwalla; Alberto Civetta; Sandra W Clifton; Josep M Comeron; James C Costello; Jerry A Coyne; Jennifer Daub; Robert G David; Arthur L Delcher; Kim Delehaunty; Chuong B Do; Heather Ebling; Kevin Edwards; Thomas Eickbush; Jay D Evans; Alan Filipski; Sven Findeiss; Eva Freyhult; Lucinda Fulton; Robert Fulton; Ana C L Garcia; Anastasia Gardiner; David A Garfield; Barry E Garvin; Greg Gibson; Don Gilbert; Sante Gnerre; Jennifer Godfrey; Robert Good; Valer Gotea; Brenton Gravely; Anthony J Greenberg; Sam Griffiths-Jones; Samuel Gross; Roderic Guigo; Erik A Gustafson; Wilfried Haerty; Matthew W Hahn; Daniel L Halligan; Aaron L Halpern; Gillian M Halter; Mira V Han; Andreas Heger; LaDeana Hillier; Angie S Hinrichs; Ian Holmes; Roger A Hoskins; Melissa J Hubisz; Dan Hultmark; Melanie A Huntley; David B Jaffe; Santosh Jagadeeshan; William R Jeck; Justin Johnson; Corbin D Jones; William C Jordan; Gary H Karpen; Eiko Kataoka; Peter D Keightley; Pouya Kheradpour; Ewen F Kirkness; Leonardo B Koerich; Karsten Kristiansen; Dave Kudrna; Rob J Kulathinal; Sudhir Kumar; Roberta Kwok; Eric Lander; Charles H Langley; Richard Lapoint; Brian P Lazzaro; So-Jeong Lee; Lisa Levesque; Ruiqiang Li; Chiao-Feng Lin; Michael F Lin; Kerstin Lindblad-Toh; Ana Llopart; Manyuan Long; Lloyd Low; Elena Lozovsky; Jian Lu; Meizhong Luo; Carlos A Machado; Wojciech Makalowski; Mar Marzo; Muneo Matsuda; Luciano Matzkin; Bryant McAllister; Carolyn S McBride; Brendan McKernan; Kevin McKernan; Maria Mendez-Lago; Patrick Minx; Michael U Mollenhauer; Kristi Montooth; Stephen M Mount; Xu Mu; Eugene Myers; Barbara Negre; Stuart Newfeld; Rasmus Nielsen; Mohamed A F Noor; Patrick O'Grady; Lior Pachter; Montserrat Papaceit; Matthew J Parisi; Michael Parisi; Leopold Parts; Jakob S Pedersen; Graziano Pesole; Adam M Phillippy; Chris P Ponting; Mihai Pop; Damiano Porcelli; Jeffrey R Powell; Sonja Prohaska; Kim Pruitt; Marta Puig; Hadi Quesneville; Kristipati Ravi Ram; David Rand; Matthew D Rasmussen; Laura K Reed; Robert Reenan; Amy Reily; Karin A Remington; Tania T Rieger; Michael G Ritchie; Charles Robin; Yu-Hui Rogers; Claudia Rohde; Julio Rozas; Marc J Rubenfield; Alfredo Ruiz; Susan Russo; Steven L Salzberg; Alejandro Sanchez-Gracia; David J Saranga; Hajime Sato; Stephen W Schaeffer; Michael C Schatz; Todd Schlenke; Russell Schwartz; Carmen Segarra; Rama S Singh; Laura Sirot; Marina Sirota; Nicholas B Sisneros; Chris D Smith; Temple F Smith; John Spieth; Deborah E Stage; Alexander Stark; Wolfgang Stephan; Robert L Strausberg; Sebastian Strempel; David Sturgill; Granger Sutton; Granger G Sutton; Wei Tao; Sarah Teichmann; Yoshiko N Tobari; Yoshihiko Tomimura; Jason M Tsolas; Vera L S Valente; Eli Venter; J Craig Venter; Saverio Vicario; Filipe G Vieira; Albert J Vilella; Alfredo Villasante; Brian Walenz; Jun Wang; Marvin Wasserman; Thomas Watts; Derek Wilson; Richard K Wilson; Rod A Wing; Mariana F Wolfner; Alex Wong; Gane Ka-Shu Wong; Chung-I Wu; Gabriel Wu; Daisuke Yamamoto; Hsiao-Pei Yang; Shiaw-Pyng Yang; James A Yorke; Kiyohito Yoshida; Evgeny Zdobnov; Peili Zhang; Yu Zhang; Aleksey V Zimin; Jennifer Baldwin; Amr Abdouelleil; Jamal Abdulkadir; Adal Abebe; Brikti Abera; Justin Abreu; St Christophe Acer; Lynne Aftuck; Allen Alexander; Peter An; Erica Anderson; Scott Anderson; Harindra Arachi; Marc Azer; Pasang Bachantsang; Andrew Barry; Tashi Bayul; Aaron Berlin; Daniel Bessette; Toby Bloom; Jason Blye; Leonid Boguslavskiy; Claude Bonnet; Boris Boukhgalter; Imane Bourzgui; Adam Brown; Patrick Cahill; Sheridon Channer; Yama Cheshatsang; Lisa Chuda; Mieke Citroen; Alville Collymore; Patrick Cooke; Maura Costello; Katie D'Aco; Riza Daza; Georgius De Haan; Stuart DeGray; Christina DeMaso; Norbu Dhargay; Kimberly Dooley; Erin Dooley; Missole Doricent; Passang Dorje; Kunsang Dorjee; Alan Dupes; Richard Elong; Jill Falk; Abderrahim Farina; Susan Faro; Diallo Ferguson; Sheila Fisher; Chelsea D Foley; Alicia Franke; Dennis Friedrich; Loryn Gadbois; Gary Gearin; Christina R Gearin; Georgia Giannoukos; Tina Goode; Joseph Graham; Edward Grandbois; Sharleen Grewal; Kunsang Gyaltsen; Nabil Hafez; Birhane Hagos; Jennifer Hall; Charlotte Henson; Andrew Hollinger; Tracey Honan; Monika D Huard; Leanne Hughes; Brian Hurhula; M Erii Husby; Asha Kamat; Ben Kanga; Seva Kashin; Dmitry Khazanovich; Peter Kisner; Krista Lance; Marcia Lara; William Lee; Niall Lennon; Frances Letendre; Rosie LeVine; Alex Lipovsky; Xiaohong Liu; Jinlei Liu; Shangtao Liu; Tashi Lokyitsang; Yeshi Lokyitsang; Rakela Lubonja; Annie Lui; Pen MacDonald; Vasilia Magnisalis; Kebede Maru; Charles Matthews; William McCusker; Susan McDonough; Teena Mehta; James Meldrim; Louis Meneus; Oana Mihai; Atanas Mihalev; Tanya Mihova; Rachel Mittelman; Valentine Mlenga; Anna Montmayeur; Leonidas Mulrain; Adam Navidi; Jerome Naylor; Tamrat Negash; Thu Nguyen; Nga Nguyen; Robert Nicol; Choe Norbu; Nyima Norbu; Nathaniel Novod; Barry O'Neill; Sahal Osman; Eva Markiewicz; Otero L Oyono; Christopher Patti; Pema Phunkhang; Fritz Pierre; Margaret Priest; Sujaa Raghuraman; Filip Rege; Rebecca Reyes; Cecil Rise; Peter Rogov; Keenan Ross; Elizabeth Ryan; Sampath Settipalli; Terry Shea; Ngawang Sherpa; Lu Shi; Diana Shih; Todd Sparrow; Jessica Spaulding; John Stalker; Nicole Stange-Thomann; Sharon Stavropoulos; Catherine Stone; Christopher Strader; Senait Tesfaye; Talene Thomson; Yama Thoulutsang; Dawa Thoulutsang; Kerri Topham; Ira Topping; Tsamla Tsamla; Helen Vassiliev; Andy Vo; Tsering Wangchuk; Tsering Wangdi; Michael Weiand; Jane Wilkinson; Adam Wilson; Shailendra Yadav; Geneva Young; Qing Yu; Lisa Zembek; Danni Zhong; Andrew Zimmer; Zac Zwirko; David B Jaffe; Pablo Alvarez; Will Brockman; Jonathan Butler; CheeWhye Chin; Sante Gnerre; Manfred Grabherr; Michael Kleber; Evan Mauceli; Iain MacCallum
Journal: Nature Date: 2007-11-08 Impact factor: 49.962

4 in total

1. Modeling compositional dynamics based on GC and purine contents of protein-coding sequences.

Authors: Zhang Zhang; Jun Yu
Journal: Biol Direct Date: 2010-11-08 Impact factor: 4.540

2. GRASP [Genomic Resource Access for Stoichioproteomics]: comparative explorations of the atomic content of 12 Drosophila proteomes.

Authors: James D J Gilbert; Claudia Acquisti; Holly M Martinson; James J Elser; Sudhir Kumar; William F Fagan
Journal: BMC Genomics Date: 2013-09-04 Impact factor: 3.969

3. Phylogenetic analysis of mutational robustness based on codon usage supports that the standard genetic code does not prefer extreme environments.

Authors: Ádám Radványi; Ádám Kun
Journal: Sci Rep Date: 2021-05-26 Impact factor: 4.379

4. GC bias lead to increased small amino acids and random coils of proteins in cold-water fishes.

Authors: Dongsheng Zhang; Peng Hu; Taigang Liu; Jian Wang; Shouwen Jiang; Qianghua Xu; Liangbiao Chen
Journal: BMC Genomics Date: 2018-05-02 Impact factor: 3.969

4 in total