Literature DB >> 25408848

New Approaches to Comparative and Animal Stress Biology Research in the Post-genomic Era: A Contextual Overview.

Abstract

Although much is known about the physiological responses of many environmental stresses in tolerant animals, studies evaluating the regulation of stress-induced mechanisms that regulate the transitions to and from this state are beginning to explore new and fascinating areas of molecular research. Current findings have developed a general, but refined, view of the important molecular pathways contributing to stress-survival. However, studies utilizing newly developed technologies that broadly focus on genomic and proteomic screening are beginning to identify many new targets for future study. This minireview will provide a contextual overview on the use of DNA/RNA sequencing, microRNA annotation and prediction software, protein structure and function prediction tools, as well as methods of high-throughput protein expression analysis. We will also use select examples to highlight the existing use of these technologies in stress biology research. Such tools can be used in comparative stress biology in the characterization of animal responses to environmental challenges. Although there are many areas of study left to be explored, research in comparative stress biology will always be continuing as new technologies allow the further analysis of cell function, and new paradigms in gene regulation and regulatory molecules (such as microRNAs) are continuing to be discovered. Building upon the findings of past research, while utilizing new technologies in the appropriate manner, future studies can be carried out in new and exciting areas still unexplored. Proper use of rapidly developing technologies will help to create a complete understanding of the animal stress response and survival mechanisms utilized by many diverse organisms.

Entities: CellLine Chemical Disease Gene Species

Keywords: Microarray; Multiplex; Protein structure; RNA sequencing; Stress biology

Year: 2014 PMID： 25408848 PMCID： PMC4232569 DOI： 10.1016/j.csbj.2014.09.006

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Currently, the field of comparative animal biology is at the beginning of a large expansion of experimental knowledge as studies start to utilize new high-throughput technologies. To date, research has discovered much about the physiological responses of many tolerant animals to environmental stress [1-7], however new studies are broadly focused on genomic and proteomic screening and have been identifying many new targets for future study [8-10]. Many of these technologies should be intriguing to the comparative stress biologist, who now has available technology to assess the global expression of nearly all genes and proteins that contribute to survival in stress-tolerant animals [11]. Although there are many areas of study left to be explored, research in comparative and animal stress biology will always be continuing with the advancement of technologies that allow new insight into cell function, and new paradigms in gene regulation and regulatory molecules (such as microRNAs) are continuing to be discovered. Proper knowledge and use of genomic and proteomic-based technology will help to create a complete understanding of the stress response and survival mechanisms that are utilized by many diverse organisms. Within the next few years the entire genome of many organisms, including those that display tolerances to extreme environmental conditions, will most likely be sequenced. For example, the genome of the anoxia/freeze-tolerant Western painted turtle (Chrysemys picta bellii) was only sequenced in 2013 and the hibernating thirteen-lined ground squirrel (Ictidomys tridecemlineatus) has been sequenced since 2008 [12]. With the current and future availability of genomic information, the prepared comparative biologist will be provided with a blue-print for protein structure, control domains and sites of post-translational modifications that are either conserved or perhaps unique in those organisms. The overall goal for global modeling of the cell is to better predict the behavior of biological systems. This type of research will have profound implications for the understanding of basic biology and improving future stress-tolerance of human systems. As previously mentioned, being able to successfully utilize newly developed technologies and resources, researchers will be able to build upon previously explored areas of study. The end result will most likely be a deeper and extensive understanding of the biological processes that underlie natural mechanisms of animal stress tolerance. It should also be noted that the ability to analyze the stress response at a global level is not limited to the availability of a genome [13]. It is one of the goals of this minireview to provide a contextual overview of technologies and tools that can provide omic-level analysis, without the absolute need for an annotated genome. Several technologies have emerged in the recent years that allow researchers to quantitatively analyze the cellular response in a relatively short period of time and at low cost. These technologies include (1) the use of microarrays to examine the responses of mRNA [14], protein and microRNAs, (2) the use of RNA sequencing (RNASeq) to evaluate the state of transcription among all expressed genes [15], (3) multiplexed assays that have the ability to assess the expression of multiple analytes (mRNA, protein and enzyme activity) [16], and (4) the prediction of protein structure and function [17]. Below are brief overviews of each technology and its application to the field of comparative molecular biology.

Microarray analysis of gene and protein expression

Microarrays are widely available in the research marketplace, functioning as a solid-support for thousands of different sequences that are fixed at specific locations [18]. To date, there are a variety of microarray types and formats. Essentially, microarrays can be received as an advancement on end-point RT-PCR or immunoblotting as they have the ability to measure the expression of a very large number of genes (cDNA/oligo-based capture) or proteins (antibody-based capture) at the same time and within a single sample [19]. As a result, they are typically a chosen technology for experiments that require a large number of genes to be measured quickly or when sample amount is extremely limited for study. These arrays are also useful when discovery or initial characterization of a new model organism is necessary because they allow either the generation of project “leads” (heterologous screening) or a quantitative assessment of gene/protein expression when a homologous array is used (Box. 1) [20,21]. As microarrays can be used to examine the expression of hundreds (protein) or thousands (gene) of targets at once, it holds the promise to complete multiple years of RT-PCR or immunoblotting expression research (target-based) within days. However it is critical to note that the results obtained by microarray experiment need to be validated through other methods of expression analysis (ie. Immunoblotting (protein) or qRT-PCR (gene)). One must also realize that this technology is steadily changing and improving as new advances are being made to increase both array reproducibility and specificity. Nevertheless, at its current state this technology provides an excellent research tool to the comparative biologist to obtain complete expression data or a simple generation of project leads, using either homologous or heterologous arrays, that can be used to identify possible areas of future study. Ultimately, microarray-based studies promise to expand the knowledge of the cellular stress response, revealing patterns of coordinated gene expression and perhaps even uncovering entirely new stress-responsive cellular pathways. When combined with appropriate bioinformatic tools, microarray technology also aids in integrating target expression data with function at the cellular level, revealing hypotheses of how multiple targets may work together to produce a particular stress response to match a particular cellular need (such as metabolic adjustments, cytoskeletal reorganization, etc.) [22]. Outlined below is specific information regarding both DNA and protein microarrays.

Gene expression

Microarrays can be used to detect mRNA expression patterns comparatively within different stresses, organisms, tissues or time-points. The previous research from the Storey lab, using heterologous cDNA microarrays, has indicated that there may not be a large variety of genes involved in regulating the typical animal stress response [20]. This makes it critical to be able to detect “all-of-the-few” genes that play important roles, no matter how seemingly obscure. For a comparative biologist, an expression microarray experiment could be designed where gene expression data are generated over multiple stress points in multiple arrays and referenced to control conditions. Unfortunately, it must be noted that data obtained from cDNA microarray experiments do not yield sequence information and do not provide an indication of organism-specific novel genes or organism-specific “oddities” within the gene (such as mutation of splicing events that alter protein function). It is also important to note that two main types of DNA microarrays exist in today's marketplace, (1) oligomeric microarrays, and (2) cDNA microarrays. Oligomeric microarrays are spotted with synthesized oligos anywhere between 30 and 60 bp in length. Typically, these oligos are designed to have complementarity to the 3′ UTR (some companies differ, so you must check with your company of interest) and are often used because of their high stringency. By contrast, microarrays spotted with cDNA contain the complete transcript sequence. Classically, heterologous cDNA microarrays allow the highest degree of hybridization for use with new/unsequenced animals, because they are “forgiving” enough that small, poorly conserved regions of sequence do not dictate overall binding to the array. Overall, it is critical to check with each company to see what type of DNA microarray you are purchasing and to what location the probes are designed. As an example of microarray use in comparative stress biology, one study employed the use of heterologous cDNA microarrays to determine the genes and mechanisms underlying the stress response associated with various confinement exposure lengths in gilthead sea bream (Sparus aurata) [23]. Another study used heterologous mouse cDNA microarrays to determine hibernation-responsive gene expression patterns in the brown adipose tissue of hibernating arctic ground squirrels [24]. This study identified 408 genes overexpressed during hibernation and 217 genes underexpressed during hibernation among the 11,670 annotated genes probed on the arrays. When mapping hibernation-responsive gene to GO categories, the TCA cycle, electron transport, ATP synthesis, fatty acid metabolism, and protein biosynthesis were identified as processes of importance to the hibernation cycle. Select results from the heterologous cDNA arrays were subsequently validated by qRT-PCR.

Protein expression

Antibody microarrays were conceived originally as miniaturized dot blots or immunoassays and are now rapidly becoming established as a powerful tool to assess widespread protein expression [25]. These microarrays make possible the parallel screening of thousands of unmodified or post-translationally modified proteins. In the microarray format, these experiments can be carried out with minimum use of materials, while generating large amounts of data from a single sample. When compared to the conventional use of gel electrophoresis and mass spectrometry for proteomic research, antibody microarrays are typically able to detect the proteins that are of lower abundance [26]. As low abundant proteins are often those of the greatest diagnostic interest (e.g. transcription factors), there is a need for highly selective and sensitive throughput technologies for protein detection, quantitation and differential expression analysis. For this reason, antibody-based microarrays are generating interest at the level of the comparative biologist [27]. It should be mentioned that although antibody microarrays offer the advantage of measuring the expression of multiple proteins within individual samples, there are disadvantages that should be noted. Of particular importance is that there is no separation of protein by molecular weight. Many antibodies used to detect a particular protein, often cross-react with other proteins within the same protein family or other proteins with similar detection epitopes. Studies utilizing antibody microarrays should keep this in mind and confirm all data with selective secondary immunoblotting.

RNA sequencing-based transcriptomics

RNA sequencing (commonly referred to as RNAseq) is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. RNAseq also provides more precise measurement of transcript expression levels (compared to DNA microarrays) and provides sequence information for the identified mRNA transcripts. Initially, Sanger sequencing of cDNA or EST libraries was used, but this approach has a relatively low throughput, is expensive and is generally not quantitative [28]. The development to tag-based methods of RNA sequencing allowed multiple samples to be sequenced in parallel, largely overcoming these issues [28]. However, tag-based sequencing methods are limited as they are only sequencing a portion of the transcript, ultimately limiting the use of traditional sequencing in the creation of a transcriptome. Recently, the development of high-throughput RNAseq has provided a means to sequencing whole RNA transcripts, allowing the assembly and quantification of transcriptomes. The use of high-throughput RNAseq provides clear advantages over gene microarray studies as the analysis provides both the ability to sequence RNA and measure the dynamic expression of mRNA transcripts. Currently, RNAseq uses deep-sequencing technologies such as 454, illumina, SOLiD and HelicosBiotechnology (see Table 1 for comparison). In general, a population of total RNA is converted into a library of cDNA fragments with adaptors attached to one or both ends. Each adaptor-ligated transcript is then sequenced from one end (single-end sequencing) or both ends (pair-end sequencing), producing reads that are typically 30–400 bp in length [29]. The commonly used illumina sequencing process is similar in principle, but uses a solid phase bridge amplification method to create clusters of a specific gene before sequencing (Fig. 1). RNAseq reads are then aligned and mapped to a reference genome for further analysis, or assembled de novo without the genomic sequence (Fig. 2). Following the release of its genome sequence, RNAseq analysis has been used to determined mRNA expression during anoxia exposure in the Western painted turtle (C. picta bellii) [12]. To explore the transcriptomic basis of its anoxia tolerance, this study assembled an mRNA expression profile by sequencing poly A-enriched RNA isolated from the heart and brain (telencephalon) of normoxic and anoxic turtles. Differential gene expression significantly increased in the brain (19 genes) and heart (23 genes). Highly differentially expressed genes (> 10-fold; APOLD1, FOS, JUNB, ATF3, PTGS2, BTG1/2, and EGR1) were found to encode proteins that have been implicated in the control of cellular proliferation, cancers, and tumor suppression [12]. If a complete genome sequence is not available, a de novo transcriptome assembly may be constructed and used for mRNA expression analysis. However, researchers must consider all of the statistical concerns for this type of experimental design before undertaking this type of study as more reads are typically needed for a de novo assembly (see Box. 2).

Table 1

Overview comparison of next-generation sequencing techniques.

Platform	Method	Read length (bp)	Throughput
Roche 454	Pyrosequencing	400	400 Mb/run
Illumina/Solexa HiSeq	Reversible terminator chemistry	2 × 100	600 Gb/run
ABI SOLiD	Ligation	2 × 60	15 Gb/day
HelicosBiotechnology	Reversible terminator chemistry	25–55	28 Gb/run
Roche 454 — GS Junior	Pyrosequencing	400	50 Mb/run
Illumina/Solexa MiSeq	Reversible terminator chemistry	2 × 150	1.0–1.4 Gb/run
ABI Iontorrent	H + ion selective transistor	–	320 Mb/run

Fig. 1

Overview of Illumina sequencing technology. Samples are initially fragmented and adapter sequences are ligated to both ends of the fragments. Adapted fragments are then randomly bound to the inside surface of the channels of a flow cell. Solid-phase bridge amplification is carried out creating large clusters of double stranded fragments. Sequencing is carried out by adding four labeled reversible terminators, primers, and DNA polymerase. Following laser excitation, the image (fluorophore corresponding to specific-bound nucleotide) is captured and the identity of the base is recorded.

Fig. 2

Overview of RNAseq transcriptome mapping for gene expression experiments.

Unlike microarray-based approaches, RNAseq experiments are not limited to detecting transcripts that correspond to an existing genomic sequence. For example, the detection of novel freeze-responsive genes such as FR10 and Li16 in the wood frog (Rana sylvatica), initially discovered by cDNA array, could not be possible using heterologous cDNA microarrays that have been prepared with cataloged genes from another organism. This makes RNAseq particularly attractive for non-model organisms with genomic sequences that are yet to be determined (de novo assembly). A second advantage of RNAseq is that it does not have an upper limit for quantification. Consequently, it has a large dynamic range of expression levels over which transcripts can be detected: a greater than 9000-fold range (not limited by fluorescence and the “hook” effect that plagues microarray analysis) [30]. By contrast, DNA microarrays lack sensitivity for genes expressed either at very low or very high levels and therefore have a much smaller dynamic range. RNAseq is also highly accurate for quantifying expression levels, as determined using quantitative RT-PCR [28]. Taking all of these advantages into account, RNAseq is the first sequencing-based method that allows the entire transcriptome to be surveyed in a very high-throughput and quantitative manner. However, it should be noted that bioinformatic analysis of RNAseq data is very intensive and typically must be done by the company servicing the project at an additional cost (typically doubling the cost of the experiment). However, it should be noted that open-source software has been recently developed to expedite and simplify the analysis of RNAseq data and de novo assembly (Trinity; http://trinityrnaseq.sourceforge.net/).

Discovery of microRNA sequence and identification of function

MicroRNAs are short (18–23 nt), non-coding RNAs that are known to have central roles in regulating the post-transcriptional expression of mRNA transcripts and have been shown to play an important role in the stress response [32]. A single microRNA (miRNA) is known to directly target hundreds of mRNAs [33,34]. Many human miRNAs (mature: 2578 & precursors: 1872) are released in the latest release of miRBase (Release v.20), yet similar numbers are sparse in non-human species and many still remain to be identified. In the past 10 years several groups have developed algorithms to identify targets for miRNA [35-37]. Most of the algorithms are mainly based on the conservation of the seed region and binding energy, but in the recent years many algorithms have incorporated expression profiles in their scoring function [38], which predicts the target more accurately. Before high-throughput identification of miRNA targets, many prediction tools were used, including TargetScan, miRanda, RNAhybrid, DIANA-microT, microInspector, and mirTarget2 [39-45]. The bioinformatics tools are still highly useful in validating microRNA targets in non-model organisms when gene sequences (including UTRs) are known. For example, both miR-15a and miR-16-1 are known to target cyclin D1 and regulate the cell cycle in humans [46]. Interest in the anoxic regulation of the cell cycle in tolerant turtles, prompted researchers to explore the possibility that miR-15a and miR-16-1 may regulate the turtle-specific cyclin D1 mRNA [46]. With no genomic information for the turtle at the time of study, researchers used 3′ rapid amplification of cDNA ends (RACE) to sequence the 3′ UTR of turtle cyclin D1. The ability of both miR-15a and miR-16-1 to target turtle cyclin D1 mRNA was then determined through a combined analysis using TargetScan and RNAhybrid (see Fig. 3). Unfortunately, if no gene sequence information is available for your specific miRNA:target interaction analysis must rely on heterologous analysis from the most closely related species with available genomic information [3,7,47-49].

Fig. 3

Binding of miR-16-1 and miR-15a to a conserved region of the turtle cyclin D1 mRNA. (A) Predicted binding structures from RNAhybrid. (B) Conservation analysis and seed-pairing identification by TargetScan. Figure modified from [46].

Identification from available genomic sequence

The prediction of novel miRNA from non-annotated genomic sequence has received considerable attention in the recent years. However, the vast majority of studies have focused on the human genome. The previous studies have shown that the specificity (the ability to correctly reject non-miRNA sequences) drops dramatically once human-trained methods are applied to other species [36]. Considering the expected ratio of true miRNA sequences to pseudo-miRNA hairpins is on the order of 1:1000, the use of cross-species prediction models with low specificity becomes useless for validation, as the number of false positives overwhelms the number of true positives. However, several newly developed methods and tools can be used to circumvent the issue of prediction specificity. Recognizing the problem of specificity, the HeteroMirPred was created for the identification of unannotated microRNA from genome sequences of non-human species [36,50]. This program attempts to address the non-human issue by using training data pooled from multiple species. As this software has been designed to operate across all eukaryotes, it suffers from its generalist prediction approach as it can commonly overlook known microRNAs. Currently, the most high throughout approach to microRNA sequence annotation involves the use of small RNA sequencing to target predictions back to the genome, greatly reducing the number of false positives that initially enter the prediction pipeline. MiRDeep2 was developed to discover active known or novel miRNA from deep sequencing data [51]. Using small RNAseq, MiRDeep2 map sequencing reads back to their location in the animal's genome. The program then extracts the surrounding nucleotides from the genome sequence to perform miRNA prediction. This method has been successfully used to identify the developmental response of 212 miRNA from soft-shell turtle embryos [52]. The combination of MiRDeep and RNAseq has also been successfully used for microRNA discovery in response to hibernation in the Arctic ground squirrel (Spermophilus parryii) [53]. Remarkably, this study found 200 ground squirrel miRNAs, including 18 novel miRNAs specific to the ground squirrel.

Complementary analysis of function

Hibernation research has now begun to highlight various adaptational roles for miRNAs. In particular, studies are beginning to move away from candidate-based miRNA analysis (i.e. whether a particular microRNA is able to regulate a specific target), and are beginning to address the ability of microRNA to collectively target and regulate cellular processes [3,4]. Typically, the gene and the miRNA expression data need to be co-related with the targets identified. This correlation could mostly be achieved by using pathway analysis tools, such as DIANA-micropath, Cytoscape and Pathway central [54-57]. For example, one of our own studies found that in response to torpor in little brown bats (Myotis lucifugus), differentially expressed microRNA in brain tissue converged on the common regulation on pathways of focal adhesion and axon guidance [3]. Interestingly, these same processes were also independently shown to be regulated during hibernation in the brain of the greater horseshoe bats (Rhinolophus ferrumequinum) [58].

Multiplex analysis

A multiplex assay is a type of laboratory procedure that simultaneously measures multiple analytes (up to 500) in a single assay. As this technology is under constant growth and change, this minireview will only outline on the principles of multiplex assays and highlight key technologies currently available at the time of publication. Multiplex assays are widely used in functional genomics experiments that assay the state of a type of target (e.g. microRNA, mRNA, protein) within a single biological sample. Multiplexing assays work by performing multiple parallel reactions for different targets, greatly reducing the time needed to complete the analysis. Various companies are currently developing and refining multiplex technologies. Luminex xMAP technology is built on a flow cytometry platform that utilizes fluorescently tagged microspheres to detect the target analyte. Each color-coded tiny microsphere can be coated with a reagent specific to a particular bioassay, allowing the capture and detection of specific analytes from a sample. Within the analyzer, lasers excite the internal dyes that identify each microsphere particle, and also any reporter dye captured during the assay (Fig. 4). Currently, xMAP technology allows multiplexing of up to 500 unique assays within a single sample. The use of Luminex xMAP technology was recently used to determine the differential expression profiles of microRNA transcripts in response to dehydration stress in the tissues of Xenopus laevis [16]. Different from Luminex-based analysis, Mesoscale Discovery utilizes small antibody arrays, spotted onto the bottom of a microplate (available in 24-, 96-, and 384-well formats). In essence, the technology is at the cross section of Luminex and antibody microarrays, with the capability to analyze up to 100 spots per well. Like Luminex arrays, this technology also requires the use of specialized equipment. To date, several comparative studies have used Luminex technology to protein expression during hibernation [59,60]. For example, Luminex has recently been used to determine the activation of insulin signaling pathways pre-, post- and during hibernation in grizzly bears [59]. The technology has also been used to determine the activation response of immunological-response to white-nose syndrome, finding that bats showing visible signs of infection had significantly higher IL-4 expression when compared to bats without visible infection [60].

Fig. 4

Luminex xMAP system. Luminex is based on microsphere bead technology that relies on flow cytometry and target-capture ligand binding.

Mass spectrometry

Mass spectrometry (MS) is an important and emerging technology for the characterization and sequencing of proteins. The technology works by ionizing compounds to generate charged peptide fragments and measuring their mass-to-charge ratios to identify amino acids sequences (for review see, [61]). For identification, proteins are enzymatically digested into smaller peptides using proteases (commonly trypsin, cutting sequences at lysine residues), after electrophoretic separation. The collection of peptide products is then introduced to the mass analyzer. When the characteristic pattern of peptides is used for the identification of the protein, the method is called peptide mass fingerprinting. If the identification is performed using the sequence data determined in MS analysis it is called de novo sequencing [62]. The use of MS/MS to identify unknown proteins may be of great interest to the comparative biologist. Apart from identifying the protein sequence of unknown proteins, mass spectrometry is also able to detect relative post-translation modifications such as phosphorylation, methylation and acetylation, among others [63]. Identification of modified amino acids that are both utilized and unique to a stress-tolerant animal may be of functional significance to the stress. One must keep in mind that this technique works best with highly abundant proteins (typically not proteins such as transcription factors) that are easily purified (such as many metabolic enzymes and structural proteins). One growing field in mass spectrometry is the identification and dynamic changes of the phospho-proteome. Phosphorylated proteins are typically pre-fractionated and enriched prior to MS/MS, increasing the coverage of identification. Pre-fractionation is typically accomplished through the use of one of the two types of ion-exchange chromatography; (1) strong anionic ion-exchange (SAX), and (2) strong cationic ion-exchange (SCX). Following pre-fractionation, phosphorylated protein or peptides can be enriched by a variety of methods including (but not limited to); (1) immunoprecipitation by pan-specific antibodies, (2) pull-down by phospho-binding domains, (3) immobilized metal affinity chromatography (IMAC), (4) metal-oxide affinity chromatography (MOAC), and (5) Phos-Tag chromatography. A few examples of studies exploring phospho-proteomics in non-model organisms include phospho-proteome of chicken (Gallus gallus) embryo fibroblasts and of the mitochondria of hibernating thirteen-lined ground squirrels (I. tridecemlineatus) [64,65]. Recently, studies are beginning to explore the methyl-proteome by the enrichment of methylated protein through pan-methyl-arginine antibodies or methyl-lysine binding domains [66]. These types of enrichment methods, with wide-cross reactivity profiles, not only allow the ability to identify proteins in a whole cell complex, but also remain specificity in many non-human animals as amino acid sequence has little role in binding.

Analysis of protein structure and function

Several of the methods described in this minireview detail analysis that are used to identify new proteins or post-translational modifications. Importantly, when new or novel proteins are identified it is critical to determine, or predict, their functional role. While knockdown and overexpression analysis remains to be the ‘gold-standard’ in determining protein function, animals that do not have cell-lines or the genetic tools available, must resort to bioinformatics analysis to supplement and guide any molecular data (cellular localization, binding partners, etc.). Multiple tools exist to assist in these types of analysis. Once an amino acid sequence is obtained, the protein can initially be scanned for conserved domains with the NCBI Conserved Domain search resource (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). This analysis identifies any possible functional domains that exist in the protein (such as SH2 or ATP-binding domains), giving information of functional interactions and may guide further analysis. Too many resources currently exist for this Review to comprehensively provide an overview, however OpenPredictProtein (http://ppopen.informatik.tu-muenchen.de/) has curated a collection of valuable prediction tools that can be used on primary amino acid sequences [67]. Such tools include those for structural annotation (Solvent accessibility, transmembrane helices, protein disorder and flexibility, as well as disulfide bridges) and functional annotation (Gene ontology terms, subcellular localization and binding sites). Occasionally, obtaining protein structure is necessary to determine more specific function information regarding the protein under study. As such, protein structure can be determined through several methods. If the protein is highly homologous to the existing body of protein crystal structures, SWISS-MODEL can be used to determine 3-dimensional (3D) protein structure [68]. Importantly, SWISS-MODEL is currently one of the most commonly used resources and provides information on the quality of prediction. For example, a study on the structural adaptations of aldolase enzyme that helps to drive glycolysis in anoxic turtles (Trachemys scripta elegans) used SWISS-MODEL to generate the structures of aldolase enzymes (ALDOA and ALDOB). These structures were then used to determine the mechanisms involved in substrate interactions compared to rabbit aldolase proteins, stating that differences in substrate binding and heterotetramer formation contribute to the higher activity of turtle aldolase (Fig. 5) [5]. When completely novel proteins are discovered, structures cannot be determined from homology-based methods and be predicted de novo. Several programs currently exist to facilitate de novo predictions, the most commonly used being I-TASSER (for proteins < 1500 amino acids) and QUARK (< 200 amino acids) [69]. To highlight the use of de novo structure prediction for completely novel proteins, a recent study used QUARK to determine the structure of two freeze-response proteins, FR10 and Li16, from the wood frog (Rana sylvatica) (Fig. 6) [17]. The ability to obtain structures for these novel proteins allowed researchers to model membrane interaction (PPM server; http://opm.phar.umich.edu/server.php) [70], leading to the hypothesis that FR10 was an excreted protein and Li16 may have functional roles in membrane-adaptation roles in response to freezing stress.

Fig. 5

Identification and characterization of predicted turtle aldolase enzyme. (A) Overlaid tertiary structure of ALDOB from both rabbit (light) (1fdjA) and the turtle (dark) ALDOB protein predicted by SWISS-MODEL. (B) Predicted docking of fructose-1,6-bisphosphate on the active sites of both rabbit and turtle ALDOB enzyme. Ligand docking was performed with MOE Dock, employing Triangle Matcher as the placement and function London dG as the first scoring function. Figure modified from [5].

Fig. 6

De novo protein modeling and prediction of function. (A) Predicted de novo protein structure of the novel freeze-responsive proteins, FR10 and Li16 from the freeze-tolerant wood frog (Rana sylvatica). Structures were predicted by the QUARK server and optimized by MOE software. (B) Membrane interactions of FR10 and Li16 based on PPM server prediction. Figure modified from [17].

Summary and outlook

The development of tools capable de novo assembly and predictions, introduces many new possibilities for comparative biologists to take part in “omic” studies and introduces the potential to discover novel proteins or genes with biologically-relevant function. Given the rich assortment of techniques and bioinformatic tools (many being open-source with GUIs) that are currently available and that have been refined for use non-human species, proper introduction and use of these tools in future research will help to discover and characterize the animal stress responses that are utilized by many diverse organisms.

68 in total

1. A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry.

Authors: T Chen; M Y Kao; M Tepel; J Rush; G M Church
Journal: J Comput Biol Date: 2001 Impact factor: 1.479

2. Assembly of large genomes using second-generation sequencing.

Authors: Michael C Schatz; Arthur L Delcher; Steven L Salzberg
Journal: Genome Res Date: 2010-05-27 Impact factor: 9.043

3. Prediction of both conserved and nonconserved microRNA targets in animals.

Authors: Xiaowei Wang; Issam M El Naqa
Journal: Bioinformatics Date: 2007-11-29 Impact factor: 6.937

4. Genomic analysis of miRNAs in an extreme mammalian hibernator, the Arctic ground squirrel.

Authors: Yuting Liu; Wenchao Hu; Haifang Wang; Minghua Lu; Chunxuan Shao; Corinna Menzel; Zheng Yan; Ying Li; Sen Zhao; Philipp Khaitovich; Mofang Liu; Wei Chen; Brian M Barnes; Jun Yan
Journal: Physiol Genomics Date: 2010-05-04 Impact factor: 3.107

5. Regulation of p53 by reversible post-transcriptional and post-translational mechanisms in liver and skeletal muscle of an anoxia tolerant turtle, Trachemys scripta elegans.

Authors: Jing Zhang; Kyle K Biggar; Kenneth B Storey
Journal: Gene Date: 2012-11-01 Impact factor: 3.688

6. Detection of differential gene expression in brown adipose tissue of hibernating arctic ground squirrels with mouse microarrays.

Authors: Jun Yan; Adlai Burman; Calen Nichols; Linda Alila; Louise C Showe; Michael K Showe; Bert B Boyer; Brian M Barnes; Thomas G Marr
Journal: Physiol Genomics Date: 2006-02-07 Impact factor: 3.107

Review 7. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

8. NAViGaTing the micronome--using multiple microRNA prediction databases to identify signalling pathway-associated microRNAs.

Authors: Elize A Shirdel; Wing Xie; Tak W Mak; Igor Jurisica
Journal: PLoS One Date: 2011-02-25 Impact factor: 3.240

9. PredictProtein--an open resource for online prediction of protein structural and functional features.

Authors: Guy Yachdav; Edda Kloppmann; Laszlo Kajan; Maximilian Hecht; Tatyana Goldberg; Tobias Hamp; Peter Hönigschmid; Andrea Schafferhans; Manfred Roos; Michael Bernhofer; Lothar Richter; Haim Ashkenazy; Marco Punta; Avner Schlessinger; Yana Bromberg; Reinhard Schneider; Gerrit Vriend; Chris Sander; Nir Ben-Tal; Burkhard Rost
Journal: Nucleic Acids Res Date: 2014-05-05 Impact factor: 16.971

10. MicroRNA regulation in extreme environments: differential expression of microRNAs in the intertidal snail Littorina littorea during extended periods of freezing and anoxia.

Authors: Kyle K Biggar; Samantha F Kornfeld; Yulia Maistrovski; Kenneth B Storey
Journal: Genomics Proteomics Bioinformatics Date: 2012-10-08 Impact factor: 7.691

2 in total

Review 1. Seasonal and post-trauma remodeling in cone-dominant ground squirrel retina.

Authors: Dana K Merriman; Benjamin S Sajdak; Wei Li; Bryan W Jones
Journal: Exp Eye Res Date: 2016-01-22 Impact factor: 3.467

2. The Gray Mouse Lemur: A Model for Studies of Primate Metabolic Rate Depression. Preface.

Authors: Kenneth B Storey
Journal: Genomics Proteomics Bioinformatics Date: 2015-06-21 Impact factor: 7.691

2 in total