Literature DB >> 26697363

Third party data gene data set of eutherian growth hormone genes.

Abstract

Among 146 potential coding sequences, the most comprehensive eutherian growth hormone gene data set annotated 100 complete coding sequences. The eutherian comparative genomic analysis protocol first described 5 major gene clusters of eutherian growth hormone genes. The present updated gene classification and nomenclature of eutherian growth hormone genes integrated gene annotations, phylogenetic analysis and protein molecular evolution analysis into new framework of future experiments. The curated third party data gene data set of eutherian growth hormone genes was deposited in European Nucleotide Archive under accession numbers LM644135-LM644234.

Entities: Chemical Gene Species

Keywords: Comparative genomic analysis; Gene annotations; Molecular evolution; Phylogenetic analysis

Year: 2015 PMID： 26697363 PMCID： PMC4664738 DOI： 10.1016/j.gdata.2015.09.007

Source DB: PubMed Journal: Genom Data ISSN： 2213-5960

Direct link to deposited data

Deposited data could be found here: http://www.ebi.ac.uk/ena/data/view/LM644135–LM644234.

Experimental design, materials and methods

The eutherian comparative genomic analysis protocol included gene annotations, phylogenetic analysis and protein molecular evolution analysis [1].

Gene annotations

The BioEdit 7.0.5.3 program was used in nucleotide and protein sequence analyses (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). The NCBI's BLAST programs were used in identification of genes in eutherian genomic sequence assemblies downloaded from NCBI (ftp://ftp.ncbi.nlm.nih.gov/blast/ and ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/) [2], [3]. Alternatively, the Ensembl genome browser's BLAST or BLAT web tools were used in gene identifications (http://www.ensembl.org). The gene feature analyses included direct evidence of eutherian gene annotations deposited in NCBI's nr, est_human, est_mouse and est_others databases (http://www.ncbi.nlm.nih.gov). The protocol tested potential growth hormone (GH) coding sequences using tests of reliability of eutherian public genomic sequences. Using NCBI's BLAST programs and primary sequence reads deposited in NCBI's Trace Archive (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi), the first test step analysed nucleotide sequence coverage of potential coding sequences. The potential coding sequences were described as complete coding sequences in second test step only if consensus trace sequence coverage was available for every nucleotide. Alternatively, the potential coding sequences were described as putative coding sequences. Only the complete GH coding sequences were deposited in European Nucleotide Archive (http://www.ebi.ac.uk/ena/about/tpa-policy) and used in phylogenetic and protein molecular evolution analyses. In gene descriptions, the guidelines of human and mouse gene nomenclature were followed (http://www.genenames.org/about/guidelines and http://www.informatics.jax.org/mgihome/nomen/gene.shtml). There were 100 complete eutherian GH coding sequences, among 146 potential coding sequences (Fig. 1) (Supplementary data file 1). The most comprehensive third party data gene data set of eutherian GH genes annotated 15 GHA genes, 36 GHB genes, 5 GHC genes, 39 GHD genes and 5 GHE genes. The eutherian GHA genes were described as prolactin PRL orthologues, eutherian GHB genes were described as growth hormone GH orthologues and paralogues, domesticated guinea pig GHC genes were first described in present work, Ghd genes were described as prolactin paralogues in mouse and brown rat and GHE genes were described as prolactin paralogues in domestic cattle [4], [5], [6]. The masking of transposable elements using RepeatMasker version open-4.0.3 was included as preparatory step in multiple pairwise genomic sequence alignments, using default settings except simple repeats and low complexity elements were not masked (sensitive mode, cross_match version 1.080812, RepBase Update 20130422, RM database version 20130422) (http://www.repeatmasker.org/). In genomic sequence alignments, the mVISTA web tool was used, using AVID alignment program and default settings (http://genome.lbl.gov/vista/index.shtml). Using ClustalW implemented in BioEdit 7.0.5.3, the common predicted promoter genomic sequence regions were aligned at nucleotide sequence level and then manually corrected. The pairwise nucleotide sequence identities of common predicted promoter genomic sequence regions were calculated using BioEdit 7.0.5.3, and used in statistical analysis (Microsoft Office Excel). The common predicted promoter genomic sequence regions of eutherian GHA and GHB genes were described (Supplementary data file 2, Supplementary data file 3). For example, among primates, the calculated patterns of average pairwise nucleotide sequence identities of common predicted promoter genomic sequence regions exceeded empirically determined cut-offs of detection of common genomic sequence regions. Whereas the average pairwise nucleotide sequence identity of primate GHA common predicted promoter genomic sequence regions was ā = 0,872 (amax = 0,986, amin = 0,767, āad = 0,074) (Supplementary data file 2, Supplementary data file 3), average pairwise nucleotide sequence identity of primate GHB common predicted promoter genomic sequence regions was ā = 0,844 (amax = 0,989, amin = 0,252, āad = 0,111) (Supplementary data file 2, Supplementary data file 3).

Fig. 1

(A) Phylogenetic analysis of eutherian growth hormone genes. The minimum evolution tree was calculated using maximum composite likelihood method. After 1000 bootstrap replicates, the estimates higher than 50% were shown. (B) Distribution of common cysteine amino acid residues in eutherian growth hormone proteins. The common Cys amino acid residues 1–6 were labelled using black rectangles. The numbers indicated numbers of amino acid residues. (C) Reference human GHA1 protein primary structure. The 4 invariant amino acid sites were shown using white letters on black backgrounds and 13 forward amino acid sites were shown using white letters on grey backgrounds. The common Cys amino acid residues 1–6 were labelled below reference protein amino acid sequence, as well as 14 predicted functional amino acid residues (#) [9]. The α-helical regions of human GHA1 tertiary structure 1N9D were labelled by rectangles [9]. The tertiary structure determinant amino acid sites H75 and H88 were indicated by arrows [10]. The predicted signal peptide cleavage site was indicated by black triangle.

Phylogenetic analysis

The translated complete eutherian GH coding sequences were aligned at amino acid level using ClustalW implemented in BioEdit 7.0.5.3. Then the protein sequence alignments were manually corrected, as well as nucleotide sequence alignments (Supplementary data file 4). In phylogenetic tree calculations, the MEGA 6.06 program was used (http://www.megasoftware.net), using neighbour-joining method (default settings, except gaps/missing data treatment = pairwise deletion) (data not shown), minimum evolution method (default settings, except gaps/missing data treatment = pairwise deletion) and maximum parsimony method (default settings, except gaps/missing data treatment = use all sites) (data not shown). However, the maximum likelihood methods were not used in present analysis because their homogeneity and stationarity assumptions were not satisfied (data not shown). The pairwise nucleotide sequence identities of complete eutherian GH coding sequences were calculated using BioEdit 7.0.5.3, and used in statistical analysis (Microsoft Office Excel). The present work first described 5 eutherian GHA-GHE major gene clusters (Fig. 1). There were evidence of differential gene expansions in all eutherian GH major gene clusters, except GHA major gene cluster included orthologues only. For example, the present study confirmed that there were differential gene expansions of primate GHB paralogues [4], [7], mouse and brown rat GHD paralogues [4], [5] and domestic cattle GHE paralogues [4]. Of note, the present phylogenetic analysis first included completed eutherian GH gene data set. For example, the phylogenies of eutherian GHA and GHB major gene clusters, as well as phylogenies of domesticated guinea pig GHC and domestic cattle GHE major gene clusters were first described. The present phylogenetic analysis of primate GHB paralogues was in agreement with previous analyses [6], [8]. In addition, the overall grouping within Ghd major gene cluster agreed with analysis of Soares et al. [5]. The calculated average pairwise nucleotide sequence identity of entire data set of eutherian GH homologues was ā = 0,448 (amax = 0,995, amin = 0,224, āad = 0,141). Indeed, the updated and revised eutherian GH gene classification was confirmed by calculated patterns of pairwise nucleotide sequence identities of eutherian GH genes (Supplementary data file 5). First, whereas the eutherian GHA major gene cluster showed nucleotide sequence identities typical in comparisons between eutherian orthologues, eutherian GHB major gene cluster showed nucleotide sequence identities typical in comparisons between eutherian orthologues and paralogues. Next, the nucleotide sequence identities of eutherian GHC and GHE major gene clusters respectively were typical in comparisons between eutherian paralogues. However, there were calculated nucleotide sequence identity patterns of Ghd major gene cluster distant eutherian paralogues. Finally, there were nucleotide sequence identities of close eutherian homologues in comparisons between eutherian GHA, GHC, Ghd and GHE major gene clusters. Yet, in comparisons between eutherian GHB major gene cluster and other major gene clusters, there were nucleotide sequence identities of typical eutherian homologues.

Protein molecular evolution analysis

The tests of protein molecular evolution integrated patterns of nucleotide sequence similarities with protein tertiary structures. In codon usage statistic calculations, the MEGA 6.06 program was used. The ratios between observed and expected amino acid codon counts determined relative synonymous codon usage statistics (R). The not preferable amino acid codons with R ≤ 0.7 were TTA (0,28), TTG (0,56), CTA (0,54), ATA (0,62), GTA (0,38), TCG (0,39), CCG (0,34), ACG (0,41), GCG (0,15), TGT (0,55), CGT (0,54), CGA (0,54), AGT (0,63) and GGT (0,56). Accordingly, the reference protein sequence amino acid sites were indicated as invariant amino acid sites (invariant alignment positions), forward amino acid sites (variant alignment positions that did not include amino acid codons with R ≤ 0.7) or compensatory amino acid sites (variant alignment positions that included amino acid codons with R ≤ 0.7). The presence of preferable amino acid codons, as well as absence of not preferable amino acid codons indicated that forward amino acid sites could have major influence on protein tertiary structure and function. The DeepView/Swiss-PdbViever 4.1.0 (http://spdbv.vital-it.ch/) was used in analysis of human GHA1 tertiary structure 1N9D [9], [10]. In prediction of N-terminal signal peptides, the SignalP 4.1 web tool was used, using default settings (http://www.cbs.dtu.dk/services/SignalP/). The present study first described 5 eutherian GH major protein clusters (Fig. 1). There were 6 common cysteine amino acid residues 1–6 present in eutherian GH proteins (Fig. 1B) (Supplementary data file 4). Whereas the eutherian GHB major protein cluster included 4 common Cys amino acid residues 3–6, there were 6 common Cys amino acid residues 1–6 present in other eutherian GH major protein clusters. Yet, in present eutherian GH protein data set, there were substitutions at common Cys residues 1, 2 and 5 (C33, C40 and C220 in human GHA1) but not at invariant common Cys amino acid residues 3, 4 and 6 (C87, C203 and C228 in human GHA1). Whereas the N-terminal signal peptides were predicted in all eutherian GH major protein clusters (data not shown), no invariant common potential N-glycosylation sites were found in eutherian GH major protein clusters. The present tests of protein molecular evolution included entire eutherian GH homologue data set (Fig. 1C) (Supplementary data file 4). The human GHA1 protein primary structure was used as reference protein amino acid sequence in analysis of human GHA1 tertiary structure 1N9D [9] (Supplementary data file 6). First, there were 4 invariant amino acid sites among 228 reference protein amino acid residues. For example, the invariant common Cys amino acid residues 3 and 4 (C87 and C203 in human GHA1) were implicated in disulfide linkage [10]. Second, there were 13 forward amino acid sites described in reference protein amino acid sequence. For example, the human GHA1 amino acid sites H75 and H88 were designated as major tertiary structure determinant amino acid residues [10].

Discussion

The eutherian GH third party data gene data set included genes implicated in major physiological processes [4], [5], [6], [7], [8], [9], [10]. For example, the human GH homologues were recorded in World Anti-Doping Code's Prohibited List (http://list.wada-ama.org/). The present updated gene classification and nomenclature of eutherian GH genes integrated gene annotations, phylogenetic analysis and protein molecular evolution analysis into new framework of future experiments. The following are the supplementary data related to this article.

Supplementary data file 1

Third party data gene data set of eutherian growth hormone genes.

Supplementary data file 2

Pairwise genomic sequence alignments of eutherian growth hormone genes. The translated sequence regions were displayed as indigo rectangles and untranslated sequence regions were displayed as cyan rectangles in base sequences (top). The genomic sequence regions that showed conservation levels exceeding empirical cut-offs of detection of common genomic sequence regions were shown accordingly in pairwise alignments. Using rectangles, the common predicted promoter genomic sequence regions (P) were labelled. (A) The cut-offs of detection of common genomic sequence regions in pairwise alignments with Homo sapiens GHA1 were 90% per 100 bp (Pongo abelii, Nomascus leucogenys), 85% per 100 bp (Macaca mulatta), 80% per 100 bp (Callithrix jacchus), 75% per 100 bp (Microcebus murinus, Otolemur garnettii), 65% per 100 bp (Rodentia) or 70% per 100 bp in other pairwise alignments. (B) The cut-offs of detection of common genomic sequence regions in pairwise alignments with Homo sapiens GHB1 were 80% per 100 bp (Haplorrhini), 75% per 100 bp (Otolemur garnettii), 65% per 100 bp (Rodentia) or 70% per 100 bp in other pairwise alignments. The exons in base sequences (top) were annotated using transcripts BC088370.1 (A) and BC035965.1 (B).

Supplementary data file 3

Nucleotide sequence alignments of common predicted promoter genomic sequence regions. The 5′-terminal exons were annotated as in Supplementary data file 2, and labelled by Xs below alignments. The translation start sites were marked by black triangles above alignments. According to conservation levels, the nucleotide positions were labelled by white letters on black background (100% conservation level), white letters on dark grey background (≥ 85% conservation level) or black letters on grey background (≥ 70% conservation level).

Supplementary data file 4

Protein sequence alignments of eutherian growth hormone proteins. In reference human GHA1 protein amino acid sequence (top), the 4 invariant amino acid sites were shown using white letters on violet backgrounds and 13 forward amino acid sites were shown using white letters on red backgrounds. According to conservation levels, the amino acid positions were labelled by white letters on black background (100% conservation level), white letters on dark grey background (≥ 75% conservation level) or black letters on grey background (≥ 50% conservation level). The stop codons were indicated by &s.

Supplementary data file 5

Pairwise nucleotide sequence identities of eutherian growth hormone genes.

Supplementary data file 6

Protein molecular evolution analysis of eutherian growth hormone proteins. (A) Reference human GHA1 protein primary structure. The 4 invariant amino acid sites were shown using white letters on violet backgrounds and 13 forward amino acid sites were shown using white letters on red backgrounds. The common Cys amino acid residues 1–6 were labelled below reference protein amino acid sequence, as well as 14 predicted functional amino acid residues (#) [9]. The α-helical regions of human GHA1 tertiary structure 1N9D were labelled by rectangles [9]. The tertiary structure determinant amino acid sites H75 and H88 were indicated by arrows [10]. The predicted signal peptide cleavage site was indicated by black triangle. (B) Human GHA1 tertiary structure analysis. The ribbon representation of human GHA1 tertiary structure 1N9D [9]. Whereas the invariant amino acid sites were labelled violet, forward amino acid sites were labelled red.

Specifications
Organism/cell line/tissue	35 eutherian species
Sex	N/A
Sequencer or array type	Sanger DNA sequencing method sequencers
Data format	FAS, TXT
Experimental factors	Eutherian comparative genomic analysis protocol
Experimental features	Third party data gene data set
Consent	N/A
Sample source location	N/A

10 in total

1. An intermediate grade of finished genomic sequence suitable for comparative analyses.

Authors: Robert W Blakesley; Nancy F Hansen; James C Mullikin; Pamela J Thomas; Jennifer C McDowell; Baishali Maskeri; Alice C Young; Beatrice Benjamin; Shelise Y Brooks; Bradley I Coleman; Jyoti Gupta; Shi-Ling Ho; Eric M Karlins; Quino L Maduro; Sirintorn Stantripop; Cyrus Tsurgeon; Jennifer L Vogt; Michelle A Walker; Catherine A Masiello; Xiaobin Guan; Gerard G Bouffard; Eric D Green
Journal: Genome Res Date: 2004-10-12 Impact factor: 9.043

2. A standardized nomenclature for the mouse and rat prolactin superfamilies.

Authors: Michael J Soares; S M Khorshed Alam; Mary Lynn Duckworth; Nelson D Horseman; Toshihiro Konno; Daniel I H Linzer; Lois J Maltais; Marit Nilsen-Hamilton; Kunio Shiota; Jennifer R Smith; Michael Wallis
Journal: Mamm Genome Date: 2007-05-03 Impact factor: 2.957

3. Growth hormone locus expands and diverges after the separation of New and Old World Monkeys.

Authors: Rafael González Alvarez; Agnès Revol de Mendoza; Dolores Esquivel Escobedo; Gloria Corrales Félix; Irám Rodríguez Sánchez; Víctor González; Guillermo Dávila; Qing Cao; Pieter de Jong; Yun-Xin Fu; Hugo A Barrera Saldaña
Journal: Gene Date: 2006-07-26 Impact factor: 3.688

4. Gene conversions in the growth hormone gene family of primates: stronger homogenizing effects in the Hominidae lineage.

Authors: Nicholas Petronella; Guy Drouin
Journal: Genomics Date: 2011-06-12 Impact factor: 5.736

5. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing.

Authors: Elliott H Margulies; Jade P Vinson; Webb Miller; David B Jaffe; Kerstin Lindblad-Toh; Jean L Chang; Eric D Green; Eric S Lander; James C Mullikin; Michele Clamp
Journal: Proc Natl Acad Sci U S A Date: 2005-03-18 Impact factor: 11.205

6. The tertiary structure and backbone dynamics of human prolactin.

Authors: Camille Keeler; Priscilla S Dannies; Michael E Hodsdon
Journal: J Mol Biol Date: 2003-05-16 Impact factor: 5.469

7. Contribution of individual histidines to the global stability of human prolactin.

Authors: Camille Keeler; M Cristina Tettamanzi; Syrus Meshack; Michael E Hodsdon
Journal: Protein Sci Date: 2009-05 Impact factor: 6.725

8. Ancient origin of placental expression in the growth hormone genes of anthropoid primates.

Authors: Zack Papper; Natalie M Jameson; Roberto Romero; Amy L Weckle; Pooja Mittal; Kurt Benirschke; Joaquin Santolaya-Forgas; Monica Uddin; David Haig; Morris Goodman; Derek E Wildman
Journal: Proc Natl Acad Sci U S A Date: 2009-09-18 Impact factor: 11.205

9. Initial description of primate-specific cystine-knot Prometheus genes and differential gene expansions of D-dopachrome tautomerase genes.

Authors: Marko Premzl
Journal: Meta Gene Date: 2015-04-25

Review 10. The prolactin and growth hormone families: pregnancy-specific hormones/cytokines at the maternal-fetal interface.

Authors: Michael J Soares
Journal: Reprod Biol Endocrinol Date: 2004-07-05 Impact factor: 5.211

10 in total

3 in total

Third party data gene data set of eutherian growth hormone genes.

Direct link to deposited data

Experimental design, materials and methods

Gene annotations

Phylogenetic analysis

Protein molecular evolution analysis

Discussion

Supplementary data file 1

Supplementary data file 2

Supplementary data file 3

Supplementary data file 4

Supplementary data file 5

Supplementary data file 6

1. An intermediate grade of finished genomic sequence suitable for comparative analyses.

2. A standardized nomenclature for the mouse and rat prolactin superfamilies.

3. Growth hormone locus expands and diverges after the separation of New and Old World Monkeys.

4. Gene conversions in the growth hormone gene family of primates: stronger homogenizing effects in the Hominidae lineage.

5. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing.

6. The tertiary structure and backbone dynamics of human prolactin.

7. Contribution of individual histidines to the global stability of human prolactin.

8. Ancient origin of placental expression in the growth hormone genes of anthropoid primates.

9. Initial description of primate-specific cystine-knot Prometheus genes and differential gene expansions of D-dopachrome tautomerase genes.

Review 10. The prolactin and growth hormone families: pregnancy-specific hormones/cytokines at the maternal-fetal interface.

1. Comparative genomic analysis of eutherian tumor necrosis factor ligand genes.

2. Curated eutherian third party data gene data sets.

3. Revised eutherian gene collections.