| Literature DB >> 24951090 |
Syed Nabeel-Shah, Kanwal Ashraf, Ronald E Pearlman, Jeffrey Fillingham1.
Abstract
BACKGROUND: NASP is an essential protein in mammals that functions in histone transport pathways and maintenance of a soluble reservoir of histones H3/H4. NASP has been studied exclusively in Opisthokonta lineages where some functional diversity has been reported. In humans, growing evidence implicates NASP miss-regulation in the development of a variety of cancers. Although a comprehensive phylogenetic analysis is lacking, NASP-family proteins that possess four TPR motifs are thought to be widely distributed across eukaryotes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24951090 PMCID: PMC4082323 DOI: 10.1186/1471-2148-14-139
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Phylogenetic tree of NASP proteins from different eukaryotic lineages reconstructed using TPR 1–4 amino acid sequences. Tree topology and branch lengths correspond to Bayesian inferences. The average standard deviation of split frequencies from two runs was 0.009. Posterior probability values are indicated in bold-face and underlined whereas bootstrap values (based on 1000 replicates) for the ML tree are indicated in light-face and are only reported when at least ≥ 50%. Different taxonomic groups are indicated in the right margin. T. cruzi and T. brucei were used as out groups to root the tree.
Average number of amino acid and nucleotide variations along with average synonymous ( ) and non-synonymous ( ) differences per site among NASP lineages from various taxonomic groups
| Vertebrate | 0.430±0.016 | 0.353±0.008 | 0.617±0.010 | 0.263±0.011 | 0.98 | 21.928** |
| Tunicata | 0.602±0.023 | 0.417±0.012 | 0.715±0.032 | 0.323±0.019 | 0.87 | 10.599** |
| Arthropoda | 0.561±0.012 | 0.435±0.008 | 0.611±0.012 | 0.375±0.011 | 1.06 | 11.673** |
| Nematoda | 0.451±0.020 | 0.365±0.010 | 0.676±0.020 | 0.265±0.014 | 1.05 | 16.227** |
| Fungi | 0.726±0.014 | 0.578±0.008 | 0.765±0.010 | 0.522±0.011 | 0.56 | 14.94** |
| Plants | 0.484±0.017 | 0.400±0.013 | 0.620±0.027 | 0.327±0.017 | 0.8 | 8.617** |
| Ciliates | 0.689±0.019 | 0.465±0.011 | 0.631±0.024 | 0.426±0.015 | 0.6 | 7.19** |
| Apicomplex | 0.531±0.020 | 0.401±0.011 | 0.655±0.021 | 0.315±0.015 | 0.9 | 12.447** |
| Euglenozoa | 0.423±0.024 | 0.367±0.014 | 0.703±0.030 | 0.256±0.016 | 0.87 | 12.856** |
pAA,pNT,pS, and pN, represent average number of amino acid, nucleotide, synonymous and non-synonymous nucleotide differences per site when calculated using the entire protein/nucleotide coding sequence along with the Z-test of selection. SE indicates standard error based on 1000 bootstrap replicates.
aaverage transition/transversion ratio.
bH1: p
P< 0.001.
Average number of amino acid and nucleotide variations among different TPR domains
| TPR1 | 0.764±0.026 | 0.599±0.015 | 0.688±0.009 | 0.568±0.022 | 0.9# | 4.602*** |
| TPR2 | 0.617±0.036 | 0.499±0.021 | 0.665±0.016 | 0.437±0.027 | 1# | 7.098*** |
| TPR3 | 0.682±0.034 | 0.530±0.022 | 0.687±0.013 | 0.476±0.031 | 0.88# | 5.646*** |
| TPR4 | 0.762±0.028 | 0.582±0.016 | 0.633±0.015 | 0.564±0.023 | 1.2# | 2.105** |
| Entire Protein | 0.729±0.012 | 0.562±0.008 | 0.725±0.006 | 0.510±0.011 | 0.64 | 16.823*** |
pAA,pNT,pS, and pN, represent the average number of amino acid, nucleotide, synonymous and non-synonymous nucleotide differences per site along with the Z-test of selection. SE indicates standard error based on 1000 bootstrap replicates.
aaverage transition/transversion ratio.
#calculated using Maximum composite likelihood method in this case.
bH1:p
P< 0.001;
P<0.05.
Codon usage bias referred to as the effective number of codons (ENC) estimated in NASP discriminating different taxonomic groups
| Vertebrate | 50.69±1.41 |
| Tunicate | 49.89±2.58 |
| Arthropod | 54.64±3.43 |
| Nematode | 51.13±2.59 |
| Fungi | 50.98±5.82 |
| Plant | 53.99±2.82 |
| Ciliate | 40.73±2.37 |
| Apicomplexa | 53.18±5.97 |
| Euglenozoa | 53.78±2.63 |
Figure 2Relationship between the genomic GC content and GC-rich (GAPW) and GC-poor (FYMINK) residues. The relationship between most represented residues in each class (Alanine and Lysine) and GC content is also shown.
Genomic GC correlation with GC-rich and GC-poor amino acids in NASP family proteins
| Genomic GC vs. GAPW | -0.0896 | 0.899 |
| Genomic GC vs. FYMINL | 0.0634 | 0.968 |
| Genomic GC vs. Alanine | -0.188 | 0.749 |
| Genomic GC vs. Lysine | 0.0405 | 0.796 |
Figure 3Phylogenetic relationship among different NASP paralogs inferred by ML, MP and Bayesian methods. The tree topology corresponds to ML estimations under the Tamura-Nei model. Branch lengths do not reflect genetic distance. Duplicated genes are referred to as NASP1 and NASP2 (see text for details). Confidence values for ML and MP trees are based on 1000 bootstrap replicates for each method and are indicated (≥50%) in light-face and bold-face (underlined), respectively. Bayesian posterior probability values are indicated in red (≥50%).
Figure 4Comparison of %GC3 and ENC values among NASP paralogs. A- GC content at the third codon position of NASP paralogs among different lineages. The box signifies estimated values for %GC3 in fish lineages. B- The extent of codon usage bias referred to as ENC values among two NASP paralogs. The box indicates ENC for fish lineages.
Figure 5Multiple sequence alignments of the N-terminal region for NASP paralogs found in different fish lineages. The NNR region is highlighted in box whereas the out group lineage of Halocynthia roretzi is indicated in black. The asterisks (*) indicate the predicted conserved phosphorylation sites.
GC
content at fourfold degenerate sites was assumed to represent the genomic GC content providing that the latter has previously been shown to be a good approximation of the former [79,80]. In addition, GC content at four-fold degenerate sites was used as an approximation to the neutral expectation. Spearman rank correlation coefficient was used to compute the correlations and standard regression analysis was conducted for statistical significance. A comparison of nucleotide frequencies at the first codon position (always non-synonymous for the residues studied here) and at fourfold degenerate codon positions (always synonymous) was carried out in order to assess the influence of mutation and selection bias. Statistical significance of the results was also assessed by conducting student’s t test. In addition, estimation of the positive selection operating on individual codons was conducted using an ML-based method. We used the HyPhy program which is implemented in MEGA5. It involves the ML reconstructions of ancestral states under a Muse-Gaut model [111] of codon substitution. A nucleotide substitution model ‘GTR’ was also used as selected by MEGA and a user defined tree topology was provided. For detecting positively selected codons, the test statistic of dN –dS was used, where dS is the number of synonymous substitutions per site and dN is the number of nonsynonymous substitutions per site. Results were considered statistically significant for positive selection if the probability value was less than 0.05. Furthermore, the overall nucleotide variations (average number of synonymous and non-synonymous diversity per site) within TPRs 1–4 was estimated using a slide window approach with a window length of 50 bp and step size of 10 bp. The estimation of codon usage bias among NASP genes was conducted as the effective number of codons (ENC). Both analyses were carried out using the program DnaSP v5. In addition, relative synonymous codon usage (RSCU) [112] was estimated by MEGA using nucleotide coding sequences of the human NASP gene. To this end, an RSCU value greater than 1 indicates that a particular codon is used more frequently than expected whereas a value less than 1 indicates the reverse. Accordingly, an RSCU value of 1 indicates no codon bias [106].