| Literature DB >> 35981005 |
Megan Yu1.
Abstract
Rapid advancements in automated genomic technologies have uncovered many unique findings about the turtle genome and its associated features including olfactory gene expansions and duplications of toll-like receptors. However, despite the advent of large-scale sequencing, assembly, and annotation, about 40-50% of genes in eukaryotic genomes are left without functional annotation, severely limiting our knowledge of the biological information of genes. Additionally, these automated processes are prone to errors since draft genomes consist of several disconnected scaffolds whose order is unknown; erroneous draft assemblies may also be contaminated with foreign sequences and propagate to cause errors in annotation. Many of these automated annotations are thus incomplete and inaccurate, highlighting the need for functional annotation to link gene sequences to biological identity. In this study, we have functionally annotated two genes of the red-bellied short-neck turtle (Emydura subglobosa), a member of the relatively understudied pleurodire lineage of turtles. We improved upon initial ab initio gene predictions through homology-based evidence and generated refined consensus gene models. Through functional, localization, and structural analyses of the predicted proteins, we discovered conserved putative genes encoding mitochondrial proteins that play a role in C21-steroid hormone biosynthetic processes and fatty acid catabolism-both of which are distantly related by the tricarboxylic acid (TCA) cycle and share similar metabolic pathways. Overall, these findings further our knowledge about the genetic features underlying turtle physiology, morphology, and longevity, which have important implications for the treatment of human diseases and evolutionary studies.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35981005 PMCID: PMC9387794 DOI: 10.1371/journal.pone.0268031
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Summary table of 15 computational biology tools used to study the conservation, structure, function, localization, and phylogeny of genes within the E. subglobosa scaffold.
| Name | Type | Description | URL | Access Date | Citation | Reference |
|---|---|---|---|---|---|---|
|
| Database | Gene ontology |
| April 2021 | (AmiGO, RRID:SCR_002143) | [ |
|
| Software, algorithm | Environment for genome annotations, editing, and refinements to generate a consensus model |
| April 2021 | (Apollo, RRID:SCR_001936) | [ |
|
| Software, algorithm | Gene prediction |
| April 2021 | (Augustus, RRID:SCR_008417) | [ |
|
| Software, algorithm | Compares protein model to database of protein homologs in biologically similar species; phylogenetic tree analysis |
| April 2021 | (BLASTP, RRID:SCR_001010) | [ |
|
| Software, algorithm | Multiple sequence alignment; compares predictive regions of the genome with other homologous sequences; phylogenetic tree analysis |
| April 2021 | (Cobalt: Constraint-based Multiple Alignment Tool, RRID:SCR_004152) | [ |
|
| Software, algorithm | Protein function based on domains and classification into families; gene ontology |
| April 2021 | (InterPro, RRID:SCR_006695) | [ |
|
| Database | Provides information on assembled genomes of various organisms |
| April 2021 | (NCBI Assembly Archive Viewer, RRID:SCR_012917) | [ |
|
| Software, algorithm | Predicts whether the protein has transmembrane helices (TMHs) and signal peptides |
| May 2021 | (Phobius, RRID:SCR_0156643) | [ |
|
| Software, algorithm | Secondary structure prediction |
| May 2021 | (PSIPRED, RRID:SCR_010246) | [ |
|
| Software, algorithm | Signal peptide cleavage site prediction for organelle targeting |
| May 2021 | (SignalP, RRID:SCR_015644) | [ |
|
| Database | Looks at conservation of interacting genes across other species; protein-protein interaction networks; functional enrichment analysis |
| April 2021 | (STRING, RRID:SCR_005223) | [ |
|
| Software, algorithm | Secondary structure prediction; homology modeling of three-dimensional protein structure |
| April 2021 | (SWISS-MODEL, RRID:SCR_018123) | [ |
|
| Software, algorithm | Prediction of N-terminal presequences, signal peptides, and transit peptides |
| May 2021 | (TargetP, RRID:SCR_019022) | [ |
|
| Software, algorithm | Predicts whether the protein has transmembrane helices (TMHs) |
| April 2021 | (TMHMM Server, RRID:SCR_014935) | [ |
|
| Software, algorithm | Prediction of protein localization sites based on primary amino acid composition |
| April 2021 | (WoLF PSORT, RRID:SCR002472) | [ |
Fig 1Homology-based genome annotation of the cholesterol side-chain cleavage enzyme.
(A) Apollo gene editor view and AUGUSTUS track of the g19.t1 gene located within the ML679947.1 scaffold. (B) Graphical representation of query coverage across the top 10 BLAST hits on 10 subject sequences. Red means high conservation. (C) COBALT multiple sequence alignment demonstrating high conservation (red) across the homologs. Low conservation is colored gray. Exons (thick lines) and introns (thin lines) are shown. Query sequence is the top, while the subjects are below.
BLAST output of the g19.t1 sequence.
Top hits predicted a mitochondrial cholesterol side-chain cleavage enzyme with a high query coverage, high percent identity, and low E value.
| Description | Scientific Name | Max Score | Query Cover | E Value | Percent Identity (%) | Accession |
|---|---|---|---|---|---|---|
| cholesterol side-chain cleavage enzyme, mitochondrial |
| 935 | 100% | 0 | 87 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial |
| 931 | 100% | 0 | 86.62 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial isoform X1 |
| 931 | 100% | 0 | 86.42 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial isoform X1 |
| 927 | 100% | 0 | 86.23 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial |
| 927 | 100% | 0 | 85.66 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial |
| 917 | 100% | 0 | 85.28 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial |
| 915 | 100% | 0 | 85.09 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial isoform X1 |
| 905 | 100% | 0 | 83.59 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial |
| 893 | 100% | 0 | 82.41 |
|
| cholesterol side-chain cleavage enzyme, mitochondrial |
| 885 | 98% | 0 | 83.24 |
|
Fig 2Functional analysis of the cholesterol side-chain cleavage enzyme.
(A) InterPro functional analysis of the enzyme. (B) GO terms for the enzyme outputted by InterPro. (C) STRING network of predicted protein-protein interactions in H. sapiens. (D) List of functional partners predicted by STRING corresponding to C. (E) Gene co-occurrence of the protein. (F) BLAST phylogenetic tree built based on pairwise alignment.
Fig 3Subcellular localization of the cholesterol side-chain cleavage enzyme.
(A) Bar chart displaying WoLF PSORT prediction of the protein’s localization sites based on 32 nearest neighbors. Mito, mitochondria; cyto_mito, cytoplasm and mitochondria; cyto, cytoplasm; extr, extracellular. (B) TMHMM prediction of TMHs. X-axis represents the amino acid number, and y-axis represents the probability that the amino acid is located within, outside, or inside the membrane. Probabilities >0.75 are significant. (C) SignalP analysis of signal sequences existing in the amino acid sequence of the polypeptide. (D) Phobius predictions of TMHs and signal peptides. X-axis represents the amino acid number, and y-axis represents the probability that the amino acid is transmembrane, cytoplasmic, non-cytoplasmic, and/or a signal peptide. Probabilities >0.75 are significant. (E) TargetP-2.0 prediction of N-terminal pre-sequences, signal peptides, and transit peptides.
Fig 4Homology modeling and structural predictions of the mitochondrial cholesterol side-chain cleavage enzyme.
(A) Three-dimensional homology model built by SWISS-MODEL. Blue regions are highly conserved, while orange regions are less conserved. (B) Oligo-state, ligands, global quality estimates, template, sequence identity, and coverage outputted by SWISS-MODEL. (C) Local quality estimate showing pair residue estimates. Similarities >0.6 are high-quality models. (D) Comparison with non-redundant set of PDB structures showing QMEAN scores for experimental structures that have been deposited of similar size. The red star is our model. (E) Ramachandran plot showing the probability of a residue having a specific orientation. Dots in the dark green regions represent high probability and a high-quality model. (F) Secondary structure prediction through PSIPRED.
MolProbity results to validate the SWISS-MODEL prediction for CYP11A1.
A MolProbity Score close to 0 represents the resolution that one would see a structure of this quality. Clash score represents overlapping residues; a lower value is favored. Outliers represent values that extend outside the standard deviation; low values are also favored. Low values for bad bonds and angles are also favored.
|
| 1.44 |
|
| 2.99 |
|
| 95.91% |
|
| 0.86% |
|
| 6 |
|
| 0/3908 |
|
| 46/5287 |
Fig 5Homology-based genome annotation of the methylmalonyl-CoA epimerase (MCEE) enzyme.
(A) Apollo gene editor view and AUGUSTUS track of the g112.t1 gene located within the ML679947.1 scaffold. Bottom: initial ab initio prediction. Top: consensus gene model. (B) Graphical representation of query coverage across the top 7 BLAST hits on 7 subject sequences before (top) and after (bottom) genome editing. Red means high conservation, and magenta means moderate conservation. (C) COBALT multiple sequence alignment before (top) and after (bottom) genome editing, demonstrating high conservation (red) across the homologs. Low conservation is colored gray. Exons (thick lines) and introns (thin lines) are shown. Query sequence is the top, while the subjects are below.
BLAST output of the g112.t1 sequence before genome editing.
Top hits predicted mitochondrial methylmalonyl-CoA epimerase with a high query coverage, moderate percent identity, and low E value.
| Description | Scientific Name | Max Score | Query Cover | E Value | Percent Identity (%) | Accession |
|---|---|---|---|---|---|---|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 180 | 98% | 3.00E-55 | 58.62 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 178 | 98% | 1.00E-54 | 58.05 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 177 | 98% | 4.00E-54 | 58.05 |
|
| methylmalonyl-CoA epimerase, mitochondrial |
| 177 | 98% | 6.00E-54 | 58.05 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 176 | 98% | 1.00E-53 | 58.05 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 173 | 98% | 2.00E-52 | 56.32 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 168 | 98% | 1.00E-50 | 54.6 |
|
BLAST output of the g112.t1 sequence after genome editing.
Top hits also predicted mitochondrial methylmalonyl-CoA epimerase with a higher query coverage, higher percent identity, and lower E value.
| Description | Scientific Name | Max Score | Query Cover | E Value | Percent Identity (%) | Accession |
|---|---|---|---|---|---|---|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 340 | 100% | 9.00E-118 | 94.25 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 340 | 100% | 2.00E-117 | 94.25 |
|
| methylmalonyl-CoA epimerase, mitochondrial |
| 340 | 100% | 2.00E-117 | 94.25 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 337 | 100% | 2.00E-116 | 93.68 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 336 | 100% | 6.00E-116 | 93.68 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 336 | 100% | 6.00E-116 | 93.1 |
|
| methylmalonyl-CoA epimerase, mitochondrial isoform X1 |
| 329 | 100% | 2.00E-113 | 90.23 |
|
Fig 6Functional analysis of the MCEE enzyme.
(A) InterPro functional analysis of the enzyme. (B) STRING network of predicted protein-protein interactions in H. sapiens. (C) List of functional partners predicted by STRING corresponding to B. (D) Gene co-occurrence of the enzyme. (E) BLAST phylogenetic tree built based on pairwise alignment.
Fig 7Subcellular localization of the MCEE enzyme.
(A) Bar chart showing WoLF PSORT prediction of the protein’s localization sites based on 32 nearest neighbors. Mito, mitochondria; pero = peroxisome; cyto = cytoplasm; extr = extracellular; cyto-nucl, cytoplasm and nucleus. (B) TMHMM prediction of TMHs. X-axis represents the amino acid number, and y-axis represents the probability that the amino acid is located within, outside, or inside the membrane. Probabilities >0.75 are significant. (C) SignalP analysis of signal sequences existing in the amino acid sequence of the polypeptide. (D) Phobius predictions of TMHs and signal peptides. X-axis represents the amino acid number, and y-axis represents the probability that the amino acid is transmembrane, cytoplasmic, non-cytoplasmic, and/or a signal peptide. Probabilities >0.75 are significant. (E) TargetP-2.0 prediction of N-terminal pre-sequences, signal peptides, and transit peptides.
Fig 8Homology modeling and structural predictions of the MCEE enzyme.
(A) Three-dimensional homology model built by SWISS-MODEL. Blue regions are highly conserved, while orange regions are less conserved. (B) Oligo-state, ligands, global quality estimates, template, sequence identity, and coverage outputted by SWISS-MODEL. (C) Local quality estimate showing pair residue estimates. Similarities >0.6 are high-quality models. (D) Comparison with non-redundant set of PDB structures showing QMEAN scores for experimental structures that have been deposited of similar size. The red star is our model. (E) Ramachandran plot showing the probability of a residue having a specific orientation. Dots in the dark green regions represents high probability and a high-quality model. (F) Secondary structure prediction through PSIPRED.
MolProbity results to validate the SWISS-MODEL prediction for MCEE.
A MolProbity Score close to 0 represents the resolution that one would see a structure of this quality. Clash score represents overlapping residues; a lower value is favored. Outliers represent values that extend outside the standard deviation; low values are also favored. Low values for bad bonds and angles are also favored.
|
| 1.4 |
|
| 5.07 |
|
| 97.31% |
|
| 0.77% |
|
| 1 |
|
| 0/2070 |
|
| 13/2794 |