Literature DB >> 33287696

Bioinformatic analysis and functional predictions of selected regeneration-associated transcripts expressed by zebrafish microglia.

Ousseini Issaka Salia1,2,3, Diana M Mitchell4.   

Abstract

BACKGROUND: Unlike mammals, zebrafish have a remarkable capacity to regenerate a variety of tissues, including central nervous system tissue. The function of macrophages in tissue regeneration is of great interest, as macrophages respond and participate in the landscape of events that occur following tissue injury in all vertebrate species examined. Understanding macrophage populations in regenerating tissue (such as in zebrafish) may inform strategies that aim to regenerate tissue in humans. We recently published an RNA-seq experiment that identified genes enriched in microglia/macrophages in regenerating zebrafish retinas. Interestingly, a small number of transcripts differentially expressed by retinal microglia/macrophages during retinal regeneration did not have predicted orthologs in human or mouse. We reasoned that at least some of these genes could be functionally important for tissue regeneration, but most of these genes have not been studied experimentally and their functions are largely unknown. To reveal their possible functions, we performed a variety of bioinformatic analyses aimed at identifying the presence of functional protein domains as well as orthologous relationships to other species.
RESULTS: Our analyses identified putative functional domains in predicted proteins for a number of selected genes. For example, we confidently predict kinase function for one gene, cytokine/chemokine function for another, and carbohydrate enzymatic function for a third. Predicted orthologs were identified for some, but not all, genes in species with described regenerative capacity, and functional domains were consistent with identified orthologs. Comparison to other published gene expression datasets suggest that at least some of these genes could be important in regenerative responses in zebrafish and not necessarily in response to microbial infection.
CONCLUSIONS: This work reveals previously undescribed putative function of several genes implicated in regulating tissue regeneration. This will inform future work to experimentally determine the function of these genes in vivo, and how these genes may be involved in microglia/macrophage roles in tissue regeneration.

Entities:  

Keywords:  Bioinformatic analysis; Functional predictions; Microglia; RNAseq; Regeneration; Retina; Transcripts; Zebrafish

Mesh:

Substances:

Year:  2020        PMID: 33287696      PMCID: PMC7720500          DOI: 10.1186/s12864-020-07273-8

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Tissue regeneration allows restoration of the function of damaged tissues and organs. Mammals have the ability to regenerate a limited number of tissues and organs like skin [1, 2], skeletal muscle [3, 4] and liver [5, 6]. Unfortunately, injuries or disease of the central nervous system (CNS) resulting in neuronal loss cannot regenerate neurons in mammals [7-12]. In contrast, zebrafish (Danio rerio) have the ability to regenerate numerous different tissues, including tissue in the central nervous system [10, 12–19]. For example, zebrafish can regenerate damaged retinal neurons, which restores visual function [20]. In all species examined, macrophage populations appear to be crucial to tissue regeneration [21-30], though in the mammalian CNS they appear to instead engage in pathological functions [31-35]. In vertebrates, the retina lies at the back of the eye and is a stereotypically organized part of the CNS that is composed of neural and glial cell types that are laminated into 3 distinct nuclear layers. Evidence strongly indicates that Müller glia are the source of regenerated retinal neurons in zebrafish [12, 36–42]. In both zebrafish and mammals, resident microglia respond to retinal injury and degeneration. This may lead to immune-Müller glia crosstalk that may shape Müller glia reaction to retinal injury [43-45]. The zebrafish is a relatively new, and powerful, vertebrate model in microglial biology [10, 30, 46–51]. In particular, microglia and macrophage functions in the regeneration of CNS tissue, such as in the zebrafish retina, is just beginning to be explored. Our recent work has used the zebrafish towards an understanding of microglia and macrophage responses to acute, widespread retinal lesion in zebrafish [30, 51]. In particular, our transcriptome analysis [30] has provided a rich dataset to facilitate an understanding of gene expression in microglia/macrophages in a context of successful CNS regeneration. In order to translate our transcriptome findings in zebrafish [30] to mammals, we examined predicted orthology of differentially expressed genes (DEGs) enriched in zebrafish microglia/macrophages during retinal regeneration. We found that nearly all of the genes examined had predicted orthologs in mouse and human. However, several of these genes did not. Further, the putative function of these genes is largely unknown. As these “non-orthologous” genes comprise a portion of the microglia/macrophage regeneration-associated transcriptome [30], a better understanding of their predicted gene products will facilitate a greater understanding of the similarities and differences in fish and mammalian response to retinal injury. We reason that these genes could play functional importance in determining the outcome of tissue regeneration in zebrafish, and so functional predictions for these genes is necessary to inform future experimental work. This knowledge will also help us better understand evolutionary relationships between mammalian and teleost immunity. For twelve selected genes without clear human or mouse orthologues, we performed a variety of bioinformatic analyses aimed to identify functional protein domains. These analyses included identification of protein domains and Gene Ontology (GO) analysis, sequence similarity comparisons, and predicted protein structure. In addition, we used synteny analysis which failed to find evidence of orthologous genes in human and mouse genomes. However, sequence similarity comparisons to find similar genes in other vertebrate species with well described regenerative capacity (Axolotl, Xenopus, Salamander) indicated possible orthologs for several of the genes of interest. We also examined several other published gene expression datasets to determine if these genes showed informative expression patterns in other contexts of tissue regeneration, or if these genes might also be differentially expressed in macrophages responding to microbial infection. The work presented here is informative for several zebrafish genes of previously unknown function, providing a foundation for future experimental work to test gene function in vivo. In addition, only one of these twelve genes was previously described to be differentially expressed in macrophages responding to microbial infection, suggesting that these genes indeed have importance to tissue regeneration and not only macrophage responses in general. These results have provided further insight into the transcriptome of zebrafish macrophages in the context of tissue regeneration.

Results

Selection of genes expressed in zebrafish microglia/macrophages for further bioinformatics analyses

We previously described a set of 970 genes enriched in in mpeg1+ cells (representing microglia and macrophage populations) compared to other retinal cell types in regenerating zebrafish retinas [30]. Of these genes, 409 of them comprised a list that we considered to be “regeneration-associated” transcripts. These particular 409 transcripts were considered to be “regeneration associated” because they were enriched in microglia/macrophages isolated from regenerating retinal tissue, but were not found to be enriched in resting/steady-state zebrafish brain microglia in another published study [30, 52]. Each gene in this list of 409 “regeneration-associated” transcripts was examined for predicted orthology in mouse and human species using the DRSC integrative ortholog prediction tool. Most genes returned predicted orthologues in mouse and/or human (Supplemental File 1). However, twelve (12) of these genes did not show predicted orthology to human or mouse genes with this analysis and were therefore selected for further bioinformatic analysis (Table 1, denoted P1-P12 throughout the manuscript). We reasoned that these twelve transcripts could be part of a transcriptional program executed in microglia/macrophages during CNS regeneration, and therefore could be important in understanding similarities and differences in mammalian vs. zebrafish outcomes following tissue damage.
Table 1

Transcripts enriched in zebrafish microglia/macrophages during retinal regeneration, without readily predicted human or mouse orthologs

Gene IDaModLog2FCbZebrafish SymbolcZFIN IDEnsembl IDChromosomeGene lengthProtein length
P16.03si:dkey-181f22.4ZDB-GENE-160728-126ENSDARG0000010564379695 bp513 aa
P25.17si:ch73-112 l6.1ZDB-GENE-091204-14ENSDARG000000931262117,924 bp1025 aa
P32.92zgc:174863ZDB-GENE-080204-87ENSDARG0000009947667668 bp290 aa
P42.14si:dkey-56 m19.5ZDB-GENE-030131-226ENSDARG0000006843274453 bp526 aa
P57.91si:ch211-105j21.9ZDB-GENE-131127-499ENSDARG0000009784562369 pb294 aa
P64.47si:ch73-248e21.7ZDB-GENE-120215-231ENSDARG0000009633133403 bp480 aa
P73.56si:ch211-191j22.3ZDB-GENE-030131-4242ENSDARG00000095459212682 bp99 aa
P87.87si:ch73-256j6.2ZDB-GENE-070705-223ENSDARG00000071653227566 bp210 aa
P97.74urp1ZDB-GENE-100922-138ENSDARG00000093493142696 bp154 aa
P105.32xcl32a.1ZDB-GENE-070912-31ENSDARG0000009390621199 bp126 aa
P116.06si:ch211-287n14.3ZDB-GENE-131120-146ENSDARG0000009365018165,070 bp1809 aa
P122.03phoZDB-GENE-030131-5935ENSDARG00000035133516,478 bp2798 aa

aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis

bMod Log2FC = Moderated Log2(Fold-Change), which is the log-ratio of the transcript’s expression values between microglia/macrophages and other retinal cells, corrected for lowly expressed transcripts, as determined in [30]

cZebrafish Symbol corresponds to the symbol attributed to each gene by the ZFIN Zebrafish Nomenclature Conventions (https://wiki.zfin.org, [53] and Ensembl ID the symbol attributed by Ensembl (https://www.ensembl.org/, [54]. The prefix “Zgc:” indicates that this gene is represented by cDNAs generated by the ZGC project (https://wiki.zfin.org). The prefix “si” Sanger institute and indicates that this institution identified the gene. aa amino acid

†Previously reported as “NA” in [30] with the same Esembl ID; has been updated here to current zebrafish symbol and ZFIN ID

Transcripts enriched in zebrafish microglia/macrophages during retinal regeneration, without readily predicted human or mouse orthologs aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis bMod Log2FC = Moderated Log2(Fold-Change), which is the log-ratio of the transcript’s expression values between microglia/macrophages and other retinal cells, corrected for lowly expressed transcripts, as determined in [30] cZebrafish Symbol corresponds to the symbol attributed to each gene by the ZFIN Zebrafish Nomenclature Conventions (https://wiki.zfin.org, [53] and Ensembl ID the symbol attributed by Ensembl (https://www.ensembl.org/, [54]. The prefix “Zgc:” indicates that this gene is represented by cDNAs generated by the ZGC project (https://wiki.zfin.org). The prefix “si” Sanger institute and indicates that this institution identified the gene. aa amino acid †Previously reported as “NA” in [30] with the same Esembl ID; has been updated here to current zebrafish symbol and ZFIN ID

Summary of results from bioinformatic analyses

A number of bioinformatic analyses were performed for the twelve genes of interest shown in Table 1 (methods summarized in Materials and Methods), and are summarized in Fig. 1. The species included in the results from these analyses are shown in Supplemental Figure 1. Protein domain and GO term were found for nine genes and largely included terms involved in immune system (Table 2). Orthologs found by sequence similarity arise from several species, mainly vertebrates (Supplemental Figure 1, Table 3); several are associated with the immune system or soluble signaling (Table 3) and the best-matched proteins are most frequently from species of fish, with occasional hits in mouse or human (Table 4). Overall, the results found for the sequence similarity and best-matched ortholog approach are consistent with the results found with the protein domain and gene ontology (GO) term approach (Tables 2, 3, 4). The three dimensional structure of the protein, or lack thereof, is known to determine protein function [56]. Of the genes studied here, two of these (P4 and P12 (pho)) are predicted to have greater than 50% disordered amino acids, and thus are likely to code for unstructured proteins (Supplemental Figure 2). We predicted three-dimensional (3D) structure using homology modeling (Table 5, Figs. 2, 3, 4, 5 and 6). The results are consistent with sequence similarity and protein domain/GO results for several genes of interest. In addition, structural similarity was informative for genes that did not return results with previous analyses (e.g. P2, P7, and P12). Synteny analysis compared to human and mouse genome returned results for only one gene (P4, with hit in human genome, Supplemental Figure 3), though based on sequence comparison this gene did not align with the candidate gene in the identified human chromosomal region. Comparison to other vertebrate species with described capacity for tissue regeneration (Ambystoma mexicanum, Xenopus laevis, Xenopus tropicalis and Cynops pyrrhogaster) returned putative orthologs of several of these genes (Table 6 and Supplemental Table 1) indicating that they may have conserved function across these species. More detailed descriptions of findings regarding P1-P12 are provided next.
Fig. 1

Overview of Bioinformatic Analysis for Functional Predictions. The diagram shows an overview of the bioinformatic analyses performed in order to make functional predictions about the genes of interest based on (a) the predicted amino acid sequence, b predicted protein structure, and (c) genomic comparisons with selected species. The bioinformatic tool used for each type of analysis is indicated. Multiple approaches were used in order to obtain informational results for each gene of interest and to increase confidence in the overall predictions

Table 2

Protein domain and gene ontology (GO) term

Gene IDaProtein domainsBiological processMolecular function
P1Protein kinase and CARDb domainProtein phosphorylation, Regulation of apoptotic process,Protein kinase,
Oligodendrocyte developmentATP binding
P2nonenonenone
P3Immunoglobulin-likeCell adhesion,none
Viral entry into host cell
P4Ribonuclease E/Gnonenone
P5MGC-24c and Mucin15nonenone
P6nonenonenone
P7nonenonenone
P8Immunoglobulin-likenonenone
P9Urotensin IIRegulation of blood pressure,Hormone
Regulation of blood vessel diameter
P10Chemokine interleukin-8-likeImmune responsechemokine
P11P-type trefoil, Galactose mutarotase,Carbohydrate metabolic processHydrolyzing O-glycosyl compounds,
Glycoside hydrolaseCarbohydrate binding,
N-6 Adenine-specific DNA methylases
P12Coiled coilNeuromast regenerationnone

The protein domains and gene ontology (GO) terms found to be associated with the 12 predicted zebrafish proteins of interest

aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis

bCARD caspase activation and recruitment domain

cMGC-24 Multi-glycosylated core protein 24

Table 3

Orthologs and their species of origin identified by amino acid sequence similarity using EGGNOG

Gene IDaOrtholog IDFunctionEvaluebSpecies
P1ENSLACG00000022667Protein tyrosine kinase1.23e-200Latimeria chalumnae
MOSv-mos Moloney murine sarcoma viral oncogene homolog1.86e-27Xenopus (Silurana) tropicalis
BLKB lymphoid tyrosine kinase1.03e-11Takifugu rubripes
Mst1rMacrophage stimulating 1 receptor2.07e-7Mus musculus
CSF1RColony stimulating factor 1 receptor5.21e-4Xenopus (Silurana) tropicalis
P2JGI99580Unknown6.68e-259Branchiostoma floridae
P3ENSGMOG00000016627Unknown1.5e-127Gadus morhua
ENSLACG00000005016Immunoglobulin V-set domain3.08e-10Latimeria chalumnae
PDGFRBGrowth factor receptor6.45e-7Xenopus (Silurana) tropicalis
NPHS1Nephrosis 1, congenital, Finnish type (nephrin)1.97e-5Xenopus (Silurana) tropicalis
LOC414035Lachesin9.06e-5Apis mellifera
P4BASP1Unknown1.63e-5Oryzias latipes
P5ENSXMAG00000002763Unknown7.04e-17Xiphophorus maculatus
JGI72098SH32.17e-4Phanerochaete chrysosporium
PTPRAProtein tyrosine phosphatase, receptor type, A8.33e-4Xenopus (Silurana) tropicalis
P6ARC2CD46 molecule, complement regulatory protein8.30e-4Xenopus (Silurana) tropicalis
P7ENSXMAG00000014998Unknown9.61e-44Xiphophorus maculatus
P8ENSLACG00000014033CD84 molecule1.05e-112Latimeria chalumnae
ENSXMAG00000015872Lymphocyte antigen 92.03e-77Xiphophorus maculatus
ENSGALG00000007355Immunoglobulin V-set domain1.22e-09Latimeria chalumnae
CEACAM6Carcinoembryonic antigen-related cell adhesion molecule1.41e-09Takifugu rubripes
HMCN1Hemicentin3.28e-06Xenopus (Silurana) tropicalis
P9ENSXMAG00000013611Urotensin II2.24e-70Xiphophorus maculatus
P10ENSG00000143185Chemokine (C motif) ligand3.22e-14Gorilla gorilla
ENSXMAG00000019244Small cytokines (intecrine/chemokine), interleukin-8 like3.86e-6Xiphophorus maculatus
P11GANABGlucosidase, alpha1.38e-307Xenopus (Silurana) tropicalis
P12No orthologs found

Orthologs found for the studied genes using the protein sequence similarity approach EggNOG 4.5.1 [55]

aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis

bThe Expect value (E-value) or random background noise is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is

Table 4

Best-matched orthologs and their species of origin identified using SmartBLAST protein sequence analysis

Gene IDaAccession IDOrthologsEvaluebQuery covercIdentitydSpecies
P1NP_003812.1Receptor-interacting serine/threonine-protein kinase 2 isoform 12.00e-3994%27.54%Homo sapiens
NP_620402.1Receptor-interacting serine/threonine-protein kinase 2 isoform 16.00e-3789%28.74%Mus musculus
P2XP_005164418.2Uncharacterized protein LOC1018859500.0095%54.14%Danio rerio
XP_017210637.2Uncharacterized protein LOC1081791492.00e-16479%37.53%Danio rerio
XP_021326567.1Uncharacterized protein LOC1018850875.00e-15174%37.47%Danio rerio
P3XP_005166230.1Uncharacterized protein LOC100136852 isoform X20.00100%54%Danio rerio
XP_016100849.1PREDICTED: uncharacterized protein LOC107561032 isoform X31.00e-11398%58.82%Danio rerio
NP_001076332.2Junctional adhesion molecule 3b2.00e-0233%29.41%Danio rerio
P4XP_026123653.1Uncharacterized protein LOC113106193 isoform X14.00e-177100%62.04%Carassius auratus
XP_016389660.1PREDICTED: cell surface glycoprotein 1-like isoform X41.00e-173100%64.76%Sinocyclocheilus rhinocerous
XP_016333309.1PREDICTED: serine-aspartate repeat-containing protein I-like isoform X12.00e-165100%63.72%Sinocyclocheilus anshuiensis
XP_016105136.1PREDICTED: calphotin-like3.00e-164100%62.79%Sinocyclocheilus grahami
P5ROL44899.1Hypothetical protein DPX16_91116.00e-121100%63.40%Anabarilius grahami
XP_016143106.1PREDICTED: uncharacterized protein LOC1075968009,00e-115100%63.19%Sinocyclocheilus grahami
XP_016395950.1PREDICTED: uncharacterized protein LOC107729778 isoform X25.00e-113100%62.50%Sinocyclocheilus rhinocerous
XP_018973499.1PREDICTED: uncharacterized protein LOC109104670 isoform X23.00e-110100%61.69%Cyprinus carpio
P6XP_016397186.1PREDICTED: cell wall protein RTB1-like1.00e-12291%54.81%Sinocyclocheilus rhinocerous
XP_016343246.1PREDICTED: mucin-5 AC-like2.00E-12291%55.03%Sinocyclocheilus anshuiensis
XP_016091956.1PREDICTED: mucin-5 AC-like3,00E-10691%51.01%Sinocyclocheilus grahami
XP_016124548.1PREDICTED: cell wall protein DAN4-like6,00E-10592%52.30%Sinocyclocheilus grahami
P7RXN26987.1Hypothetical protein ROHU_0204409,00E-65100%87.88%Labeo rohita
KTG33652.1Hypothetical protein cypCar_000014892,00E-64100%87.88%Cyprinus carpio
XP_026090693.1Uncharacterized protein LOC1130642452,00E-63100%86.87%Carassius auratus
ROL47558.1Hypothetical protein DPX16_132736,00E-63100%86.87%Anabarilius grahami
KAA0720020.1Hypothetical protein E13015,00E-58100%78.43%Triplophysa tibetana
P8XP_009294219.1uncharacterized protein si:ch211-239 m17.1 isoform X42,00E-14193%98.48%Danio rerio
P9KTG45257.1Hypothetical protein cypCar_000116567,00E-9095%85.03%Cyprinus carpio
ROL51783.1Hypothetical protein DPX16_193022.00e-8882%94.49%Anabarilius grahami
TRY88805.1Hypothetical protein DNTS_0150194,00E-87100%77.27%Danionella translucida
P10NP_001108533.1Chemokine (C-X-C motif) ligand 32b, duplicate 1 precursor5,00E-1071%35.16%Danio rerio
NP_003166.1Cytokine SCM-1 beta precursor5,00E-0868%27.91%Homo sapiens
NP_032536.1Lymphotactin precursor1,00E-0575%27.27%Mus musculus
NP_002986.1Lymphotactin precursor3,00E-0768%27.91%Homo sapiens
NP_067418.1C-C motif chemokine 8 precursor2,00E-0567%32.61%Mus musculus
P11XP_016428050.1Maltase-glucoamylase, intestinal isoform 20.0098%57.17%Homo sapiens
NP_001074606.1Sucrase-isomaltase, intestinal0.0099%55.67%Mus musculus
P12AAI28789.1Zgc:165381 protein0.0026%100%Danio rerio

aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis

bE value: The Expect value (E-value) is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is

cQuery cover is the percentage of the query’s sequence (zebrafish gene of interest) that overlaps the subject’s sequence (returned orthologs)

dIdentity is calculated as the percentage of characters (amino acid) within the covered part of the query that are identical

Table 5

Protein structure analysis

GeneIDaTemplate IDbFunctionGMQEcCoveragedIdentitye
P16fu5.1.BReceptor-interacting Serine/threonine-protein kinase 20.3455%30.50%
3sd0.1.AGlycogen synthase kinase-3 beta0.3558%19.26%
4xlv.1.AInsulin receptor0.3251%23.19%
P2No templates were found matching the sequence
P33of6.1.AT cell receptor beta chain0.3870%19.31%
5fhx.1.CAntibody fragment light chain0.3872%14.35%
6bpc.1.BMonoclonal antibody 4F7 Fab heavy chain0.3469%15.50%
P4No templates were found matching the sequence
P5No templates were found matching the sequence
P6No templates were found matching the sequence
P7No templates were found matching the sequence
P86e56.1.BAntibody pn132p2C050.4990%21.93%
5n4g.1.AHeavy Chain0.4993%23.08%
P9No templates were found matching the sequence
P101j8i.1.ALymphotactin0.4260%30.26%
1ncv.1.BMonocyte chemoattractant protein 30.4159%32.43%
5eki.5.AC-C motif chemokine 210.4055%27.54%
P113top.1.AMaltase-glucoamylase, intestinal0.4549%59.66%
3lpo.1.ASucrase-isomaltase0.4448%57.04%
5nn3.1.ALysosomal alpha-glucosidase0.3846%41.65%
P12No templates were found matching the sequence

Protein structure analysis using SWISS-MODEL [57]

aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis

bTemplate ID: 3D structure found that modeled the zebrafish protein of interest

cGMQE: Global Model Quality Estimation [58], which is the quality estimation of the model taking account properties from the target–template alignment and the template search method. GMQE is a number between 0 and 1. Higher numbers indicate higher reliability. A cut-off of GMQE> 0.3 was applied

dCoverage: The percentage of the query’s sequence (P1 to P12) that overlaps the Template sequence

eIdentity is the percentage of characters (amino acid) within the covered part of the query that are identical. Template ID correspond to the name of the template (Ortholog) in the Protein Data Bank (https://www.rcsb.org/ [59];)

Fig. 2

Homology model of P1 putative kinase domain. The kinase domain of Receptor-interacting serine/threonine-protein kinase 2 (RIPK2, 6fu5.1.B in the rcsb protein database) is the template used for the homology modelling of P1. The X-RAY diffraction 3.26 Å was used to determine the experimental structure of 6fu5.1 [60]. The blue color show regions of the model where P1 was well-modeled and orange regions where P1 was poorly modeled. The well-modeled regions (blue) are regions where P1 is likely to be similar to the experimental 3D structure of the template. The homology model pertains to the putative kinase domain of P1 and starts from P1 residue N°3 (GLN, Glutamine) and ends with the residue N° 284 (LYS, Lysine)

Fig. 3

Homology model of P3. T cell receptor beta chain (3of6.1.A in the rcsb protein database) is the template used for the homology modelling of P3. The homology model starts from the P3 residue N°32 (THR, Threonine) and ends with the residue N° 245 (THR, Threonine). The X-RAY diffraction 2.80 Å was used to determine the experimental structure of 3of6.1.A [61]. The blue color show regions of the model in which P3 was well-modeled by the template, and orange regions where P3 was poorly modeled. The blue regions correspond to the T cell receptor beta chain immunoglobulin domains

Fig. 4

Homology model of P10 chemokine interleukin-8-like domain. Lymphotactin (1j8i.1.A in the rcsb protein database) is the template used for the homology modelling of P10. The homology model starts from P10 residue N°24 (GLU, Glutamic acid) and ends with the residue N° 102 (SER, Serine). The NMR spectroscopy was used to determine the experimental structure of 1j8i.1.A [62]. The blue color show regions of the model where P10 was well modeled and orange regions where P10 was poorly modeled. The chemokine interleukin-8-like domain of the model starts with P10 amino acid at position N°27(HIS, Histidine) and ends with amino acid at position N°86 ((LEU, Leucine). This region includes both well-modeled (blue) and poorly-modeled (orange) sections

Fig. 5

Homology model of P11. Maltase-glucoamylase, intestinal (3top.1.A in the rcsb protein database) is the template used for the homology modelling of P11. The X-RAY diffraction 2.9 Å was used to determine the experimental structure of 3top.1.A [63]. The homology model starts from P11 residue N°922 (LYS, Lysine) and ends with the residue N° 1804 (PHE, Phenylalanine). The P-type trefoil domain (amino acid N°51–962), galactose mutaros domain (amino acid N°114–1085), and glycoside hydrolase domain (amino acid N°225–1152) are not covered in the homology model. The blue color show regions of the model where P11 was well modeled and orange regions show where P11 was poorly modeled

Fig. 6

Expression level of selected zebrafish genes in other published studies. Expression level of selected zebrafish genes (P1, P9, and P12) in other published RNA-seq datasets of (a) zebrafish heart regeneration [64], and (b) zebrafish brain microglia [52] using the Zf Regeneration Database (www.zfregeneration.org, [65]). The y-axis indicates the normalized transcript level expressed as fpkm (fragments per kilobase of exon per million reads). On the x-axis is the different experimental conditions. (A,  dpa =  days post injury. B, active microglia indicates responding to acute damage, h = hours after acute damage)

Table 6

Othologs found in the species Agmbystoma mexicanum, Xenopus laevis, Xenopus tropicalis and Cynops pyrrhogaster

Gene IDaAccession IDFunctionbE ValuecQuery coverdIdentityeSpecies
A. Ambystoma mexicanum
P1AIW46262.1Receptor tyrosine kinase-like orphan receptor 21.00e-0940%22.90%Ambystoma mexicanum
B. Xenopus
P1XP_018112660.1Threonine-protein kinase 2-like isoform X13.00e-3239%32.24%Xenopus laevis
P3XP_004916146.1Cell adhesion molecule 1 isoform X23.00e-0256%23.78%Xenopus tropicalis
P5XP_018101840.1Uncharacterized protein2.00e-3258%40.11%Xenopus laevis
P8XP_004919377.2CD48 antigen7.00e-0899%28.97%Xenopus tropicalis
P9KAE8621564.1Hypothetical protein XENTR_v100048822.00e-0623%45.95%Xenopus tropicalis
P10XP_018120302.1Cytokine SCM-1 beta-like1.00e-0764%36.59%Xenopus laevis
P11XP_012818887.1Sucrase-isomaltase, intestinal0.0099%58.89%Xenopus tropicalis
C. Cynops pyrrhogaster
P1BAB44154.1Insulin-like growth factor I receptor2.00e-1044%23.17%Cynops pyrrhogaster

BLASTP BLOSUM45 was used to find distantly related proteins in the shown species

aGene ID: Corresponds to the symbol used for each predicted zebrafish protein subjected to bioinformatics analysis, the query. Only those with hits are shown

bFunction: Corresponds to the function associated with the ortholodg found for each gene

cE Value: The Expect value (E-value) or random background noise is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is

dQuery cover: The percentage of the query’s sequence (zebrafish gene) that overlaps the subject’s sequence (returned ortholog)

eIdentity: The percentage of amino acids within the covered part of the query that are identical between the query and the returned ortholog

Overview of Bioinformatic Analysis for Functional Predictions. The diagram shows an overview of the bioinformatic analyses performed in order to make functional predictions about the genes of interest based on (a) the predicted amino acid sequence, b predicted protein structure, and (c) genomic comparisons with selected species. The bioinformatic tool used for each type of analysis is indicated. Multiple approaches were used in order to obtain informational results for each gene of interest and to increase confidence in the overall predictions Protein domain and gene ontology (GO) term The protein domains and gene ontology (GO) terms found to be associated with the 12 predicted zebrafish proteins of interest aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis bCARD caspase activation and recruitment domain cMGC-24 Multi-glycosylated core protein 24 Orthologs and their species of origin identified by amino acid sequence similarity using EGGNOG Orthologs found for the studied genes using the protein sequence similarity approach EggNOG 4.5.1 [55] aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis bThe Expect value (E-value) or random background noise is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is Best-matched orthologs and their species of origin identified using SmartBLAST protein sequence analysis aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis bE value: The Expect value (E-value) is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is cQuery cover is the percentage of the query’s sequence (zebrafish gene of interest) that overlaps the subject’s sequence (returned orthologs) dIdentity is calculated as the percentage of characters (amino acid) within the covered part of the query that are identical Protein structure analysis Protein structure analysis using SWISS-MODEL [57] aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis bTemplate ID: 3D structure found that modeled the zebrafish protein of interest cGMQE: Global Model Quality Estimation [58], which is the quality estimation of the model taking account properties from the target–template alignment and the template search method. GMQE is a number between 0 and 1. Higher numbers indicate higher reliability. A cut-off of GMQE> 0.3 was applied dCoverage: The percentage of the query’s sequence (P1 to P12) that overlaps the Template sequence eIdentity is the percentage of characters (amino acid) within the covered part of the query that are identical. Template ID correspond to the name of the template (Ortholog) in the Protein Data Bank (https://www.rcsb.org/ [59];) Homology model of P1 putative kinase domain. The kinase domain of Receptor-interacting serine/threonine-protein kinase 2 (RIPK2, 6fu5.1.B in the rcsb protein database) is the template used for the homology modelling of P1. The X-RAY diffraction 3.26 Å was used to determine the experimental structure of 6fu5.1 [60]. The blue color show regions of the model where P1 was well-modeled and orange regions where P1 was poorly modeled. The well-modeled regions (blue) are regions where P1 is likely to be similar to the experimental 3D structure of the template. The homology model pertains to the putative kinase domain of P1 and starts from P1 residue N°3 (GLN, Glutamine) and ends with the residue N° 284 (LYS, Lysine) Homology model of P3. T cell receptor beta chain (3of6.1.A in the rcsb protein database) is the template used for the homology modelling of P3. The homology model starts from the P3 residue N°32 (THR, Threonine) and ends with the residue N° 245 (THR, Threonine). The X-RAY diffraction 2.80 Å was used to determine the experimental structure of 3of6.1.A [61]. The blue color show regions of the model in which P3 was well-modeled by the template, and orange regions where P3 was poorly modeled. The blue regions correspond to the T cell receptor beta chain immunoglobulin domains Homology model of P10 chemokine interleukin-8-like domain. Lymphotactin (1j8i.1.A in the rcsb protein database) is the template used for the homology modelling of P10. The homology model starts from P10 residue N°24 (GLU, Glutamic acid) and ends with the residue N° 102 (SER, Serine). The NMR spectroscopy was used to determine the experimental structure of 1j8i.1.A [62]. The blue color show regions of the model where P10 was well modeled and orange regions where P10 was poorly modeled. The chemokine interleukin-8-like domain of the model starts with P10 amino acid at position N°27(HIS, Histidine) and ends with amino acid at position N°86 ((LEU, Leucine). This region includes both well-modeled (blue) and poorly-modeled (orange) sections Homology model of P11. Maltase-glucoamylase, intestinal (3top.1.A in the rcsb protein database) is the template used for the homology modelling of P11. The X-RAY diffraction 2.9 Å was used to determine the experimental structure of 3top.1.A [63]. The homology model starts from P11 residue N°922 (LYS, Lysine) and ends with the residue N° 1804 (PHE, Phenylalanine). The P-type trefoil domain (amino acid N°51–962), galactose mutaros domain (amino acid N°114–1085), and glycoside hydrolase domain (amino acid N°225–1152) are not covered in the homology model. The blue color show regions of the model where P11 was well modeled and orange regions show where P11 was poorly modeled Expression level of selected zebrafish genes in other published studies. Expression level of selected zebrafish genes (P1, P9, and P12) in other published RNA-seq datasets of (a) zebrafish heart regeneration [64], and (b) zebrafish brain microglia [52] using the Zf Regeneration Database (www.zfregeneration.org, [65]). The y-axis indicates the normalized transcript level expressed as fpkm (fragments per kilobase of exon per million reads). On the x-axis is the different experimental conditions. (A,  dpa =  days post injury. B, active microglia indicates responding to acute damage, h = hours after acute damage) Othologs found in the species Agmbystoma mexicanum, Xenopus laevis, Xenopus tropicalis and Cynops pyrrhogaster BLASTP BLOSUM45 was used to find distantly related proteins in the shown species aGene ID: Corresponds to the symbol used for each predicted zebrafish protein subjected to bioinformatics analysis, the query. Only those with hits are shown bFunction: Corresponds to the function associated with the ortholodg found for each gene cE Value: The Expect value (E-value) or random background noise is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is dQuery cover: The percentage of the query’s sequence (zebrafish gene) that overlaps the subject’s sequence (returned ortholog) eIdentity: The percentage of amino acids within the covered part of the query that are identical between the query and the returned ortholog

P1 (si:dkey-181f22.4)

The gene coding for P1 (si:dkey-181f22.4) is located on zebrafish chromosome 7 and is predicted to have exon/intron structure coding for a predicted 513 amino acid protein (Table 1). Protein domain and gene ontology (GO) term returned predicted “protein kinase domain” and “Caspase Activation and Recruitment (CARD) domain” (Table 2). The CARD domain is known to function in innate immunity, particularly in inflammation and the regulation of apoptotic process (Table 2, [66-69]). Amino acid sequence similarity analysis returned several kinases associated with immune function, and suggested that this gene may code for a receptor tyrosine kinase (Table 3). The best-matched ortholog analysis returned “Receptor-interacting serine/threonine-protein kinase 2 isoform 1” in both human and mouse (Table 4). Of note, human RIPK2 has been described to contain a C-terminal CARD domain [70-72]. In comparison to other selected species (Table 6), P1 returned receptor tyrosine kinase-like orphan receptor 2 (Axolotl), Threonine-protein kinase 2-like isoform X1 (Xenopus), and insulin-like growth factor receptor as well as receptor tyrosine kinase-like orphan receptor 2 (Salamander). Structure prediction (Table 5, Fig. 2) strongly indicated a kinase domain/function for P1. The results strongly indicate that P1 has a kinase domain that may be activated by interactions with other proteins via the CARD domain, and this function may be acting in concert with receptor activity. Interestingly, the CARD domain of human RIPK2 facilitates interaction with NOD-like receptors [73, 74]. Collectively, these results indicate that zebrafish P1 may have orthologous function to human RIPK2. However, the amino acid substrate of phosphorylation (tyrosine vs. serine/threonine) by zebrafish P1 is not yet clear, as both classes of kinases were indicated in the hits.

P2 (si:ch73-112 l6.1)

The gene for P2 (si:ch73-112 l6.1) is located on zebrafish chromosome 21 and codes for a predicted 1025 amino acid protein (Table 1). Protein stability analysis (Supplemental Figure 2) indicates P2 is a structured protein, but with a large disorded domain. Such disordered regions often indicate a protein-protein binding interface [56]. However, collective analyses were largely uninformative for P2. For example, no protein domains nor GO terms were returned (Table 2). A putative ortholog with unknown function from Branchiostoma floridae was returned based on amino acid sequence similarity (Table 3), and three uncharacterized zebrafish genes were returned as best-matched orthologs (Table 4).

P3 (zgc:174863)

The gene for P3 (zgc:174863) is located on zebrafish chromosome 6 and codes for a predicted 290 amino acid protein (Table 1). Protein domain and GO terms indicate an immunoglobulin-like domain, which are present in proteins involved in cell adhesion (Table 2). Consistent with this, sequence similarity analysis revealed 5 proteins from 4 species, several of which contain immunoglobulin folds (Table 3). Protein structure analysis (Table 5, Fig. 3) further indicated that the predicted protein contains immunoglobulin-like domains as it was resonably modeled by the T cell receptor beta chain in regions containing immunoglobulin folds (Fig. 3). Collectively, these results suggest that P3 could be a cell membrane receptor possibly involved in cell adhesion. In support of this, comparison to Xenopus tropicalis returned a predicted ortholog with putative cell adhesion function (Table 6). In addition, several hits for P3 were found by amino acid similarity in Xenopus tropicalis, Apis mellifera, Gadus morhua, and Latimeria chalumnae (Table 3), and based on phylogenetic relationships of these species (Supplemental Figure 1), it seems possible that the funciton of the gene coding for P3 was evolutionarily conserved in these species.

P4 (si:dkey-56 m19.5)

The gene coding for P4 (si:dkey-56 m19.5) is located on zebrafish chromosome 7 and codes for a predicted 526 amino acid protein (Table 1). As noted above, P4 is predicted to be a disordered protein (Supplemental Figure 2). Many intrinsically disordered proteins evolve rapidly [75-78], and therefore, predicting a function for P4 is difficult based on amino acid sequence. Accordingly, analyses based on sequence similarity were overall minimally informative. An associated protein domain (Ribonuclease E/G) was returned for P4 (Table 2) and a possible ortholog (Brain abundant, membrane attached signal protein 1, BASP1) with unknown function in Oryzias latipes was a hit based on amino acid sequence similarity (Table 3). P4 returned four best-matched orthologs from other species, but these genes had widely varying predicted functions (Table 4). Protein structure analysis was uninformative for P4 (Table 5). Synteny analysis indicated that the gene coding for P4 lies in a syntenic region with human genome on human chromosome 16 (Supplemental Figure 3). The gene for P4 is flanked by several neighboring genes that have apparent orthologs in human, and based on the orientations and locations of the neighboring genes in the two species, the gene for P4 lies in a relative location similar to human TERB1. However, using NCBI BLASTP to compare sequences of zebrafish P4 and human TERB1 (with any scoring matrix) found no signficant similarity between these two genes, therefore failing to provide evidence of orthology of these genes. Therefore, we consider that the gene coding for P4 could have been gained in zebrafish or lost in humans. Interestingly, several possible orthologs in various species of fish were returned for P4 (Table 4).

P5 (si:ch211-105j21.9)

Protein domain and GO term returned MGC-24 and Mucin15 domain (Table 2) for P5 (si:ch211-105j21.9). Amino acid sequence similarity returned three hits from three different species for genes with unknown and varying functions (Table 3), but best-matched orthologs (Table 4), as well as protein structure analysis, was uninformative. Although a hit was found in Xenopus laevis (Table 6), the protein has unknown function.

P6 (si:ch73-248e21.7)

P6 (si:ch73-248e21.7) did not return any hits for GO terms, but a putative complement regulatory protein from Xenopus tropicalis was identified as a hit by sequence similarity analysis (Table 3). Best-matched orthologs were found in four Sinocyclocheilus species of fish, two of which were Mucin 5AC_like proteins and two of which were cell wall-like proteins (Table 4). However, other analyses proved uninformative.

P7 (si:ch211-191j22.3)

Analyses for P7 were largely uninformative, though there were hits in some of these analyses indicating unknown, uncharacterized, or hypothetical proteins in six different fish species (Table 3, Table 4) their meaning was not interpretable.

P8 (LOC100535303)

Protein domain/GO term results suggest P8 contains immunoglobulin-like domain. This was further indicated by the amino acid sequence similarity results (Table 3), protein structure results (Table 5), and the putative “CD48 antigen” orthologue identified in Xenopus tropicalis (Table 6).

P9 (urp1)

The gene coding for P9 was previously annotated as urp1, suggesting that putative urotensin function is already recognized. Consistent with this, protein domain/GO term and amino acid sequence similarityreturned results for P9 indicating urotensin function (Table 2 and Table 3), which is involved in regulation of vasculature diameter. Specifically, Urotensin II is a secreted mediator known to function in vasoconstriction of blood vessel diameter (Table 2, [79-81]). However, similar structures were not identified in our analyses (Table 5).

P10 (xcl32a.1)

The gene for P10 (xcl32a.1) is located on zebrafish chromosome 2 and is predicted to encode a protein of only 126 amino acids (Table 1). The protein domains/GO term search returned chemokine interleukin-8-like, which functions in immune response (Table 2). Other analyses also indicated that P10 is likely a cytokine/chemokine (Table 3, Table 4, Table 5, Table 6). The predicted amino acid length of P10 is consistent with short amino acid chains seen in cytokines/chemokines. Consistent with this function, regions of P10 were well modeled by regions of the chemokine Lymphotactin’s interleukin-8-like domain (Fig. 4).

P11 (si:ch211-287n14.3)

Collectively, results for P11 indicate that it could be an enzyme involved in carbohydrate metabolism (Table 2, Table 3, Table 4, Table 5, and Table 6). P11 could be well modeled by human intestinal maltase-glucoamylase (Table 5, Fig. 5), as well as sucrase-isomaltase and lysosomal alpha-glucosidase (Table 5). However, the predicted functional domains found previosly (P-type trefoil, galactose mutarose, and glycoside hydrolase domains, Table 2), were not covered in the homology model of maltase-glucoamylase. The domain P-type trefoil, found for P11 (Table 2), is found in several secreted proteins associated with mucins [82-84], many of which are involved in the response to gastrointestinal mucosal injury and inflammation [85], though the function of such a secreted protein in the CNS during tissue regeneration is not clear; perhaps it could be involved in extracellular matrix degradation.

P12 (pho)

The gene encoding P12 (pho) is located on zebrafish chromosome 5 and encodes a large predicted protein of 2798 amino acids (Table 1). Interestingly, P12 (pho) has been previously described to be required for the regeneration of zebrafish neuromasts [86], which are sensory patches located along the zebrafish body, but its function has not been studied otherwise. The coiled coil domain found in the protein domain/GO term analysis (Table 2) was described previously [86]. In addition, we find that P12 is predicted to have more than 50% of the amino acids disordered, and is therefore is likely an unstructured protein (Supplemental Figure 2). Since P12 is a disordered protein, this is likely the reason that other analyses did not prove informative (Table 3, Table 4, Table 5, Table 6). Many studies have shown that disordered proteins evolve more rapidly than structured proteins [75-78] and the disordered region of the protein drives this rapid evolution [77]. In addition, large proteins with coiled-coil domains appear to have functions in cell structure [56]. In spite of the predicted disordered structure, the previously cited study [86] found evidence for an ATPase and transmembrane domain; however, our analyses did not reveal these features. Given that P12 is reported to be required for neuromast regeneration in zebrafish [86], we considered that a syntenic relationship might be identified in genomes of other species known to have robust regenerative abilities. However, our synteny analyses did not return predicted syntenic regions compared to Ambystoma mexicanum, Xenopus laevis, Xenopus tropicalis, Cynops pyrrhogaster (not shown).

Comparison to other published RNA-seq datasets

We were interested in determining to what extent transcripts mapping to some select genes might be shared in other zebrafish tissue/cells such as regenerating tissue such as heart [64], in resting microglia [52], and in microglia responding to acute damage [52]. We focused this comparison on P1, P9, and P12 because P1 had particularly informative analyses above (indicating kinase function), and P9 and P12 might have novel functions in regeneration. Interestingly, transcripts for both P1 and P9 were increased in regenerating heart tissue samples compared to uninjured (Fig. 6a). Transcripts mapping to P1 appeared slightly more abundant in resting microglia compared to other brain cells, but levels did not change significantly in microglia responding to acute damage (Fig. 6b). Since P1 was enriched in microglia in our study [30], which sampled microglia/macrophages during retinal regeneration, it is possible that expression and function of this putative kinase (P1) are upregulated during tissue regeneration. Transcripts for P9 gene were present in microglia in the zebrafish brain, both in resting state and in response to acute brain damage (Fig. 6b), though they did not appear to change significantly in such conditions. Thus, it is possible that P9 is a mediator produced by microglia/macrophages that acts on the local vasculature to control blood pressure locally and perhaps this function is upregulated during tissue regeneration. Examining expression levels of P12 did not demonstrate any apparent upregulation of P12 in regenerating heart compared to the very low transcript levels in uninjured heart tissue (Fig. 6a). However, P12 expression was observed in resting microglia from zebrafish brain, and the expression of P12 appeared to be reduced in context of microglial acute damage response [52] (Fig. 6b). This expression pattern, in combination with our dataset indicating expression by microglia/macrophages during retinal regeneration, suggests that P12 (pho) may have function in restoring and/or maintaining a “resting” microglial/macrophage state. However, such a hypothesis will require experimental testing. We next examined a published RNA-seq dataset representing zebrafish macrophages responding to M. marinum infection [87], to determine if the genes of interest were also differentially expressed in zebrafish macrophages responding to microbial infection. Interestingly, although transcripts were detected in the Rouget et al. study for ten out of twelve of the genes, only one of these (P6, si:ch73-248e21.7, which may have complement regulatory function based on the results describbed above) was found to be differentially expressed in macrophages from infected fish compared to uninfected fish based on the authors’ cut-off criteria of Log2FC > =1, p-adj < 0.05 (Table 7). This supports the idea that these genes could comprise part of a unique transcriptome that is expressed in microglia/macrophages during tissue regeneration compared to that in response to microbial infection.
Table 7

Expression of zebrafish genes pertaining to P1-P12 in macrophages responding to microbial infection

Gene IDaZebrafish SymbolbEnsembl IDDEc in Macrophages responding to M. marinum infection?
P1si:dkey-181f22.4ENSDARG00000105643ND
P2si:ch73-112 l6.1ENSDARG00000093126No
P3zgc:174863ENSDARG00000099476ND
P4si:dkey-56 m19.5ENSDARG00000068432No
P5si:ch211-105j21.9ENSDARG00000097845No
P6si:ch73-248e21.7ENSDARG00000096331Yes
P7si:ch211-191j22.3ENSDARG00000095459No
P8LOC100535303ENSDARG00000071653No
P9urp1ENSDARG00000093493No
P10xcl32a.1ENSDARG00000093906No
P11si:ch211-287n14.3ENSDARG00000093650No
P12phoENSDARG00000035133No

The twelve genes of interest were examined in the RNA-seq dataset from Rouget et al., 2019 (GSE78954 and GSE68920), which examined the transcriptome of zebrafish macrophages responding to M. marinum infection

aGene ID: Corresponds to the zebrafish gene of interest in this study

bZebrafish Symbol: corresponds to the symbol attributed to each gene by the ZFIN Zebrafish Nomenclature Conventions

cDE: Differential Expression in zebrafish macrophages responding to infection compared to uninfected. Using the RNA-seq datasets from Rouget et al., 2019, DE was based on the authors’ original criteria of logFC greater than or equal to 1, and p-adj < 0.05. “Yes” or “No” indicates that the gene was differentially expressed or not, respectively. ND indicates that the transcript not detected in the dataset

Expression of zebrafish genes pertaining to P1-P12 in macrophages responding to microbial infection The twelve genes of interest were examined in the RNA-seq dataset from Rouget et al., 2019 (GSE78954 and GSE68920), which examined the transcriptome of zebrafish macrophages responding to M. marinum infection aGene ID: Corresponds to the zebrafish gene of interest in this study bZebrafish Symbol: corresponds to the symbol attributed to each gene by the ZFIN Zebrafish Nomenclature Conventions cDE: Differential Expression in zebrafish macrophages responding to infection compared to uninfected. Using the RNA-seq datasets from Rouget et al., 2019, DE was based on the authors’ original criteria of logFC greater than or equal to 1, and p-adj < 0.05. “Yes” or “No” indicates that the gene was differentially expressed or not, respectively. ND indicates that the transcript not detected in the dataset

Discussion

In this study, we analyzed twelve zebrafish genes with unknown function. These genes were selected from our previous transcriptome analysis of zebrafish microglia/macrophages isolated from regenerating retinal tissue [30]. We used bioinformatic analyses to analyze the twelve selected transcripts to suggest putative functions. These analyses included protein domain and gene ontology (GO) terms, amino acid similarity, predicted protein structure, and synteny comparisons. For some selected genes, we examined expression level in other published studies of gene expression in zebrafish [52, 64], and examined other published data sets involving macrophages responding to microbial infection [87] to determine if these genes might be regulated in different activation contexts. Results for many of the genes analyzed indicate putative functions related to the immune system. Several of these functions may not be well described in fish compared to mammalian organisms. The predicted genes/predicted proteins yielding the most informative results include P1 (results strongly indicate receptor associated kinase activity), P9 (previously annotated as urp1, which results indicate urotensin-like activity), P10 (which may have chemokine activity), and P11 (which could be an enzyme involved in carbohydrate metabolism). Although only an immunoglobulin-like fold domain was revealed for P3 and P8, and a possible mucin domain for P5, these results provide at least some new insight into the structure of the predicted proteins as these domains have not been previously noted for these genes. On the other hand, our analyses did not reveal significant functional information about P2, P4, P6, P7, and P12. Given that P12 (pho) is predicted to be a disordered protein, our analyses do not allow us to make predictions about the function of this particular protein, though it remains of interest due to its previously indicated role in neuromast regeneration [86]. It will be interesting to determine, experimentally, if phoenix (pho), or any of the other genes analyzed in this work, are required for retinal regeneration. The lack of syntenic relationships between zebrafish and mouse/human for the majority of the genes analyzed is notable, suggesting that possibly these genes were not evolutionarily retained across these species or alternatively, that these genes may have appeared in certain species [88]. For the one zebrafish gene that did have syntenic relationship identified, sequence alignment did not indicate an evolutionary relationship to the candidate gene in the syntenic region. Orthologs were identified for some, but not all, of these zebrafish genes of interest in species which are also known to regenerate damaged tissue (Axolotl, Xenopus and Salamander, Table 6 and Supplemental Table 1). We therefore consider that, in future work, it is important to determine if the genetic program used by microglia/macrophages during zebrafish CNS regeneration is unique on a species level. Whether such a unique genetic program is required for successful regeneration also remains to be determined. To begin to probe this question, we examined other published RNA-seq datasets for expression patterns of the genes examined here in this work. For selected genes (P1, P9, and P12), we examined transcript abundance in samples from zebrafish regenerating heart tissue [64] and zebrafish brain microglia [52]. Both P1 and P9 showed upregulation in regenerating zebrafish heart, while P12 transcripts were apparently reduced in microglia responding to acute damage compared to resting microglia. When we examined the transcriptome of zebrafish macrophages responding to infection by the microbe M. marinum [87], only one of the twelve genes discussed in our work here was found to be differentially expressed in this context. It is worth considering that the samples sequenced in our study [30] compared to these other studies differ in regards to the developmental age/stage of the animal, location in the body, sample preparation, sequencing protocols, as well as other factors. However, these comparisons might still suggest that it is possible that these genes may be regulated in a tissue regeneration context rather than in response to microbial infection. Thus, it is possible that at least some of these genes comprise part of a general transcriptional program active in zebrafish microglia/macrophages responding to both tissue damage and/or infection. However, further experimental studies involving at least some of these genes (i.e. P1, which bioinformatic predictions suggest could be a kinase, and P12 (pho)) are likely to increase our understanding of mechanisms involved in successful tissue regeneration. Indeed, harnessing such regenerative capacity in mammals must be better informed by a more thorough functional understanding of a genetic program executed by organisms such as zebrafish, that underlies successful regeneration. Such work will also lead to a better evolutionary understanding of the vertebrate innate immune system.

Conclusions

In this study, we have predicted putative functions for several zebrafish genes with previously unknown function. Transcripts mapping to these genes were enriched in microglia/macrophages during retinal regeneration, suggesting they could have functional importance in tissue regeneration. We identified putative orthologs of several of these genes, mainly based on functional domains, which provide informative insight into possible protein function. In addition, comparison to other RNAseq datasets suggest that most of these genes could be expressed as part of a transcriptional program expressed by microglia/macrophages during tissue regeneration. Our findings provide a foundation for future experimental work to determine the function of these genes in vivo.

Methods

RNAseq dataset and predicted orthology

The 3’mRNA Quant-seq experiment and differential gene expression (DEG) analysis is described in Mitchell et al., 2019 [30]. This dataset is available on the Gene Expression Omnibus (GEO120467). To identify putative mouse and human orthologs of the 986 transcripts found to be enriched in mpeg1+ cells compared to other cell types, the DRSC integrative ortholog prediction tool (DIOPT, v 7.0, www.flyrnai.org) was employed based on the zebrafish ENSEMBL ID.

Protein domains and gene ontology (GO) terms

The protein domains and the gene ontology (GO) terms (Biological Process and Molecular Function) were determined from the universal protein knowledgebase (UniProt, [89]) and the integrative protein signature database (InterPro, [90]). The gene ID from Ensembl (https://www.ensembl.org/, [54]) was used to extract the predicted protein sequence of the gene from the National Center for Biotechnology Information database (NCBI, https://www.ncbi.nlm.nih.gov/). The gene’s amino acid sequence was used to extract protein domains and gene ontology (GO) terms in UniProt [89] and InterPro [90].

Sequence similarity

Two approaches were used to find orthologs for each protein based on sequence similarity, EggNOG and SmartBLAST, because these two approaches use different protein databases. The bioinformatics web-server EggNOG 4.5.1 [55] compares the input protein sequence to the sequences available in several databases and displays the list of orthologs of the protein and the species where those orthologs are found [55]. The “default” settings of the web-server SmartBLAST (https://blast.ncbi.nlm.nih.gov/smartblast/) was used to identify the species of origin of orthologs (and paralogues within zebrafish) which were best-matched by our genes using the non-redundant protein sequence database [91]. To look for orthologs in species with described capacity for regeneration (Ambystoma mexicanum, Xenopus laevis, Xenopus tropicalis, Cynops pyrrhogaster), the protein sequences of zebrafish genes were compared to the NCBI database (http://blast.ncbi.nlm.nih.gov) using BLASTP with the BLOSUM45 scoring matrix and Gap Costs “Existence: 10 Extension: 3” (http://blast.ncbi.nlm.nih.gov). In addition, we used tBLASTn to identify putative unannotated orthologs in these species, and these results are reported in Supplemental Table 1.

Structural analysis

We inferred protein disorder using default settings (5% false positive rate) of the the server PrDOS (http://prdos.hgc.jp/cgi-bin/top.cgi, [92]), which predicts natively disordered regions of a protein chain from its amino acid sequence. PrDOS returns a disorder probability for each residue. Proteins with more than 30–50% predicted disordered residues are considered disordered proteins [92]. We used the bioinformatics web-server SWISS-MODEL [57] to identify templates or homologs for our list of unknown proteins based on the predicted 3D structure of the proteins of interest (with Global Model Quality Estimation [58] or GMQE > 0.3 as cut-off). Homology modeling, or comparative protein modeling, uses an ortholog’s (template’s) experimentally-determined 3D-structure to estimate a model for the target sequence [57].

Synteny analysis

Synteny comparisons were performed using www.ensembl.org, because this database uses the most updated genome build for zebrafish (GRCz11). The ENSEMBL ID was used to identify the gene of interest and the chromosomal region containing the gene was selected. In the Comparative Genomics menu option, synteny was selected to compare the chromosomal region of the zebrafish gene to human (GRCh38.p13) and mouse (GRCm38.p6) genomes. Only one gene of interest was found to lie in a syntenic region (P4, Supplemental Figure 3). The amino acid sequence of the zebrafish gene was compared using (BLASTP, http://blast.ncbi.nlm.nih.gov) to the candidate annotated gene found inside the syntenic region using the National Center for Biotechnology Information (NCBI) database to look for similarity and orthologs; alignment was compared with each scoring matrix in the program [93].

Expression level in other RNA-seq datasets

We determined the expression level of selected zebrafish genes of interest in other published datasets of zebrafish heart regeneration [64] and zebrafish brain microglia [52] using the Zf Regeneration Database (www.zfregeneration.org) [65]. The gene’s symbol or ENSEMBL ID were used to plot the normalized expression level of transcripts of interest. To probe the RNA-seq dataset from Rouget et al. [87], we searched for the ENSEMBL ID of each gene of interest in the raw datasets (GSE78954 and GSE68920) to determine if transcript counts were detected. To determine if the gene was considered to be differentially expressed in macrophages responding to infection, we examined the authors’ reported results of differential expression analysis comparing transcripts from sorted uninfected vs. M. marinum infected macrophages from zebrafish larvae [87] (Rouget et al.,2019). Additional file 1: Supplemental File 1, Orthology predictions of differentially expressed genes. Additional file 2: Contains Supplemental Figures 1-3 and Supplemental Table 1.
  92 in total

1.  UniProt: the Universal Protein knowledgebase.

Authors:  Rolf Apweiler; Amos Bairoch; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  zfRegeneration: a database for gene expression profiling during regeneration.

Authors:  Rocío Nieto-Arellano; Héctor Sánchez-Iranzo
Journal:  Bioinformatics       Date:  2019-02-15       Impact factor: 6.937

3.  Restoration of Dendritic Complexity, Functional Connectivity, and Diversity of Regenerated Retinal Bipolar Neurons in Adult Zebrafish.

Authors:  Timothy E McGinn; Diana M Mitchell; Peter C Meighan; Natalie Partington; Dylan C Leoni; Christina E Jenkins; Michael D Varnum; Deborah L Stenkamp
Journal:  J Neurosci       Date:  2017-11-13       Impact factor: 6.167

4.  Visual function in regenerating teleost retina following cytotoxic lesioning.

Authors:  A F Mensinger; M K Powers
Journal:  Vis Neurosci       Date:  1999 Mar-Apr       Impact factor: 3.241

5.  Adaptive Müller cell responses to microglial activation mediate neuroprotection and coordinate inflammation in the retina.

Authors:  Minhua Wang; Wenxin Ma; Lian Zhao; Robert N Fariss; Wai T Wong
Journal:  J Neuroinflammation       Date:  2011-12-07       Impact factor: 8.322

6.  Human long intrinsically disordered protein regions are frequent targets of positive selection.

Authors:  Arina Afanasyeva; Mathias Bockwoldt; Christopher R Cooney; Ines Heiland; Toni I Gossmann
Journal:  Genome Res       Date:  2018-06-01       Impact factor: 9.438

7.  Structural basis of RIP2 activation and signaling.

Authors:  Qin Gong; Ziqi Long; Franklin L Zhong; Daniel Eng Thiam Teo; Yibo Jin; Zhan Yin; Zhao Zhi Boo; Yaming Zhang; Jiawen Zhang; Renliang Yang; Shashi Bhushan; Bruno Reversade; Zongli Li; Bin Wu
Journal:  Nat Commun       Date:  2018-11-26       Impact factor: 14.919

8.  RNAseq Profiling of Leukocyte Populations in Zebrafish Larvae Reveals a cxcl11 Chemokine Gene as a Marker of Macrophage Polarization During Mycobacterial Infection.

Authors:  Julien Rougeot; Vincenzo Torraca; Ania Zakrzewska; Zakia Kanwal; Hans J Jansen; Frida Sommer; Herman P Spaink; Annemarie H Meijer
Journal:  Front Immunol       Date:  2019-04-17       Impact factor: 7.561

9.  Evolutionary Approach of Intrinsically Disordered CIP/KIP Proteins.

Authors:  Muhamad Fahmi; Masahiro Ito
Journal:  Sci Rep       Date:  2019-02-07       Impact factor: 4.379

10.  Small molecule inhibitors reveal an indispensable scaffolding role of RIPK2 in NOD2 signaling.

Authors:  Matous Hrdinka; Lisa Schlicher; Bing Dai; Daniel M Pinkas; Joshua C Bufton; Sarah Picaud; Jennifer A Ward; Catherine Rogers; Chalada Suebsuwong; Sameer Nikhar; Gregory D Cuny; Kilian Vm Huber; Panagis Filippakopoulos; Alex N Bullock; Alexei Degterev; Mads Gyrd-Hansen
Journal:  EMBO J       Date:  2018-07-19       Impact factor: 11.598

View more
  3 in total

1.  Development and characterization of a chronic photoreceptor degeneration model in adult zebrafish that does not trigger a regenerative response.

Authors:  Brooke Turkalj; Danielle Quallich; Denise A Bessert; Ashley C Kramer; Tiffany A Cook; Ryan Thummel
Journal:  Exp Eye Res       Date:  2021-05-21       Impact factor: 3.770

2.  A Comparative Analysis of Gene and Protein Expression Throughout a Full 28-Day Retinal Regeneration Time-Course in Adult Zebrafish.

Authors:  Ashley C Kramer; Katherine Gurdziel; Ryan Thummel
Journal:  Front Cell Dev Biol       Date:  2021-11-01

3.  mTOR activity is essential for retinal pigment epithelium regeneration in zebrafish.

Authors:  Fangfang Lu; Lyndsay L Leach; Jeffrey M Gross
Journal:  PLoS Genet       Date:  2022-03-10       Impact factor: 5.917

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.