| Literature DB >> 33287696 |
Ousseini Issaka Salia1,2,3, Diana M Mitchell4.
Abstract
BACKGROUND: Unlike mammals, zebrafish have a remarkable capacity to regenerate a variety of tissues, including central nervous system tissue. The function of macrophages in tissue regeneration is of great interest, as macrophages respond and participate in the landscape of events that occur following tissue injury in all vertebrate species examined. Understanding macrophage populations in regenerating tissue (such as in zebrafish) may inform strategies that aim to regenerate tissue in humans. We recently published an RNA-seq experiment that identified genes enriched in microglia/macrophages in regenerating zebrafish retinas. Interestingly, a small number of transcripts differentially expressed by retinal microglia/macrophages during retinal regeneration did not have predicted orthologs in human or mouse. We reasoned that at least some of these genes could be functionally important for tissue regeneration, but most of these genes have not been studied experimentally and their functions are largely unknown. To reveal their possible functions, we performed a variety of bioinformatic analyses aimed at identifying the presence of functional protein domains as well as orthologous relationships to other species.Entities:
Keywords: Bioinformatic analysis; Functional predictions; Microglia; RNAseq; Regeneration; Retina; Transcripts; Zebrafish
Mesh:
Substances:
Year: 2020 PMID: 33287696 PMCID: PMC7720500 DOI: 10.1186/s12864-020-07273-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Transcripts enriched in zebrafish microglia/macrophages during retinal regeneration, without readily predicted human or mouse orthologs
| Gene IDa | Mod | Zebrafish Symbolc | ZFIN ID | Ensembl ID | Chromosome | Gene length | Protein length |
|---|---|---|---|---|---|---|---|
| P1 | 6.03 | si:dkey-181f22.4 | ZDB-GENE-160728-126 | ENSDARG00000105643 | 7 | 9695 bp | 513 aa |
| P2 | 5.17 | si:ch73-112 l6.1 | ZDB-GENE-091204-14 | ENSDARG00000093126 | 21 | 17,924 bp | 1025 aa |
| P3 | 2.92 | zgc:174863 | ZDB-GENE-080204-87 | ENSDARG00000099476 | 6 | 7668 bp | 290 aa |
| P4 | 2.14 | si:dkey-56 m19.5 | ZDB-GENE-030131-226 | ENSDARG00000068432 | 7 | 4453 bp | 526 aa |
| P5 | 7.91 | si:ch211-105j21.9 | ZDB-GENE-131127-499 | ENSDARG00000097845 | 6 | 2369 pb | 294 aa |
| P6 | 4.47 | si:ch73-248e21.7 | ZDB-GENE-120215-231 | ENSDARG00000096331 | 3 | 3403 bp | 480 aa |
| P7 | 3.56 | si:ch211-191j22.3 | ZDB-GENE-030131-4242 | ENSDARG00000095459 | 21 | 2682 bp | 99 aa |
| P8 | 7.87 | si:ch73-256j6.2† | ZDB-GENE-070705-223† | ENSDARG00000071653 | 22 | 7566 bp | 210 aa |
| P9 | 7.74 | urp1 | ZDB-GENE-100922-138 | ENSDARG00000093493 | 14 | 2696 bp | 154 aa |
| P10 | 5.32 | xcl32a.1 | ZDB-GENE-070912-31 | ENSDARG00000093906 | 2 | 1199 bp | 126 aa |
| P11 | 6.06 | si:ch211-287n14.3 | ZDB-GENE-131120-146 | ENSDARG00000093650 | 18 | 165,070 bp | 1809 aa |
| P12 | 2.03 | pho | ZDB-GENE-030131-5935 | ENSDARG00000035133 | 5 | 16,478 bp | 2798 aa |
aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis
bMod Log2FC = Moderated Log2(Fold-Change), which is the log-ratio of the transcript’s expression values between microglia/macrophages and other retinal cells, corrected for lowly expressed transcripts, as determined in [30]
cZebrafish Symbol corresponds to the symbol attributed to each gene by the ZFIN Zebrafish Nomenclature Conventions (https://wiki.zfin.org, [53] and Ensembl ID the symbol attributed by Ensembl (https://www.ensembl.org/, [54]. The prefix “Zgc:” indicates that this gene is represented by cDNAs generated by the ZGC project (https://wiki.zfin.org). The prefix “si” Sanger institute and indicates that this institution identified the gene. aa amino acid
†Previously reported as “NA” in [30] with the same Esembl ID; has been updated here to current zebrafish symbol and ZFIN ID
Fig. 1Overview of Bioinformatic Analysis for Functional Predictions. The diagram shows an overview of the bioinformatic analyses performed in order to make functional predictions about the genes of interest based on (a) the predicted amino acid sequence, b predicted protein structure, and (c) genomic comparisons with selected species. The bioinformatic tool used for each type of analysis is indicated. Multiple approaches were used in order to obtain informational results for each gene of interest and to increase confidence in the overall predictions
Protein domain and gene ontology (GO) term
| Gene IDa | Protein domains | Biological process | Molecular function |
|---|---|---|---|
| P1 | Protein kinase and CARDb domain | Protein phosphorylation, Regulation of apoptotic process, | Protein kinase, |
| Oligodendrocyte development | ATP binding | ||
| P2 | |||
| P3 | Immunoglobulin-like | Cell adhesion, | |
| Viral entry into host cell | |||
| P4 | Ribonuclease E/G | ||
| P5 | MGC-24c and Mucin15 | ||
| P6 | |||
| P7 | |||
| P8 | Immunoglobulin-like | ||
| P9 | Urotensin II | Regulation of blood pressure, | Hormone |
| Regulation of blood vessel diameter | |||
| P10 | Chemokine interleukin-8-like | Immune response | chemokine |
| P11 | P-type trefoil, Galactose mutarotase, | Carbohydrate metabolic process | Hydrolyzing O-glycosyl compounds, |
| Glycoside hydrolase | Carbohydrate binding, | ||
| N-6 Adenine-specific DNA methylases | |||
| P12 | Coiled coil | Neuromast regeneration |
The protein domains and gene ontology (GO) terms found to be associated with the 12 predicted zebrafish proteins of interest
aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis
bCARD caspase activation and recruitment domain
cMGC-24 Multi-glycosylated core protein 24
Orthologs and their species of origin identified by amino acid sequence similarity using EGGNOG
| Gene IDa | Ortholog ID | Function | Evalueb | Species |
|---|---|---|---|---|
| P1 | ENSLACG00000022667 | Protein tyrosine kinase | 1.23e-200 | |
| MOS | v-mos Moloney murine sarcoma viral oncogene homolog | 1.86e-27 | ||
| BLK | B lymphoid tyrosine kinase | 1.03e-11 | ||
| Mst1r | Macrophage stimulating 1 receptor | 2.07e-7 | ||
| CSF1R | Colony stimulating factor 1 receptor | 5.21e-4 | ||
| P2 | JGI99580 | Unknown | 6.68e-259 | |
| P3 | ENSGMOG00000016627 | Unknown | 1.5e-127 | |
| ENSLACG00000005016 | Immunoglobulin V-set domain | 3.08e-10 | ||
| PDGFRB | Growth factor receptor | 6.45e-7 | ||
| NPHS1 | Nephrosis 1, congenital, Finnish type (nephrin) | 1.97e-5 | ||
| LOC414035 | Lachesin | 9.06e-5 | ||
| P4 | BASP1 | Unknown | 1.63e-5 | |
| P5 | ENSXMAG00000002763 | Unknown | 7.04e-17 | |
| JGI72098 | SH3 | 2.17e-4 | ||
| PTPRA | Protein tyrosine phosphatase, receptor type, A | 8.33e-4 | ||
| P6 | ARC2 | CD46 molecule, complement regulatory protein | 8.30e-4 | |
| P7 | ENSXMAG00000014998 | Unknown | 9.61e-44 | |
| P8 | ENSLACG00000014033 | CD84 molecule | 1.05e-112 | |
| ENSXMAG00000015872 | Lymphocyte antigen 9 | 2.03e-77 | ||
| ENSGALG00000007355 | Immunoglobulin V-set domain | 1.22e-09 | ||
| CEACAM6 | Carcinoembryonic antigen-related cell adhesion molecule | 1.41e-09 | ||
| HMCN1 | Hemicentin | 3.28e-06 | ||
| P9 | ENSXMAG00000013611 | Urotensin II | 2.24e-70 | |
| P10 | ENSG00000143185 | Chemokine (C motif) ligand | 3.22e-14 | |
| ENSXMAG00000019244 | Small cytokines (intecrine/chemokine), interleukin-8 like | 3.86e-6 | ||
| P11 | GANAB | Glucosidase, alpha | 1.38e-307 | |
| P12 | No orthologs found |
Orthologs found for the studied genes using the protein sequence similarity approach EggNOG 4.5.1 [55]
aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis
bThe Expect value (E-value) or random background noise is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is
Best-matched orthologs and their species of origin identified using SmartBLAST protein sequence analysis
| Gene IDa | Accession ID | Orthologs | Evalueb | Query coverc | Identityd | Species |
|---|---|---|---|---|---|---|
| P1 | NP_003812.1 | Receptor-interacting serine/threonine-protein kinase 2 isoform 1 | 2.00e-39 | 94% | 27.54% | |
| NP_620402.1 | Receptor-interacting serine/threonine-protein kinase 2 isoform 1 | 6.00e-37 | 89% | 28.74% | ||
| P2 | XP_005164418.2 | Uncharacterized protein LOC101885950 | 0.00 | 95% | 54.14% | |
| XP_017210637.2 | Uncharacterized protein LOC108179149 | 2.00e-164 | 79% | 37.53% | ||
| XP_021326567.1 | Uncharacterized protein LOC101885087 | 5.00e-151 | 74% | 37.47% | ||
| P3 | XP_005166230.1 | Uncharacterized protein LOC100136852 isoform X2 | 0.00 | 100% | 54% | |
| XP_016100849.1 | PREDICTED: uncharacterized protein LOC107561032 isoform X3 | 1.00e-113 | 98% | 58.82% | ||
| NP_001076332.2 | Junctional adhesion molecule 3b | 2.00e-02 | 33% | 29.41% | ||
| P4 | XP_026123653.1 | Uncharacterized protein LOC113106193 isoform X1 | 4.00e-177 | 100% | 62.04% | |
| XP_016389660.1 | PREDICTED: cell surface glycoprotein 1-like isoform X4 | 1.00e-173 | 100% | 64.76% | ||
| XP_016333309.1 | PREDICTED: serine-aspartate repeat-containing protein I-like isoform X1 | 2.00e-165 | 100% | 63.72% | ||
| XP_016105136.1 | PREDICTED: calphotin-like | 3.00e-164 | 100% | 62.79% | ||
| P5 | ROL44899.1 | Hypothetical protein DPX16_9111 | 6.00e-121 | 100% | 63.40% | |
| XP_016143106.1 | PREDICTED: uncharacterized protein LOC107596800 | 9,00e-115 | 100% | 63.19% | ||
| XP_016395950.1 | PREDICTED: uncharacterized protein LOC107729778 isoform X2 | 5.00e-113 | 100% | 62.50% | ||
| XP_018973499.1 | PREDICTED: uncharacterized protein LOC109104670 isoform X2 | 3.00e-110 | 100% | 61.69% | ||
| P6 | XP_016397186.1 | PREDICTED: cell wall protein RTB1-like | 1.00e-122 | 91% | 54.81% | |
| XP_016343246.1 | PREDICTED: mucin-5 AC-like | 2.00E-122 | 91% | 55.03% | ||
| XP_016091956.1 | PREDICTED: mucin-5 AC-like | 3,00E-106 | 91% | 51.01% | ||
| XP_016124548.1 | PREDICTED: cell wall protein DAN4-like | 6,00E-105 | 92% | 52.30% | ||
| P7 | RXN26987.1 | Hypothetical protein ROHU_020440 | 9,00E-65 | 100% | 87.88% | |
| KTG33652.1 | Hypothetical protein cypCar_00001489 | 2,00E-64 | 100% | 87.88% | ||
| XP_026090693.1 | Uncharacterized protein LOC113064245 | 2,00E-63 | 100% | 86.87% | ||
| ROL47558.1 | Hypothetical protein DPX16_13273 | 6,00E-63 | 100% | 86.87% | ||
| KAA0720020.1 | Hypothetical protein E1301 | 5,00E-58 | 100% | 78.43% | ||
| P8 | XP_009294219.1 | uncharacterized protein si:ch211-239 m17.1 isoform X4 | 2,00E-141 | 93% | 98.48% | |
| P9 | KTG45257.1 | Hypothetical protein cypCar_00011656 | 7,00E-90 | 95% | 85.03% | |
| ROL51783.1 | Hypothetical protein DPX16_19302 | 2.00e-88 | 82% | 94.49% | ||
| TRY88805.1 | Hypothetical protein DNTS_015019 | 4,00E-87 | 100% | 77.27% | ||
| P10 | NP_001108533.1 | Chemokine (C-X-C motif) ligand 32b, duplicate 1 precursor | 5,00E-10 | 71% | 35.16% | |
| NP_003166.1 | Cytokine SCM-1 beta precursor | 5,00E-08 | 68% | 27.91% | ||
| NP_032536.1 | Lymphotactin precursor | 1,00E-05 | 75% | 27.27% | ||
| NP_002986.1 | Lymphotactin precursor | 3,00E-07 | 68% | 27.91% | ||
| NP_067418.1 | C-C motif chemokine 8 precursor | 2,00E-05 | 67% | 32.61% | ||
| P11 | XP_016428050.1 | Maltase-glucoamylase, intestinal isoform 2 | 0.00 | 98% | 57.17% | |
| NP_001074606.1 | Sucrase-isomaltase, intestinal | 0.00 | 99% | 55.67% | ||
| P12 | AAI28789.1 | Zgc:165381 protein | 0.00 | 26% | 100% |
aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis
bE value: The Expect value (E-value) is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is
cQuery cover is the percentage of the query’s sequence (zebrafish gene of interest) that overlaps the subject’s sequence (returned orthologs)
dIdentity is calculated as the percentage of characters (amino acid) within the covered part of the query that are identical
Protein structure analysis
| Gene | Template IDb | Function | GMQEc | Coveraged | Identitye |
|---|---|---|---|---|---|
| P1 | 6fu5.1.B | Receptor-interacting Serine/threonine-protein kinase 2 | 0.34 | 55% | 30.50% |
| 3sd0.1.A | Glycogen synthase kinase-3 beta | 0.35 | 58% | 19.26% | |
| 4xlv.1.A | Insulin receptor | 0.32 | 51% | 23.19% | |
| P2 | No templates were found matching the sequence | ||||
| P3 | 3of6.1.A | T cell receptor beta chain | 0.38 | 70% | 19.31% |
| 5fhx.1.C | Antibody fragment light chain | 0.38 | 72% | 14.35% | |
| 6bpc.1.B | Monoclonal antibody 4F7 Fab heavy chain | 0.34 | 69% | 15.50% | |
| P4 | No templates were found matching the sequence | ||||
| P5 | No templates were found matching the sequence | ||||
| P6 | No templates were found matching the sequence | ||||
| P7 | No templates were found matching the sequence | ||||
| P8 | 6e56.1.B | Antibody pn132p2C05 | 0.49 | 90% | 21.93% |
| 5n4g.1.A | Heavy Chain | 0.49 | 93% | 23.08% | |
| P9 | No templates were found matching the sequence | ||||
| P10 | 1j8i.1.A | Lymphotactin | 0.42 | 60% | 30.26% |
| 1ncv.1.B | Monocyte chemoattractant protein 3 | 0.41 | 59% | 32.43% | |
| 5eki.5.A | C-C motif chemokine 21 | 0.40 | 55% | 27.54% | |
| P11 | 3top.1.A | Maltase-glucoamylase, intestinal | 0.45 | 49% | 59.66% |
| 3lpo.1.A | Sucrase-isomaltase | 0.44 | 48% | 57.04% | |
| 5nn3.1.A | Lysosomal alpha-glucosidase | 0.38 | 46% | 41.65% | |
| P12 | No templates were found matching the sequence | ||||
Protein structure analysis using SWISS-MODEL [57]
aGene ID: P1 to P12 correspond to the symbol used for each predicted protein subjected to bioinformatics analysis
bTemplate ID: 3D structure found that modeled the zebrafish protein of interest
cGMQE: Global Model Quality Estimation [58], which is the quality estimation of the model taking account properties from the target–template alignment and the template search method. GMQE is a number between 0 and 1. Higher numbers indicate higher reliability. A cut-off of GMQE> 0.3 was applied
dCoverage: The percentage of the query’s sequence (P1 to P12) that overlaps the Template sequence
eIdentity is the percentage of characters (amino acid) within the covered part of the query that are identical. Template ID correspond to the name of the template (Ortholog) in the Protein Data Bank (https://www.rcsb.org/ [59];)
Fig. 2Homology model of P1 putative kinase domain. The kinase domain of Receptor-interacting serine/threonine-protein kinase 2 (RIPK2, 6fu5.1.B in the rcsb protein database) is the template used for the homology modelling of P1. The X-RAY diffraction 3.26 Å was used to determine the experimental structure of 6fu5.1 [60]. The blue color show regions of the model where P1 was well-modeled and orange regions where P1 was poorly modeled. The well-modeled regions (blue) are regions where P1 is likely to be similar to the experimental 3D structure of the template. The homology model pertains to the putative kinase domain of P1 and starts from P1 residue N°3 (GLN, Glutamine) and ends with the residue N° 284 (LYS, Lysine)
Fig. 3Homology model of P3. T cell receptor beta chain (3of6.1.A in the rcsb protein database) is the template used for the homology modelling of P3. The homology model starts from the P3 residue N°32 (THR, Threonine) and ends with the residue N° 245 (THR, Threonine). The X-RAY diffraction 2.80 Å was used to determine the experimental structure of 3of6.1.A [61]. The blue color show regions of the model in which P3 was well-modeled by the template, and orange regions where P3 was poorly modeled. The blue regions correspond to the T cell receptor beta chain immunoglobulin domains
Fig. 4Homology model of P10 chemokine interleukin-8-like domain. Lymphotactin (1j8i.1.A in the rcsb protein database) is the template used for the homology modelling of P10. The homology model starts from P10 residue N°24 (GLU, Glutamic acid) and ends with the residue N° 102 (SER, Serine). The NMR spectroscopy was used to determine the experimental structure of 1j8i.1.A [62]. The blue color show regions of the model where P10 was well modeled and orange regions where P10 was poorly modeled. The chemokine interleukin-8-like domain of the model starts with P10 amino acid at position N°27(HIS, Histidine) and ends with amino acid at position N°86 ((LEU, Leucine). This region includes both well-modeled (blue) and poorly-modeled (orange) sections
Fig. 5Homology model of P11. Maltase-glucoamylase, intestinal (3top.1.A in the rcsb protein database) is the template used for the homology modelling of P11. The X-RAY diffraction 2.9 Å was used to determine the experimental structure of 3top.1.A [63]. The homology model starts from P11 residue N°922 (LYS, Lysine) and ends with the residue N° 1804 (PHE, Phenylalanine). The P-type trefoil domain (amino acid N°51–962), galactose mutaros domain (amino acid N°114–1085), and glycoside hydrolase domain (amino acid N°225–1152) are not covered in the homology model. The blue color show regions of the model where P11 was well modeled and orange regions show where P11 was poorly modeled
Fig. 6Expression level of selected zebrafish genes in other published studies. Expression level of selected zebrafish genes (P1, P9, and P12) in other published RNA-seq datasets of (a) zebrafish heart regeneration [64], and (b) zebrafish brain microglia [52] using the Zf Regeneration Database (www.zfregeneration.org, [65]). The y-axis indicates the normalized transcript level expressed as fpkm (fragments per kilobase of exon per million reads). On the x-axis is the different experimental conditions. (A, dpa = days post injury. B, active microglia indicates responding to acute damage, h = hours after acute damage)
Othologs found in the species Agmbystoma mexicanum, Xenopus laevis, Xenopus tropicalis and Cynops pyrrhogaster
| Gene IDa | Accession ID | Functionb | E Valuec | Query coverd | Identitye | Species |
|---|---|---|---|---|---|---|
| P1 | AIW46262.1 | Receptor tyrosine kinase-like orphan receptor 2 | 1.00e-09 | 40% | 22.90% | |
| P1 | XP_018112660.1 | Threonine-protein kinase 2-like isoform X1 | 3.00e-32 | 39% | 32.24% | |
| P3 | XP_004916146.1 | Cell adhesion molecule 1 isoform X2 | 3.00e-02 | 56% | 23.78% | |
| P5 | XP_018101840.1 | Uncharacterized protein | 2.00e-32 | 58% | 40.11% | |
| P8 | XP_004919377.2 | CD48 antigen | 7.00e-08 | 99% | 28.97% | |
| P9 | KAE8621564.1 | Hypothetical protein XENTR_v10004882 | 2.00e-06 | 23% | 45.95% | |
| P10 | XP_018120302.1 | Cytokine SCM-1 beta-like | 1.00e-07 | 64% | 36.59% | |
| P11 | XP_012818887.1 | Sucrase-isomaltase, intestinal | 0.00 | 99% | 58.89% | |
| P1 | BAB44154.1 | Insulin-like growth factor I receptor | 2.00e-10 | 44% | 23.17% | |
BLASTP BLOSUM45 was used to find distantly related proteins in the shown species
aGene ID: Corresponds to the symbol used for each predicted zebrafish protein subjected to bioinformatics analysis, the query. Only those with hits are shown
bFunction: Corresponds to the function associated with the ortholodg found for each gene
cE Value: The Expect value (E-value) or random background noise is the number of hits one can “expect” to see by chance when searching a database of a particular size (https://blast.ncbi.nlm.nih.gov). The lower the E-value, or the closer it is to zero, the more “significant” the match is
dQuery cover: The percentage of the query’s sequence (zebrafish gene) that overlaps the subject’s sequence (returned ortholog)
eIdentity: The percentage of amino acids within the covered part of the query that are identical between the query and the returned ortholog
Expression of zebrafish genes pertaining to P1-P12 in macrophages responding to microbial infection
| Gene IDa | Zebrafish Symbolb | Ensembl ID | DEc in Macrophages responding to |
|---|---|---|---|
| P1 | si:dkey-181f22.4 | ENSDARG00000105643 | |
| P2 | si:ch73-112 l6.1 | ENSDARG00000093126 | No |
| P3 | zgc:174863 | ENSDARG00000099476 | |
| P4 | si:dkey-56 m19.5 | ENSDARG00000068432 | No |
| P5 | si:ch211-105j21.9 | ENSDARG00000097845 | No |
| P6 | si:ch73-248e21.7 | ENSDARG00000096331 | Yes |
| P7 | si:ch211-191j22.3 | ENSDARG00000095459 | No |
| P8 | LOC100535303 | ENSDARG00000071653 | No |
| P9 | urp1 | ENSDARG00000093493 | No |
| P10 | xcl32a.1 | ENSDARG00000093906 | No |
| P11 | si:ch211-287n14.3 | ENSDARG00000093650 | No |
| P12 | pho | ENSDARG00000035133 | No |
The twelve genes of interest were examined in the RNA-seq dataset from Rouget et al., 2019 (GSE78954 and GSE68920), which examined the transcriptome of zebrafish macrophages responding to M. marinum infection
aGene ID: Corresponds to the zebrafish gene of interest in this study
bZebrafish Symbol: corresponds to the symbol attributed to each gene by the ZFIN Zebrafish Nomenclature Conventions
cDE: Differential Expression in zebrafish macrophages responding to infection compared to uninfected. Using the RNA-seq datasets from Rouget et al., 2019, DE was based on the authors’ original criteria of logFC greater than or equal to 1, and p-adj < 0.05. “Yes” or “No” indicates that the gene was differentially expressed or not, respectively. ND indicates that the transcript not detected in the dataset