Literature DB >> 31934309

Novel eye genes systematically discovered through an integrated analysis of mouse transcriptomes and phenome.

Chia-Yin Chiang1, Yung-Hao Ching2, Ting-Yan Chang1, Liang-Shuan Hu1, Yee Siang Yong2, Pei Ying Keak2, Ivana Mustika2, Ming-Der Lin2, Ben-Yang Liao1.   

Abstract

In the last few decades, reverse genetic and high throughput approaches have been frequently applied to the mouse (Mus musculus) to understand how genes function in tissues/organs and during development in a mammalian system. Despite these efforts, the associated phenotypes for the majority of mouse genes remained to be fully characterized. Here, we performed an integrated transcriptome-phenome analysis by identifying coexpressed gene modules based on tissue transcriptomes profiled with each of various platforms and functionally interpreting these modules using the mouse phenotypic data. Consequently, >15,000 mouse genes were linked with at least one of the 47 tissue functions that were examined. Specifically, our approach predicted >50 genes previously unknown to be involved in mice (Mus musculus) visual functions. Fifteen genes were selected for further analysis based on their potential biomedical relevance and compatibility with further experimental validation. Gene-specific morpholinos were introduced into zebrafish (Danio rerio) to target their corresponding orthologs. Quantitative assessments of phenotypes of developing eyes confirmed predicted eye-related functions of 13 out of the 15 genes examined. These novel eye genes include: Adal, Ankrd33, Car14, Ccdc126, Dhx32, Dkk3, Fam169a, Grifin, Kcnj14, Lrit2, Ppef2, Ppm1n, and Wdr17. The results highlighted the potential for this phenome-based approach to assist the experimental design of mutating and phenotyping mouse genes that aims to fully reveal the functional landscape of mammalian genomes.
© 2019 The Authors.

Entities:  

Keywords:  Expression profile; Functional genomics; Mammals; Modularity; Retina; Systems biology

Year:  2019        PMID: 31934309      PMCID: PMC6951830          DOI: 10.1016/j.csbj.2019.12.009

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   7.271


Introduction

Mus musculus is one of the most important mammalian organisms used for biomedical research since it exhibits extensive genetic, physiological, and behavioral similarities with humans. Consequently, over the last two decades, high throughput technologies have been applied to this animal model as part of ongoing efforts to understand how genes function in tissues/organs and during development [1], [2], [3], [4], [5]. Reverse genetics has provided straightforward phenotypic descriptions of gene functions at the organismal level [4], [6]. However, >50% of genes encoded in the mouse genome still lack such data [7]. Furthermore, among the mouse genes which have been phenotyped for various alleles (i.e., spontaneous, targeted, N-ethyl-N-nitrosourea-induced, endonuclease-mediated, etc.), the majority of these mutant alleles have only been phenotyped in a limited subset of tissues/organs without a standardized and comprehensive examination conducted across developmental stages and organs. To address this issue, The International Mouse Phenotyping Consortium (IMPC) has been working to systematically catalog gene functions for mice generated by the International Knockout Mouse Consortium (IKMC) by performing comprehensive phenotyping of using standardized pipelines to measure various phenotypes [8]. However, despite these efforts, prioritization and identification of functions of mammalian genes remain a challenge. Modularity is an essential property in biology because it allows a living system to function [9] and evolve [10] efficiently. Modular structures are observed in various gene networks, including those inferred from coregulation relationships [11], [12], [13]. In our previous study, we provided genomic evidence that a mutated mouse gene tends to present abnormal phenotype(s) in tissues where it is expressed [14]. Thus, we postulated that gene modules in a coexpression network are enriched with genes which co-function within the same anatomical structure (i.e., tissue/organ), and this anticipated tendency, if exists, may facilitate a prediction of gene function. Therefore, we compiled transcriptome datasets of mouse tissues profiled with various platforms to identify gene modules in coexpression networks inferred from each dataset. Statistically enriched tissue functions for each module were annotated based on mouse phenome data. By using this approach, we linked tissue functions >15,000 mouse genes. Specifically, we predicted 50 mouse genes with potential roles in the development and/or physiological functions of the eye, which is one of the most conserved organs during vertebrate evolution [15]. Among the 50 genes identified, 35 have not had their visual functions elucidated in any animal model. The results of the subsequent experiments based on a subset of candidate eye genes demonstrated that predicted functions of >85% of the candidate eye genes can be validated with functional analyses performed in zebrafish (Danio rerio) embryos. These results indicated the potential for this phenome-based approach to guide the experimental design of mutating and phenotyping mouse genes. We discuss the relevance of the novel eye genes discovered to genetics of several congenital eye diseases in humans.

Materials and methods

Mouse gene modules in coexpression networks

Expression signals of mouse genes in 78 and 96 tissues/cell types were obtained from the bioinformatic portal BioGPS (http://biogps.org/; accessed on September 23, 2015) [16] (see Tables S1–S2). These expression signals were derived from mRNA hybridization data collected with Affymetrix microarray chips, GNF1M and MOE430 2.0, respectively. The downloaded microarray-based expression signals were previously processed/normalized with the GCRMA algorithm [17]. In addition, RNA-seq data for 53 mouse tissues were obtained from data sources listed in Table S3 to provide a third transcriptome dataset for this study. For each set of RNA-seq data, raw reads were mapped to the reference mouse genome, NCBIM37 (Ensembl release 67), with TopHat (v2.0.12) [18]. The read count for each gene was then calculated with HTSeq [19]. Both procedures were performed with default parameters. The expression signals were represented as raw counts and then were transformed by using a variance-stabilizing transformation (VST) procedure available in the DESeq2 R software package [20]. The expression signals were measured multiple times in tissues/cells from biological replicates and these values were averaged after normalization. The strengths of coexpression (CoExp) between all possible gene pairs were estimated based on the absolute value of Pearson’s correlation coefficient of expression signals of compared genes across all of the tissues/cells examined. As a result, a weighted gene coexpression network was generated for each of the three transcriptome datasets. The R software package WGCNA and Python program PGCNA were subsequently used to construct a coexpression network and to identify the modular structures within it. WGCNA analyses used following settings: networkType = “unsigned”; TOMType = “unsigned”; minModuleSize = 30, maxBlockSize = 1000; reassignThreshold = 0. The soft-thresholding power values (and the corresponding R2 of the scale free topology model fit) were determined by the “pickSoftThreshold” function of WGCNA, and these values for the GNF1M, MOE430 and RNA-seq datasets were 3 (0.82), 3 (0.65), and 9 (0.71), respectively. As for PGCNA analyses, default settings were used except for using “-f 1” to preserved all the nodes and using “–usePearson” to define CoExp as described above. Consequently, WGCNA (or PGCNA) identified 54 (or 45), 64 (or 43), and 22 (or 28) modules of coexpressed genes from the GNF1M, MOE430, and RNA-seq datasets, respectively (see below). Zsummary module preservation statistics were obtained by performing a WGCNA. These statistics included four statistics related to density and three statistics related to connectivity. These statistics allow a quantitative assessment to be performed to determine whether the density and connectivity patterns of modules defined in a reference dataset are preserved in a test dataset. A Zsummary value between 2 and 10 indicates moderate module preservation, whereas a Zsummary value >10 indicates strong module preservation. The score of the overlap between two modules from different networks was calculated by the function “overlapTable” with default settings.

Associating coexpressed gene modules with tissue functions based on phenotypic data

Annotations of mouse genes and their associated phenotypes derived from mutagenesis approaches were obtained from MGI (http://www.informatics.jax.org/, version 5.2; accessed on October 8, 2015). Ensembl IDs of the phenotyped mouse protein-coding genes were found at MRK_ENSEMBL.rpt, while information regarding genotype-phenotype associations (presented as Mammalian Phenotype IDs, MP IDs) was found at MGI_GenePhenoMP.rpt. Phenotypes caused by mutations in multiple genes were discarded. Consequently, there were “phenotyped” 7791 mouse protein coding genes with one or more MP IDs. These data derived from gene knock out, gene knock down, transgenic insertions, and/or point mutation experiments conducted in mouse models. MP IDs are hierarchically structured and a parent MP ID represents a phenotype lineage that may include several child MP IDs to describe a more detailed abnormal phenotype. Genes with a child MP ID were also assigned to the parent MP IDs. As described in a previous study [14], we transformed the MGI annotated MP IDs into records of phenotypic defects at the tissue level (Table S4). MP ID terms used to define abnormal phenotypes in the 47 tissues examined are listed in Dataset S1. There were 4363 genes associated with at least one of the 47 tissue functions in our dataset. For the remaining 3428 genes, although mutant strains have been generated (and phenotyped) for these, no phenotypic defect has been reported in any of the 47 tissues of interest. Furthermore, in the latter cases, the absence of a phenotype profile may partially be explained by incomprehensive phenotyping. To functionally annotate the gene sets represented as modules in our coexpression network, we performed an enrichment analysis on the mutant phenotypes available for the 47 tissues listed in Table S4. For each module and each tissue to which a gene’s function could be associated, the number of phenotyped genes found to be associated with (and not associated with) the focal tissue function in the module of interest, as well as the number of genes in other non-focal modules, were counted. Tissues with entries overrepresented with P < 0.001 by Fisher’s exact test were assigned to the coexpressed gene module.

Network properties of module members

To estimate the centrality of genes in a module, only interactions between genes of the same modules were considered in the approximation of centrality of the focal gene in a module. Connectivity (K) was defined as the averaged connection strength between the focal gene and the rest of the genes in the same module. Betweenness centrality (the number of shortest paths between all pairs of genes that pass through the gene) and closeness centrality (the sum of the length of the shortest paths between the genes and all other genes) were calculated with the ‘networkx’ Python package.

Known and predicted human orthologs associated with eye diseases

To understand if human orthlogs of mouse eye candidate genes were enriched with known human genes associated with eye diseases, we obtained the lists of human genes associated with retina disorder or eye disease from RetNet (http://www.sph.uth.tmc.edu/RetNet/, accessed on May 22, 2017) or HPO (https://hpo.jax.org/, accessed on March 28, 2019), respectively. The file “hp.obo” and “ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt” were downloaded as the phenotype ontology data and the gene-phenotype association data, respectively. Genes annotated with any of the HP term “Abnormality of the eye” (HP:0000478) or/and the downstream terms of HP:0000478 were defined as known human genes associated with congenital eye diseases.

Zebrafish maintenance and injection of morpholinos

Zebrafish Danio rerio AB strains were maintained at 28 °C with a 14 h/10 h dark/light cycle. Two males and four females were set-up for breeding. Embryo collection and microinjections were performed according to a standard protocol [21]. Orthology and sequence information for zebrafish/mouse/human genes were obtained from the portal BioMart of the Ensembl database (v87). MO were designed and synthesized by GeneTools, LLC. To minimize possible off-target effects, a BLAST search was performed against the zebrafish genome (GRCz10). MO which were found to target translational start sites or splicing junctions of a secondary gene were redesigned. The criteria for judging off-target binding of MO were as follows: 1) MO needed to be at least 14 bases in length to target regions between the 5′ cap and within 25 bases of the 3′ translational start site of a mRNA; and 2) MO needed to target splice junctions within 10 bases of an exon and 25 bases of an intron. The specific amount of MO which were injected into embryos with a Nanoject II instrument (Drummond Scientific) (see below). The standard control MO (CCTCTTACCTCAGTTACAATTTATA) which targets a human beta-globin intron mutation were injected into embryos to serve as the negative control group for each experiment. As suggested by GeneTools, the standard control MO is a widely used and frequently reported sequence of negative control [22]. Injected embryos were maintained and raised under the same conditions as those for the adult zebrafish. Observations regarding phenotype were made at 72 hpf and images were captured and measured with a stereomicroscope (Nikon SMZ645) equipped with a CCD camera. The fixation and embedding of zebrafish embryos for paraffin sectioning and H&E staining were according to the standard protocol [23]. The Institutional Animal Care and Use Committee of Tzu Chi University approved all of the animal experiments conducted in this study (IACUC approval number: 106050).

Statistics

Fisher’s exact test was used to examine if the proportions of samples with a specified characteristic were the same between two compared groups. The Mann-Whitney U test was used to examine if two compared samples were equivalent in their network properties or in regard to parameters used to measure eye phenotypes. Fisher’s exact test (two-tailed) or the Mann-Whitney U test (two-tailed) were applied by using the R function fisher.test or wilcox.test, respectively, to determine P-values.

Results and discussion

Coexpressed mouse gene modules and the associated tissue functions

The procedures of our analysis are summarized as Fig. S1 (see below for details). To identify modular structures in coexpression gene networks generated from transcription profiles of mouse tissues/cells, we compiled three platform-specific datasets from transcriptomes obtained from three sources: 1) 78 tissues/cells of mouse embryos, 8- to 10-week-old mice, or 10- to 12-week-old mice (Table S1) profiled by the GNF1M microarray platform [1], 2) 96 tissues/cells of mouse embryos or 8- to 10-week-old mice (Table S2) profiled by the MOE430 2.0 microarray platform (referred to as MOE430 hereafter) [24], and 3) 53 embryonic and adult mouse tissues profiled by Illumina RNA-seq in the ENCODE project (30 tissues) [25] or other individual studies (23 tissues) (Table S3), respectively. Both the GNF1M and MOE430 datasets were generated and processed from a single study. To integrate the RNA-seq-based transcriptome data produced from various studies, the raw reads of each transcriptome set of data for 53 tissues were remapped, reprocessed, and normalized (see Section 2). In the GNF1M, MOE430, and RNA-seq datasets, mRNA expression levels of 17,384, 16,688, and 20,968 protein-coding genes were estimated, respectively. As a result, a total of 15,521 common genes whose mRNA levels were estimated across the platforms were identified. To construct coexpression gene networks and identify gene modules within each network, we adopted two methods, including Weighted correlation network analysis (WGCNA) [12], [26] and Parsimonious Gene Correlation Network Analysis (PGCNA) [27]. Although WGCNA and PGCNA employ very different algorithms to define modules of coexpressed genes [WGCNA: hierarchical clustering [28]; PGCNA: fast unfolding [29]], the results generated from the two methods are consistent with each other (see below, and Section 2). Because WGCNA is more widely used, we present the data based on WGCNA as the main results as follows. There were 54, 64, and 22 modules of coexpressed gene clusters identified among the GNF1M, MOE430, and RNA-seq datasets, respectively (see Section 2; Tables S5–S7) by WGCNA. A Zsummary score, which provides evidence that a module is preserved in another independent network, was calculated for each of the abovementioned modules (see Section 2). Although GNF1M, MOE430, and RNA-seq datasets were generated by different platforms/approaches and include expression profiles of different sets of tissues, a proportion of the modules analyzed [(GNF1M: 31/54 (57.4%), MOE430: 33/64 (51.5%), and RNA-seq: 7/22 (31.8%)] were found to be associated with highly significant preservation scores (Zsummary > 10) in at least one of the two other datasets (Fig. S2). The analysis on overlapped modules between networks indicated that the relatively few GNF1M- or MOE430-based modules found to have a high Zsummary score in the RNA-seq-based network is because multiple GNF1M- or MOE430-based modules often corresponded to the same module in the RNA-seq-based network (Fig. S2). In fact, only a small number of modules in microarray-data-based network were not overlapped with any module in the RNA-seq-based network (Fig. S2). This result suggested the reproducibility of modules detected across the datasets analyzed. To functionally annotate coexpressed gene modules based on the mouse phenome, we focused on genes that had mutagenesis-derived phenotypes manifested in the 47 focal mouse tissues defined according to annotations of Mouse Genome Informatics (MGI) (see Table S4 and Dataset S1, and Section 2), and looked for enriched tissue-level phenotypes for each module according to Fisher’s exact tests (see Section 2) [14]. Because 47 tissues were tested, a P-value < 10−3 (0.05/47 = 1.06 × 10−3, Bonferroni correction) was considered to be statistically significant to correct for multiple testing. The tissue(s) associated with each of the modules in the coexpression network examined derived from the GNF1M, MOE430, and RNA-seq datasets are shown in Figs. S3–S5, respectively (also see Tables S5–S8).

Prediction of mouse genes with eye-related functions

Through the abovementioned analyses, 12,832, 15,453 and 5738 genes were associated with at least one tissue function based on their module membership in the GNF1M-, MOE430-, and RNA-seq-networks, respectively. Some of these genes have been phenotyped experimentally, while others have not (Tables S5–S7). Because modular structures can vary across different coexpression networks, a gene that is associated with a tissue function in a given network (e.g., the GNF1M-based network) is not necessarily associated with the same tissue function in another network (e.g., the RNA-seq-based network). Based on the consistency of assigned tissue functions in the three expression datasets used in the current study, we determined confidence scores for each tissue function for each gene. Thus, low, medium, and high confidence scores for a given tissue function of a gene were considered to indicate that the assigned tissue function for that gene was observed in one, two, or three of the datasets, respectively. To estimate the accuracy of predicting associated tissue function in this manner, we calculated Rdcv/phe (the ratio of “the number of genes previously discovered to have mutant phenotypes in a predicted tissue” divided by “the number of total phenotyped genes”) for each set of genes having a confidence score and a predicted tissue function (see Table S9). The two gene numbers used to compute Rdcv/phe for each gene set were based on gene annotations obtained from MGI [30] (see Section 2). The full gene lists of predicted tissue functions with the corresponding confidence score for each gene in the lists are provided as Dataset S2. The gene sets predicted to function in “eye” achieved relatively high Rdcv/phe scores (Rdcv/phe = 96.3% [52/54], 77.3% [34/44], and 34.4% [65/189] for high, medium, and low confidence scores, respectively; Rdcv/phe of genes outside the eye module(s) was 19.3% [784/4072]) compared with the Rdcv/phe values of the gene sets predicted to function in other tissues (Table S9). For example, the Rdcv/phe values for genes with predicted functions in “spleen” were only 60.9% (56/92, high confidence), 37.3% (128/343, medium confidence), 22.7% (152/670, low confidence), and 9.3% (301/3254, others) (Table S9). Subnetworks of the eye module in the GNF1M-, MOE430- and RNA-seq-based coexpression networks, as well as the confidence scores of the module members within each network, are presented in Fig. 1A, B, and C, respectively. Next, we focused on this subnetwork structure by only considering intra-modular interactions. Specifically, we computed connectivity (K), closeness, and betweenness for each gene in the largest component (i.e., the module with the greatest number of nodes) of eye modules (and in this module the number of genes of each confidence score has to be at least 10) for each network (see Section 2). In general, the genes with higher confidence scores were associated with greater K, closeness, and betweenness values (Fig. 1D–F). These results indicate that the genes which were more consistently reported to be associated with eye function across the platforms analyzed were also more likely to represent hub genes in the subnetwork of the eye module, and these genes may be more important for eye function or development. Although the algorithm adopted by PGCNA does not clustered genes into modules by considering all edges in a network, the reproduced Fig. 1 based on PGCNA-defined modules showed consistent trends, except for the lack of statistical significance when the analyses were performed based on the RNA-seq-data (Fig. S6).
Fig. 1

Genes associated with the “eye” module. (A) Genes with different levels of consistency in being located in the eye module across the three platforms are shown according to their confidence scores [e.g., high (dark red), medium (red), low (pink)] for predicted retinal function. (B-D) Genes with higher confidence scores tend to have greater values of (B) connectivity, (C) betweenness, and (D) closeness, thereby indicating that they are more likely to be the central node of the module(s). In (B–D), P-values were determined with the Mann-Whitney U test and are associated with the arched lines that indicate the values that were compared. The corresponding weighted coexpression networks used to generate the data are indicated under each panel. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Genes associated with the “eye” module. (A) Genes with different levels of consistency in being located in the eye module across the three platforms are shown according to their confidence scores [e.g., high (dark red), medium (red), low (pink)] for predicted retinal function. (B-D) Genes with higher confidence scores tend to have greater values of (B) connectivity, (C) betweenness, and (D) closeness, thereby indicating that they are more likely to be the central node of the module(s). In (B–D), P-values were determined with the Mann-Whitney U test and are associated with the arched lines that indicate the values that were compared. The corresponding weighted coexpression networks used to generate the data are indicated under each panel. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) To further verify the predicted eye functions of the mouse genes that were identified, we examined if the human orthologs of these mouse genes tended to be known genes associated with eye diseases in humans. We found that the human orthlogs were enriched with genes cataloged in the Retinal Information Network (RetNet, a database which compiled known retinal disease genes in humans; see Section 2) (Table 2) and enriched with eye disease-related genes annotated by Human Phenotype Ontology (HPO) [31] (see Section 2) (Table 2). More importantly, when we focused on a subgroup of genes which did not have any previously reported retinal phenotype exclusive to mice, the abovementioned enrichment was still observed (Table 2). These results strongly suggest that many mouse genes in the eye modules identified are associated with previously undiscovered visual-related functions.
Table 2

The human orthologs of mouse genes with predicted eye functions are enriched with genes cataloged in RetNet (dubbed “RetNet”) or HPO-defined eye disease genes (dubbed “HPO”).

Confidence score of predicted eye functionsP-valuea
All orthologs
Orthologs without any previously reported eye phenotype in mice
RetNetHPORetNetHPO
High<10−11<10−6<10−20.02
High + Medium<10−12<10−9<10−20.01
High + Medium + Low<10−4<10−100.12n.s.

P-value was obtained under the null hypothesis of no enrichment by Fisher’s exact test (n.s.: not significant).

Highly expressed genes and tissue-specific genes are more easily to be clustered in modules of a gene coexpression network. To understand if it was coexpression information, or the expression abundance in the eye tissues alone, contributing to the predictive power of our approach, we computed Zeye for each gene for each dataset as Zeye = (Eeye − Eavg)/SDE, where Eeye is the expression signals of the focal gene in the eye tissues, Eavg is its mean expression signals across all the tissues, and SDE is the standard deviation of expression signals in all the tissues. Zeye quantified the relative mRNA abundance of a gene in the eye tissues in comparison with other tissues, and did not incorporate the information of coexpression between genes. We found that genes with a higher confidence score defined in Fig. 1A tend to have a higher Zeye (Fig. S7), suggesting the need to verify the contribution of expression patterns at a single gene level to the predictive power of our approach. We subsequently generated three gene lists by selecting 185, 549 and 306 genes with the highest Zeye values from each of the GNF1M-, MOE430-, and RNA-seq platforms, respectively (the numbers of genes selected were the same as the numbers of candidate genes identified from the eye related modules for each network shown in Fig. 1A). Similar to Fig. 1A, low, medium, and high confidence scores of the candidate eye genes identified through Zeye were determined based on the observed association of a gene in one, two, or three Zeye-defined gene lists, respectively. We found that, when candidate genes were defined based on top ranked Zeye, the enrichment in known genes associated with eye diseases became insignificant, or not as significance as the P-values shown in Table 2 (Table S10). This result, indicated that coexpression information, which emphasizes the relationship between genes, has assisted the gene function prediction at the phenotypic level.

External data supporting the prediction of genes with eye-related functions

In the present study, a total of 75 mouse genes and 73 mouse genes were found to be associated with the eye module and to have high and medium confidence scores, respectively (Tables S11 and S12). Among these genes, 50 have no previously documented eye phenotype according to the MGI annotations, or they do not have any mutant strains produced for phenotyping (Tables S11–S13, Table 1, and Section 2). When the same approach to identify eye candidate genes with a confidence score was applied to PGCNA-defined coexpressed gene modules, 45 out of these 50 genes were also found to be medium- or high-confident eye candidate genes by the PGCNA method (genes marked with the superscript “e” in Table 1), indicating the consistency of the results when different methods were used to predict gene functions when the phenome-based approach was applied to. It should also be noted that the 50 novel eye genes listed in Table 1 were predicted in December 2016. Subsequently, in 2017 and 2018, eye-related functions for two genes which each received a high-confidence score (i.e., Samd7 and Lrit1 Tspan10, respectively) were confirmed in targeted mutagenesis studies conducted in a mouse model[32], [33] (Table 1). We also found that 17 of the remaining 48 candidate genes have characterized functions related to vision in at least one other vertebrate species, including human, chicken (Gallus gallus), or zebrafish (Danio rerio) (see Table 1) [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55]. For example, while the functions of Cnga1, Pou6f2, and Vit in mouse remain unexplored, nonsense or misense mutations of human CNGA1 is associated with Retinitis pigmentosa [35], [36], a common form of inherited retinal degeneration; knock-down of zebrafish pou6f2 suppresses retina regeneration [52]; mutations in vit in chicken have been linked to weakened vision during the domestication process [55].
Table 1

List of candidate genes underlying proper eye development and function in mice. Only genes without a previously reported gene deletion phenotype in any of the 47 selected tissues in mice are included in this list.

MGI Gene symbolaNumber of ortholog(s)b
Characterized vision/eye-related phenotypes due to mutations (or reduced expression, if noted) [gene symbol, speciesc], if available
HumanZebrafish
High confidence genes
Ankrd33d,e12Unreported
Arr3e12Knockdown caused a sever delay in photoresponse recovery [arr3, zebrafish] [34]
Cnga1e12Retinitis pigmentosa 49, RP49 [CNGA1, human] [35], [36]
Cryba4e11Cataracts and microphthalmia, CTRCT23 [CRYBA4, human] [37], [38]
Crybb3e11Congenital cataracts, CTRCT22 [CRYBB3, human] [39]
Gm4792e00Unreported
Gm9918e10Unreported
Grifind,e11Unreported
Kcnj14d,e11Unreported
Kcnv2e12Retinal cone dystrophy 3B, RCD3B [KCNV2, human] [40]
Lrit1e12Unreported until 2018; visual acuity impairments in optokinetic response [Lrit1, mouse] [32]
Lrit2d,e11Unreported
Mlanae10None [ps. no eye abnormality detected in mice [41]]
Pde6he11Incomplete achromatopsia [43] and Retinal cone dystrophy 3A, RCD3A [42] [PDE6H, human]
Ppef2d,e12Unreported
Ppm1nd,e12Unreported
Samd7e11Unreported until 2017; ectopic expression of nonrod genes and rod photoreceptor cell dysfunction [Samd7, mouse] [33]
Slc1a7e12Early pathological change in the development of aged-related macular degeneration [EAAT5, human] [44]
Stx3e12Congenital cataract and intellectual disability [STX3, human] [45]
Tmem215e10Unreported
Tspan10e11Unreported until 2019, concomitant strabismus [TSPAN10, human] [46]



Medium confidence genes
4632404H12Rike00Unreported
Adald,e11Unreported
Car14d,e11Unreported
Ccdc126d,e11Unreported
Crxose00Unreported
Crybb1e11Pediatric or age-related cataracts CATARACT 17 [CRYBB1, human] [47], [48]
Crygne12Unreported
Defb9e00Unreported
Dhx32d,e12Unreported
Dkk3d,e12Unreported
Fabp12e13Unreported
Fam169ad12Unreported
Fam19a3e10Unreported
Frmpd2d,e11Unreported
Gzmme10Unreported
Impg1e12Vitelliform macular dystrophies [IMPG1, human] [49]
Lyg2e13Unreported
Make11Retinitis pigmentosa 62, RP62 [MAK, human] [50], [51]
Mgarpe10Unreported
Otore10Unreported
Pdzph1e01Unreported
Plk510Unreported
Pou6f211Knockdown suppressed retina regeneration [pou6f2, zebrafish] [52]
Rrhe11Retinitis punctata albescens [RRH, human] [53]
Slco4a111Expression reduction marks the occurrence of retinal detachment [SLCO4A1, human] [54]
Tldc1d,e11Unreported
Vite11Evolutionarily reduced expression linked to weakened vision during domestication [vit, chicken] [55]
Wdr17411Unreported
Zfp563e10Unreported

See Tables S11 and S12 for the corresponding MGI ID and full gene name.

See Table S15 for the corresponding gene IDs of the orthologs.

Human: Homo sapiens; mouse: M. musculus; chicken: Gallus gallus; zebrafish: Danio rerio.

Indicates genes which were functionally validated in zebrafish in the present study.

Eye candidate genes with a medium or a high confidence score when coexpressed gene modules were defined by PGCNA.

List of candidate genes underlying proper eye development and function in mice. Only genes without a previously reported gene deletion phenotype in any of the 47 selected tissues in mice are included in this list. See Tables S11 and S12 for the corresponding MGI ID and full gene name. See Table S15 for the corresponding gene IDs of the orthologs. Human: Homo sapiens; mouse: M. musculus; chicken: Gallus gallus; zebrafish: Danio rerio. Indicates genes which were functionally validated in zebrafish in the present study. Eye candidate genes with a medium or a high confidence score when coexpressed gene modules were defined by PGCNA. The human orthologs of mouse genes with predicted eye functions are enriched with genes cataloged in RetNet (dubbed “RetNet”) or HPO-defined eye disease genes (dubbed “HPO”). P-value was obtained under the null hypothesis of no enrichment by Fisher’s exact test (n.s.: not significant). However, while few of the other candidate genes (i.e., Mlana, Pde6h, and Ppef2) with potential eye-related functions have been examined in mammals, eye-related functions have not been discovered for these genes in mice. For example, no visible morphological abnormality was observed in the eyes of Mlana-deleted mice [41]. While mutations in PDE6H in humans causes total color blindness [43] or retinal cone dystrophy [42], Pde6h knockout in mice exhibited normal retinal morphology and no visual functional impairment [56]. Meanwhile, loss of rdgC, the ortholog of mouse Ppef2, results in light-induced retinal degeneration in Drosophila [57], yet mouse double knockouts of Ppef2 and Ppef1revealed no visual defect [58]. Also, IMPC has examined and unable to discover vision/eye phenotypes for a subset of candidate eye genes, including Kcnj14, Kcnv2, Adal, Fabp12, Plk5, and Wdr17 (Table S14), yet the ortholog of Kcnv2 in humans is associated with retinal cone dystrophy [40]. For these cases, it remains unclear whether the functions of the abovementioned genes in mice are truly unrelated to vision, or whether vision-related phenotypes can only be observed under previously unconsidered conditions. One such an example is that Cabp5 (a gene with a high confidence score, see Table S11) knockout mouse exhibit no significant retinal abnormalities. In spite of this, reduced sensitivity of rod-mediated light responses of retinal ganglion cells is observed [59]. Similarly, IMPC has only examined gross morphological/physiological alternations in mutant strains (see Table S14); more comprehensive phenotypic assays might be needed to definitively conclude if any of these genes have eye-related functions. Molecular experiments have been conducted for some of the genes listed in Table 1. The results from these experiments indirectly support the retinal functions predicted for these genes. For example, mRNA of mouse Gzmm localizes exclusively to photoreceptor cells in the retina and is only expressed after the eye opens. These results imply that Gzmm transcripts are related to maintenance of the retinal structure or functions of mature photoreceptor cells [60]. Another example is the protein product of mouse Mgarp which specifically localizes to mitochondria in retina cells. When MGARP is overexpressed without its N-terminal region, severe aggregation of mitochondria occurs. Taken together, these results imply that Mgarp may have a role in retinal-energetic metabolism [61]. Furthermore, mouse Slc1a7 has been identified as a target of the light-regulated microRNA, miR-183/96/182, in photoreceptor cells [62]. While all of these molecular data are valuable, we only considered phenotypes as direct evidence of the gene functions predicted in the present study, since gene expression activities or interactions could, in part, be spurious and not all of them produce phenotypic outcomes [7], [63].

Functional validation of candidate mouse eye genes in zebrafish

The anatomy, histology, and biochemistry of the eye in zebrafish (Danio rerio) are comparable to mammals. Concerning the rapid embryonic development, zebrafish could be excellent in vivo model for efficiently validating eye-related gene functions predicted from mouse -omics data [64]. Accordingly, zebrafish orthologs of candidate mouse genes that met the following criteria were selected and examined for their requirement in eye development. First, non-crystalline genes with corresponding human orthologs that are not linked to eye-related diseases according to RetNet and Online Mendelian Inheritance in Man (OMIM, https://www.omim.org/) databases. Second, genes without any reported visual function in the non-human vertebrate model. Third, to reduce the complexity of the zebrafish experiments, the number of orthologs in the zebrafish genome for each of the selected genes was limited to two. Accordingly, 15 out of 50 mouse genes corresponding to 21 zebrafish orthologs were selected for further study (Table 1, asterisks, and Table S15). We individually knocked down these zebrafish orthologs with use of morpholino oligos (MO) by injecting them into the yolk of 1-cell stage embryos to block the translation of the targeted mRNAs (Table S16; see Section 2). Embryos of the same brood were also injected with an equal amount of standard control oligo (see Table S16, Section 2) as negative controls. Since the role of pax6b in eye development in zebrafish is well-known [65], MO targeting on pax6b was used as a positive control. According to the gross phenotypic consequences of these experiments, we categorized the eye phenotypes of MO-injected embryos into three categories: the global development delay, small eye, and abnormal lens/eye ratio. The developmental delay phenotype is defined by the gross morphology of MO-injected embryos (Fig. 2A; Figs. S8–S23). When zebrafish orthologs of Grifin and Tldc1 mouse genes were targeted, a significant increase in the proportion of developmentally delayed embryos (embryos arrested at 48 hpf, hours post-fertilization, or earlier) was observed (Fig. 2A). For examination of the phenotype of eye development, the gross phenotype of the embryonic eye at 72 hpf was characterized based on the size of the eye, as well as the relative size of the lens by the lens/eye ratio (Fig. 2B) (see Section 2, Figs. S8–S23). The lens/eye ratio is to describe the potential knockdown effect which specifically interferes the lens development. To control for the confounding effect of late embryonic development in our measurements of eye phenotype, we excluded the data from developmentally delayed embryos from our analyses. As a positive control, the embryos injected with pax6b MO exhibited small eye phenotype, yet no change in their lens/eye ratio (Fig. S23), which is consistent with that described in a previous report [65].
Fig. 2

Consequences of knocking down zebrafish orthologs of mouse candidate eye genes with MO. (A) Groups of MO for two candidate genes (Grifin, Tldc1) exhibited a greater proportion of developmentally delayed embryos than the control groups. The numbers of embryos used to calculate the proportions are indicated for each bar. (B) The left side of each embryo was imaged and a line was drawn across the centroids of the lens and the otic vesicle to measure gross eye morphology. Based on this line, eye (or lens) size was defined as the length (in pixels) of the red (or green) dashed line that starts from the anterior boundary and extends to the posterior boundary of the eye (or lens). The lens/eye ratio was calculated by lens size divided by eye size. (C) Groups of MO for 11 candidate genes induced a significant reduction in eye size. (D) Groups of MO for 5 candidate genes induced aberrant lens/eye ratios. In (C–D), violin plots present the values for (C) eye size and (D) lens/eye ratios. In (A, C, D), P-values were determined with Fisher’s exact test (A) or the Mann-Whitney U test (C, D). These values are associated with arched gray lines at the top of each panel which indicate the values that were compared. At the bottom of the plots, names of the corresponding mouse genes (black font) which were validated with MO are indicated at the bottom of the plots. The names of the MO groups (grey font) are specified only when multiple MO were designed for a focal gene. Only comparisons that differed significantly are shown (Figs. S8–S22 present the complete set of data). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Consequences of knocking down zebrafish orthologs of mouse candidate eye genes with MO. (A) Groups of MO for two candidate genes (Grifin, Tldc1) exhibited a greater proportion of developmentally delayed embryos than the control groups. The numbers of embryos used to calculate the proportions are indicated for each bar. (B) The left side of each embryo was imaged and a line was drawn across the centroids of the lens and the otic vesicle to measure gross eye morphology. Based on this line, eye (or lens) size was defined as the length (in pixels) of the red (or green) dashed line that starts from the anterior boundary and extends to the posterior boundary of the eye (or lens). The lens/eye ratio was calculated by lens size divided by eye size. (C) Groups of MO for 11 candidate genes induced a significant reduction in eye size. (D) Groups of MO for 5 candidate genes induced aberrant lens/eye ratios. In (C–D), violin plots present the values for (C) eye size and (D) lens/eye ratios. In (A, C, D), P-values were determined with Fisher’s exact test (A) or the Mann-Whitney U test (C, D). These values are associated with arched gray lines at the top of each panel which indicate the values that were compared. At the bottom of the plots, names of the corresponding mouse genes (black font) which were validated with MO are indicated at the bottom of the plots. The names of the MO groups (grey font) are specified only when multiple MO were designed for a focal gene. Only comparisons that differed significantly are shown (Figs. S8–S22 present the complete set of data). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) As a result, 14 zebrafish genes, corresponding to mouse candidate genes Adal, Ankrd33, Car14, Ccdc126, Dkk3, Fam169a, Grifin, Kcnj14, Lrit2, Ppef2, and Ppm1n, were found to exhibit reduced eye size phenotype (Fig. 2C). The other five zebrafish genes, corresponding to mouse candidate genes, Ankrd33, Car14, Dhx32, Dkk3, and Wdr17, exhibited an aberrant lens/eye ratio phenotype (Fig. 2D) (Figs. S8–S22). Strikingly, knockdown of zebrafish orthologs of mouse genes Ankrd33, Car14, and Dkk3 exhibited both of these eye abnormalities. Therefore, the predicted eye-related functions were validated for the majority (13 out of 15) of mouse candidate genes in zebrafish. Strikingly, knockdown of zebrafish orthologs of mouse genes Ankrd33, Car14, and Dkk3 exhibited both of these eye abnormalities. Among them, Ankrd33 has been found to be expressed in the mouse retina and acts as a transcriptional repressor that suppresses CRX-regulated photoreceptor genes [66]. However, the phenotypic analysis of Ankrd33 in retina is lacking. In zebrafish, there are two orthologs of mouse Ankrd33, ankrd33aa and ankrd33ab. The knockdown of zebrafish ankrd33aa showed the phenotype of aberrant lens/eye ratio (P = 0.004, Fig. 2D), whereas knockdown of zebrafish ankrd33ab caused the reduced eye size (P < 10−3, Fig. 2C). To further investigate whether the gross phenotypes presented in ankrd33aa and ankrd33ab morphants could reflect specific eye abnormalities in tissue level, we performed the subsequent histological analysis. In control MO-injected embryos, the retinal lamination was well structured, and the crescent-shaped inner plexiform layer (IPL) and outer plexiform layer (ONL) can be clearly observed (Fig. 3A) for the all the eyes examined. Among the analyzed thirteen eyes from randomly selected ankrd33aa morphants, we observed retinal abnormalities such as disorganized plexiform layers (either IPL or OPL) accompanying mild cell death (9/13) (Fig. 3C) and massive retinal degenerations in the ganglion cell layer (GCL) and inner nuclear layer (INL) (2/13) (Fig. 3D). In ankrd33ab morphants, other than normal retinal lamination (7/16) (Fig. 3E), we observed the poor differentiation of retinal layers and the enucleation defect in lens fiber cells (9/16) (Fig. 3F). These results phenotypically verified visual related functions of Ankrd33, and suggested that the gross morphological defects could correlate with specific structural defects in the retina.
Fig. 3

Histological examination of retinal structures in morphants of ankrd33aa and ankrd33ab. (A) The eye section of control MO-injected embryos. The retina of all control embryos (13/13) showed well-structured lamination. (B,C,D) Eye sections of ankrd33aa morphant embryos. The retina of ankrd33aa morphant embryos could be (B) normal (2/13), (C) disorganized in plexiform layers (9/13), or (D) severely degenerated (2/13). (E,F) Eye sections of ankrd33ab morphant embryos. The retina of ankrd33ab morphant embryos could be (E) normal (7/16) or (F) underdeveloped with an enucleation defect in lens fiber cells (yellow arrows) (9/16). In (A–F), coronal sections of the eye were performed to reveal retinal lamination by hematoxylin and eosin staining. The area within yellow rectangle is enlarged to show the retina structure. The anterior is to the top. Scale bars: 20 μm. RPE: retinal pigment epithelium; PCL: photoreceptor cell layer; ONL: outer nuclear layer; OPL: outer plexiform layer; INL: inner nuclear layer; IPL: inner plexiform layer; GCL: ganglion cell layer. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Histological examination of retinal structures in morphants of ankrd33aa and ankrd33ab. (A) The eye section of control MO-injected embryos. The retina of all control embryos (13/13) showed well-structured lamination. (B,C,D) Eye sections of ankrd33aa morphant embryos. The retina of ankrd33aa morphant embryos could be (B) normal (2/13), (C) disorganized in plexiform layers (9/13), or (D) severely degenerated (2/13). (E,F) Eye sections of ankrd33ab morphant embryos. The retina of ankrd33ab morphant embryos could be (E) normal (7/16) or (F) underdeveloped with an enucleation defect in lens fiber cells (yellow arrows) (9/16). In (A–F), coronal sections of the eye were performed to reveal retinal lamination by hematoxylin and eosin staining. The area within yellow rectangle is enlarged to show the retina structure. The anterior is to the top. Scale bars: 20 μm. RPE: retinal pigment epithelium; PCL: photoreceptor cell layer; ONL: outer nuclear layer; OPL: outer plexiform layer; INL: inner nuclear layer; IPL: inner plexiform layer; GCL: ganglion cell layer. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Final remarks

In the present study, an integrated transcriptome-phenome analysis based on the mouse data was used to predict tissue functions for mammalian genes. In particular, we focused on a subset of eye candidate genes with potential biomedical relevance to validate the results of gene function prediction. For 86.7% (13/15) of these genes, eye-related functions were experimentally validated with targeted MO in zebrafish. These 13 genes include genes whose previous targeted deletion experiment did not lead to any observable eye abnormality in mice, such as Ppef2 [58] (Fig. 2C and Fig. S19). In addition to the previously mentioned issues regarding the comprehensiveness of phenotyping, it remains to be determined if the absence of an eye phenotype in knockout mice of some of these genes is due to genetic compensation mechanisms activated by a gene deletion approach [67]. For some of the candidate genes, their human orthologs were found to be located in genomic regions where inherited eye diseases have been mapped to (see Table 3) [68], [69], [70], [71], [72], [73], [74], [75], [76]. However, each of these mapped regions harbor multiple genes. Our study thus could assist researchers to pin down the causal genes underlying these diseases.
Table 3

Candidate genes whose human ortholog were found to be located in the genomic regions mapped by congenital eye diseases in humans.

Gene symbolMapped congenital eye diseases (OMIM ID)References
High confidence genes
Kcnj14Cataract 35 (OMIM: 609376)[68]
Ppm1nCataract 35 (OMIM: 609376)[68]



Medium confidence genes
AdalMicrophthalmia with coloboma 2(OMIM: 605738)[69], [70]
Ccdc126Dominant cystoid macular dystrophy (OMIM: 153880)[72]
Dhx32Cone-rod dystrophy 17 (OMIM: 615163)[73]
Frmpd2Usher syndrome, type IK (OMIM: 614990)[74]
Lyg2Glaucoma 1B, primary open angle, adult onset; (OMIM: 606689)[75]
OtorGlaucoma 1 K, primary open angle, juvenile-onset (OMIM 608696)[76]
Wdr17Retinitis pigmentosa 29 (OMIM: 612165)[71]
Candidate genes whose human ortholog were found to be located in the genomic regions mapped by congenital eye diseases in humans. It should be noted that except for Ankrd33 (Fig. 3), a more detailed phenotyping had not been performed for the rest of 12 genes whose predicted functions were validated by observing the gross phenotype of developing eyes in the zebrafish system (Fig. 2). In the future, a more careful examinations of the consequences of gain- or loss-of-functions mutations of these genes, especially in mammals, would be desired. In addition to the identification of eye candidate genes, our combined transcriptome-phenome approach could be used to identify genes related with functions of non-eye tissues. With improvements and greater availability of both expression and phenotype data in mouse tissues in the future, utilization of our presented approach is expected to accelerate the elucidation of functional profiles of a substantial proportion of mammalian coding genes, and potentially noncoding genes when their target information becomes available.

Funding

This work was supported by intramural funding from the National Health Research Institutes and research grants from the Ministry of Science and Technology [grant numbers 104-2311-B-400-002-MY3, 105-2314-B-400-021-MY4] to B.-Y.L.

Author contributions

Y-H.C. and B-Y.L. conceived the study. C-Y.C. and B-Y.L. designed computational experiments. Y-H.C., M.-D.L. and B-Y.L. designed and supervised validation experiments and data analyses. C-Y.C. and T-Y.C. performed bioinformatic analyses. C-Y.C., L-S.H., Y-S.Y., P-Y.K, I.M. performed/assisted validation experiments. C-Y.C., Y-H.C., T-Y.C., L-S.H., M.-D.L. and B-Y.L. analyzed data. C-Y.C., Y-H.C., M.-D.L. and B-Y.L. wrote the manuscript.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  4 in total

Review 1.  Coiled-Coil Domain-Containing (CCDC) Proteins: Functional Roles in General and Male Reproductive Physiology.

Authors:  Patra Priyadarshini Priyanka; Suresh Yenugu
Journal:  Reprod Sci       Date:  2021-05-03       Impact factor: 3.060

2.  Comparison of anadromous and landlocked Atlantic salmon genomes reveals signatures of parallel and relaxed selection across the Northern Hemisphere.

Authors:  Erik Kjærner-Semb; Rolf B Edvardsen; Fernando Ayllon; Petra Vogelsang; Tomasz Furmanek; Carl Johan Rubin; Alexey E Veselov; Tom Ole Nilsen; Stephen D McCormick; Craig R Primmer; Anna Wargelius
Journal:  Evol Appl       Date:  2020-09-23       Impact factor: 5.183

3.  Control of Directed Cell Migration after Tubular Cell Injury by Nucleotide Signaling.

Authors:  Sabrina Gessler; Clara Guthmann; Vera Schuler; Miriam Lilienkamp; Gerd Walz; Toma Antonov Yakulov
Journal:  Int J Mol Sci       Date:  2022-07-17       Impact factor: 6.208

4.  Vision-related convergent gene losses reveal SERPINE3's unknown role in the eye.

Authors:  Henrike Indrischek; Juliane Hammer; Anja Machate; Nikolai Hecker; Bogdan Kirilenko; Juliana Roscito; Stefan Hans; Caren Norden; Michael Brand; Michael Hiller
Journal:  Elife       Date:  2022-06-21       Impact factor: 8.713

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.