Peter Rotwein1. 1. Department of Biomedical Sciences, Paul L. Foster School of Medicine, Texas Tech Health University Health Sciences Center, El Paso, Texas.
Abstract
Repulsive guidance molecules, RGMA, RGMB, and RGMC, are related proteins discovered independently through different experimental paradigms. They are encoded by single copy genes in mammalian and other vertebrate genomes, and are ~50% identical in amino acid sequence. The importance of RGM actions in human physiology has not been realized, as most research has focused on non-human models, although mutations in RGMC are the cause of the severe iron storage disorder, juvenile hemochromatosis. Here I show that repositories of human genomic and population genetic data can be used as starting points for discovery and for developing new testable hypotheses about each of these paralogs in human biology and disease susceptibility. Information was extracted, aggregated, and analyzed from the Ensembl and UCSC Genome Browsers, the Exome Aggregation Consortium, the Genotype-Tissue Expression project portal, the cBio portal for Cancer Genomics, and the National Cancer Institute Genomic Data Commons data site. Results identify extensive variation in gene expression patterns, substantial alternative RNA splicing, and possible missense alterations and other modifications in the coding regions of each of the three genes, with many putative mutations being detected in individuals with different types of cancers. Moreover, selected amino acid substitutions are highly prevalent in the world population, with minor allele frequencies of up to 37% for RGMA and up to 8% for RGMB. These results indicate that protein sequence variation is common in the human RGM family, and raises the possibility that individual variants will have a significant population impact on human physiology and/or disease predisposition.
Repulsive guidance molecules, RGMA, RGMB, and RGMC, are related proteins discovered independently through different experimental paradigms. They are encoded by single copy genes in mammalian and other vertebrate genomes, and are ~50% identical in amino acid sequence. The importance of RGM actions in human physiology has not been realized, as most research has focused on non-human models, although mutations in RGMC are the cause of the severe iron storage disorder, juvenile hemochromatosis. Here I show that repositories of human genomic and population genetic data can be used as starting points for discovery and for developing new testable hypotheses about each of these paralogs in human biology and disease susceptibility. Information was extracted, aggregated, and analyzed from the Ensembl and UCSC Genome Browsers, the Exome Aggregation Consortium, the Genotype-Tissue Expression project portal, the cBio portal for Cancer Genomics, and the National Cancer Institute Genomic Data Commons data site. Results identify extensive variation in gene expression patterns, substantial alternative RNA splicing, and possible missense alterations and other modifications in the coding regions of each of the three genes, with many putative mutations being detected in individuals with different types of cancers. Moreover, selected amino acid substitutions are highly prevalent in the world population, with minor allele frequencies of up to 37% for RGMA and up to 8% for RGMB. These results indicate that protein sequence variation is common in the humanRGM family, and raises the possibility that individual variants will have a significant population impact on human physiology and/or disease predisposition.
The repulsive guidance molecule (RGM) family consists of three members, RGMA, RGMB, and RGMC (also known as HFE2 and HJV) (Monnier et al. 2002; Kuninger et al. 2004; Niederkofler et al. 2004; Papanikolaou et al. 2004; Samad et al. 2004; Schmidtmer and Engelkamp 2004), that are encoded by single‐copy genes in human and other vertebrate genomes (Severyn et al. 2009). The family received its name from a then‐novel axonal guidance molecule termed RGM that was characterized in 2002 (Monnier et al. 2002). Subsequent studies identified two related proteins in mammals, termed RGMB and RGMC (Papanikolaou et al. 2004; Samad et al. 2004; Schmidtmer and Engelkamp 2004), and fourth member in teleosts, called RGMD (Corradini et al. 2009; Siebold et al. 2017). The original RGM is now named RGMA (Corradini et al. 2009; Severyn et al. 2009; Siebold et al. 2017).RGMA and RGMB have been shown to be expressed in the central nervous system during development (Schmidtmer and Engelkamp 2004), and their discoveries indicated that they were involved in controlling axonal patterning and neuronal survival (Monnier et al. 2002; Matsunaga et al. 2004; Niederkofler et al. 2004; Rajagopalan et al. 2004; Samad et al. 2004). In contrast, RGMC was initially characterized through its gene, which was found within a locus that was linked to a severe form of an iron storage disease that primarily affects children, termed juvenile hemochromatosis (Papanikolaou et al. 2004). The gene was termed HFE2 after HFE (high iron [chemical symbol Fe]), the initial gene whose mutations were found in hemochromatosis (Papanikolaou et al. 2004). The encoded protein, RGMC, is also called hemojuvelin (HJV), because of its relationship with juvenile hemochromatosis (Papanikolaou et al. 2004). Unlike RGMA and RGMB, RGMC/HFE2/HJV is produced in the liver and in cardiac and skeletal muscle, and not within the nervous system (Kuninger et al. 2004; Papanikolaou et al. 2004; Schmidtmer and Engelkamp 2004).RGMA, RGMB, and RGMC are glycosylphosphatidylinositol (GPI) ‐linked cell membrane‐associated glycoproteins (Corradini et al. 2009; Severyn et al. 2009; Siebold et al. 2017), and the paralogs share ~50% amino acid identity and several structural motifs, including 14 cysteine residues in comparable locations within the three proteins (Corradini et al. 2009; Severyn et al. 2009; Siebold et al. 2017). All three RGMs also appear to undergo a series of similar biosynthetic and processing steps leading to both cell‐associated and soluble protein species (Babitt et al. 2005; Samad et al. 2005; Kuninger et al. 2006). All three proteins also interact with members of the bone morphogenetic protein (BMP) family, where they function as co‐receptors (Core et al. 2014). BMPs are members of the transforming growth factor‐β (TGF‐β) super‐family, and play key roles in different developmental and cell fate decisions (Hata and Chen 2016; Morikawa et al. 2016; Siebold et al. 2017). BMPs bind as dimers to specific type I and type II serine/threonine kinase receptors, and initiate a protein kinase cascade which culminates in the activation by serine phosphorylation of Smads 1, 5, and 8, signal transducers and transcription factors that regulate the expression of many BMP‐dependent target genes (Hata and Chen 2016; Morikawa et al. 2016).RGM proteins also bind to the cell surface trans‐membrane molecule, neogenin (Matsunaga et al. 2004, 2006; Rajagopalan et al. 2004; Kuns‐Hashimoto et al. 2008; Yang et al. 2008), a member of the netrin‐binding, deleted in colon cancer family, which also includes DCC and UNC5 (Keino‐Masu et al. 1996; Leonardo et al. 1997; Mehlen and Mazelin 2003; Bernet and Mehlen 2007). The actions of RGMA in both neuronal guidance and neuronal survival are mediated by neogenin (Matsunaga et al. 2004; Conrad et al. 2007). The other RGM proteins also can bind to neogenin, but it does not appear to play the predominant role in their biological actions (Kuns‐Hashimoto et al. 2008; Xia et al. 2008; Yang et al. 2008; Corradini et al. 2009; Siebold et al. 2017).Major recent advances in human genetics and genomics now present distinct opportunities for improving our knowledge of human physiology and disease susceptibility, and for gaining new insights into human variation, human origins, and evolution (Acuna‐Hidalgo et al. 2016; Katsanis 2016; Quintana‐Murci 2016; Battle et al. 2017; eGTEx Project, 2017). Here I use the RGM family to show how to understand and integrate this information, by accessing publically available genomic and gene expression repositories to examine humanRGM genes in detail. Results reveal extensive variation in gene expression patterns, substantial alternative RNA splicing, and a range of possible missense alterations and other modifications in the coding regions of each of the three genes. Taken together, these observations will provide new opportunities to define the dynamics and range of RGM actions in different physiological and pathological contexts, and will serve as a template and guide that can be applied to other gene families in humans and other species.
Methods
Databases and analyses
Information on humanRGMA, RGMB, and RGMC (HFE2/HJV) loci and genes was obtained from the Ensembl (www.ensemble.org) and UCSC Genome Browsers (https://genome.ucsc.edu), by searching genome assembly, GRCh38, with each gene name. The different classes of transcripts for each gene were also derived from the Ensembl and UCSC browsers. Data on levels of RGMA, RGMB, and RGMC mRNAs in human tissues were extracted from the Genotype‐Tissue Expression project (GTEx) portal (Battle et al. 2017) (https://www.gtexportal.org/) by searching the “transcriptome” menu with the name of each gene. Relative levels of specific mRNA isoforms were calculated from primary information within the “exon expression” sub‐menu of GTEx. HumanRGMA, RGMB, and RGMC protein sequences were isolated from the National Center for Biotechnology Information (NCBI) Consensus CDS Protein Set (https://www.ncbi.nlm.nih.gov/CCDS/). Information on predicted population variation in these three proteins was obtained from the Exome Aggregation Consortium (ExAc) genome browser (http://exac.broadinstitute.org/), by examining the primary data from each gene after it was downloaded as a series of CSV files. ExAc contains results of sequencing the exons of 60,706 individuals (Karczewski et al. 2017). Data on predicted alterations in RGMA, RGMB, and RGMC proteins in different cancers were extracted from the cBio portal for Cancer Genomics (http://www.cbioportal.org/), which lists gene alterations from 65,690 different individuals from 225 cancer studies (Cerami et al. 2012; Gao et al. 2013), and from the National Cancer Institute Genomic Data Commons data portal (https://portal.gdc.cancer.gov/), which contains analogous information on 32,555 cancer cases.
Results
Topography of human RGM loci
The three single‐copy humanRGM genes reside on different autosomes. Three other genes are found within the 300 kb segment of chromosome 15q26.1 containing RGMA (Fig. 1A), and the locus is conserved with both mouse and chicken genomes (Severyn et al. 2009). RGMB on human chromosome 5q15 is also a part of a chromosomal region with conserved synteny with mouse and chicken genomes (Severyn et al. 2009), but only two genes, CHD1 and DDX18P4, are found within the 300 kb region depicted in Figure 1B. Of note, the paralogous relationship between adjacent genes on both chromosomes, CHD2 and RGMA on chromosome 15 and CHD1 and RGMB on chromosome 5, and their shared convergent transcriptional orientation indicates that these loci were generated by segmental duplication (Severyn et al. 2009). By contrast with the chromosomal regions of RGMA or RGMB, the RGMC/HFE2/HJV locus on human chromosome 1q21.1 is far more gene dense (Fig. 1C), and contains nine other genes that also are present in the orthologous mouse locus (Severyn et al. 2009).
Figure 1
Organization of human loci. (A) Map showing the human locus on chromosome 15q26.1. Genes include long intergenic non‐protein coding (), clone‐based (Ensembl) gene , chromo‐domain helicase (), and . (B) Illustration of the human locus on chromosome 5q15. Genes include chromo‐domain helicase (), , and (). (C) Map showing the human / locus on chromosome 1q21.1. Genes include the following: ankryin repeat domain 35 (), integrin subunit alpha 10 (), peroxisomal biogenesis factor 11 beta (), RNA binding motif protein 8A (), limb and CNS expressed 1 like (), ankryin repeat domain 34A (), RNA polymerase III subunit G like (), thioredoxin interacting protein (), /, NBPF member 10 (). For A–C, the scale bar represents 20 kb, and a horizontal arrow indicates the direction of transcription for each gene.
Organization of human loci. (A) Map showing the human locus on chromosome 15q26.1. Genes include long intergenic non‐protein coding (), clone‐based (Ensembl) gene , chromo‐domain helicase (), and . (B) Illustration of the human locus on chromosome 5q15. Genes include chromo‐domain helicase (), , and (). (C) Map showing the human / locus on chromosome 1q21.1. Genes include the following: ankryin repeat domain 35 (), integrin subunit alpha 10 (), peroxisomal biogenesis factor 11 beta (), RNA binding motif protein 8A (), limb and CNS expressed 1 like (), ankryin repeat domain 34A (), RNA polymerase III subunit G like (), thioredoxin interacting protein (), /, NBPF member 10 (). For A–C, the scale bar represents 20 kb, and a horizontal arrow indicates the direction of transcription for each gene.
RGM gene structures and expression patterns in human tissues
The humanRGMA gene spans ~45 kb of chromosomal DNA and consists of seven exons that are used in the vast majority of transcripts reported within the human Genotype‐Tissue Expression project (GTEx) (Battle et al. 2017; eGTEx Project, 2017) (Fig. 2). The five predominant RGMA mRNA isoforms described in GTEx consist of either 3 or 4 exons, and encode one of three very similar RGMA proteins of 434, 450, or 458 amino acids, with all differences being located at the NH2‐termini of the proteins (Fig. 2B). RGMA mRNAs are expressed in 48 of the 51 different human organs and tissues found in the GTEx portal. The 10 organs and tissues with the highest abundance of RGMA mRNAs include esophagus, colon, skeletal muscle, uterus, tibial nerve, testes, ovary, several brain regions, and adipose tissue (range of expression from 139 to 32 transcripts per kilobase million reads in order (TPM; Fig. 2C). By contrast, glyceraldehyde 3‐phosphate dehydrogenase (GAPDH), a typical “control” transcript in gene expression studies was 10–70‐times more abundant than RGMA in the organs and tissues examined here (Fig. 2C). The vast majority of RGMA mRNAs found in human tissues according GTEx comprised of isoforms 1 or 2 (~93–99% of all transcripts; Fig. 2D). In contrast, the major RGMA protein in the Exome Aggregation Consortium (ExAC) gene dataset is predicted to have 458 amino acids, which is encoded by isoform 5 in Figure 2B. This mRNA is expressed minimally in the ten human tissues catalogued by GTEx and presented here (Fig. 2D, and see below).
Figure 2
Human gene structure and expression. (A) Schematic of the human gene, illustrating exons 1–7, and ATG and TAG codons. Exons are represented as boxes, with coding regions in black and non‐coding segments in white, and introns as horizontal lines. A scale bar is shown. (B) Diagrams of the four major classes of human
mRNAs represent the following transcripts from the Ensembl genome browser: isoform 1, ENST00000543599.5; isoform 2, ENST00000329082.11; isoform 3, ENST00000542321.6; and isoform 4, ENST00000425933.6. The protein encoded by each transcript is listed to the right of each diagram. (C) RGMA gene expression in 10 different human tissues and organs. Data were obtained from the GTEx portal, and are graphed as the mean number of transcripts per kilobase million reads (TPM), with the mean transcript abundance of glyceraldehyde 3‐phosphate dehydrogenase (GAPDH) listed to the right of each
RNA level. The number of samples for each organ and tissue is as follows: esophagus (370), sigmoid colon (233), skeletal muscle (564), uterus (111), tibial nerve (414), testes (259), ovary (133), substantia nigra (88), cerebral cortex (158) and subcutaneous adipose tissue (42). (D) Relative expression of
mRNAs in 10 different human organs and tissues (%) from the GTEx database. Transcripts 1–5 are illustrated in part B above, and mRNAs 6–14 are found in GTEx.
Human gene structure and expression. (A) Schematic of the human gene, illustrating exons 1–7, and ATG and TAG codons. Exons are represented as boxes, with coding regions in black and non‐coding segments in white, and introns as horizontal lines. A scale bar is shown. (B) Diagrams of the four major classes of human
mRNAs represent the following transcripts from the Ensembl genome browser: isoform 1, ENST00000543599.5; isoform 2, ENST00000329082.11; isoform 3, ENST00000542321.6; and isoform 4, ENST00000425933.6. The protein encoded by each transcript is listed to the right of each diagram. (C) RGMA gene expression in 10 different human tissues and organs. Data were obtained from the GTEx portal, and are graphed as the mean number of transcripts per kilobase million reads (TPM), with the mean transcript abundance of glyceraldehyde 3‐phosphate dehydrogenase (GAPDH) listed to the right of each
RNA level. The number of samples for each organ and tissue is as follows: esophagus (370), sigmoid colon (233), skeletal muscle (564), uterus (111), tibial nerve (414), testes (259), ovary (133), substantia nigra (88), cerebral cortex (158) and subcutaneous adipose tissue (42). (D) Relative expression of
mRNAs in 10 different human organs and tissues (%) from the GTEx database. Transcripts 1–5 are illustrated in part B above, and mRNAs 6–14 are found in GTEx.The humanRGMB gene is slightly more compact than RGMA, and its five major exons extend over ~26 kb of genomic DNA (Fig. 3A). The three predominant transcripts in GTEx also are derived by alternative RNA splicing, but only two of these mRNAs appear to encode RGMB proteins of either 478 or 437 amino acids (Fig. 3B). RGMB transcripts are expressed in 49 of the 51 different organs and tissues found in GTEx; except for esophagus, mRNA levels are 2–3‐fold lower than for RGMA mRNAs (compare Figs. 3C, 2C). The majority of expressed RGMB mRNAs encode RGMB proteins, primarily the 478 amino acid species (isoform 1, Fig. 3D).
Figure 3
Human gene structure and expression. (A) Schematic of the human gene, depicting exons 1–5, and ATG and TAG codons. Exons are represented as boxes, with coding regions in black and non‐coding segments in white, and introns as horizontal lines. A scale bar is shown. (B) Diagrams of the three major classes of human
mRNAs represent the following transcripts from the Ensembl genome browser: isoform 1, ENST00000308234.11; isoform 2, ENST00000513185.1; and isoform 3, ENST00000434027.2. The protein encoded by each transcript is listed to the right of each diagram. (C) gene expression in 10 different human tissues and organs. Data were obtained from the GTEx portal, and are graphed as TPM. Mean transcript abundance of GAPDH is listed to the right of each
RNA level. (D) Relative expression of
mRNAs in 10 different human organs and tissues (%) in the GTEx database. Transcripts 1–3 are illustrated in part B above, and mRNAs 4–7 are found in GTEx.
Human gene structure and expression. (A) Schematic of the human gene, depicting exons 1–5, and ATG and TAG codons. Exons are represented as boxes, with coding regions in black and non‐coding segments in white, and introns as horizontal lines. A scale bar is shown. (B) Diagrams of the three major classes of human
mRNAs represent the following transcripts from the Ensembl genome browser: isoform 1, ENST00000308234.11; isoform 2, ENST00000513185.1; and isoform 3, ENST00000434027.2. The protein encoded by each transcript is listed to the right of each diagram. (C) gene expression in 10 different human tissues and organs. Data were obtained from the GTEx portal, and are graphed as TPM. Mean transcript abundance of GAPDH is listed to the right of each
RNA level. (D) Relative expression of
mRNAs in 10 different human organs and tissues (%) in the GTEx database. Transcripts 1–3 are illustrated in part B above, and mRNAs 4–7 are found in GTEx.HumanRGMC/HFE2/HJV at ~4.5 kb in length is substantially smaller than either RGMA or RGMB, and is composed of four exons and three introns (Fig. 4A). There are five major transcripts expressed in human organs and tissues, and they encode proteins of variable lengths, from 93 to 426 amino acids (Fig. 4B). Unlike RGMA or RGMB, RGMC/HFE2/HJV mRNAs can be detected only in human skeletal muscle, liver, and heart, and were found at steady‐state levels that were 9–25‐fold less abundant than GAPDH (Fig. 4C). Perhaps surprisingly, the RGMC/HFE2/HJV mRNA that encodes the full‐length 426‐residue RGMC/HJV protein (isoform 2) comprises only 10–20% of transcripts in human organs and tissues according to GTEx (Fig. 4D). The reasons for the low level of gene expression for isoform 2 are unknown, but could reflect differential RNA stability, or the technical conditions under which the tissues were obtained and RNA samples isolated and processed (see Discussion).
Figure 4
Human / gene structure and expression. (A) Schematic of the human / gene, illustrating exons 1‐4, and ATG and TAA codons. Exons are depicted as boxes, with coding regions in black and non‐coding segments in white, and introns as horizontal lines. A scale bar is shown. (B) Diagrams of the five major classes of human / transcripts represent the following mRNAs from the Ensembl genome browser, respectively: isoform 1, ENST00000497365.5; isoform 2, ENST00000357836.5; isoform 3, ENST00000336751.10; isoform 4, ENST00000475797.1; and isoform 5, ENST00000421822.2. The protein encoded by each transcript is listed to the right of each diagram. (C) / gene expression in different human tissues and organs. Data were obtained from the GTEx portal, and are graphed as TPM. Mean transcript abundance of GAPDH is listed to the right of each /
RNA level. The number of samples for each organ and tissue is as follows: skeletal muscle (564), liver (175), atrial appendage (297), and left ventricle (303). (D) Relative expression of /
mRNAs in different human organs and tissues (%) found in the GTEx database.
Human / gene structure and expression. (A) Schematic of the human / gene, illustrating exons 1‐4, and ATG and TAA codons. Exons are depicted as boxes, with coding regions in black and non‐coding segments in white, and introns as horizontal lines. A scale bar is shown. (B) Diagrams of the five major classes of human / transcripts represent the following mRNAs from the Ensembl genome browser, respectively: isoform 1, ENST00000497365.5; isoform 2, ENST00000357836.5; isoform 3, ENST00000336751.10; isoform 4, ENST00000475797.1; and isoform 5, ENST00000421822.2. The protein encoded by each transcript is listed to the right of each diagram. (C) / gene expression in different human tissues and organs. Data were obtained from the GTEx portal, and are graphed as TPM. Mean transcript abundance of GAPDH is listed to the right of each /
RNA level. The number of samples for each organ and tissue is as follows: skeletal muscle (564), liver (175), atrial appendage (297), and left ventricle (303). (D) Relative expression of /
mRNAs in different human organs and tissues (%) found in the GTEx database.
Predicted variation in RGM proteins in human populations
ExAC contains DNA sequence information from the exons of genes from 60,706 people representing different population groups from around the world (Bahcall 2016; Lek et al. 2016; Ruderfer et al. 2016; Karczewski et al. 2017). The data have revealed substantial variation within the coding regions of genes in this large population, but also showed that most alterations were uncommon, as the majority was detected in a single allele, and over 99% were found in <1% of the study group (Lek et al. 2016). Most of this previously described variation consists of synonymous nucleotide changes and amino acid substitutions (Lek et al. 2016).Examination of RGM family members in ExAC revealed a wide range of potential alterations in their exons, with most of the predicted changes consisting of missense mutations (92–96% of modified alleles, depending on the gene, Table 1). Second most common were changes in the reading frame, including inserted stop codons (1–7%, Table 1). Overall, the total number of different allelic variants per gene was similar for all RGM family members, and ranged from 143 for RGMB to 185 for RGMA, but their population frequency varied by a factor of 60, from 1.4% for RGMC/HFE2/HJV to 86% for RGMA, with the vast majority of changes being accounted for just a few modifications (Fig. 5). As 99.1% of missense alleles were detected in ≤1% of the ExAC study population, overall results regarding the frequency of differences in the humanRGM family proteins are consistent with the general conclusions from ExAC (Lek et al. 2016), with the exception of the few highly prevalent allelic variants depicted in Figure 5.
Table 1
Human population variation in RGMA, RGMB, and RGMC
Protein
Number of codons1
Missense and in‐frame insertions‐deletions
Frame shifts; stop codons
Splicing site changes
Loss of start codon
Loss of stop codon
Total number of different changes
Variants occurring once
Total variant alleles in population
RGMA
458
178
2
5
0
0
185
96
86.0%
RGMB
478
134
6
3
0
0
143
78
9.1%
RGMC
426
157
12
1
1
0
171
109
1.4%
Based on transcripts used in ExAC database. All RGMB and RGMC variants mapped to the 478 or 426 codons, respectively, corresponding to a full‐length protein. For RGMA, 17 variants were not counted as they mapped to a transcript corresponding to a smaller predicted protein of 61 residues that undergoes rapid decay.
Figure 5
Population variation in human RGM proteins. (A–C) The three human RGM protein precursors are composed of four identifiable regions, termed the signal peptide (SP), N‐terminal RGM (N‐RGM domain), C‐terminal RGM (C‐RGM segment), which includes a partial von Willebrand factor type D domain (vWFD), and the glycosylphosphatidylinositol recognition sequence (GPI), which is cleaved as part of the biosynthetic steps leading to glycosylphosphatidylinositol addition to the maturing protein (Lebreton et al. 2018). The scale bar represents 100 amino acids. The location of the GDPH autocatalytic cleavage site (Siebold et al. 2017) is listed above each diagram. (A) Human RGMA highlighted by ExAC consists of a 458‐residue protein. The overall population prevalence of variant alleles for each segment of the protein is listed below the map, and the most common variants are illustrated in single letter amino acid code. (B) The human RGMB found in ExAC consists of 478‐amino acids. The population prevalence of variant alleles for each segment of the protein is listed below the map, and the most common variants are depicted in single letter amino acid code. (C) Human RGMC/HJV consists of a 426‐residue protein in ExAC. The overall population prevalence of variant alleles for each segment of the protein is listed below the map, and the most common variants are shown in single letter amino acid code.
Human population variation in RGMA, RGMB, and RGMCBased on transcripts used in ExAC database. All RGMB and RGMC variants mapped to the 478 or 426 codons, respectively, corresponding to a full‐length protein. For RGMA, 17 variants were not counted as they mapped to a transcript corresponding to a smaller predicted protein of 61 residues that undergoes rapid decay.Population variation in humanRGM proteins. (A–C) The three humanRGM protein precursors are composed of four identifiable regions, termed the signal peptide (SP), N‐terminal RGM (N‐RGM domain), C‐terminal RGM (C‐RGM segment), which includes a partial von Willebrand factor type D domain (vWFD), and the glycosylphosphatidylinositol recognition sequence (GPI), which is cleaved as part of the biosynthetic steps leading to glycosylphosphatidylinositol addition to the maturing protein (Lebreton et al. 2018). The scale bar represents 100 amino acids. The location of the GDPH autocatalytic cleavage site (Siebold et al. 2017) is listed above each diagram. (A) HumanRGMA highlighted by ExAC consists of a 458‐residue protein. The overall population prevalence of variant alleles for each segment of the protein is listed below the map, and the most common variants are illustrated in single letter amino acid code. (B) The humanRGMB found in ExAC consists of 478‐amino acids. The population prevalence of variant alleles for each segment of the protein is listed below the map, and the most common variants are depicted in single letter amino acid code. (C) HumanRGMC/HJV consists of a 426‐residue protein in ExAC. The overall population prevalence of variant alleles for each segment of the protein is listed below the map, and the most common variants are shown in single letter amino acid code.
Population variation in RGMA
Alterations in RGMA have not been linked to date with the pathogenesis of any specific human diseases. Thus, the functional consequences of three prevalent specific amino acid substitutions in the RGMA protein (Leu4 to Pro in the signal peptide (8.5% in the population), the conservative substitution of Asp423 to Glu in the C‐terminal RGM domain (63.1%), and Ala439 to Val in the GPI‐anchor segment (11.9%), Fig. 5A) in either human physiology or pathology are not known.As with some other proteins, a large number of alterations in RGMA have been found to be associated with a variety of different cancers, according to the analysis of data in the cBio portal for Cancer Genomics (Table 2) and the National Cancer Institute Genomic Data Commons portal, although the functional consequences are unknown. Potential mutations at 78 different locations in RGMA coding exons have been detected in 38 different neoplasms, with the prevalence of these changes ranging from 3.5% in ovarian cancer and 2.8% in esophageal, gastric, and small cell lung cancer, and in soft tissue sarcoma, to <0.3% in prostate and renal carcinoma, various leukemias and lymphomas, and others (see cancer type in: http://www.cbioportal.org/index.do?session_id=5b5f49be498eb8b3d5672991). The vast majority of alterations consisted of amino acid substitutions (76 different modifications at 71 different sites; Table 2), of which 43 at 32 locations were present in the ExAC population, although generally at low frequency (Table 2). However, three of the cancer‐associated amino acid substitutions were among the more common allelic variants in the population (Leu4 to Pro, 8.5% of ExAC alleles, Ala439 to Val, 11.9%, and Arg441 to Trp, 0.6%; Fig. 5A), and thus may have been detected by chance rather than through disease association. However, the most prevalent allele in ExAC, Asp423 to Glu, seen in 63.1% of the population (Fig. 5A), was absent in any of the cancer studies compiled here (Table 2). Other changes associated with different neoplasms included premature stop codons and frame‐shifts, none of which were found in ExAC (seven examples, Table 2).
Table 2
Cancer‐associated mutations in RGMA1
Mutation
Population variant
ExAC prevalence
L4P
L4P
10396 alleles
L16V
None
–
R21P
None
–
M27I
None
–
G30E
None
–
S34stop
None
–
F38L
F38S
1 allele
P40S
None
–
A43D
A43V
1 allele
F44L
F44C
3 alleles
P55L
P55L
15 alleles
G71C
None
–
D79G
D79N, D79Y
1, 1 allele
P81L
None
–
R88H
R88H
1 allele
R95L, R95Q, R95W
R95Q
7 alleles
R96W
R96Q, R96W
3, 6 alleles
T97M
None
–
D104N
None
–
H108N
None
–
S119T
None
–
R133C
R133G, R133H
5, 8 alleles
R135H
R135C, R135H
4, 5 alleles
P139L
P239S, P139T
21, 143 alleles
E145Q
None
–
E151K
E151K
1 allele
E156K
None
–
P164S
None
–
H170Tfs34stop
None
–
G175R
G175R
1 allele
T181I
None
–
T189I
None
–
P196L
None
–
N204S
N204K
1 allele
P211H
None
–
A217V
A217V
3 alleles
Q231R
None
–
E233A, E233K
None
–
P248S
P248L
1 allele
V252L
V252M
5 alleles
K256stop
None
–
S266N
None
–
E271K, E271stop
None
–
A281D
None
–
V290M
None
–
R292C
None
–
V302A, V302I
V302I
13 alleles
M304I
None
–
V309A
None
–
E313K
None
–
W315L
W315G
1 allele
R325W
R325Q, R325W
1, 4 alleles
G326V
None
–
G346S
G346S
1 allele
R348H
R348C, R348H
15, 42 alleles
L350M
None
–
A353T
A353T
2 alleles
P357S
None
–
E3613Rfs361
None
–
V369M
V369L, V369M
6, 1 alleles
C372stop
C372R, C372Y
1, 1 alleles
V378A
V378M
1 allele
E379stop
None
–
V387I
None
–
D389N
None
–
D395N
None
–
V396M
None
–
A402T
None
–
V404M
None
–
A405V
None
–
L406F
None
–
L412I
None
–
P429L
None
–
A432V
A432V
4 alleles
A439V
A439G, A439G
1, 14451 alleles
R441T
R441Q, R441W
34, 668 alleles
P442A
P442L
1 allele
A446T
A446P, A446T
1, 7 alleles
Amino acid positions modified to agree with ExAC assignments (see Text).
Cancer‐associated mutations in RGMA1Amino acid positions modified to agree with ExAC assignments (see Text).
Population variation in RGMB
Changes in RGMB also have not been connected to the pathogenesis of any human diseases to date. As with RGMA, the functional consequences to human physiology or pathology of the single predicted single amino acid substitution in RGMB that is prevalent in the ExAC population (Ser63 to Arg in the signal peptide (7.8%), Fig. 5B) are unknown.Changes in RGMB also have been detected in a number of different cancers (Table 3), but as with RGMA, the possible functional impacts are not known. Potential mutations have been identified at 69 different locations in coding portions of the RGMB gene in 38 cancer studies, with the prevalence of these changes ranging from nearly 10% in prostate cancer and 7.5% in adrenocortical carcinoma to 0.3% or less in cervical, thyroid, bone, skin, and brain cancers, in leukemias and lymphomas, and in other neoplasms (see cancer type in: http://www.cbioportal.org/index.do?session_id=5b609381498eb8b3d5672df4). Most of the alterations consisted of amino acid substitutions (73 different modifications; Table 3), of which 26 at 19 sites were identified in ExAC at a frequency of 0.1–0.001% (Table 3), except for the highly prevalent Ser63 to Arg allele at 7.8% (Fig. 5B). The other 18 changes, which included both premature stop codons and frame‐shifts, which led to stop codons, were not found in ExAC (Table 3).
Table 3
Cancer‐associated mutations in RGMB1
Mutation
Population variant
ExAC prevalence
R49Kfs51stop
None
–
S63R
S63R
9454 alleles
Q90stop
None
–
A93Nfs14stop
A93T
1 allele
Q94H
None
–
R96Q
R96Q, R96stop
132, 1 alleles
S104R
None
–
V105E
None
–
H110P
None
–
E120stop
None
–
E121Vfs34stop
E121A, E121stop
1, 1 alleles
R127C
R127H
2 alleles
R135Q
R135Q
1 allele
C140Y
None
–
R141H
R141C, R141H
2, 11 alleles
N143S
None
–
V145L, V145_Y105insL
None
–
H147D
None
–
L151F
None
–
L156H
None
–
Q159H
None
–
R160M
None
–
G166stop
None
–
H183Qfs35stop
None
–
E190Tfs28stop
None
–
L206F
None
–
L230I
None
–
N233D
N233T
1 allele
N234del
None
–
V243I
V243E, V243I
8, 4 alleles
P244H
None
–
G248E
None
–
X257_splice
A263G
None
–
C267Y
None
–
T268K
None
–
Y273stop
None
–
A283T
A283T
1 allele
G287D
None
–
G292R, G292V
None
–
R300H
R300C, R300H
4, 4 alleles
V302L, V302M
V302M
8 alleles
G307A
None
–
A314P
A314T, A314V
1, 2 alleles
R328C, R328H
R328C
1 allele
R335C
None
–
A341V
None
–
Q351K
None
–
E361K
None
–
L392Gfs9stop
None
–
Q398E
None
–
E401Q
None
–
P404S
None
–
Y409D
Y409C
1 allele
F415V
None
–
T420N
None
–
F425L
None
–
A428T
A428P, A428T
2, 5 alleles
L433Gfs9stop
None
–
E3434stop,
None
–
A438V
None
–
K443N
None
–
S451N
None
–
N454Kfs9stop
N454S
3 alleles
T456I
None
–
R458H
R458H
4 alleles
L463stop
L463F
2 alleles
T470Nfs33stop
None
–
L478stop
None
–
Amino acid positions modified to agree with ExAC assignments (see Text).
Cancer‐associated mutations in RGMB1Amino acid positions modified to agree with ExAC assignments (see Text).
Disease links and population variation in RGMC/HFE2/HJV
Unlike other members of the humanRGM family, RGMC/HFE2/HJV was first characterized as the gene associated with the severe iron storage disease, juvenile hemochromatosis (Papanikolaou et al. 2004), and identification of mutations in the gene in affected individuals defined causality (Lanzara et al. 2004; Papanikolaou et al. 2004; Gehrke et al. 2005), which was confirmed by mouse gene knockout models (Huang et al. 2005; Niederkofler et al. 2005). The majority of over 40 different mutations that have been found in the individuals with juvenile hemochromatosis are amino acid substitutions, but more than a third predict truncated proteins because of introduced premature stop codons (Table 4). Almost half of these disease‐associated alleles can be found in the ExAC population, but nearly all are present at very low prevalences of 0.025–0.001% (Table 4). The only exception, Ala310 to Gly, is the most common RGMC/HFE2/HJV variant in ExAC, and has a population frequency of 0.7% (Fig. 5C).
Table 4
Juvenile hemochromatosis‐linked mutations in RGMC/HJV
Mutation
Population variant
ExAC prevalence
Q6H
Q6H
3 alleles
L27fs51stop
None
–
R54stop
None
–
G66stop
None
–
V74fs113stop
None
–
C80R
C80R
1 allele
S85P
None
–
G99R, G99V
None
–
L101P
L101P
1 allele
C119F
None
–
R131fs245stop
R131W
1 allele
D149fs245stop
None
–
L165stop
L165stop
1 allele
A168D
A168V
1 allele
F170S
None
–
D172E
D172E
1 allele
R176C
None
–
W191C
None
–
L194P
L194P
1 allele
N196K
None
–
S205R
None
–
I222N
I222M, I222N
1, 1 alleles
K234stop
None
–
D249H
None
–
G250V
None
–
N269fs311stop
N269S
1 allele
I281T
None
–
R288W, R288Y
R288Q, R288W
1, 2 alleles
E302K
E302D, E302K
1, 32 alleles
A310G
A310G
846 alleles
Q312stop
None
–
G319fs341stop
G319A
3 alleles
G320V
G320V, G320W
21, 2 alleles
C321W, C321stop
C321W, C321Y, C321stop
2, 1, 1 alleles
R326stop
R326Q, R326stop
5, 2 alleles
S328fs337stop
S328T
1 allele
R335Q
R335Q, R335W
9, 1 alleles
C361fs366stop
None
–
N372D
N372D, N372H
1, 1 alleles
R385stop
R385G, R385Q, R385stop
1, 2, 1 alleles
Juvenile hemochromatosis‐linked mutations in RGMC/HJVPotential alterations in RGMC/HFE2/HJV also are present in different cancers, but as with RGMA and RGMB, the possible functional consequences have not been determined. Predicted mutations (116, Table 5) have been identified at 102 different codons in 38 different neoplastic diseases, with the prevalence of these alterations ranging from 25% in prostate cancer, 10% in ovarian cancer, and 8.4% in melanoma, to 0.6% or less in colorectal carcinoma, salivary gland and renal cancer, leukemia, lymphomas, and others (see http://www.cbioportal.org/index.do?session_id=5b60fc90498eb8b3d5672fba). Putative amino acid substitutions or deletions predominated (106 different modifications at 92 locations; Table 5). Only 11 of these alterations were present in ExAC, with 9 having allelic frequencies of <0.002% (Table 5), and the others, a deletion or a duplication of Gly69, at 0.06 or 0.13%, respectively (Table 5). The other 10 changes consisted of premature stop codons and frame‐shifts, and except for Arg385 to stop codon were not found in ExAC (Table 5).
Table 5
Cancer‐associated mutations in RGMC/HJV
Mutation
Population variant
ExAC prevalence
G2V
None
–
P8L, P8S, P8T
None
–
G15D
None
–
L20I
None
–
T22N
None
–
L25I
None
–
L27M
None
–
L28I
None
–
L29I
None
–
S35F
None
–
I39T
None
–
R41C
R41C, R41L
1, 1 allele
V47L
None
–
A61E
None
–
G67E
None
–
G69del
G69del, G69dup
76, 154 alleles
Y86S
None
–
A94T
None
–
R95H
R95G
1 allele
D100E, D100N
D100H
1 allele
F103L
None
–
S105Ffs45stop
None
–
I110V
I110M
1 allele
D112Y
None
–
M114I
None
–
I115L, I115M
None
–
Q116K
None
–
N118Y
N118S
1 allele
Q122K
None
–
P129S
P129L
9 alleles
P133L
None
–
P136S
None
–
G141D
None
–
A144V
A144S, A144T
1, 1 allele
E151K
None
–
G159D
G159S
2 alleles
R160C, R160H
None
–
F164del
None
–
R176C, R176H
None
–
N196H, N196S
None
–
S206F
None
–
M208Wfs38stop
M208V, M208T, M208W
1, 1, 1 allele
A209V
None
–
L210S
None
–
T215I
T215A
1 allele
R218Q, R218W
None
–
T221S
None
–
K225N
None
–
M227T
None
–
I231V
I231T
2 alleles
E239Q
E239G
4 alleles
L243F
None
–
D249G
None
–
S251Y
None
–
G260E, G260R
None
–
S261Ifs9stop
None
–
S262G
None
–
L263F
None
–
S264L
S264L
2 alleles
Q266stop
None
–
N269K
N269S
1 allele
Y280N, Y280Hfs25stop, Y280Hfs31stop
None
–
R288Q
R288Q, R288W
1, 2 alleles
A305V
None
–
A307S
None
–
D313N
D313N
2 alleles
C317W
None
–
C321Vfs21stop
C321Y, C321W, C321stop
1, 2, 1 alleles
P323L
None
–
R329Q
R329L, R329P, R329Q, R329stop
2, 2, 2, 1 alleles
S330L
None
–
E331D
E331Q
1 allele
R332H
R332C, R332H
1, 5 alleles
N333K
N333S
1 allele
R334H
R334H
7 alleles
T339S
T339N
1 allele
I340T
None
–
R345W
R345Q, R345W
3, 1 alleles
K348N, K348R
None
–
E349K
None
–
S360F
None
–
S368Y
None
–
P371S
P371L
1 allele
F373C
None
–
A376E
None
–
A379T
A379E
1 allele
R385stop
R385G, R385Q, R385stop
1, 2, 1 alleles
L396F
None
–
P398L, P398S
None
–
D400V
None
–
A401V
None
–
G402E
None
–
V403A
V403I
1 allele
S406F
None
–
L415F, L415H
None
–
S416F, S416Y
S416P
2 alleles
L421M
None
–
W422stop
W422C
1 allele
L423I
None
–
I425T
None
–
Q426stop
None
–
Cancer‐associated mutations in RGMC/HJV
Discussion
Information extracted from publically available databases has been collected and then analyzed here to gain insights into the genomics and population genetics of the RGM family in humans. Results identify extensive variation in gene expression patterns, substantial alternative RNA splicing, and a range of possible missense alterations and other modifications in the coding regions of each of the three genes studied, which were not apparent previously, and in many cases are detected in individuals with different types of cancers (Tables 2, 3, 5). In addition, the data show that selected amino acid substitutions are highly prevalent in the world's population, with minor allele frequencies of up to 37% for RGMA and up to 8% for RGMB (Fig. 5). Collectively, these results indicate that protein sequence variation is common in the humanRGM family, as has been observed for some other human proteins (Rotwein 2017a,b), and it thus appears likely that these variants could have a significant population impact on human physiology and/or disease predisposition.
RGMA and RGMB: genes, mRNAs, and proteins
By combining information from the Ensembl and UCSC Genome Browsers with data extracted from GTEx, complex patterns of expression have been elucidated here for each humanRGM gene, particularly in the distribution of different mRNA isoforms (Figs. 2, 3, 4). For example, these results now demonstrate that both RGMA and RGMB genes are widely expressed in many different adult human organs and tissues, with most of the transcripts encoding one of the several “full‐length” proteins, as differences among these isoforms are found primarily at the NH2‐terminus in the presumptive signal peptides (Figs. 2, 3). Although a few studies have examined possible effects of RGMA or RGMB in humans (Demicheva et al. 2015; Shi et al. 2015; Li et al. 2016; Muller et al. 2016), most publications to date have focused on experimental model systems (Matsunaga et al. 2004, 2006; Niederkofler et al. 2004; Rajagopalan et al. 2004; Samad et al. 2004; Hata et al. 2006; Tanabe and Yamashita 2014). Thus, these new observations will provide opportunities to develop new insights into RGMA and RGMB gene regulation and their protein functions in a variety of human physiological and pathological processes. Of particular note here is the fact that according to GTEx both RGMA and RGMB are expressed at similarly high transcript levels in the muscularis region of the esophagus, and within the gastro‐esophageal junction (Figs. 2C, 3C, and not shown), raising the question of whether either or both proteins might be involved in aspects of smooth muscle function, such as its coordination by the sympathetic and parasympathetic nervous systems or other signals during swallowing or digestion of food (Woodland et al. 2013). As mRNAs encoding neogenin (NEO1) and BMP receptors (BMPR1A, BMPR1B, and BMPR2) also are expressed in these parts of the esophagus, it is conceivable that different RGM‐mediated signaling pathways could be active in different parts of this organ.Another surprising observation with regard to RGMA and RGMB is their expression in a range of different cancers, with transcripts encoding mutant proteins being detected in up to 10% of cases of prostate cancer (RGMB) and in 3.5% of ovarian carcinomas (RGMA, see Results), again providing evidence for their unexplored roles in human disease. As the majority of these predicted mutations were found to be rare in the general population used in ExAC (although nearly all of the most highly prevalent amino acid substitution alleles were present; see Tables 2 and 3), these data argue for possible pathophysiological actions for RGMA and RGMB in humanneoplasms, and represent another illustration in which focused analysis of information extracted from large‐scale databases can help identify new areas of investigation with possible biomedical consequences.
The special case of RGMC/HFE2/HJV
Data collected and assessed from Ensembl, the UCSC Genome Browser, and GTEx also have revealed some unexpected aspects of humanRGMC/HFE2/HJV gene expression (Fig. 4). Even though restriction of transcripts to skeletal muscle, liver, and heart had been recognized previously (Kuninger et al. 2004; Papanikolaou et al. 2004; Schmidtmer and Engelkamp 2004), remarkably it now appears that only ~20% of RGMC/HFE2/HJV mRNAs found in human tissues encode the 426‐amino acid full‐length protein (Fig. 4D). The other mRNAs, which comprise the vast majority of transcripts in each tissue type (80 to 90%, Fig. 4D), encode proteins that are truncated at the NH2‐terminus. These latter species lack most of the N‐RGM domain (313‐residue isoform), all of the N‐RGM segment and the entire von Willebrand factor type D domain (200‐amino acid protein), or all but 93‐amino acids in the center of the molecule (Figs. 4B, 5C). The observations also raise questions regarding which of these variant RGMC/HJV proteins are biologically active molecules, and what are their presumptive activities. In animal and cell‐based studies, several different‐length versions of RGMC/HJV have been noted, but these have been characterized as being derived from differential protein processing during biosynthesis, and from proteolytic cleavage of the mature GPI‐linked cell surface molecule either by pro‐protein convertases such as furin (Kuninger et al. 2006, 2008; Silvestri et al. 2008a), or by the serine protease, matriptase‐2 (Silvestri et al. 2008b). Thus, these new observations, which have resulted from analyses of information in databases, define a potentially novel and alternative way that different RGMC/HJV protein isoforms are produced in humans.Unlike what is observed for RGMA and RGMB, presumptive RGMC/HJV protein variants within the ExAC population are very uncommon, collectively occurring in <1.5% of 60,706 genomes versus 86% for RGMA and 9% for RGMB (Fig. 5). Moreover, even though 17 of 43 amino acid substitution, frame‐shift, and stop codon mutations associated with juvenile hemochromatosis have been found in the ExAC study cohort, only a single disease‐associated allele is present in more than 0.025% of the population (Ala310 to Gly, at ~0.7%), and 13 are represented just 1–3 times in the 121,412 ExAC alleles (Table 4). This result suggests that any possible contribution of RGMC/HFE2/HJV heterozygosity toward iron overload in the general population is minimal, in marked contrast to the high prevalence of HFE protein variants, at least in European‐derived groups (Barton et al. 2015; Wallace and Subramaniam 2016).As seen for RGMA and RGMB, predicted mutations of RGMC/HJV are found in many different cancers, with transcripts encoding mutant proteins being detected in 25% of prostate cancers, 10% of ovarian carcinomas, and 8.4% of melanomas (see Results). Remarkably, both prostate and ovarian cancers are the diseases in which mutant RGMB and RGMA molecules also have been found at highest prevalence, respectively (see Results and above). Moreover, only ~10% of the 106 different mutations in RGMC/HJV detected in cancers are present in ExAC, with all but one of them being rare (found fewer than 5 times) in the 121,412 alleles studied (Table 5).
Limitations and strengths of population‐based sequence data for understanding RGM actions
As with any large‐scale DNA or RNA‐based sequencing project, ExAC and GTEx respectively contain the potential materials for new biological and biomedical applications, as well as errors and ambiguities. From the perspective of the three RGM family genes, potential problems include the choice of minor transcripts as the reference sequences for proteins. This is especially true for RGMA, in which the mRNA species encoding the 458‐amino acid protein isoform selected by ExAC (see Table 1) appears to comprise ≤2% of transcripts in human organs and tissues in GTEx (isoform 5, Fig. 2D). In contrast, for RGMB, the predominant transcript in 9 of the 10 tissues surveyed in GTEx encodes the major 478‐residue protein species (all but testes, Fig. 3D). Another complication here is the potential variation in RNA quality in GTEx samples, especially since both the time from tissue harvesting to RNA extraction and the methods employed to isolate RNA are unknown. It thus seems possible that transcript degradation may skew the results seen in GTEx RNA‐sequencing libraries derived from at least some of the different organs and tissues. Furthermore, as the population distribution of the GTEx dataset is unknown, there are no data to determine whether or not expression of different mRNA isoforms varies among different groups, perhaps in conjunction with population‐specific DNA polymorphisms (Khera et al. 2018; Yengo et al. 2018). Other limitations that could contribute to problems in data interpretation include the potential non‐representative nature of the ExAC study population, as over 60% of samples are derived from European individuals, with ~20% from South or East Asians, and only ~8% each from Hispanic or African groups (Lek et al. 2016). Thus, the actual rate and potential extent of variation among RGM proteins has not been established fully yet, and could change once exome sequencing data are obtained from more individuals and are expanded to include larger numbers of people from different human population groups. Moreover, there is an undefined but probable error rate associated with nucleotide changes that appear only once or just a few times in the 121,412 ExAC chromosomes studied.Despite these challenges and difficulties, the data in ExAC, GTEx, and in the various cancer medicine portals examined here, provide potentially exciting new opportunities to evaluate contributes of the RGM family, and RGMA and RGMB in particular, to human physiology and disease. Since RGMA and RGMB are expressed in the vast majority of adult human organs and tissues (48 of 51 for RGMA and 49 of 51 for RGMB), the encoded proteins are likely to be involved in some regulatory processes. Perhaps immune cell function is in one of these areas, since RGMA is expressed in dendritic cells and neogenin is found in CD4 + T lymphocytes (Muramatsu et al. 2011).Modern human populations represent the outcomes of many interactions over long time frames with different ancestral groups. Not only do the DNA marks in our genomes derived from extinct populations such as Neanderthals, Denisovans, and others document these past relationships (Jones et al. 2015; Vattathil and Akey 2015; Clarkson et al. 2017; Hublin et al. 2017), but some of the introgressed DNA continues to influence human physiology or disease susceptibility to the present day (Dannemann and Kelso 2017; Prufer et al. 2017). Opportunities abound to use the data in ExAC, GTEx, and other large‐scale population‐based repositories such as the British Biobank (Khera et al. 2018; Yengo et al. 2018) as the springboard toward developing novel and medically important research questions with high biological and biomedical significance.
Conflict of Interest
The author has no perceived or potential conflict of interest, financial or otherwise.
Authors: Tarek A Samad; Anuradha Rebbapragada; Esther Bell; Ying Zhang; Yisrael Sidis; Sung-Jin Jeong; Jason A Campagna; Stephen Perusini; David A Fabrizio; Alan L Schneyer; Herbert Y Lin; Ali H Brivanlou; Liliana Attisano; Clifford J Woolf Journal: J Biol Chem Date: 2005-01-25 Impact factor: 5.157
Authors: Jean-Jacques Hublin; Abdelouahed Ben-Ncer; Shara E Bailey; Sarah E Freidline; Simon Neubauer; Matthew M Skinner; Inga Bergmann; Adeline Le Cabec; Stefano Benazzi; Katerina Harvati; Philipp Gunz Journal: Nature Date: 2017-06-07 Impact factor: 49.962
Authors: Franklin W Huang; Jack L Pinkus; Geraldine S Pinkus; Mark D Fleming; Nancy C Andrews Journal: J Clin Invest Date: 2005-08 Impact factor: 14.808
Authors: Douglas M Ruderfer; Tymor Hamamsy; Monkol Lek; Konrad J Karczewski; David Kavanagh; Kaitlin E Samocha; Mark J Daly; Daniel G MacArthur; Menachem Fromer; Shaun M Purcell Journal: Nat Genet Date: 2016-08-17 Impact factor: 38.330
Authors: Amit V Khera; Mark Chaffin; Krishna G Aragam; Mary E Haas; Carolina Roselli; Seung Hoan Choi; Pradeep Natarajan; Eric S Lander; Steven A Lubitz; Patrick T Ellinor; Sekar Kathiresan Journal: Nat Genet Date: 2018-08-13 Impact factor: 38.330