Literature DB >> 26340305

Genetic Susceptibility to Rhodococcus equi.

C M McQueen¹, S V Dindot², M J Foster³, N D Cohen¹.

Abstract

Rhodococcus equi pneumonia is a major cause of morbidity and mortality in neonatal foals. Much effort has been made to identify preventative measures and new treatments for R. equi with limited success. With a growing focus in the medical community on understanding the genetic basis of disease susceptibility, investigators have begun to evaluate the interaction of the genetics of the foal with R. equi. This review describes past efforts to understand the genetic basis underlying R. equi susceptibility and tolerance. It also highlights the genetic technology available to study horses and describes the use of this technology in investigating R. equi. This review provides readers with a foundational understanding of candidate gene approaches, single nucleotide polymorphism-based, and copy number variant-based genome-wide association studies, and next generation sequencing (both DNA and RNA).

Entities: Chemical Disease Gene Mutation Species

Keywords: Copy number variants; Genome-wide association study; Horses; Pneumonia; Rhodococcus equi

Mesh：

Year: 2015 PMID： 26340305 PMCID： PMC4895676 DOI： 10.1111/jvim.13616

Source DB: PubMed Journal: J Vet Intern Med ISSN： 0891-6640 Impact factor: 3.333

array comparative genomic hybridization base‐pairs complementary DNA colony forming units copy number variant fork stalling and template switching genome‐wide association study interleukin kilobases linkage disequilibrium microhomology‐mediated end‐joining non‐allelic homologous recombination next‐generation sequencing non‐homologous end joining solute carrier family 11 member 1 single nucleotide polymorphism tracheobronchial aspirate transient receptor protein potential cation channel, subfamily M, member 2 transferrin gene virulence‐associated protein A Rhodococcus equi pneumonia is an important disease of foals most commonly characterized by chronic progression associated with development of large pulmonary abscesses.1 Treatment of R. equi pneumonia is prolonged and expensive, and prevention is limited because transfusion of hyperimmune plasma is incompletely effective,2 chemoprophylaxis is inconsistently effective3, 4 and may promote antimicrobial resistance,5 and no effective vaccine is currently available.6 Isolates of R. equi that are virulent in foals express the virulence‐associated protein A (VapA), which is encoded by a gene, located within a pathogenicity island, on an approximately 85‐ to 90‐kilobase (kb) plasmid. Expression of VapA alone, however, is not sufficient to cause disease.7, 8 Many different strains of virulent (and avirulent) R. equi have been shown to be present in a common environment (ie, the same farm), and multiple genotypic virulent strains may exist even in an individual foal with R. equi pneumonia.5, 9, 10, 11 Although exposure to R. equi is widespread in farms where foals are affected, only a variable proportion of the foals will develop clinical disease at a given farm, whereas other foals at the same location will not develop disease.12, 13 In addition, anecdotal reports by veterinarians indicate that some mares recurrently have affected foals, whereas other mares from the same environment consistently have foals that do not develop R. equi pneumonia. Taken together, these findings support the possibility of an important role for a genetic predisposition (ie, susceptibility, resistance, or tolerance) to the development of R. equi pneumonia and have prompted investigations of the genetic basis for this disease. Pneumonia caused by R. equi is a complex trait. Thus, it is unlikely that it will have a monogenic basis. Nevertheless, studying the genetic basis of R. equi pneumonia is important because it could reveal information about crucial biological processes and pathways that influence the outcome of infection in foals, and identifying these pathways and processes might consequently lead to novel approaches for diagnostic testing, treatment, control, and prevention. The purpose of this report is to review what is known about the genetic predisposition to pneumonia caused by R. equi in foals and to describe the genetic techniques currently available to study the genetic determinants for the development of R. equi pneumonia or other diseases in horses. We begin by summarizing the current literature regarding genetic associations with R. equi pneumonia, and then discuss some more advanced genetic tools available for future studies to further investigate the genetic basis of R. equi pneumonia.

Literature Search

A literature search was conducted to identify English language studies from any year that focused on foals, R. equi, and genetics. Databases were searched in April 2014 through Ovid including CAB Abstracts, MEDLINE, Embase, and BIOSIS. Search words included (foals or equus or equine) and (r equi or Rhodococcus equi*) and gene*, where the asterisk indicates truncation. Within each database, appropriate subject headings or index terms also were added. A total of 744 articles were retrieved and deduplicated, with 5 articles selected for inclusion. This search was updated in September 2014.

Candidate Genes

We identified 5 studies that attempted to identify genes associated with R. equi pneumonia. Four of these 5 studies have utilized a candidate gene approach (Table 1). The candidate gene approach involves either use of prior knowledge pertaining to known gene functions that might predispose to the disease of interest (eg, the interferon‐gamma pathway and R. equi pneumonia)14, 15 or use of genes implicated in other but similar diseases that could be potential candidates for involvement(eg, genes important for Mycobacterium tuberculosis which, like R. equi, is a gram‐positive, facultative intracellular organism that replicates primarily within macrophages and causes pneumonia and could be potential candidates for R. equi pneumonia).16, 17, 18 To the authors' knowledge, the first candidate gene association study of R. equi pneumonia compared the frequencies of single nucleotide polymorphisms (SNPs) in the transferrin gene (Tf) among Thoroughbred foals from Kentucky that died of R. equi pneumonia with those of control Thoroughbred mares.19 In one study, the Tf gene was selected on the basis of its product's ability to bind iron because iron sequestration is a known host defense mechanism against bacterial replication.20, 21 The authors postulated that polymorphisms in the Tf gene might result in enhanced (or decreased) iron binding, which then could confer a selective advantage (or disadvantage) to survive infections with bacteria such as R. equi.19 The authors used SNP frequencies to infer Tf alleles present within the study population, and allele frequencies were subsequently compared between the case and control groups. The authors documented a significant (P < 0.05) abundance of the Tf F allele and a deficiency of the D allele among the cases (diseased foals) when compared with controls. Limitations of this study included the fact that the sample size was relatively small, it was restricted to a single breed, a separate population for validation was not included, and no mechanistic studies (ie, documentation that the F allele was associated with decreased iron‐binding) were incorporated or cited. Nonetheless, a significant association of SNPs in the Tf gene with R. equi pneumonia was demonstrated, and this finding represented an important advance in knowledge.

Table 1

Genetic studies of Rhodococcus equi pneumonia in foals

Author	Study design	Country	Breed(s)	Number of horses	Markers investigated	Observed outcome	Findings
Mousel et al.19	Candidate gene	United States	Thoroughbred	N = 84	Tf SNPs	Clinical pneumonia or control	Allelic association of Tf with disease
Horin et al.22	Candidate gene	Czech Republic	Thoroughbred	N = 51	SNPs, Microsatellites	Burden of R. equi in TBA fluid	Association of IL1RN and IL1β with burden of R. equi
Halbert et al.29	Candidate gene	United States	Arabian and Thoroughbred	N = 103	SLC11A1 SNPs	Clinical pneumonia or control	Variation in SLC11A1 associated with disease
Horin et al.30	Candidate gene	Czech Republic	Thoroughbred	N = 51	SNPs	Burden of R. equi in TBA fluid	Association of IL7R with burden of R. equi
McQueen et al.52	GWAS	United States	Quarter Horse	N = 72 N = 248	Genome‐wide SNPs	Clinical and Subclinical pneumonia, or control	Associated SNP in TRPM2 with disease

Genetic studies of Rhodococcus equi pneumonia in foals A later study seeking to identify a genetic predisposition to R. equi pneumonia utilized the candidate gene approach by comparing frequencies of 22 genetic markers among 51 Thoroughbred foals from the Czech Republic.22 These markers were either SNPs or polymorphic microsatellites in or near immune‐related genes that had been previously identified (except for 5 markers that were first identified in this study). No genetic variants were significantly associated with R. equi pneumonia, but some genetic variants were significantly associated with a higher burden of R. equi in tracheobronchial aspirate (TBA) fluid from foals. Specifically, loci on chromosome 10 and 15 were associated with R. equi infection when comparing the subset of foals with extreme phenotypes (ie, foals with the highest numbers of R. equi in TBA fluid) to those with no R. equi. The strongest association with TBA fluid phenotype was for the microsatellite HMS01 located on chromosome 15 which encodes the genes for interleukin (IL)‐1β (IL1β) and the IL‐1 receptor antagonist (IL1RN). Although the associations were relatively weak and the phenotype was for burden of R. equi in TBA fluid (rather than for pneumonia caused by R. equi), these results offer further evidence of a genetic basis for host response to infection with R. equi. A third study from our laboratory utilized previous findings indicating association between the solute carrier family 11 member 1 gene (SLC11A1) and susceptibility to intracellular bacterial infections in other species of animals.23, 24, 25, 26 The SLC11A1 gene encodes a protein relevant to innate immune responses to intracellular bacteria.27, 28 Direct sequencing of the beginning of the gene transcript (ie, the 5′ end of the gene) was used to identify SNPs that were compared between cases of R. equi and unaffected foals (controls) among Arabian horses at 2 farms (1 farm in Texas and 1 farm in Arizona). A novel SNP, ‐57T, in the 5′ untranslated region (UTR) was significantly associated with R. equi pneumonia.29 The authors further screened for this polymorphism in 5 domestic horse breeds, donkeys, and zebras, and found that it was represented in 4 of the 5 horse breeds. The observation that this SNP was represented across multiple breeds strengthened the study's findings because if a marker were present in only 1 breed, it would be possible (if not probable) that the identified marker was more likely associated with breed differences than disease. Limitations of this study included the fact that association between the candidate gene and disease was only assessed within a single breed at 2 farms, and no validation of the association in another population was conducted. In addition, inconsistencies in diagnostic practices for R. equi pneumonia among farms in the study existed, which might have impacted the results.29 A fourth candidate gene study investigated the frequency of SNPs in selected immune response genes from DNA samples collected from 31 Thoroughbred foals from the Czech Republic30 that had been used in a previous candidate gene study (described above).22 The candidate markers were used to asses allele frequencies between the groups of foals classified on the basis of a binary outcome using a cut‐point of >5,000 colony forming units (CFU)/mL of vapA‐positive R. equi in TBA fluid. Twenty‐five foals were categorized as below the cut‐point because they had no bacteria cultured from them, and 6 were categorized as above the cut‐point.30 An association was identified between a SNP in the IL‐7 receptor (IL7R) gene and the presence of >5,000 CFU of R. equi in TBA fluid. Limitations of this study included lack of a validation population in which the association could be replicated, and, similar to the earlier study using these foals, the association was not made between the marker and disease but rather between the marker and the concentrations of bacteria present in TBA fluid. Regardless of these limitations, this study was scientifically important in that it implicated the IL7R gene in particular, and innate immunity in general, as having a role in host response to infection with R. equi. The candidate gene approach is a valid method for genetic investigation and yielded positive associations in the aforementioned studies, strengthening the plausibility of a genetic contribution to susceptibility or resistance to R. equi pneumonia in foals. Moreover, the commonality of identifying innate immune responses as playing a role in host defense against infection with R. equi in these various candidate gene association studies is important to our understanding of the pathogenesis of R. equi pneumonia. Despite these positive results, the candidate gene approach has important limitations for making genetic associations. Bias is introduced into the study design by selecting a small number of genes for evaluation, on the basis of either function of the gene product(s) or prior association of the gene with disease. This selection process effectively eliminates the ability of the investigators to examine both the enormous amount of genetic information in the remainder of the genome or the relationship and interaction of other genes with the genes of interest.17 Other genetic elements present in the genome (eg, sites critical to gene regulation) are missed by restricting analysis to candidate genes, because in most cases probes used to detect variation are not near each other and offer no information about neighboring genetic variation. Assessing variation across the genome circumvents these limitations of the candidate gene approach. Genome‐wide studies in horses are now feasible because of recent technological developments.

Genome‐Wide Association Studies

Genome‐wide association studies (GWASs) rapidly gained popularity after the sequencing of the genomes of several animal species, including human beings.31, 32, 33, 34 The completion of the sequencing and assembly of these reference genomes (an assembly of the DNA sequence and its chromosomal locations representing the genetic baseline of a species) provided a tool that could be used as a map indicating where elements of the genome reside. Substantial efforts were then made to catalog the locations of the genes and genetic variation identified within these species.35, 36 Single nucleotide polymorphisms proved useful for characterizing the genetic variation among individuals of a given species, and the development of SNP array technology made it possible to perform >1 million association tests simultaneously of markers across the entire genome without the expense or labor of genome sequencing. Single nucleotide polymorphism arrays are glass slides with genomic probes (sequences of DNA) that capture SNPs present within a given species. Through a hybridization process, the probes bind DNA of samples to identify which polymorphisms are present in that sample.37 These SNP arrays enabled clinical researchers to compare clinically affected horses with unaffected controls so as to examine the association of various health disorders with markers on a genome‐wide basis, and the interplay among different genetic variants associated with disease.38 Results from a GWAS are readily identifiable because they typically are visualized by plotting the negative logarithm of the P value for the association of a given SNP with the outcome of interest as the ordinate (vertical axis) and the chromosome number as the abscissa (horizontal axis). The resultant scatter plots are known as Manhattan plots because they resemble the skyline of a major city with some points that tower over the majority of others. Determining the genome sequence of the domestic horse led to the development of 2 equine SNP arrays that could be used for GWAS by researchers.32, 39 Currently, a single SNP array has been developed, well characterized, made commercially available, and utilized in numerous GWAS in horses. For example, the EquineSNP70 BeadChip Array1 contains approximately 74,000 SNPs positioned across the equine genome that can be simultaneously tested to identify their associations with a phenotype of interest.39, 40 Recently, a higher density SNP array with approximately 770,000 SNPs across the equine genome has been developed but has not been characterized in peer‐reviewed literature to date. Several GWAS in horses using SNP arrays and yielding positive associations have been reported.41, 42, 43, 44, 45, 46 Genome‐wide association studies rely on observing different frequencies of alleles (identified by SNPs) that segregate with a phenotype of interest. These associations have identified regions of interest (Fig 1A), which are further investigated to understand what elements (eg, genes, promoters, other variants),pathways or processes are associated with the phenotype.

Figure 1

Association studies, CNVs, and SNPs. (A) The colored blocks indicate different alleles or haplotypes present in the horse genome. These have been identified by either a CNV or a SNP but any type of genetic variation can be used to identify alleles. The boxed regions show a greater frequency of the orange allele in the cases compared to the controls. The increased frequency of this allele in the cases suggests that it may harbor a variant(s) causing or contributing to the associated phenotype. (B) CNV – A represents a single copy of a gene; CNV – B represents a duplication of the gene; and, CNV – C represents a deletion of the gene. These examples demonstrate how CNVs can affect a single gene and can be used to identify different alleles in a population. SNPs, represented as red bases, offer the ability to identify alleles because of their polymorphic nature. Either type of genetic variation can be used in a GWAS to identify disease‐associated alleles. The reason marker associations require region investigation is because of linkage disequilibrium (LD), which is defined as the nonrandom association of genetic information.47 Linkage disequilibrium is a phenomenon that allows for the prediction of the nongenotyped genetic information around a genotyped marker because of an assumption that the genetic material around a marker differs, and thus can be based on the allele represented by the marker (ie, SNP). The use of LD to make disease associations leverages inheritance patterns, selection, and evolution and is a fundamental concept underlying GWAS. The association of a marker, whether it is located in a gene or in a noncoding region of the genome, should only be treated as an indicator of the need to further investigate the area. An association of an SNP with disease neither indicates that the SNP is causally associated with the disease nor that the specific gene in which the SNP lies is the gene of interest. An SNP only indicates that there might be genetic variation in the area of the genome where the SNP is located. The size of the area of interest is largely described by the length of the LD (ie, the number of bases for which another gene or genetic element can be expected to be in LD with the marker). Using LD to assist in making associations is a powerful tool that is genome‐wide and efficient because not all markers across a genome must be tested to find an associated region, should one exist. The power of LD allows fewer markers to be present on an array, and hence decreases the number of necessary test corrections. Furthermore, the longer the LD of the species the fewer SNPs are necessary to identify significant associations. Estimates of LD for breeds of horses are markedly longer than those for humans.48, 49 Thus, one might expect to need fewer SNPs on an equine array to have the same discriminatory power as a human array or to have greater power in a GWAS for horses than humans for an array of the same size or density of SNPs. Although SNP arrays are proving to be a powerful tool for investigating the relationship of genetic and phenotypic variation, challenges exist with validating and reproducing results generated by SNP‐based GWAS. There are likely many contributing risk alleles for all complex traits and complex diseases such that no single allele can explain all of the phenotypic variation.50 This becomes problematic during replication using different breeds and populations because the markers identified might merely reflect breed differences, or the markers might represent different alleles conferring different levels of risk across breeds or populations. The number of association studies in equine genetics will only continue to increase and the equine research community will continue to face these challenges. Appropriate study designs, accurately defined and categorized phenotypes, and functional follow‐up assays will be essential to maximize the utility of GWAS results in future studies.51 The first report of a GWAS with R. equi pneumonia recently was published by our laboratory.52 The study53 population included 248 foals born in 2011 at a large Quarter Horse breeding farm. For a separate study characterizing the accuracy of screening tests for R. equi pneumonia, foals at the farm were examined by thoracic ultrasonography every 2 weeks beginning at 3 weeks of age and continuing through 19 weeks of age (or until weaned) to identify foals with areas of pulmonary consolidation or abscess formation attributed to R. equi infection. Farm personnel were blinded to the ultrasonographic findings and a separate team of individuals performed thoracic ultrasonography. Foals at the farm were classified as having R. equi pneumonia (N = 43; on the basis of clinical signs of pneumonia, isolation of virulent R. equi from the TBA fluid, cytologic evidence of sepsis in TBA fluid, and ultrasonographic evidence of pulmonary consolidation or abscess formation >1 cm in maximal diameter), no pneumonia (N = 49; on the basis of the absence of clinical signs of pneumonia and no ultrasonographic evidence of pulmonary consolidation or abscess formation >1 cm diameter), and subclinical pneumonia (N = 156; on the basis of absence of clinical signs of pneumonia with ultrasonographic evidence of pulmonary consolidation or abscess formation >1 cm diameter). From each of these 3 subpopulations of foals, a sample of 24 foals was randomly selected for genotyping using the EquineSNP70 BeadChip Array. Comparisons among the 3 groups identified a significant association of a region on chromosome 26 that included the gene for the transient receptor potential cation channel, subfamily M, member 2 (TRPM2). These results are notable because the TRPM2 gene is known to play a role in neutrophil function and recruitment. In a study using TRPM2 knock‐out mice and a model of ulcerative colitis, TRPM2‐deficient mice had less tissue damage at sites of inflammation than did wild‐type mice.54 The association of the TRPM2 was validated using polymerase chain reaction (PCR)‐based genotyping of the locus in the remaining 176 study foals that were not tested using the SNP array. The principal limitations of this study were that only a single breed at a single farm was studied, and that no association of the genotype with function of the TRPM2 gene‐product or associated signaling pathways was identified. Nonetheless, this study is interesting in that, consistent with previous candidate gene studies, a gene related to innate immunity was associated with R. equi pneumonia, and the study provides further evidence of the underlying genetic basis for R. equi pneumonia.

Copy Number Variants

Although the genetic determinants of phenotypic variation are largely dependent on the gene or genes and the manner in which they exert their effect (eg, altering the biochemical properties of a protein, changing the expression patterns or levels of messenger RNA), recent studies have implicated copy number variants (CNVs) as major determinants of phenotypic variation in humans and animals.55, 56, 57 As the name implies, CNVs are characterized by changes in the number of copies of DNA between at least 2 individuals (Fig 1B).58 Their sizes can range from hundreds to millions of base‐pairs (bps). Although they often are enriched in certain regions of the genome that predispose to their formation, CNVs have been detected throughout the genome, with many CNVs involving multiple genes, individual genes, or components of a single gene. Several mechanisms have been shown to cause the formation of CNVs. During meiosis in the germ cells, homologous chromosomes align with each other to exchange genetic information between the parental genomes. This process, called homologous recombination or crossing over, plays an instrumental role in expanding the genetic diversity of a population. In rare instances, however, the exchange of genetic material can occur between 2 different sites (non‐allelic homologous recombination [NAHR]), resulting in an unequal exchange of genetic material.59 Although NAHR often is the source of many disease‐causing CNVs, this process plays a key role in the formation of gene families and the birth of new genes. Naturallyoccurring DNA repair mechanisms also can delete or duplicate DNA. For example, the non‐homologous end joining (NHEJ) and microhomology‐mediated end‐joining (MMEJ) pathways are used to repair double‐stranded DNA breaks that occur in the genome. During the repair of the breaks, the NHEJ and MMEJ pathways either add or remove DNA to ligate the broken strands back together.60, 61, 62 Fork stalling and template switching (FoSTeS) is a mechanism used to circumvent a stalled replication complex during DNA synthesis. When this happens, the FoSTeS machinery identifies a similar sequence at a nearby replication site to re‐engage the stalled complex, leading either to a deletion or duplication of the circumvented segment of DNA.63 Microhomology‐mediated break‐induced repair is another mechanism believed to give rise to CNVs under stressed cellular conditions in which traditional break‐induced repair does not occur and therefore homologous sequences are identified to continue replication.57, 64 Overall, there are numerous pathways and processes that can lead to the formation of a CNV. Identification of CNVs across the genome has proven to be challenging because of the dependency on probe density to increase resolution and the physical limitation of the number of probes that can be placed on a single array. Array design technology continues to advance and undoubtedly will increase our ability to identify CNVs by enhancing genome resolution via probe density. Two studies in horses have used the EquineSNP70 BeadChip Array to search for the equine genome for CNVs.65, 66 The use of the equine SNP array to identify CNVs highlights the usefulness of the SNP array, but, there are limitations when SNP arrays are used solely for identifying CNVs. The probes present on SNP arrays are often evenly distributed across the genome, thus spanning large distances and allowing only for the identification of large CNVs. The SNP arrays also are not well suited for identifying CNVs in structurally complex regions (eg, gene families, segmentally duplicated regions). Probe design often is difficult in these regions, thus they are excluded from the array.67, 68, 69 Several studies have used technologies other than SNP arrays to identify CNVs in horses, such as next generation sequencing (NGS) and comparative genomic hybridization arrays (aCGH).52, 70, 71, 72, 73, 74 Arrays for CGH are designed by tiling oligonucleotide probes across the genome to which DNA of interest then can be hybridized for the identification of CNVs (Fig 2). Use of aCGH also has limitations for the identification of CNVs, principally related to probe placement and density. The currently published equine arrays are a whole genome tiling array (ie, probes tiled across the whole genome) and an exon tiling array (ie, probes tiled across noncoding and coding exons of genes).71, 72 Thus, these arrays only permit evaluation of CNVs within specific regions of the genome. The results of studies identifying CNVs by NGS are limited by variation in read‐depth (ie, the number of copies of sequences aligned to a specific area) across the genome and the size of the CNVs identified. Specifically, CNVs of lengths ranging from 197 bp to 3.5 Mb have been identified and confirmed using a CGH array designed to identify CNVs in genes of the equine genome.71 In a subsequent study using NGS, CNVs ranging in length from 3.74 kb to 4.84 Mb were identified.70 There is, however, a trade‐off when using either approach. Targeted arrays can identify smaller CNVs, but they are only able to identify CNVs within regions targeted on the array. Conversely, NGS can be used to identify CNVs throughout the entire genome, but NGS approaches to identifying CNVs are limited because of their bias toward larger CNV size. Although there are discrepancies among the approaches used to identify CNVs, the studies to date have identified CNVs in genes involved in similar pathways, such as sensory perception, signal transduction, and immune‐related pathways. Results from some CNV studies of horses also have found concordant results between aCGH and NGS whole genome sequencing in which CNVs of horses have been shown to be enriched in genes relating to sensory perception, signal transduction, and immune‐related functions.71, 72

Figure 2

Comparative genomic hybridization method to identify CNVs in horses. (A) Genomic DNA is isolated from subject horses (cases and controls) and a single reference horse. (B) Genomic DNA from the subject horses are independently labeled with a red dye and genomic DNA from a single reference horse is labeled with a green dye. (C) Labeled DNA from a single subject horse and the reference horse are mixed together at equal ratios and competitively hybridized onto a comparative genomic hybridization array. (D) Fluorescent image of array after hybridization of subject and reference DNA. The spots on the array represent individual oligonucleotides. Yellow spots reflect regions with equal DNA content, and red and green spots reflect unequal ratios of DNA between the subject and reference horse, respectively. (E) Plot of normalized log2 ratios of oligonucleotides on the array. The Y‐axis represents normalized log2 ratios of fluorescent signals for each spot on the array. The X‐axis represents the relative genomic coordinates of each oligonucleotide. For example, a log 2 ratio <1 and >‐1 (black dots) indicates equal DNA content between the subject and reference horses. A log2 ratio >1 and <‐1 indicates unequal DNA content between the subject and reference horses. Our laboratory conducted a CNV‐based GWAS by applying the aforementioned equine exon tilling array71 to the 72 foals studied in our SNP GWAS.48 Although similar lengths and numbers of CNVs were observed in these foals as in the previous report using this array, no CNVs were significantly associated with R. equi pneumonia in these foals. This finding does not preclude the possibility that CNVs contribute to susceptibility to R. equi pneumonia because only CNVs within exons were considered. The CNVs located within other elements such as promoters and silencers that were not detected by the array might influence the odds of foals developing R. equi pneumonia. Moreover, sample size was small, which might have limited our ability to detect anything less than very strong associations. Future efforts should include the design and implementation of adequately powered studies using tiling arrays focused at gene promotors, gene expression enhancers, and other regulatory elements that are both near and within genes. Much remains to be investigated to characterize CNVs in horses and to study the role of CNVs in susceptibility to R. equi foal pneumonia and other diseases of horses. Because CNVs represent a change in genetic content (ie, deletions and duplications), they may have the potential to greatly impact many phenotypes. A 4.6‐kb duplication in an intron has been associated with graying and melanomas in horses.75 A 16.1‐kb duplication has been shown to cause wrinkling of the skin in Shar‐Pei dogs.76 In humans, CNVs are believed to play critical roles in neurodevelopmental disorders, psychiatric disorders, and cancers.77, 78 A number of studies have described CNVs in cattle. Overall, the CNVs identified to date are enriched in genes related to immune function and sensory perception, which also has been observed in horses.79

Next Generation Sequencing Techniques

The invention of next generation sequencing (NGS) technologies has opened a new world of opportunities for understanding genetic variation and its role in disease pathogenesis. The NGS technology has enabled rapid sequencing of the genomes of individuals at a low cost and with maximum genome coverage.40 Before NGS, the gold standard for sequencing technology was automated Sanger sequencing.80 Sanger sequencing technology was developed in the late 1970s and later automated to increase throughput.81 Next generation sequencing technologies differ among companies, but they all share a principal advantage over Sanger sequencing in that they are capable of sequencing multiple DNA fragments (e, an entire genome) in a single sequencing reaction (versus sequencing small fragments piece‐by‐piece in multiple reactions).82 The opportunities provided by NGS technology are accompanied by the challenge of managing and analyzing datasets of enormous size. The ability of NGS to generate data has out‐paced the ability of the scientists to interpret it. Developments in bioinformatic and biostatistical software have facilitated our ability to visualize and make inferences from large datasets. Both DNA and RNA can be sequenced using NGS methods. Sequencing the genome (DNA) and transcriptome (RNA) offers 2 interrelated but distinct biological approaches. Genome sequencing using NGS can characterize all of the variants known to exist in gene sequences, including single base changes (SNPs), insertions and deletions, CNVs, and genetic variation in nongenomic regions. The first application of NGS for genome sequencing in horses yielded the genome sequence of a Quarter Horse mare.70 Sequencing of the genome, however, does not reflect which elements of the DNA are transcribed. Moreover, transcription generally should be considered at the level of a specific tissue or cell type because of intercellular variation in gene expression. Although the DNA sequence is common to all cells in an individual, the genes expressed vary among cells or organs of the same individual. Sequencing RNA yields a snapshot of the expressed genes of the tissue or cell type that cannot be identified by DNA sequencing. The process of sequencing RNA using NGS methods is termed RNA‐Seq; it may be applied either to total RNA (all forms of RNA) or specific types of RNA. Most commonly, RNA‐Seq is applied to messenger RNA (mRNA) to reflect which portions of the genome are being transcribed in the specimen. Arriving at RNA‐sequencing is a multistep process which first requires deciding from which tissue or body‐fluid RNA should be extracted to best answer the biological questions being asked. Briefly, isolated RNA is converted to complementary DNA (cDNA) in order to construct a library that represents all of the RNA isolated and to be sequenced (Fig 3). The representative libraries are then sequenced, generally in a paired‐end fashion. Paired‐end sequencing reads are generated by sequencing from both ends of a cDNA fragment (ie, from the 5′ end of both strands of the cDNA fragment). Paired‐end reads are extremely valuable because 2 complementary pieces of information have been generated about the same cDNA fragment, and this greatly increases the accuracy of mapping these RNA sequences back to their respective genes of the genome.

Figure 3

RNA‐Seq flowchart. Isolated RNA is converted to cDNA, a stable molecule, which can then be amplified and sequenced. Sequencing reads are then aligned to the genome assembly (sequence only) to identify their locations based on nucleotide matches. Mapping the reads to a gene annotation list will generate the number of sequencing reads that have aligned with a particular gene and are called counts. These counts at any particular gene are representative of the amount of gene expression in the sample and can be compared across horses to identify differentially expressed genes. RNA‐Sequencing is an invaluable tool for gaining insight into biologically relevant questions such as differences in gene expression by different alleles and gene expression of a target specimen under different biological or biochemical conditions. Several downstream RNA‐Seq processing and analysis programs can be used to identify differentially expressed genes, novel transcripts, and multiple isoforms of gene transcripts in order to find answers to biological or clinical questions.83, 84, 85 The conclusions inferred from these analyses can lead to identifying potential biological pathways and processes that can be targeted for the development of novel interventions, including treatments and preventative measures. Several studies have reported the application of RNA‐Seq in horses in an attempt to identify differentially expressed genes.86, 87, 88, 89, 90, 91, 92, 93, 94, 95 To the authors' knowledge, the first report of RNA‐Seq in horses was an effort to characterize the transcriptome and tissue‐specific expression profiles from 8 equine tissues.94 A subsequent study focused on characterizing gene expression by RNA‐Seq in immunologically active tissues.95 Investigators have used RNA‐Seq to characterize the expression profile of genes critical to the differentiation and regulation of cells during embryogenesis,90 and to characterize the expression and inferred function of RNAs in the equine sperm transcriptome.89 Several studies also have used RNA‐Seq in horses to identify differentially expressed genes when comparing blood, muscle (obtained by biopsy), or both before and after exercise or racing.86, 87, 92 These studies have successfully identified pathways involved in stress during and while recovering from exercise. Other's studies have sought to answer a more specific question such as identifying expression differences in the cartilage of the metacarpophalangeal joints of young and old horses in an attempt to shed light on genes involved in the development and aging of cartilage.93 Use of RNA‐Seq of hoof lamellar basal epithelial cells has been performed to identify cell‐signaling pathways indicative of the early stages of laminitis.91 Using RNA‐Seq, an association has been demonstrated of a long terminal repeat (a genetic element inserted in the past by conversion of viral RNA to cDNA and subsequently incorporated in the genome of the host) with congenital stationary night blindness and leopard spotting in horses.88 Our laboratory currently is analyzing RNA‐Seq data to identify differentially expressed genes of foals representing the 3 genotypes of the TRPM2 SNP identified in our SNP‐based GWAS to better understand the role of this (and possibly other) gene in susceptibility to R. equi pneumonia. We are also currently applying RNA‐Seq to leukocytes collected from healthy and R. equi‐affected foals to gain insights about gene expression of these immune‐related cells. These studies will further our understanding of R. equi pathogenesis and hopefully identify critical biological pathways and processes involved in disease development. The genetic basis of a common and complex disease such as R. equi pneumonia is likely polygenic. Gene expression profiling by RNA‐Seq, thus, will be an essential step in understanding the relationships and interactions of multiple genes with this disease. The identification of genes that are up‐ or down‐regulated after pathogen exposure can reveal host responses critical for defense against infection. When evidence of differential gene expression is identified by RNA‐Seq (or other methods), it then becomes necessary to understand the mechanistic cause driving the change in expression(ie, variation within regulatory elements, changes in epigenetic modifications, structural variation, post‐transcriptional and post‐translational modifications).

Conclusions

Research findings regarding genetic relationships with disease continue to substantiate that most common and complex diseases are not monogenic. This likely is true for R. equi pneumonia. The evidence to date, as summarized in this review, indicates that susceptibility to R. equi pneumonia is not controlled by a single gene. It is increasingly clear that both innate and adaptive immune responses as well as their interactions are critical for protecting foals against R. equi infection. Genetic association studies have specifically implicated innate immune responses, but innate immune responses are critical for orchestrating adaptive immune responses and it may be an oversimplification to dichotomize these responses. It is likely that there also are epigenetic factors involved in regulating the gene transcription of critical immune‐related genes, which adds further complexity to the pathogenesis of R. equi pneumonia in foals. Future proteomic studies also will be required to follow‐up on promising genetic findings as protein concentrations, structures, and interactions are critical to disease development.96 Proteomic studies may be able to answer critical questions such as protein concentrations in diseased and nondiseased foals and variable consequences related to protein concentrations and their interactions, which cannot be answered with molecular genetic techniques and sequencing. With more genotypic–phenotypic associations being identified in horses, it will be challenging to investigate the causal implications of genetic variants with functional assays. Mechanistic studies (eg, knock‐out or knock‐in genes) can be become very expensive and would not be feasible in horses. Developing rodent models of important equine diseases and use of mechanistic studies in cell culture assay will be required to understand the functional consequences of identified associations with genetic markers. Moreover, it will be important to remain mindful of the agent‐related and environmental factors that contribute to disease development. No single genetic tool or technique will identify the factors that render some foals susceptible to R. equi, whereas others in the same environment remain clinically unaffected. The future will require a multifaceted approach to integration and analysis of data from multiple sources to successfully identify the critical pathways and processes. We believe that molecular genetic and epigenetic methods will play an important role in solving the complex riddle of susceptibility to R. equi pneumonia in foals.

95 in total

Review 1. Candidate-gene approaches for studying complex genetic traits: practical considerations.

Authors: Holly K Tabor; Neil J Risch; Richard M Myers
Journal: Nat Rev Genet Date: 2002-05 Impact factor: 53.242

2. Structural annotation of equine protein-coding genes determined by mRNA sequencing.

Authors: S J Coleman; Z Zeng; K Wang; S Luo; I Khrebtukova; M J Mienaltowski; G P Schroth; J Liu; J N MacLeod
Journal: Anim Genet Date: 2010-12 Impact factor: 3.169

Review 3. Emerging technologies in DNA sequencing.

Authors: Michael L Metzker
Journal: Genome Res Date: 2005-12 Impact factor: 9.043

Review 4. Structural variation in the human genome.

Authors: Lars Feuk; Andrew R Carson; Stephen W Scherer
Journal: Nat Rev Genet Date: 2006-02 Impact factor: 53.242

Review 5. Applied equine genetics.

Authors: C J Finno; D L Bannasch
Journal: Equine Vet J Date: 2014-06-25 Impact factor: 2.888

Review 6. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future.

Authors: Montgomery Slatkin
Journal: Nat Rev Genet Date: 2008-06 Impact factor: 53.242

Review 7. Pathogenic or not? Assessing the clinical relevance of copy number variants.

Authors: J Y Hehir-Kwa; R Pfundt; J A Veltman; N de Leeuw
Journal: Clin Genet Date: 2013-08-21 Impact factor: 4.438

8. Capture of linear fragments at a double-strand break in yeast.

Authors: Anat Haviv-Chesner; Yoshifumi Kobayashi; Abram Gabriel; Martin Kupiec
Journal: Nucleic Acids Res Date: 2007-08-01 Impact factor: 16.971

9. Population-genetic nature of copy number variations in the human genome.

Authors: Mamoru Kato; Takahisa Kawaguchi; Shumpei Ishikawa; Takayoshi Umeda; Reiichiro Nakamichi; Michael H Shapero; Keith W Jones; Yusuke Nakamura; Hiroyuki Aburatani; Tatsuhiko Tsunoda
Journal: Hum Mol Genet Date: 2009-12-05 Impact factor: 6.150

10. Genetic diversity in the modern horse illustrated from genome-wide SNP data.

Authors: Jessica L Petersen; James R Mickelson; E Gus Cothran; Lisa S Andersson; Jeanette Axelsson; Ernie Bailey; Danika Bannasch; Matthew M Binns; Alexandre S Borges; Pieter Brama; Artur da Câmara Machado; Ottmar Distl; Michela Felicetti; Laura Fox-Clipsham; Kathryn T Graves; Gérard Guérin; Bianca Haase; Telhisa Hasegawa; Karin Hemmann; Emmeline W Hill; Tosso Leeb; Gabriella Lindgren; Hannes Lohi; Maria Susana Lopes; Beatrice A McGivney; Sofia Mikko; Nicholas Orr; M Cecilia T Penedo; Richard J Piercy; Marja Raekallio; Stefan Rieder; Knut H Røed; Maurizio Silvestrelli; June Swinburne; Teruaki Tozaki; Mark Vaudin; Claire M Wade; Molly E McCue
Journal: PLoS One Date: 2013-01-30 Impact factor: 3.240

3 in total

1. TRPM2 SNP genotype previously associated with susceptibility to Rhodococcus equi pneumonia in Quarter Horse foals displays differential gene expression identified using RNA-Seq.

Authors: Cole M McQueen; Canaan M Whitfield-Cargile; Kranti Konganti; Glenn P Blodgett; Scott V Dindot; Noah D Cohen
Journal: BMC Genomics Date: 2016-12-05 Impact factor: 3.969

2. Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds.

Authors: Robert J Schaefer; Mikkel Schubert; Ernest Bailey; Danika L Bannasch; Eric Barrey; Gila Kahila Bar-Gal; Gottfried Brem; Samantha A Brooks; Ottmar Distl; Ruedi Fries; Carrie J Finno; Vinzenz Gerber; Bianca Haase; Vidhya Jagannathan; Ted Kalbfleisch; Tosso Leeb; Gabriella Lindgren; Maria Susana Lopes; Núria Mach; Artur da Câmara Machado; James N MacLeod; Annette McCoy; Julia Metzger; Cecilia Penedo; Sagi Polani; Stefan Rieder; Imke Tammen; Jens Tetens; Georg Thaller; Andrea Verini-Supplizi; Claire M Wade; Barbara Wallner; Ludovic Orlando; James R Mickelson; Molly E McCue
Journal: BMC Genomics Date: 2017-07-27 Impact factor: 3.969

3. The opportunistic intracellular bacterial pathogen Rhodococcus equi elicits type I interferon by engaging cytosolic DNA sensing in macrophages.

Authors: Krystal J Vail; Bibiana Petri da Silveira; Samantha L Bell; Noah D Cohen; Angela I Bordin; Kristin L Patrick; Robert O Watson
Journal: PLoS Pathog Date: 2021-09-02 Impact factor: 6.823

3 in total