Literature DB >> 29843617

Genome sequencing and protein domain annotations of Korean Hanwoo cattle identify Hanwoo-specific immunity-related and other novel genes.

Kelsey Caetano-Anolles1, Kwondo Kim2, Woori Kwak2,3, Samsun Sung3, Heebal Kim2,1,3, Bong-Hwan Choi4, Dajeong Lim5.   

Abstract

BACKGROUND: Identification of genetic mechanisms and idiosyncrasies at the breed-level can provide valuable information for potential use in evolutionary studies, medical applications, and breeding of selective traits. Here, we analyzed genomic data collected from 136 Korean Native cattle, known as Hanwoo, using advanced statistical methods.
RESULTS: Results revealed Hanwoo-specific protein domains which were largely characterized by immunoglobulin function. Furthermore, domain interactions of novel Hanwoo-specific genes reveal additional links to immunity. Novel Hanwoo-specific genes linked to muscle and other functions were identified, including protein domains with functions related to energy, fat storage, and muscle function that may provide insight into the mechanisms behind Hanwoo cattle's uniquely high percentage of intramuscular fat and fat marbling.
CONCLUSION: The identification of Hanwoo-specific genes linked to immunity are potentially useful for future medical research and selective breeding. The significant genomic variations identified here can crucially identify genetic novelties that are arising from useful adaptations. These results will allow future researchers to compare and classify breeds, identify important genetic markers, and develop breeding strategies to further improve significant traits.

Entities:  

Keywords:  Cattle; DNA-Seq; Genome sequencing; Hanwoo; Protein domain; Unaligned read assembly

Mesh:

Substances:

Year:  2018        PMID: 29843617      PMCID: PMC5975384          DOI: 10.1186/s12863-018-0623-x

Source DB:  PubMed          Journal:  BMC Genet        ISSN: 1471-2156            Impact factor:   2.797


Background

Hanwoo is a Korean native taurine breed of cattle that has been around since 2000 BC. Although their original primary purpose was to serve as farming and transportation cattle, the rapid growth of the Korean economy that occurred in the 1960’s and its associated food demands led to this breed being used as a main source of meat [1]. Since then, the demand for this product in Korea has skyrocketed. This is due to the high percentage of fat marbling in Hanwoo meat, a characteristic that is unique to the breed. Hanwoo loin muscles have approximately 24% intramuscular fat content [2]. The quality and price of meat is often determined by fat marbling. Consequently, one of the main goals of the meat production industry worldwide is to increase the incidence of this trait [2]. Given this focus, several studies have investigated gene expression patterns with the primary goal of determining which genes are responsible for Hanwoo-specific high fat concentration [3-7]. Here we gathered genomic data from 136 Hanwoo cattle that we analyzed using advanced statistical methods. We show that investigation of the genome of this unique set of cattle individuals with the general goal of identifying breed-level idiosyncrasies can provide valuable information for potential use in evolutionary studies, medical applications, and breeding of selective traits. The goal is to enhance our understanding of characteristics of beef cattle breeds with unique adaptations and beneficial traits that have not yet been well elucidated. This would make it possible to selectively breed for these traits in other breeds of cattle worldwide to improve meat quality and revolutionize the field of meat production.

Methods

Alignment of unaligned reads for the detection for novel genes using the Hanwoo whole genome

Blood samples for whole genome sequencing were obtained from 136 Korean beef cattle (Hanwoo) individuals reared at the Hanwoo Improvement Center of the National Agricultural Cooperative Federation (Seosan, Chungnam, Korea). Indexed shotgun paired-end (PE) libraries with 500 bp average length inserts were generated from these samples using the TruSeq Nano DNA Library Prep Kit (Illumina, USA) following the standard Illumina sample-preparation protocol. Briefly, 200 ng of gDNA was fragmented using a Covaris M220 focused-ultrasonicator (Woburn, MA, USA) to produce fragments with a median size of ~ 500 bp. The fragmented DNA was subjected to end repair, A-tailing, and indexed adapter ligation (~ 125 bp adapter). Adapter-ligated DNA of 550 to 650 bp in length was amplified using PCR for 8 cycles. The size-selected libraries were analyzed using the Agilent 2100 Bioanalyzer (Agilent Technologies) to determine the size distribution and to check for adapter contamination. The resulting libraries were sequenced using the Illumina HiSeq 2500 (2x125bp paired-end sequences) and NextSeq500 (2x150bp paired-end sequences) Next-Gen sequencers. The bioinformatics pipeline used in this study is described in Figs. 1 and 2. Quality control for per-base quality of reads and removal of potential adaptor sequences was performed using fastQC v0.11.4 [8] and Trimmomatic v0.36 [9] software (seed mismatches:2, palindrome clip threshold:30, simple clip threshold:10, LEADING:10, TRAILING:10, MINLEN:80), respectively. Then, high-quality sequence reads were mapped to the Bos taurus reference genome (UMD 3.1) using Bowtie2.2.6 [10] with default settings in order to extract unaligned reads. Removal of duplicate reads was performed using Picard (ver 1.06) and indexing, sorting, and unaligned read extraction was performed using Samtools v1.3.1 [11]. GATK v3.4.46 [12-14] was used for local realignment and recalibration of the alignment (blue boxes on the pipeline figure; Fig. 1). A summary of sequencing data is provided in Additional file 1: Table S1.
Fig. 1

Detailed unaligned read assembly pipeline. Green squares represent the first stage of analysis- assembly of scaffold-level genome; blue squares represent the second stage of analysis- extraction of unaligned reads; yellow squares represent the third and final stage of analysis- gene prediction and functional annotation

Fig. 2

Simplified pipeline of unaligned read assembly

Detailed unaligned read assembly pipeline. Green squares represent the first stage of analysis- assembly of scaffold-level genome; blue squares represent the second stage of analysis- extraction of unaligned reads; yellow squares represent the third and final stage of analysis- gene prediction and functional annotation Simplified pipeline of unaligned read assembly Since we are interested in information originating from the sample itself and not detected from the reference sequence, we created an assembled genome at the scaffold level to discover whether unaligned reads actually constitute functional units (genes) on their own genome. This scaffold was created from one randomly selected sample from our pool of samples. The Broad Institute’s stand-alone ALLPATHS-LG fragment read error correction module [15, 16] was used for error correction as a precursor to de novo assembly. De novo assembly was performed using an Iterative De Bruijn Assembler of Uneven Depth (IDBA_UD: [17, 18], an iterative De Bruijn graph de novo assembler for short reads sequencing data that utilizes paired-end reads to assemble highly uneven low-depth regions. This tool is useful for optimizing the length gap problem and iterating different K-mer length (green boxes on the pipeline figure; Fig. 1). For unaligned read alignments, we extracted reads for each sample that was not aligned to the reference genome. Using the extracted unaligned reads (blue boxes on the pipeline figure; Fig. 1) and the assembled scaffold-level genome (green boxes on the pipeline figure; Fig. 1) of each sample, alignment of unaligned reads to the scaffold was carried out using Bowtie2 (remapping). The identified remapped sequences throughout the sequence were assumed to represent Hanwoo-specific sequences. These resulting regions constitute regions that are distinctive from the reference. We performed depth profiling to diminish the possibility of false positives. We identified scaffolds containing locations meeting our depth cutoff of 10× (an arbitrary cutoff selected for result filtering), and used the collected scaffolds for gene prediction using the gene prediction program Augustus 3.1.0. Out of the resulting 614 predicted genes, we extracted protein sequences covered by unaligned reads with at least depth of 10×. The resulting total of 283 protein sequences were cross-referenced against the Pfam database of protein families (pfam.xfam.org; [19]) using the protein domain detection program InterProScan-5.15-54.0 in order to identify protein domains affiliated with those areas of the genome. In order to assign meaning and infer the function of these domains, we searched for these identified domains within DOMINE (http://domine.utdallas.edu/cgi-bin/Domine), a database of known and predicted protein domain interactions [20, 21]. Using Interpro [22], we obtained GO (Gene Ontology) Cellular Component (CC), Molecular Function (MF), and Biological Process (BP) terms for each individual domain [23]. Next, gene ontology results were summarized and visualized with the online tool REVIGO (http://revigo.irb.hr; [24]) to better interpret our results. Next, using REVIGO’s Interactive Graph tool [24] and exporting results into the Cytoscape software package [25], we created a graph-based visualization of the identified terms for each GO category. Using the above described methodologies and annotations we were able to align and map genome sequences as well as predict genes that may be related to Hanwoo-specific characteristics.

Results and discussion

Research objectives and genome build summary

Our main research objectives included: (1) Assembling and mapping unaligned reads in order to identify and predict genes in Hanwoo cattle; (2) Cross-referencing results against a comprehensive protein domain database in order to identify protein domains affiliated with those areas of the genome; and (3) Mining the uncovered genes and associated domains to identify important gene functions and networks involved in positive traits. A summary of representative reference genome builds via short read assembly is presented in Table 1. We mapped unaligned reads against the reference genome and extracted information to a depth of 10× (meaning that each base was sequenced an average of 10 times). We predicted a total of 614 gene regions using scaffolds containing locations higher than depth of 10×. Of the 614 genes, 283 genes were covered by unaligned reads with at least depth of 10×.
Table 1

Summary of the results of representative reference genome build via short read assembly (> = 1 kb)

Base pairsPercent (%)
Number of scaffolds295,265100
Residue countsA701,475,98429.24
C498,799,87920.79
G498,441,37920.78
T692,999,84428.89
N7,063,1420.29
Total2,398,780,228100
Sequence lengthsMinimum1000
Maximum136,625
Average8124.16
N5013,528
Summary of the results of representative reference genome build via short read assembly (> = 1 kb) Cross-referencing of protein sequences from the 283 genes against the Pfam database identified associated protein domains covering a total of 168 scaffolds. Overall, 311 Pfam protein domains were identified when using data filtered for sequences with an average mapped base depth coverage of less than 10×. These numbers suggest that there was more than one affiliated domain identified for some gene regions. Due to space limitations, Table 2 lists significantly identified (E- value <1XE-100) Pfam protein family domain analysis results. An extended list of significantly identified Pfam domains with E-value <1E-40 is presented in Additional file 2: Table S2.
Table 2

Significantly identified (E- value <1XE-100) Pfam protein family domain analysis results

Gene nameLengthSourceAccessionDescriptionStartStopE-value
scaffold_2197.g59.t1581PfamPF00063Myosin head (motor domain)305755.60E-207
scaffold_1285.g30.t1417PfamPF15718Domain of unknown function (DUF4673)1164124.50E-154
scaffold_6851.g129.t1391PfamPF03028Dynein heavy chain and region D6 of dynein motor23901.50E-120
scaffold_13817.g209.t1758PfamPF01403Sema domain594672.30E-117
scaffold_29068.g344.t1348PfamPF16021Programmed cell death protein 7333443.00E-114
scaffold_15941.g224.t1887PfamPF04849HAP1 N-terminal conserved region12492.60E-108
scaffold_5769.g113.t1246PfamPF0024414–3-3 protein52383.60E-107
scaffold_1936.g56.t1564PfamPF08235LNS2 (Lipin/Ned1/Smp2)3005251.30E-104
Significantly identified (E- value <1XE-100) Pfam protein family domain analysis results

Hanwoo-specific genes linked to immunity

A number of domains were largely characterized by immune system function. Selected immune system-related genes are shown in Table 3. Six of the seven domains shown are associated to the immunoglobulin function, while the remaining domain is associated with the interferon group of signaling proteins, which is crucial for the immune system response as well.
Table 3

Selected immune system-related genes and affiliated protein domains

Gene nameLengthSourceAccessionDescriptionStartStopE-value
scaffold_2520.g67.t11075PfamPF13895Immunoglobulin domain4254915.40E-09
scaffold_19370.g263.t1508PfamPF07679Immunoglobulin I-set domain3814548.30E-07
scaffold_8624.g151.t1512PfamPF13895Immunoglobulin domain10733.90E-08
scaffold_14147.g214.t1159PfamPF07679Immunoglobulin I-set domain271123.40E-24
scaffold_13817.g209.t1758PfamPF00047Immunoglobulin domain5506285.10E-09
scaffold_5779.g114.t1142PfamPF07679Immunoglobulin I-set domain35706.20E-07
scaffold_46987.g437.t1460PfamPF09294Interferon-alpha/beta receptor, fibronectin type III441421.20E-17
Selected immune system-related genes and affiliated protein domains The interferon-alpha/beta receptor is a cell surface receptor made up of one chain with two subunits, IFNAR1 and IFNAR2. The interferon receptors have antiviral, antiproliferative, and immunomodulatory functions, as well as being highly involved in pregnancy [26, 27]. Interferon-τ, a type I interferon, has been shown to prevent a return to ovarian cyclicity after conception to ensure the continuation of the pregnancy in ruminant ungulate species; this interferon appears to be the main factor responsible for prevention of degradation of the corpus luteum [28, 29]. In addition to these reproductive roles, this receptor is responsible for binding type 1 interferons interferon–α and –β and activating the JAK-STAT signaling pathway, which is associated with DNA-transcription and the expression of genes related to immunity, proliferation, and differentiation, among others [30]. The JAK-STAT pathway has primary functions related to immunity. In fact, drug therapies that aim to turn down the immune response of the body and modulate host responses to disease and infection target this pathway [31]. The expression of the interferon group of signaling proteins in our Hanwoo cattle samples suggests that Hanwoo may have breed-specific immune system functions that are not yet well understood. Our analysis also identified associated protein domains which are largely characterized by the immunoglobulin function. These results are particularly salient given the significance of these kinds of results for medical research and selective breeding. The bovine immune system has been a topic of interest to researchers for quite some time now, mainly due to two reasons [32]. The first is that an understanding of the evolution and expression of mammalian immune system genes has important implications for human health. Bovine antibodies have been of particular interest, as they exhibit prophylactic and therapeutic properties in response to several human and animal infectious diseases [33-36]. Additionally, researchers have recently developed transgenic calves that produce human immunoglobulin, speaking to the incredible importance of cattle as model organisms for the study of human immunity and disease [37]. Secondly, understanding the molecular and genetic basis of immunity in cattle breeds can not only serve to further our understanding of the breeds, but also to provide genetic information which can be used for selective breeding in order to improve performance and survival of livestock. Immunity in cattle varies vastly by breed. For example, African cattle are known for their incredible resistance to tick and gastrointestinal parasite infestations, traits that have developed in response to thousands of years of evolution in the harsh environments of Africa. A particularly amazing adaptation is the resistance of several African breeds to trypanosomiasis, also known as sleeping sickness [38]. Identification of genes responsible for immunity and introduction of identified immunity-related genes in cattle breeds that are productive but highly susceptible to disease may improve their resistance, survival, and productivity. Understanding genetic features controlling these mechanisms will allow researchers to develop appropriate breeding strategies. More generally, research in immunoglobulin genetics is particularly salient for several reasons. Although research into the genetic aspects of and expression of genes related to immunoglobulin has been widely conducted in humans and mice, research in this field is lacking when it comes to livestock breeds, particularly cattle. Information is still needed to complete previous information, including the number of available gene segments and gene families. This kind of information can be used in the future to study and create synthetic recombinant species-specific antibodies, which could be used to treat and prevent infectious diseases.

Domain interactions of Hanwoo-specific genes reveal additional links to immunity

Additionally, more general consideration of significantly identified protein family domains from the Pfam database provided information needed to further understanding the breed-specific molecular mechanisms of Hanwoo cattle. Table 2 lists highly significantly identified (E- value <1XE-100) Pfam domains. In order to assign meaning and infer the function of these domains, which include several not well understood but highly significant protein domains, we searched for these identified domains within DOMINE (http://domine.utdallas.edu/cgi-bin/Domine), a database of known and predicted protein domain interactions [20, 21]. Among these, several interesting results reveal the genetic intricacies of the Hanwoo genome and its functions. Several of the most significantly identified protein domains appear to be closely linked with immune system function, further supporting our previous findings. For example, the significantly identified Sema domain (E-value = 2.30E-117) appears to be primarily associated with immune system function. The Sema domain not only forms interactions with the Immunoglobulin domain, but also interacts with the Thrombospondin type 1 (TSP-1) domain, which has been shown to control immune regulation. Thrombospondin, an extremely large multi-domain glycoprotein, is crucial to certain mechanisms related to angiogenesis, cell proliferation, and immune responses [39] such as the chemotactic response to tissue damage and the facilitation of phagocytosis of damaged cells [40-42]. Mice deficient in TSP-1 are more susceptible to inflammation and injury, either as a side effect of drugs or as a result of gene activation [43-46]. Given the strong role of this protein domain in immunity, our identification of this pathway here once again confirms that there are unique functions of immunity at play operating specifically in the Hanwoo genome.

Hanwoo-specific genes linked to muscle and other functions

Significantly identified protein domains with functions related to energy, fat storage, and muscle function may provide insight into the mechanisms behind Hanwoo cattle’s uniquely high percentage of intramuscular fat and fat marbling. For example, LNS2 (Lipin/Ned1/Smp2) domain, which includes Lipin, was significantly identified (E-value = 1.30E-104) in our data (Table 2). Lipin, encoded by the Lpin1 gene, is a powerful gene which largely controls how the body produces, stores, and uses fat. Mice deficient in Lipin do not develop either diet-induced or genetic obesity [47]. Additionally, enhanced Lipin expression has been shown to promote adiposity in mice [48]. Additionally, the Myosin head (motor domain) protein domain, which is associated with muscle function, was significantly identified (E-value = 5.60E-207, Table 2). Myosin is a chief component of myofibril filaments, which are responsible for muscle contraction. Myosin also actively participates in the conversion of ATP chemical energy to mechanical energy through its interaction with Actin [49]. Additionally, the Dynein heavy chain and region D6 of the dynein motor domain and 14–3-3 protein domain were significantly identified (E-values = 1.50E-120,3.60E-107 respectively), both of which are also largely responsible for ATP energy conversion [50-52]. These results suggest that these proteins domains are those which are primarily responsible for providing energy to the muscle and possibly causing the breed-specific high percentage of intramuscular fat that is observed in Hanwoo cattle. Several of the other identified domains, such as the HAP1 N-terminal conserved region domain, were found to lack interactions with any other domains and their specific roles in cattle have not been well established. As we learn more about these proteins and their functions in the future, we may be able to better interpret these results.

Interpretation of gene ontology terms associated with the entire set of Pfam domains

As previously discussed, we were able to identify 311 Pfam domains mapping to 168 scaffolds not shared with common cattle. We then filtered that list and kept only the highest hits. Within that short list, we revealed high enrichment for muscle and immunology genes. However, this approach provides a very limited look at our results. Thus, we aimed to further explore Hanwoo-specific domains by analyzing the enrichment of functional categories associated with each individual domain of the entire list. Using Interpro [22], we obtained GO (Gene Ontology) Cellular Component (CC), Molecular Function (MF), and Biological Process (BP) terms for each individual domain [23]. Next, gene ontology results were summarized and visualized with the online tool REVIGO (http://revigo.irb.hr; [24] to better interpret our results. Tables 4, 5, and 6 summarize the BP, CC, and MF GO terms, respectively. REVIGO calculates “frequency” and “uniqueness” values, with frequency representing the proportion of the specified GO term within the entire Bos taurus species-specific Uniprot protein annotation database, and uniqueness determining within the inputted list whether a term is an outlier when compared semantically to the list as a whole [24].
Table 4

Summary of enriched Gene Ontology (GO) biological process (BP) terms among total identified Pfam protein family domains

term_IDdescriptionFrequencyalog10 p-valueUniquenessb
GO:0008152metabolic process62.92%−4.20760.974
GO:0007154cell communication28.75%−5.92080.866
GO:0006139nucleobase-containing compound metabolic process28.16%−31.06550.776
GO:0007165signal transduction26.76%−18.92080.794
GO:0006810transport19.48%−32.63830.765
GO:0006355regulation of transcription, DNA-templated14.27%−8.95860.608
GO:0006464cellular protein modification process14.26%−11.72120.62
GO:0007186G-protein coupled receptor signaling pathway8.87%−25.43180.803
GO:0006508proteolyis7.74%−32.67780.741
GO:0006811ion transport7.05%−12.20760.744
GO:0055114oxidation-reduction process6.85%−11.6990.831
GO:0055085transmembrane transport6.54%−32.63830.714
GO:0016192vesicle-mediated transport4.60%−30.74470.785
GO:0006886intracelular protein transport3.09%−30.74470.792
GO:0016567protein ubiquitination2.40%−14.34680.673
GO:0006820anion transport2.07%−61.21470.769
GO:0006457protein folding1.00%−19.92080.736
GO:0007018microtubule-based movement1.00%−45.60210.843
GO:0035023regulation of Rho protein signal transduction0.92%−11.16750.832
GO:0016573histone acetylation0.60%−51.33720.635
GO:0043401steroid hormone mediated signaling pathway0.46%−28.63830.839
GO:0045454cell redox homeostasis0.40%−27.02690.865
GO:0000413protein peptidyl-prolyl isomerization0.24%−19.92080.709
GO:0006400tRNA modification0.20%−34.74470.692
GO:0018149peptide cross-linking0.17%−13.43180.731
GO:0007099centriole replication0.08%− 153.3470.832
GO:0009396folic acid-containing compound biosynthetic process0.05%−11.6990.786

aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific

bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole

Table 5

Summary of enriched Gene Ontology (GO) cellular component (CC) terms among total identified Pfam protein family domains

term_IDdescriptionFrequencyalog10 p-valueUniquenessb
GO:0005622intracellular63.18%−4.20760.875
GO:0016020membrane47.23%−5.79590.872
GO:0016021integral component of membrane29.77%−10.82390.832
GO:0005634nucleus27.70%−10.65760.688
GO:0005739mitochondrion9.22%−12.56860.636
GO:0005740mitochondrial envelope3.09%−32.35650.576
GO:0031012extracellular matrix2.00%−150.77
GO:0016459myosin complex0.38%− 206.2520.589
GO:0030286dynein complex0.20%−45.60210.594

aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific

bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole

Table 6

Summary of enriched Gene Ontology (GO) molecular function (MF) terms among total identified Pfam protein family domains

term_IDdescriptionFrequencyalog10 p-valueUniquenessb
GO:0003700sequence-specific DNA binding transcription factor activity5.30%−10.65760.958
GO:0003712transcription cofactor activity1.78%−39.88610.957
GO:0003824catalytic activity37.22%−13.45590.972
GO:0004871signal transducer activity11.94%−25.43180.927
GO:0004930G-protein coupled receptor activity7.87%−27.61980.925
GO:0005089Rho guanyl-nucleotide exchange factor activity0.48%−11.6990.956
GO:0005216ion channel activity2.49%−13.92080.896
GO:0015075ion transmembrane transporter activity5.28%−5.79590.895
GO:0016773phosphotransferase activity, alcohol group as acceptor4.60%−58.74470.771
GO:0004672protein kinase activity3.87%− 14.13080.772
GO:0043015gamma-tubulin binding0.10%−25.85390.878
GO:0008017microtubule binding1.06%−16.48150.848
GO:0019001guanyl nucleotide binding2.48%−25.43180.846
GO:0005509calcium ion binding3.94%−19.06550.863
GO:0005515protein binding26.71%−7.36650.92
GO:0004488methylenetetrahydrofolate dehydrogenase (NADP+) activity0.02%−13.45590.878
GO:0003755peptidyl-prolyl cis-trans isomerase activity0.26%−19.92080.876
GO:0003777microtubule motor activity0.51%−45.60210.814
GO:0005544calcium-dependent phospholipid binding0.17%−19.06550.844
GO:0042802identical protein binding4.77%−25.85390.878
GO:0031683G-protein beta/gamma-subunit complex binding0.16%−25.43180.883
GO:0043565sequence-specific DNA binding4.31%−10.65760.855
GO:0016787hydrolase activity15.05%−7.88610.841
GO:0008484sulfuric ester hydrolase activity0.11%−31.74470.832
GO:0004181metallocarboxypeptidase activity0.15%−32.67780.809
GO:0004222metalloendopeptidase activity0.79%−150.79
GO:0019901protein kinase binding1.80%−7.20070.885
GO:0008479queuine tRNA-ribosyltransferase activity0.02%−34.74470.849
GO:0003676nucleic acid binding21.33%−8.60210.849
GO:0005102receptor binding6.56%−62.18710.875
GO:0004402histone acetyltransferase activity0.24%−51.33720.812
GO:0003677DNA binding10.28%−9.53760.843
GO:00084083′-5′ exonuclease activity0.18%−31.06550.827
GO:0016746transferase activity, transferring acyl groups1.42%−30.92080.805
GO:0003723RNA binding7.68%−5.92080.847
GO:0046872metal ion binding20.96%−4.53760.845
GO:0004550nucleoside diphosphate kinase activity0.14%−55.5850.816
GO:0008236serine-type peptidase activity1.32%−7.17390.793
GO:0008270zinc ion binding6.73%−4.20760.856
GO:0004129cytochrome-c oxidase activity0.43%−12.56860.809
GO:0003924GTPase activity1.03%−25.43180.806
GO:0005524ATP binding8.83%−14.13080.751
GO:0000166nucleotide binding14.40%−7.35650.823
GO:0003810protein-glutamine gamma-glutamyltransferase activity0.07%−14.76960.823
GO:0005543phospholipid binding1.33%−16.60210.829
GO:0035091phosphatidylinositol binding0.84%−170.829

aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific

bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole

Summary of enriched Gene Ontology (GO) biological process (BP) terms among total identified Pfam protein family domains aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole Summary of enriched Gene Ontology (GO) cellular component (CC) terms among total identified Pfam protein family domains aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole Summary of enriched Gene Ontology (GO) molecular function (MF) terms among total identified Pfam protein family domains aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole Next, using REVIGO’s Interactive Graph tool [24] and exporting results into the Cytoscape software package [25], we created a graph-based visualization of the identified terms for each GO category. Figures 3, 4, and 5 display visualizations of BP, CC, and MF GO terms, respectively. The radius of the bubbles represents the generality of the specified term; a small bubble implies higher specificity. The p-value of each GO term is represented by the color shading of each bubble, with darker colors representing higher significance. The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms [24].
Fig. 3

Visualization of significantly identified Gene Ontology (GO) Biological Process (BP) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms

Fig. 4

Visualization of significantly identified Gene Ontology (GO) Cellular Component (CC) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms

Fig. 5

Visualization of significantly identified Gene Ontology (GO) Molecular Function (MF) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms

Visualization of significantly identified Gene Ontology (GO) Biological Process (BP) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms Visualization of significantly identified Gene Ontology (GO) Cellular Component (CC) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms Visualization of significantly identified Gene Ontology (GO) Molecular Function (MF) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms The BP GO term visualization (Fig. 3) can be characterized by a large number of un-connected solo terms and shows a large diversity of biological processes being affected, meaning that a large rewiring of functionality is embedded in the new genes acquired by Hanwoo cattle. Note that the most significant term is the most specific, ‘centriole replication’, which is also connected to the general term ‘microtubule-based movement’; dynein (significantly identified from our data) moves along microtubules, so this term may reflect the biological processes responsible for dynein’s role in ATP energy conversion. This is quite unique and unexpected, since it signals an important role of cell division [53]. The second group of more significant terms are less specific but all related to transport, particularly ‘anion transport’, which may be associated with ATP energetics. Another uniqueness is the steroid hormone mediated signaling pathway. Sex steroid hormones play a critical role in the regulation of muscle, muscle strength, and growth and maintenance of muscle mass [54]. While identification of this GO term most likely can be attributed to the aforementioned relationship between steroid hormones and muscle development, as a result of the breed-specific unique high-fat muscle development, it may also be due to the practices under which Hanwoo are reared in order to enhance the natural fat marbling in their meat, such as feeding time and diet. For example, cattle are fed a high-concentration grain diet as opposed to grass-feeding [55]. Diet has been shown to have an effect on steroid hormones [56], which may also in part explain the identification of this GO term here. The CC GO term visualization (Fig. 4) can be characterized by a single connected group consisting of four terms: dynein complex, myosin complex, mitochondrial envelope, and mitochondrion. As previously mentioned, the Myosin Head and Dynein heavy chain protein domains were found significantly identified in our results- both of which participate in the conversion of ATP chemical energy to mechanical energy and serve crucial functions for muscle function. The connectivity of these nodes within our network visualization signifies that these two components work together and are potentially significant in Hanwoo-specific characteristics, such as their high percentage of intramuscular fat. The rest of the terms are generic, independent CC terms that include nucleus and membrane. The MF GO term visualization (Fig. 5) can be characterized by high connectivity, with the most significant values grouped together. Microtubule motor activity, another microtubule function related term, was also identified at the molecular function level, once again suggesting ATP energetics at play. A unique feature of this visualization, compared to the BP and CC visualizations, is the presence of 4 unconnected graphs as opposed to many unconnected terms or a single connected group. The first group features solely terms related to binding. This group contains the following terms: Sequence-specific DNA binding, DNA binding, RNA binding, Nucleic acid binding, ATP binding, Phospholipid binding, Calcium-dependent phospholipid binding, Guanyl nucleotide binding, Metal ion binding, Zinc ion binding, and Calcium ion binding. The second group consists of three connected terms: Ion Channel activity, Methylenetetrahydrofolate dehydrogenase (NADP+) activity, and Cytochrome-c oxidase activity. The third group consists of six connected terms: Sulfuric ester hydrolase activity, 3′-5′ exonuclease activity, Microtubule motor activity, GTPase activity, Serine-type peptidase activity, and Metallocarboxypeptidase activity. The fourth and final group consists of 5 terms related to the activity of transferases: Nucleoside diphosphate kinase activity, Transferase activity, transferring acyl groups, Protein-glutamine gamma-glutamyltransferase activity, Histone acetyltransferase activity, and Phosphotransferase activity, alcohol group as acceptor. Transferases are enzymes which are responsible for catalyzation of the transfer of certain functional groups from one molecule to another. They are essential for countless biochemical processes throughout the body. In cattle specifically, it has been shown that the activity of transferases is critical for embryo development [57]. The expression of genes with transferase activity function varies between abnormal and normal pregnancies [58, 59]. Therefore, the expression of these transferase GO terms may be due to their role in healthy pregnancy and development. However, interestingly, results of previous studies have demonstrated a correlation between certain transferase activity genes, such as GPAT1 and ATGL, and intramuscular fat content in Korean Cattle [60]. These previously identified results, when taken along with the comparatively high expression and connectivity of GO terms related to transferase activity, suggests that there may be unique mechanisms of transferase activity in Hanwoo cattle which influences their development and may perhaps be a factor impacting their species-specific high percentage of intramuscular fat.

Conclusions

The information unearthed from the comparison of breeds and identification of genetic variation in this study will be invaluable for future research on the molecular determinants that have been bred in Hanwoo cattle. Results revealed Hanwoo-specific protein domains which were largely characterized by immunoglobulin function. Furthermore, domain interactions of Hanwoo-specific genes reveal additional links to immunity. Hanwoo-specific genes linked to muscle and other functions were identified, including protein domains with functions related to energy, fat storage, and muscle function that may provide insight into the mechanisms behind Hanwoo cattle’s uniquely high percentage of intramuscular fat and fat marbling. Analyzing the whole Hanwoo genome and reporting significant genomic variations is crucial to identifying genetic novelties that are arising from useful adaptations. Similarly, such analysis will allow future researchers to compare and classify breeds, identify important genetic markers, and develop breeding strategies to further improve traits of economic value and biological significance. Table S1. Summary of sequencing data (DOCX 28 kb) Table S2. Significantly identified (E- value <1XE-40) Pfam protein family domain analysis results. (DOCX 17 kb) FASTA sequences for scaffolds which have locations with depth > 10×. (XLSX 9907 kb) Protein sequences which have locations with depth > 10×. (XLSX 101 kb)
  56 in total

Review 1.  14-3-3 proteins in the nervous system.

Authors:  Daniela Berg; Carsten Holzmann; Olaf Riess
Journal:  Nat Rev Neurosci       Date:  2003-09       Impact factor: 34.870

2.  Expression of fat deposition and fat removal genes is associated with intramuscular fat content in longissimus dorsi muscle of Korean cattle steers.

Authors:  J Jeong; E G Kwon; S K Im; K S Seo; M Baik
Journal:  J Anim Sci       Date:  2012-01-20       Impact factor: 3.159

Review 3.  Mechanisms of type-I- and type-II-interferon-mediated signalling.

Authors:  Leonidas C Platanias
Journal:  Nat Rev Immunol       Date:  2005-05       Impact factor: 53.106

4.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

5.  Meat quality traits as a function of cow maturity.

Authors:  Soo Hyun Cho; Geunho Kang; Pilnam Seong; Sunmoon Kang; Changwan Sun; Sunsik Jang; Jin Hyung Cheong; Beomyoung Park; Inho Hwang
Journal:  Anim Sci J       Date:  2016-09-28       Impact factor: 1.749

6.  Identification and selection of cattle naturally resistant to African trypanosomiasis.

Authors:  G E Roelants; F Fumoux; M Pinder; R Queval; A Bassinga; E Authié
Journal:  Acta Trop       Date:  1987-03       Impact factor: 3.112

7.  Bovine anti-Helicobacter pylori antibodies for oral immunotherapy.

Authors:  T H Casswall; H O Nilsson; L Björck; S Sjöstedt; Li Xu; C K Nord; T Borén; T Wadström; L Hammarström
Journal:  Scand J Gastroenterol       Date:  2002-12       Impact factor: 2.423

8.  Finished bacterial genomes from shotgun sequence data.

Authors:  Filipe J Ribeiro; Dariusz Przybylski; Shuangye Yin; Ted Sharpe; Sante Gnerre; Amr Abouelleil; Aaron M Berlin; Anna Montmayeur; Terrance P Shea; Bruce J Walker; Sarah K Young; Carsten Russ; Chad Nusbaum; Iain MacCallum; David B Jaffe
Journal:  Genome Res       Date:  2012-07-24       Impact factor: 9.043

9.  DOMINE: a comprehensive collection of known and predicted domain-domain interactions.

Authors:  Sailu Yellaboina; Asba Tasneem; Dmitri V Zaykin; Balaji Raghavachari; Raja Jothi
Journal:  Nucleic Acids Res       Date:  2010-11-27       Impact factor: 16.971

10.  Fatty Acid Profiles of Ten Muscles from High and Low Marbled (Quality Grade 1++ and 2) Hanwoo Steers.

Authors:  Young-Hwa Hwang; Seon-Tea Joo
Journal:  Korean J Food Sci Anim Resour       Date:  2016-10-31       Impact factor: 2.622

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.