| Literature DB >> 29843617 |
Kelsey Caetano-Anolles1, Kwondo Kim2, Woori Kwak2,3, Samsun Sung3, Heebal Kim2,1,3, Bong-Hwan Choi4, Dajeong Lim5.
Abstract
BACKGROUND: Identification of genetic mechanisms and idiosyncrasies at the breed-level can provide valuable information for potential use in evolutionary studies, medical applications, and breeding of selective traits. Here, we analyzed genomic data collected from 136 Korean Native cattle, known as Hanwoo, using advanced statistical methods.Entities:
Keywords: Cattle; DNA-Seq; Genome sequencing; Hanwoo; Protein domain; Unaligned read assembly
Mesh:
Substances:
Year: 2018 PMID: 29843617 PMCID: PMC5975384 DOI: 10.1186/s12863-018-0623-x
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Fig. 1Detailed unaligned read assembly pipeline. Green squares represent the first stage of analysis- assembly of scaffold-level genome; blue squares represent the second stage of analysis- extraction of unaligned reads; yellow squares represent the third and final stage of analysis- gene prediction and functional annotation
Fig. 2Simplified pipeline of unaligned read assembly
Summary of the results of representative reference genome build via short read assembly (> = 1 kb)
| Base pairs | Percent (%) | ||
|---|---|---|---|
| Number of scaffolds | 295,265 | 100 | |
| Residue counts | A | 701,475,984 | 29.24 |
| C | 498,799,879 | 20.79 | |
| G | 498,441,379 | 20.78 | |
| T | 692,999,844 | 28.89 | |
| N | 7,063,142 | 0.29 | |
| Total | 2,398,780,228 | 100 | |
| Sequence lengths | Minimum | 1000 | |
| Maximum | 136,625 | ||
| Average | 8124.16 | ||
| N50 | 13,528 | ||
Significantly identified (E- value <1XE-100) Pfam protein family domain analysis results
| Gene name | Length | Source | Accession | Description | Start | Stop | |
|---|---|---|---|---|---|---|---|
| scaffold_2197.g59.t1 | 581 | Pfam | PF00063 | Myosin head (motor domain) | 30 | 575 | 5.60E-207 |
| scaffold_1285.g30.t1 | 417 | Pfam | PF15718 | Domain of unknown function (DUF4673) | 116 | 412 | 4.50E-154 |
| scaffold_6851.g129.t1 | 391 | Pfam | PF03028 | Dynein heavy chain and region D6 of dynein motor | 2 | 390 | 1.50E-120 |
| scaffold_13817.g209.t1 | 758 | Pfam | PF01403 | Sema domain | 59 | 467 | 2.30E-117 |
| scaffold_29068.g344.t1 | 348 | Pfam | PF16021 | Programmed cell death protein 7 | 33 | 344 | 3.00E-114 |
| scaffold_15941.g224.t1 | 887 | Pfam | PF04849 | HAP1 N-terminal conserved region | 1 | 249 | 2.60E-108 |
| scaffold_5769.g113.t1 | 246 | Pfam | PF00244 | 14–3-3 protein | 5 | 238 | 3.60E-107 |
| scaffold_1936.g56.t1 | 564 | Pfam | PF08235 | LNS2 (Lipin/Ned1/Smp2) | 300 | 525 | 1.30E-104 |
Selected immune system-related genes and affiliated protein domains
| Gene name | Length | Source | Accession | Description | Start | Stop | |
|---|---|---|---|---|---|---|---|
| scaffold_2520.g67.t1 | 1075 | Pfam | PF13895 | Immunoglobulin domain | 425 | 491 | 5.40E-09 |
| scaffold_19370.g263.t1 | 508 | Pfam | PF07679 | Immunoglobulin I-set domain | 381 | 454 | 8.30E-07 |
| scaffold_8624.g151.t1 | 512 | Pfam | PF13895 | Immunoglobulin domain | 10 | 73 | 3.90E-08 |
| scaffold_14147.g214.t1 | 159 | Pfam | PF07679 | Immunoglobulin I-set domain | 27 | 112 | 3.40E-24 |
| scaffold_13817.g209.t1 | 758 | Pfam | PF00047 | Immunoglobulin domain | 550 | 628 | 5.10E-09 |
| scaffold_5779.g114.t1 | 142 | Pfam | PF07679 | Immunoglobulin I-set domain | 35 | 70 | 6.20E-07 |
| scaffold_46987.g437.t1 | 460 | Pfam | PF09294 | Interferon-alpha/beta receptor, fibronectin type III | 44 | 142 | 1.20E-17 |
Summary of enriched Gene Ontology (GO) biological process (BP) terms among total identified Pfam protein family domains
| term_ID | description | Frequencya | log10 | Uniquenessb |
|---|---|---|---|---|
| GO:0008152 | metabolic process | 62.92% | −4.2076 | 0.974 |
| GO:0007154 | cell communication | 28.75% | −5.9208 | 0.866 |
| GO:0006139 | nucleobase-containing compound metabolic process | 28.16% | −31.0655 | 0.776 |
| GO:0007165 | signal transduction | 26.76% | −18.9208 | 0.794 |
| GO:0006810 | transport | 19.48% | −32.6383 | 0.765 |
| GO:0006355 | regulation of transcription, DNA-templated | 14.27% | −8.9586 | 0.608 |
| GO:0006464 | cellular protein modification process | 14.26% | −11.7212 | 0.62 |
| GO:0007186 | G-protein coupled receptor signaling pathway | 8.87% | −25.4318 | 0.803 |
| GO:0006508 | proteolyis | 7.74% | −32.6778 | 0.741 |
| GO:0006811 | ion transport | 7.05% | −12.2076 | 0.744 |
| GO:0055114 | oxidation-reduction process | 6.85% | −11.699 | 0.831 |
| GO:0055085 | transmembrane transport | 6.54% | −32.6383 | 0.714 |
| GO:0016192 | vesicle-mediated transport | 4.60% | −30.7447 | 0.785 |
| GO:0006886 | intracelular protein transport | 3.09% | −30.7447 | 0.792 |
| GO:0016567 | protein ubiquitination | 2.40% | −14.3468 | 0.673 |
| GO:0006820 | anion transport | 2.07% | −61.2147 | 0.769 |
| GO:0006457 | protein folding | 1.00% | −19.9208 | 0.736 |
| GO:0007018 | microtubule-based movement | 1.00% | −45.6021 | 0.843 |
| GO:0035023 | regulation of Rho protein signal transduction | 0.92% | −11.1675 | 0.832 |
| GO:0016573 | histone acetylation | 0.60% | −51.3372 | 0.635 |
| GO:0043401 | steroid hormone mediated signaling pathway | 0.46% | −28.6383 | 0.839 |
| GO:0045454 | cell redox homeostasis | 0.40% | −27.0269 | 0.865 |
| GO:0000413 | protein peptidyl-prolyl isomerization | 0.24% | −19.9208 | 0.709 |
| GO:0006400 | tRNA modification | 0.20% | −34.7447 | 0.692 |
| GO:0018149 | peptide cross-linking | 0.17% | −13.4318 | 0.731 |
| GO:0007099 | centriole replication | 0.08% | − 153.347 | 0.832 |
| GO:0009396 | folic acid-containing compound biosynthetic process | 0.05% | −11.699 | 0.786 |
aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific
bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole
Summary of enriched Gene Ontology (GO) cellular component (CC) terms among total identified Pfam protein family domains
| term_ID | description | Frequencya | log10 | Uniquenessb |
|---|---|---|---|---|
| GO:0005622 | intracellular | 63.18% | −4.2076 | 0.875 |
| GO:0016020 | membrane | 47.23% | −5.7959 | 0.872 |
| GO:0016021 | integral component of membrane | 29.77% | −10.8239 | 0.832 |
| GO:0005634 | nucleus | 27.70% | −10.6576 | 0.688 |
| GO:0005739 | mitochondrion | 9.22% | −12.5686 | 0.636 |
| GO:0005740 | mitochondrial envelope | 3.09% | −32.3565 | 0.576 |
| GO:0031012 | extracellular matrix | 2.00% | −15 | 0.77 |
| GO:0016459 | myosin complex | 0.38% | − 206.252 | 0.589 |
| GO:0030286 | dynein complex | 0.20% | −45.6021 | 0.594 |
aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific
bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole
Summary of enriched Gene Ontology (GO) molecular function (MF) terms among total identified Pfam protein family domains
| term_ID | description | Frequencya | log10 p-value | Uniquenessb |
|---|---|---|---|---|
| GO:0003700 | sequence-specific DNA binding transcription factor activity | 5.30% | −10.6576 | 0.958 |
| GO:0003712 | transcription cofactor activity | 1.78% | −39.8861 | 0.957 |
| GO:0003824 | catalytic activity | 37.22% | −13.4559 | 0.972 |
| GO:0004871 | signal transducer activity | 11.94% | −25.4318 | 0.927 |
| GO:0004930 | G-protein coupled receptor activity | 7.87% | −27.6198 | 0.925 |
| GO:0005089 | Rho guanyl-nucleotide exchange factor activity | 0.48% | −11.699 | 0.956 |
| GO:0005216 | ion channel activity | 2.49% | −13.9208 | 0.896 |
| GO:0015075 | ion transmembrane transporter activity | 5.28% | −5.7959 | 0.895 |
| GO:0016773 | phosphotransferase activity, alcohol group as acceptor | 4.60% | −58.7447 | 0.771 |
| GO:0004672 | protein kinase activity | 3.87% | − 14.1308 | 0.772 |
| GO:0043015 | gamma-tubulin binding | 0.10% | −25.8539 | 0.878 |
| GO:0008017 | microtubule binding | 1.06% | −16.4815 | 0.848 |
| GO:0019001 | guanyl nucleotide binding | 2.48% | −25.4318 | 0.846 |
| GO:0005509 | calcium ion binding | 3.94% | −19.0655 | 0.863 |
| GO:0005515 | protein binding | 26.71% | −7.3665 | 0.92 |
| GO:0004488 | methylenetetrahydrofolate dehydrogenase (NADP+) activity | 0.02% | −13.4559 | 0.878 |
| GO:0003755 | peptidyl-prolyl cis-trans isomerase activity | 0.26% | −19.9208 | 0.876 |
| GO:0003777 | microtubule motor activity | 0.51% | −45.6021 | 0.814 |
| GO:0005544 | calcium-dependent phospholipid binding | 0.17% | −19.0655 | 0.844 |
| GO:0042802 | identical protein binding | 4.77% | −25.8539 | 0.878 |
| GO:0031683 | G-protein beta/gamma-subunit complex binding | 0.16% | −25.4318 | 0.883 |
| GO:0043565 | sequence-specific DNA binding | 4.31% | −10.6576 | 0.855 |
| GO:0016787 | hydrolase activity | 15.05% | −7.8861 | 0.841 |
| GO:0008484 | sulfuric ester hydrolase activity | 0.11% | −31.7447 | 0.832 |
| GO:0004181 | metallocarboxypeptidase activity | 0.15% | −32.6778 | 0.809 |
| GO:0004222 | metalloendopeptidase activity | 0.79% | −15 | 0.79 |
| GO:0019901 | protein kinase binding | 1.80% | −7.2007 | 0.885 |
| GO:0008479 | queuine tRNA-ribosyltransferase activity | 0.02% | −34.7447 | 0.849 |
| GO:0003676 | nucleic acid binding | 21.33% | −8.6021 | 0.849 |
| GO:0005102 | receptor binding | 6.56% | −62.1871 | 0.875 |
| GO:0004402 | histone acetyltransferase activity | 0.24% | −51.3372 | 0.812 |
| GO:0003677 | DNA binding | 10.28% | −9.5376 | 0.843 |
| GO:0008408 | 3′-5′ exonuclease activity | 0.18% | −31.0655 | 0.827 |
| GO:0016746 | transferase activity, transferring acyl groups | 1.42% | −30.9208 | 0.805 |
| GO:0003723 | RNA binding | 7.68% | −5.9208 | 0.847 |
| GO:0046872 | metal ion binding | 20.96% | −4.5376 | 0.845 |
| GO:0004550 | nucleoside diphosphate kinase activity | 0.14% | −55.585 | 0.816 |
| GO:0008236 | serine-type peptidase activity | 1.32% | −7.1739 | 0.793 |
| GO:0008270 | zinc ion binding | 6.73% | −4.2076 | 0.856 |
| GO:0004129 | cytochrome-c oxidase activity | 0.43% | −12.5686 | 0.809 |
| GO:0003924 | GTPase activity | 1.03% | −25.4318 | 0.806 |
| GO:0005524 | ATP binding | 8.83% | −14.1308 | 0.751 |
| GO:0000166 | nucleotide binding | 14.40% | −7.3565 | 0.823 |
| GO:0003810 | protein-glutamine gamma-glutamyltransferase activity | 0.07% | −14.7696 | 0.823 |
| GO:0005543 | phospholipid binding | 1.33% | −16.6021 | 0.829 |
| GO:0035091 | phosphatidylinositol binding | 0.84% | −17 | 0.829 |
aFrequency represents the proportion of the specified GO term within the entire Bos Taurus species-specific Uniprot protein annotation database. Higher frequencies represent more general and common terms, while terms with a lower frequency are rare and specific
bUniqueness represents whether a term is an outlier when compared semantically to the list as a whole
Fig. 3Visualization of significantly identified Gene Ontology (GO) Biological Process (BP) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms
Fig. 4Visualization of significantly identified Gene Ontology (GO) Cellular Component (CC) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms
Fig. 5Visualization of significantly identified Gene Ontology (GO) Molecular Function (MF) terms. The radius of the bubbles represents the generality of the specified term (a small bubble implies higher specificity). The p-value of each GO term is represented by the color shading of each bubble (darker colors representing higher significance). The edges between the nodes of our graph (GO terms) represent the top 3% strongest pairwise similarities between terms