Literature DB >> 28323836

Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

Nedenia Bonvino Stafuzza1, Adhemar Zerlotini2, Francisco Pereira Lobo2, Michel Eduardo Beleza Yamagishi2, Tatiane Cristina Seleguim Chud1, Alexandre Rodrigues Caetano3, Danísio Prado Munari1, Dorian J Garrick4, Marco Antonio Machado5, Marta Fonseca Martins5, Maria Raquel Carvalho6, John Bruce Cole7, Marcos Vinicius Gualberto Barbosa da Silva5.   

Abstract

Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

Entities:  

Mesh:

Year:  2017        PMID: 28323836      PMCID: PMC5360315          DOI: 10.1371/journal.pone.0173954

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Brazil has the world's largest commercial cattle herd with around 215.2 million head in 2015 according to the agricultural census [1], from which about 80% is estimated to be composed of purebred and crossbreed Bos primigenius indicus animals. The vast majority of Brazilian dairy cows are extensively grazed in challenging environments where they must convert low-quality forages into high-quality milk [2]. Due to the naturally occurring temperature, disease and nutritional stresses, indicine breeds are widely used in Brazilian cattle production systems, because of their adaptability towards the tropical climate and extensive production systems. Guzerat and Gyr, which were introduced from India, are important indicine cattle breeds for milk and meat production in Brazil. The Guzerat breed was introduced in Brazil at the end of the 19th century and currently ranks fifth in numbers of animals among indicine cattle breeds in the country [3]. The Guzerat breed is recognized for traits such as resistance to parasites, heat tolerance and willingness to consume forage with low nutritional value, all of which are important to its adaptation to adverse tropical environments. This breed was included in the FAO list of domestic animal genetic resources to be conserved by management [4] due to their potential for production in the tropics and their small effective population size. Considered to be a dual-purpose breed, Guzerat cattle have been widely used in both purebred and crossbreeding schemes to produce efficient dairy cows and beef calves. The National Breeding Program for the Improvement of Guzerat Dairy Cattle was implemented in 1994 to improve milk production, conformation, and management traits through selection based on genetic evaluations through progeny testing and by utilizing multiple ovulation and embryo transfer nucleus schemes [5]. The Gyr breed is the indicine breed with the highest milk yield and it is recognized for its robustness, adaptability to high-temperature conditions and resistance to parasites common in tropical climates. The National Breeding Program of the Dairy Gyr was established in 1985 in Brazil aiming for the genetic improvement of the breed through the identification and selection of genetically superior bulls for milk production (milk, fat, protein and total solids), conformation and management traits [6]. Gyr animals are present in more than 80% of Brazilian dairy herds either as purebreds or in stabilized crosses with Holstein (Bos primigenius taurus) resulting in the Girolando composite breed [7]. Crossbreeding has been used to generate cows that would combine the high milk production capacity of Holsteins and the adaptability to tropical conditions of the Gyr breed. Crossbreds Girolando cattle are noteworthy for excellent productivity, high fertility indexes and good vigor. Due to these advantages, Girolando became the predominant cattle breed on the majority of Brazilian dairy farms in terms of numbers of animals. A considerable number of genetic variants, including single nucleotide variations (SNVs), insertions/deletions (InDels), and structural variations have been identified across the cattle genome, as a result of the Bovine Genome Sequencing [8], HapMap [9], and the 1000 bull genomes projects [10]. These projects have helped to demonstrate the potential of cattle genomic research by increasing our knowledge of mammalian evolution and biology. Recent advances in next-generation sequencing (NGS) technology and sequence analysis tools have allowed whole genome re-sequencing to become a feasible tool to quickly, efficiently and accurately identify genetic variants segregating in a population. The identification of all genetic variants in the genome is a crucial first step for discovery of causal variants associated with complex traits in livestock species [11, 12]. Whole-genome re-sequencing has now been extensively used to identify genomic variants in a number of cattle breeds [13-21]. However, most studies have concentrated on taurine breeds, exacerbating the deficiency of knowledge for indicine genetic resources. About 99.71 million single nucleotide variants (SNV) and 8.36 million InDels across the bovine genome have as of February 2017 been published in the dbSNP database (http://www.ncbi.nlm.nih.gov/snp/). Among these, only 4.38 million SNPs and three InDels have been deposited for indicine breeds. Furthermore, the SNP chips currently used for genotyping are biased toward variants useful for characterizing taurine breeds, imposing limitations for genome-wide studies about genetic structure and diversity of indicine breeds [22]. This bias strongly affects the estimates obtained from the data [23-25]. Nevertheless, commercially available SNP chips of low- and medium-density markers specific for indicine breeds were developed by GeneSeek Inc. (Lincoln, NE), but these still suffer from ascertainment bias. However, the whole genome resequencing of single animals can be used to remove that bias, as well as identifying rare putative functional variants and detecting structural variants [10, 12, 17]. In humans, InDels have received far less attention than other variants because they are one of the least well characterized and understood variants across the genome, although several lines of evidence indicated that such variation is a major determinant of human biological diversity [26, 27]. Similarly in cattle, InDels remain less studied than SNVs, despite the fact they are the second most abundant type of genetic variant following SNVs. Although several studies identified a high number of InDels in cattle by next generation sequencing, it is apparent that most of them were not submitted to the dbSNP database. InDels in coding regions can significantly impact gene expression, particularly through frameshifts resulting in prematurely terminated protein products or changed splice variants. The aim of this study was to detect and make publicly available genome-wide SNVs and InDels present in Gyr, Girolando, Guzerat, and Holstein breeds. These results will contribute to the dbSNP database and will provide an updated genomic resource for genome-wide association studies and genomic selection in indicine cattle, which will help increase knowledge of the underlying genetic architecture of quantitative traits. In addition, our findings could improve comparative genomic studies between Bos primigenius taurus and Bos primigenius indicus breeds, providing new insights into the history of divergence of those groups.

Materials and methods

Ethics statement

Gyr and Guzerat DNA was extracted from semen, while Girolando and Holstein DNA was extracted from the semen or blood samples, bought from an artificial insemination center and therefore, no specific ethical approval is needed (Brazil law number 11794, from October 8th, 2008, Chapter 1, Art. 3, paragraph III). All the samples were obtained with the consent of the artificial insemination center to use for research.

DNA sampling, library construction and sequencing

A total of 12 animals (two Gyr, three Girolando, two Guzerat and five Holstein) were selected for sequencing according to having high numbers of daughters and due to the role of their lineage within each breed. Gyr and Guzerat genomic DNA was extracted from semen samples using a modified phenol/chloroform method described shortly. Samples were washed with lysis buffer to remove semen diluents. After that, samples were incubated with extraction buffer containing dithiothreitol 10% (DTT) and RNase for 2h. Pellets were incubated overnight with saline-proteinase K buffer and protein was removed by subsequent phenol-chloroform treatment. Girolando (⅝ Holstein and ⅜ Gyr) and Holstein DNA was extracted from the semen or blood samples using the DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA, USA), following manufacturer’s protocol. Quality and concentration of DNA for all samples were determined by NanoDrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). Paired-end and mate-paired libraries with insert sizes of 200 bp (Holstein samples) and 500 bp (Gyr, Guzerat, and Girolando samples) were prepared following Illumina protocols to produces read lengths of 2 x 100 bp and 200 bp. Each library was high-throughput sequenced on a HiSeq 2000 sequencer (Illumina Inc., San Diego, CA, USA), following manufacturer guidelines.

Short read mapping and variant calling

The UMD 3.1 bovine genome assembly (taurine reference) was used as the reference for the 29 bovine autosomes and the X chromosome whereas the Btau4.6.1 assembly was used as the reference genome sequence for the Y chromosome. Sequencing reads were mapped to the reference assembly using the Burrows-Wheeler Aligner tool (BWA, version 0.7.10-r789) with default parameters [28]. Picard tools (version 1.54) (http://broadinstitute.github.io/picard/) were used to eliminate PCR duplicates. Variant calling was conducted with Freebayes (https://github.com/ekg/freebayes). All SNVs and InDels were identified as differences from the reference genome sequences. The resulting variant list obtained for each animal were filtered by vcffilter (https://github.com/vcflib/vcflib) in order to remove SNVs and InDels with quality scores lower than 30 or coverage lower than 7.

Annotation of SNVs and InDels

The SNVs and InDels were classified according to their potential function using the Ensembl Variant Effect Predictor tool (VEP, version 84) [29], except those mapped onto Y chromosome because VEP contained only the bovine genome assembly UMD 3.1. Variants with the non-reference allele present in the dbSNP database [30] were classified as “known” and variants described herein for the first time were classified as “novel”. The Venn diagrams representing SNVs and InDels were generated using the R environment (VennDiagram package, version 1.6.17). The average ratios of homozygous versus heterozygous SNVs and InDels were calculated for Gyr, Girolando and Guzerat breeds, using RTG tools (http://realtimegenomics.com/products/rtg-tools/). The transition-to-transversion ratio (Ti/Tv), which is used as an indicator of potential sequencing errors, was calculated using the vcf-stats tool (http://vcftools.sourceforge.net/perl_module.html#vcf-stats).

Functional enrichment analysis

The Database for Annotation, Visualization, and Integrated Discovery tool (DAVID, version 6.8) [31, 32] was used for functional enrichment analysis using the lists of genes that had variants classified by the VEP tool as high (splice acceptor variant, splice donor variant, stop gained, frameshift variant, stop lost, and start lost) or moderate (inframe insertion, inframe deletion, missense variant, and protein altering variant) severity of consequences in transcripts. All annotated genes in the taurine genome were used as background. Gene Ontology (GO) biological process, GO cellular component and GO molecular function annotation data sets were used for functional enrichment analysis considering a 10% false discovery rate (FDR) threshold for significance.

Validation of SNVs using BovineHD BeadChip array

We evaluated the genotype concordance between the SNVs and SNP panel genotype data of Gyr_1, Gyr_2, Girolando_1, Girolando_2, Girolando_3, and Guzerat_2 samples to verify and validate the quality of the identified SNVs. The genotype quality control (QC) was not carried out. A python script was used to compare the SNV calling to the Bovine HD BeadChip array (Illumina, San Diego, USA). The script stored the final report file output from the Illumina Bead Express software used with the BovineHD BeadChip and compared each called locus to its counterpart in the variant calling format (VCF) file containing the SNV calling from the whole-genome sequencing.

Results and discussion

Sequencing and variant detection

Approximately 4.3 billion reads were generated with an Illumina HiSeq 2000 sequencer, which for each of seven bulls represented a 10.76 to 16.46-fold coverage of the cattle genome (2,713,722,480 bp of size). A total of 98.78% sequence reads could be mapped to the UMD3.1 bovine genome reference assembly (Table 1). Sequencing at 15-fold coverage of the genome has been reported to enable identification of around 75% of heterozygote variants and an increase in sequence depth significantly improves the accuracy and sensitivity of variant identification [33].
Table 1

Summary of sequencing and assembly results for the seven bulls representing Gyr, Girolando, Guzerat and Holstein cattle breeds.

SampleTotal readsMapped readsCoverage
Gyr_1407,432,42698.98%15.01
Gyr_2420,236,44098.75%15.49
Girolando_1297,636,32499.07%10.97
Girolando_2446,757,53698.90%16.46
Girolando_3345,972,52299.05%12.75
Guzerat_1352,056,80898.97%12.97
Guzerat_2445,595,92298.88%16.42
Holstein_1338,028,98699.02%12.46
Holstein_2326,956,58299.71%12.05
Holstein_3380,512,65299.60%14.02
Holstein_4291,946,02699.71%10.76
Holstein_5304,111,80294.35%11.21
A total of 27.441.279 SNVs (58.09% from Gyr, 48.42% from Girolando, 61.01% from Guzerat, and 32.59% from Holstein) were identified in the samples (Table 2), of which 2,557,670 (37.96% Gyr, 24.51% Girolando, 34.96% Guzerat, and 24.48% Holstein) were novel, while the remainder of the SNVs had been previously uploaded to the dbSNP database. Given the large number of novel variants, these results indicate that a high number of genetic variants remain to be identified in the bovine genome. It is important to note that the lower proportion of novel variants discovered in the Holstein breed was expected, because its genome has already been extensively re-sequenced [10, 12, 17, 34].
Table 2

Summary of SNVs and InDels identified in this study for Gyr, Girolando, Guzerat and Holstein breeds.

BreedSNVsInDels
totalnoveltotalnovel
Gyr15,941,804970,8231,833,387342,899
Girolando13,286,669626,9211,413,047227,176
Guzerat16,743,392894,1771,975,563370,095
Holstein8,944,009626,2691,348,564228,835
In recent years, the improvement of high-throughput sequencing platforms and sequence analysis tools has facilitated the identification and characterization of genetic variants in many cattle breeds, increasing the number of variants available in the dbSNP. Liao et al. [20] re-sequenced 14 Gyr animals and identified around ten million SNPs and six hundred thousand InDels, of which 62.34% and 83.62%, respectively, were novel at that time. In this study, we also identified a high number of variants in Gyr breed, of which 6.09% of SNVs and 18.70% of InDels were novel according to the dbSNP. A total of 5,198,306 (32.61%) SNVs were common across the two Gyr samples, 1,268,464 (9.55%) common across three Girolando samples, 7,010,781 (41.87%) common across two Guzerat samples, and 590,785 (6.60%) common across five Holstein DNA samples. A total of 3,828,041 InDels (47.89% Gyr, 36.91% Girolando, 51.61% Guzerat, and 35.23% Holstein) were identified, of which 883,219 (38.82% Gyr, 25.72% Girolando, 41.90% Guzerat, and 25.91% Holstein) were novel (Table 2). A total of 430,501 (23.48%) InDels were common across Gyr, 85,638 (6.06%) common across Girolando, 585,825 (29.65%) common across Guzerat, and 72,103 (5.35%) common across Holstein samples. The high number of unique genetic variants identified for each animal within the same breed shows the importance of re-sequencing to identify novel variants for monitoring genetic diversity in the cattle breeds and for developing strategies to prevent some eventual loss of genetic variability. The length of InDels ranged from -44 bp (deletion) to +30 (insertion) bp in Holstein breed, -43 bp (deletion) to +29 bp (insertion) in Gyr breed, and -43 bp (deletion) to +29 bp (insertion) for Guzerat and Girolando breeds. Moreover, most of the identified InDels have lengths less than 4 bp (87.72% Gyr, 86.10% Girolando, 87.94% Guzerat and 88.89% Holstein InDels) (Fig 1).
Fig 1

Distribution of InDel lengths (bp) for Gyr, Girolando, Guzerat, and Holstein cattle breeds.

The number of deletions discovered was slightly lower than the number of insertions, demonstrated by insertion/deletion ratios (Table 3). The higher average ratio of heterozygous versus homozygous SNVs and InDels observed in Girolando breed (Table 3) can be explained by its composite breed makeup, since it comprises admixed Holstein (taurine) and Gyr (indicine) genomes.
Table 3

Description of insertion-to-deletion, heterozygous-to-homozygous (Het/Hom) and transition-to-transversion (Ti/Tv) ratios for InDels and SNVs.

SampleInsertions/DeletionsSNVs
Insertion/DeletionInsertion Het/HomDeletion Het/HomInDel Het/HomTotal Het/HomTi/TvHet/Hom
Gyr_10.941.051.381.701.262.331.23
Gyr_20.921.081.421.661.372.371.34
Girolando_10.962.473.584.263.742.373.73
Girolando_20.942.493.513.784.012.304.02
Girolando_30.971.912.563.152.652.282.65
Guzerat_10.951.091.461.841.322.341.29
Guzerat_20.941.191.541.921.392.331.35
Holstein_10.981.341.632.061.682.061.68
Holstein_20.991.371.712.111.762.121.76
Holstein_30.981.371.712.051.792.161.79
Holstein_41.001.491.912.331.892.121.88
Holstein_50.960.850.941.190.951.640.93
The transition-to-transversion (Ti/Tv) ratio was calculated for each sample (Table 3) as an indicator of potential random sequencing errors. The Ti/Tv average ratio was 2.20 for all samples, which indicated a high accuracy for most of the variants identified in this study. Our results are supported by previous studies in bovine, which found Ti/Tv ratios of approximately 2.2 in Hanwoo, Yanbian, Jeju Heugu, Goldwyn, Black Angus, and Holstein breeds [17, 18, 35]. All the SNVs and InDels identified in this study were submitted in variant calling format (vcf) to the dbSNP database under the handle name EMBRAPA-CNPGL-LBGA. The density of variants within each chromosome was proportional to chromosome length, except for the X and Y chromosomes which showed a low number of variants identified (S1 Table). These findings were expected because sex chromosomes are haploid in males, resulting in a sequencing depth about half that of the autosomes and, as consequence, a lower rate of variant identification [15, 16]. In addition, some studies have suggested that selection resulted in lower observed retention of mutant variants on genes located on sex chromosomes because of the effects of exposure to deleterious recessive alleles in hemizygous conditions [15, 36, 37]. A high number of SNVs and InDels were identified as being shared among indicine breeds (Fig 2). A total of 2,825,991 (10.30%) SNVs were common to all four breeds while the SNVs with no overlap with any other breed (i.e. breed-specific) represented 11.42% in Gyr, 7.03% in Girolando, 13.47% in Guzerat, and 8.82% in Holstein breeds. The number of InDels shared among all four breeds was 242,690 (6.34%), while the proportion of breed-specific InDels was 14.41% in Gyr, 7.90% in Girolando, 17.14% in Guzerat, and 15.12% in Holstein (Fig 2).
Fig 2

Venn diagram showing the overlap of all identified SNVs and InDels in the Gyr, Girolando, Guzerat, and Holstein genomes.

Breed-specific variants could be useful for further studies about breed characterization, while overlapping variants may reflect closer relations between breeds. Although the Gyr and Holstein breeds were used to develop the Girolando composite breed (⅝ Holstein and ⅜ Gyr), we identified a closer overlap of variants common to Gyr and Guzerat (11,016,988 SNVs and 1,045,348 InDels), both indicine breeds than those common to Girolando and Gyr (8,236,613 SNVs and 733,598 InDels) or to Girolando and Holstein (5,391,848 SNVs and 564,311 InDels) breeds. Also, the number of genetic variants shared among Girolando, Gyr, and Holstein breeds reflect the current genetic state of this Girolando population, which appears to be approximately ⅝ Gyr and ⅜ Holstein. Apparently, the Girolando breed composition is changing since the breed was formed and the selection pressure has keeping the most variants for Gyr breed in this environment, maybe due to its additional advantages in tropical climates when compared to Holsteins breed, such as adaptability to high-temperature conditions and resistance to parasites. Identified SNVs and InDels (Table 4) were annotated into functional categories using the VEP tool, except those that were mapped on the Y chromosome (S2 Table). Variant annotation indicated that variants may be functionally relevant for each breed. The number of variant annotations was higher than the number of SNVs and InDels identified, because the same genetic variant can have more than one annotation (Table 4).
Table 4

Summary of functional classification of SNVs and InDels identified in Gyr, Girolando, Guzerat and Holstein breeds.

Some SNVs and InDels are represented in multiple categories.

 ClassificationGyrGirolandoGuzeratHolstein
Splice acceptor variant385348376276
Splice donor variant368346370225
Stop gained535481598534
Frameshift variant1,4791,3641,491987
Stop lost47354227
Start lost10710010256
Inframe insertion27723523892
Inframe deletion358306302153
Missense variant53,51948,07354,43326,931
Protein altering variant74726650
Splice region variant15,33712,87015,1726,620
Synonymous variant78,03066,43777,52727,820
Stop retained variant52465126
Coding sequence variant396360366221
Mature miRNA variant12011113281
5_prime_UTR_variant8,6437,5187,9452,245
3_prime_UTR_variant42,26636,08242,23319,609
Non-coding transcript exon variant8,2607,0198,6104,523
Intron variant5,648,3534,701,1815,845,3872,956,134
Non-coding transcript variant8,6287,3688,9794,710
Upstream gene variant893,243765,141892,485458,765
Downstream gene variant899,298766,050910,644457,038
Intergenic variant12,442,69510,192,69113,244,4687,556,938

Summary of functional classification of SNVs and InDels identified in Gyr, Girolando, Guzerat and Holstein breeds.

Some SNVs and InDels are represented in multiple categories. Most variants were identified in intergenic or intronic, while a small number of variants were annotated as being in genic regions including exons, untranslated regions (5’UTR and 3’UTR) and splice sites (Table 4). A significant number of variants were predicted to be deleterious in Gyr (7,654), Girolando (7,022), Guzerat (7,743), and Holstein (4,699) cattle breeds. In regard to variants annotated on the Y chromosome (S2 Table), most variants were mapped to intergenic regions (87.31% Gyr, 88.34% Girolando, 88.42% Guzerat, and 92% Holstein), while a small number of variants were annotated in genic regions, mainly in coding regions (0.71% Gyr, 0.63% Girolando, 0.47% Guzerat, and 0.13% Holstein). This lower representation in coding regions for the Y chromosome compared to the corresponding representation in autosomal variants only reflects the fewer number of genes present on the Y when compared to autosomes. The identification of variants in genes from the Y chromosome is important for breeding programs, since these could be responsible for fertility-related traits [38]. Genetic variants classified by the VEP tool as high (disruptive impact in the protein causing protein truncation, loss of function or triggering nonsense mediated decay) or moderate (non-disruptive variant that could change protein effectiveness) severity consequences in transcripts provide a useful resource to be used in genetic analysis of phenotypic differences observed among these four breeds, mainly because these types of variants can influence economically important traits. For this reason, following variant annotation, functional enrichment analysis was performed using the DAVID tool. The analysis set comprised 11,940 Ensembl IDs from Gyr, 11,268 from Girolando, 11,910 from Guzerat and 7,617 from Holstein. Gene Ontology enrichment analysis revealed that three GO biological processes (GO:0007186~G-protein coupled receptor signaling pathway, GO:0050907~detection of chemical stimulus involved in sensory perception, and GO:0007608~sensory perception of smell), five GO molecular functions (GO:0004984~olfactory receptor activity, GO:0004930~G-protein coupled receptor activity, GO:0004888~transmembrane signaling receptor activity, GO:0005549~odorant binding, and GO:0005524~ATP binding) and two GO cellular components (GO:0016021~integral component of membrane and GO:0005886~plasma membrane) were enriched in all four breeds studied, considering a 10% FDR threshold for significance (Table 5).
Table 5

Gene Ontology (GO) terms and KEGG pathways enriched (FDR<0.10) in Gyr, Girolando, Guzerat and Holstein breeds.

TermsGyrGirolandoGuzeratHolstein
GenesFDRGenesFDRGenesFDRGenesFDR
GO Biological Process
GO:0007186 G-protein coupled receptor signaling pathway6125.35E-256156.38E-376324.24E-345317.81E-63
GO:0007608 Sensory perception of smell1731.24E-091736.66E-131821.47E-151531.01E-21
GO:0050907 Detection of chemical stimulus involved in sensory perception1351.84E-091352.88E-121377.68E-111243.30E-22
GO:0050911 Detection of chemical stimulus involved in sensory perception of smell--450.03039460.05485458.99E-09
GO:0006508 Proteolysis1390.02278------
GO:0007165 Signal transduction2970.02108----2070.01465
GO Cellular Component
GO:0016021 Integral component of membrane28782.32E-1127365.33E-1328745.97E-1119293.42E-15
GO:0005886 Plasma membrane16431.17E-1816182.55E-3116805.21E-2711926.07E-34
GO:0005887 Integral component of plasma membrane5890.00130------
GO:0045095 Keratin filament--480.02374----
GO Molecular Function
GO:0004984 Olfactory receptor activity6612.86E-366813.35E-596947.46E-556131.51E-102
GO:0004930 G-protein coupled receptor activity7645.20E-327759.03E-497961.60E-466723.52E-83
GO:0004888 Transmembrane signaling receptor activity1653.78E-111631.19E-121672.98E-121441.48E-20
GO:0005549 Odorant binding1433.36E-071431.68E-091453.69E-081258.74E-16
GO:0005524 ATP binding8716.08E-058120.058908805.89E-066214.21E-09
GO:0042626 ATPase activity, coupled to transmembrane movement of substances------360.00680
KEGG Pathway
bta04740 Olfactory transduction7050.002177237.39E-127367.06E-096561.51E-50
bta04512 ECM-receptor interaction--680.10062710.04028--
bta02010 ABC transporters------330.00254
bta01100 Metabolic pathways8120.02570------
Four KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were identified as being over represented by DAVID. The olfactory transduction pathway (bta04740) was over represented in all four cattle breeds, the ECM-receptor interaction pathway (bta04512) was over represented in Girolando and Guzerat breeds, while the ABC transporters (bta02010) and the metabolic pathways (bta01100) were over represented only in Holstein and Gyr, respectively (Table 5). Olfactory transduction pathways act in the perception of odor through olfactory receptors and biochemical signaling events, which influences food preference and food consumption [39]. Zhan et al. [15] identified enrichment bias for this pathway when re-sequencing a Holstein Friesian bull and performing an enrichment analysis for genes with nonsynonymous SNVs. Do et al. [40] reported that the olfactory transduction pathway is associated with residual feed intake in pigs. Although the novel variants identified in this study were spread throughout the genome, we identified a high number of novel variants clustered on genes from BTA12 and BTA15 in all four cattle breeds, which could be explained by the presence of gene families that creates problems in the mapping, resulting in erroneous identification of genetic variation within these related genes [12]. The olfactory receptor genes comprise the largest multigene family in vertebrates [41], with more than one thousand coding genes organized in clusters on 26 cattle chromosomes, of which BTA15 has the largest number of mapped olfactory receptor functional genes (n = 251) [42]. The genes from BTA12 enriched with novel variants comprised multidrug resistance-associated protein 4 genes (LOC100299180, LOC509854, LOC515333, LOC522174, LOC523126, and LOC53043), while BTA15 showed the olfactory receptor genes (LOC100299084, LOC100336901, LOC104968619, LOC508180, LOC509025, LOC521081, LOC615051, LOC787385, LOC787428, LOC789288, LOC789293, LOC789300, LOC790274, LOC100299808, OR4C15, OR5F1, and OR8K1) contained a large number of novel variants. The extracellular matrix (ECM) consists of a complex of structural and functional macromolecules with an important role in tissue and organ morphogenesis and in the maintenance of cell and tissue structure and function. It can directly or indirectly influence specific cellular activities such as differentiation, adhesion, proliferation, and migration. Lee et al. [43] identified the ECM-receptor interaction pathway enriched in expression analyses from omental, subcutaneous and intramuscular adipose tissues in cattle, acting also as a sub-pathway of other enriched pathways. Gao et al. [44] reported that the ECM-receptor interaction pathway is transcriptionally regulated throughout the onset of lactation in Holstein cows. The ATP-binding cassette (ABC) transporters is one of the largest protein families in prokaryotes and eukaryotes. They couple ATP hydrolysis to active transport of a wide variety of substrates such as sugars, peptides, proteins, lipids, sterols, ions, and drugs. Prokaryotic antibiotic resistance and eukaryotic multi-drug resistance is associated with ABC transporters, which acts as a modulator of drug absorption and distribution [45, 46]. Over-represented KEGG pathways and GO terms and results should be carefully interpreted because of the small sample size used in this study. However, these results provide important genomic information to investigate the genetic mechanisms underlying phenotypic differences and similarities among these breeds.

Validation of SNVs using Bovine HD BeadChip array

The quality of SNVs identified in this study was evaluated by comparison of the concordance between the genotypes obtained from Illumina BovineHD BeadChip array for the Gyr_1, Gyr_2, Girolando_1, Girolando_2, Girolando_3, and Guzerat_2 samples. Around 98.46–99.45% of detectable array SNPs were identified as SNV by sequencing, which indicated that most of the SNVs were correctly called in this study. Table 6 shows the percentage of SNVs from the Bovine HD BeadChip array that are also present in the SNV calling and the concordance between array and whole-genome sequence genotypes.
Table 6

Comparison of BovineHD BeadChip array genotypes to sequencing SNVs.

SampleSequencing calls*ConcordantDiscordant
Gyr_199.42%267,2581,558
Gyr_299.45%196,5371,091
Girolando_198.46%145,3652,274
Girolando_298.92%153,1571,670
Girolando_398.66%185,6382,527
Guzerat_299.39%247,9311,526

*Percentage of detectable array SNPs that were identified as SNV by sequencing

*Percentage of detectable array SNPs that were identified as SNV by sequencing

Conclusion

In this study, we presented an extensive genome analysis of Gyr, Girolando, Guzerat and Holstein breeds following whole-genome re-sequencing using an Illumina HiSeq 2000 sequencing platform, which lead to identification of 27,441,279 SNVs and 3,828,041InDels in the cattle genome. The high number of genetic variants identified for each breed is important to monitoring genetic diversity and for developing strategies to prevent an eventual loss of genetic variability. The submission of the approximately 3.4 million novel variants to the dbSNP has significantly increased the number of variants available, particularly for the Bos primigenius indicus genome. The annotation of SNVs InDels predicted to affect function of transcripts reveal variants that could contribute to phenotypic differences among the Gyr, Girolando, Guzerat and Holstein breeds. Also, we identified changes in the Girolando breed genetic composition since it was formed, which suggests that the selection pressure has preferred to retain the variants for Gyr breed in the tropical climates, maybe due to their particular advantages in this environment when compared to Holstein, such as rusticity, thermotolerance, and resistance to endoparasites and ectoparasites. Therefore, this study provides the basis for further investigations about potential genomic markers and molecular mechanisms associated with economically important traits for breeding programs of these cattle breeds.

Summary of variants identified in each breed grouped by chromosome.

(DOCX) Click here for additional data file.

Summary of SNVs and InDels mapped on the bovine Y chromosome.

(DOCX) Click here for additional data file.
  38 in total

1.  Estimate of the mutation rate per nucleotide in humans.

Authors:  M W Nachman; S L Crowell
Journal:  Genetics       Date:  2000-09       Impact factor: 4.562

Review 2.  Drug delivery systems in domestic animal species.

Authors:  David J Brayden; Emilie J M Oudot; Alan W Baird
Journal:  Handb Exp Pharmacol       Date:  2010

3.  The Y-chromosome point mutation rate in humans.

Authors:  Agnar Helgason; Axel W Einarsson; Valdís B Guðmundsdóttir; Ásgeir Sigurðsson; Ellen D Gunnarsdóttir; Anuradha Jagadeesan; S Sunna Ebenesersdóttir; Augustine Kong; Kári Stefánsson
Journal:  Nat Genet       Date:  2015-03-25       Impact factor: 38.330

4.  Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle.

Authors:  Hans D Daetwyler; Aurélien Capitan; Hubert Pausch; Paul Stothard; Rianne van Binsbergen; Rasmus F Brøndum; Xiaoping Liao; Anis Djari; Sabrina C Rodriguez; Cécile Grohs; Diane Esquerré; Olivier Bouchez; Marie-Noëlle Rossignol; Christophe Klopp; Dominique Rocha; Sébastien Fritz; André Eggen; Phil J Bowman; David Coote; Amanda J Chamberlain; Charlotte Anderson; Curt P VanTassell; Ina Hulsegge; Mike E Goddard; Bernt Guldbrandtsen; Mogens S Lund; Roel F Veerkamp; Didier A Boichard; Ruedi Fries; Ben J Hayes
Journal:  Nat Genet       Date:  2014-07-13       Impact factor: 38.330

5.  Comparative Transcriptome Analysis of Adipose Tissues Reveals that ECM-Receptor Interaction Is Involved in the Depot-Specific Adipogenesis in Cattle.

Authors:  Hyun-Jeong Lee; Mi Jang; Hyeongmin Kim; Woori Kwak; Woncheoul Park; Jae Yeon Hwang; Chang-Kyu Lee; Gul Won Jang; Mi Na Park; Hyeong-Cheol Kim; Jin Young Jeong; Kang Seok Seo; Heebal Kim; Seoae Cho; Bo-Young Lee
Journal:  PLoS One       Date:  2013-06-21       Impact factor: 3.240

6.  Whole-genome resequencing shows numerous genes with nonsynonymous SNPs in the Japanese native cattle Kuchinoshima-Ushi.

Authors:  Ryouka Kawahara-Miki; Kaoru Tsuda; Yuh Shiwa; Yuko Arai-Kichise; Takashi Matsumoto; Yu Kanesaki; Sen-ichi Oda; Shizufumi Ebihara; Shunsuke Yajima; Hirofumi Yoshikawa; Tomohiro Kono
Journal:  BMC Genomics       Date:  2011-02-10       Impact factor: 3.969

7.  Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery.

Authors:  Paul Stothard; Jung-Woo Choi; Urmila Basu; Jennifer M Sumner-Thomson; Yan Meng; Xiaoping Liao; Stephen S Moore
Journal:  BMC Genomics       Date:  2011-11-15       Impact factor: 3.969

8.  Development and characterization of a high density SNP genotyping assay for cattle.

Authors:  Lakshmi K Matukumalli; Cynthia T Lawley; Robert D Schnabel; Jeremy F Taylor; Mark F Allan; Michael P Heaton; Jeff O'Connell; Stephen S Moore; Timothy P L Smith; Tad S Sonstegard; Curtis P Van Tassell
Journal:  PLoS One       Date:  2009-04-24       Impact factor: 3.240

9.  Whole-genome analyses of Korean native and Holstein cattle breeds by massively parallel sequencing.

Authors:  Jung-Woo Choi; Xiaoping Liao; Paul Stothard; Won-Hyong Chung; Heoyn-Jeong Jeon; Stephen P Miller; So-Young Choi; Jeong-Koo Lee; Bokyoung Yang; Kyung-Tai Lee; Kwang-Jin Han; Hyeong-Cheol Kim; Dongkee Jeong; Jae-Don Oh; Namshin Kim; Tae-Hun Kim; Hak-Kyo Lee; Sung-Jin Lee
Journal:  PLoS One       Date:  2014-07-03       Impact factor: 3.240

10.  Deep sequencing of Danish Holstein dairy cattle for variant detection and insight into potential loss-of-function variants in protein coding genes.

Authors:  Ashutosh Das; Frank Panitz; Vivi Raundahl Gregersen; Christian Bendixen; Lars-Erik Holm
Journal:  BMC Genomics       Date:  2015-12-09       Impact factor: 3.969

View more
  11 in total

1.  Deciphering signature of selection affecting beef quality traits in Angus cattle.

Authors:  Mengistie Taye; Joon Yoon; Tadelle Dessie; Seoae Cho; Sung Jong Oh; Hak-Kyo Lee; Heebal Kim
Journal:  Genes Genomics       Date:  2017-09-30       Impact factor: 1.839

2.  Genetic mechanisms underlying spermatic and testicular traits within and among cattle breeds: systematic review and prioritization of GWAS results.

Authors:  Pablo Augusto de Souza Fonseca; Fernanda Caroline Dos Santos; Stephanie Lam; Aroa Suárez-Vega; Filippo Miglior; Flavio S Schenkel; Luiza de Almeida Ferreira Diniz; Samir Id-Lahoucine; Maria Raquel Santos Carvalho; Angela Cánovas
Journal:  J Anim Sci       Date:  2018-12-03       Impact factor: 3.159

3.  Whole-genome sequencing reveals mutational landscape underlying phenotypic differences between two widespread Chinese cattle breeds.

Authors:  Yao Xu; Yu Jiang; Tao Shi; Hanfang Cai; Xianyong Lan; Xin Zhao; Martin Plath; Hong Chen
Journal:  PLoS One       Date:  2017-08-25       Impact factor: 3.240

4.  Whole genome characterization of autochthonous Bos taurus brachyceros and introduced Bos indicus indicus cattle breeds in Cameroon regarding their adaptive phenotypic traits and pathogen resistance.

Authors:  Archile Paguem; Babette Abanda; Mbunkah Daniel Achukwi; Praveen Baskaran; Stefan Czemmel; Alfons Renz; Albert Eisenbarth
Journal:  BMC Genet       Date:  2020-06-22       Impact factor: 2.797

5.  Genomic variants identified from whole-genome resequencing of indicine cattle breeds from Pakistan.

Authors:  Naveed Iqbal; Xin Liu; Ting Yang; Ziheng Huang; Quratulain Hanif; Muhammad Asif; Qaiser Mahmood Khan; Shahid Mansoor
Journal:  PLoS One       Date:  2019-04-11       Impact factor: 3.240

6.  Post-Translational Protein Deimination Signatures in Serum and Serum-Extracellular Vesicles of Bos taurus Reveal Immune, Anti-Pathogenic, Anti-Viral, Metabolic and Cancer-Related Pathways for Deimination.

Authors:  Michael F Criscitiello; Igor Kraev; Sigrun Lange
Journal:  Int J Mol Sci       Date:  2020-04-19       Impact factor: 5.923

7.  Marker discovery and associations with β-carotene content in Indian dairy cattle and buffalo breeds.

Authors:  F Bertolini; J Chinchilla-Vargas; J R Khadse; A Juneja; P D Deshpande; K Bhave; V Potdar; P M Kakramkar; A R Karlekar; A B Pande; Rohan L Fernando; M F Rothschild
Journal:  J Dairy Sci       Date:  2019-08-30       Impact factor: 4.034

Review 8.  Genomic Analysis, Progress and Future Perspectives in Dairy Cattle Selection: A Review.

Authors:  Miguel A Gutierrez-Reinoso; Pedro M Aponte; Manuel Garcia-Herreros
Journal:  Animals (Basel)       Date:  2021-02-25       Impact factor: 3.231

9.  Assessing genetic architecture and signatures of selection of dual purpose Gir cattle populations using genomic information.

Authors:  Amanda Marchi Maiorano; Daniela Lino Lourenco; Shogo Tsuruta; Alejandra Maria Toro Ospina; Nedenia Bonvino Stafuzza; Yutaka Masuda; Anibal Eugenio Vercesi Filho; Joslaine Noely Dos Santos Goncalves Cyrillo; Rogério Abdallah Curi; Josineudson Augusto Ii de Vasconcelos Silva
Journal:  PLoS One       Date:  2018-08-02       Impact factor: 3.240

10.  Whole-Genome Sequencing of Three Native Cattle Breeds Originating From the Northernmost Cattle Farming Regions.

Authors:  Melak Weldenegodguad; Ruslan Popov; Kisun Pokharel; Innokentyi Ammosov; Yao Ming; Zoya Ivanova; Juha Kantanen
Journal:  Front Genet       Date:  2019-01-11       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.