Literature DB >> 29856873

Identification and annotation of breed-specific single nucleotide polymorphisms in Bos taurus genomes.

Bartosz Czech1, Magdalena Frąszczak1, Magda Mielczarek1,2, Joanna Szyda1,2.   

Abstract

In Bos taurus the universality of the reference genome is biased towards genetic variation represented by only two related individuals representing the same Hereford breed. Therefore, results of genetic analyses based on this reference may not be reliable. The 1000 Bull Genomes resource allows for identification of breed-specific polymorphisms and for the construction of breed-specific reference genomes. Whole-genome sequences or 936 bulls allowed us to construct seven breed specific reference genomes of Bos taurus for Angus, Brown Swiss, Fleckvieh, Hereford, Jersey, Limousin and Simmental. In order to identify breed-specific variants all detected SNPs were filtered within-breed to satisfy criteria of the number of missing genotypes not higher than 7% and the alternative allele frequency equal to unity. The highest number of breed-specific SNPs was identified for Jersey (130,070) and the lowest-for the Simmental breed (197). Such breed-specific polymorphisms were annotated to coding regions overlapping with 78 genes in Angus, 140 in Brown Swiss, 132 in Fleckvieh, 100 in Hereford, 643 in Jersey, 10 in Limousin and no genes in Simmental. For most of the breeds, the majority of breed-specific variants from coding regions was synonymous. However, most of Fleckvieh-specific and Hereford-specific polymorphisms were missense mutations. Since the identified variants are characteristic for the analysed breeds, they form the basis of phenotypic differences observed between them, which result from different breeding programmes. Breed-specific reference genomes can enhance the accuracy of SNP driven inferences such as Genome-wide Association Studies or SNP genotype imputation.

Entities:  

Mesh:

Year:  2018        PMID: 29856873      PMCID: PMC5983434          DOI: 10.1371/journal.pone.0198419

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

A reference genome is a DNA database, assembled as a representative of species' nucleotide sequence. Typically, it is based on genomic sequences from many individuals and, as a result, it is universal for the species. Nevertheless, we must be aware of the fact that it represents genetic diversity averaged over the polymorphisms represented in DNA of donor(s) genomes. It should also be borne in mind that most species are represented by distinctive breeds with specific phenotypic and thus also genotypic characteristics. By relating individual DNA sequence variation to the one universal template, we encounter a danger to miss population-specific variation, since all polymorphism detections are “biased” towards the sequence structure represented by the reference genome. That is why breed-specific reference genomes are essential for providing more accurate genomic inferences, which account for DNA sequence variation among particular breeds [1, 2]. Such diversity occurs due to breed formation history, differences in selection pressure as well as migration barriers. In Bos taurus the universality of the UMD3.1 reference genome is highly biased towards genetic variation represented by only two related individuals representing the same Hereford breed–a bull “L1 Domino 99375” and his daughter “L1 Dominette 01449” [3]. Therefore, results of genetic analysis of cattle breeds based on the UMD3.1 reference may not be reliable [4]. Decreasing costs of obtaining individual whole genome sequences (WGS) enabled generation of large data sets consisting of DNA WGS of many individuals representing various breeds or populations. Such an initiative has been inaugurated as the 1000 Genomes Project for humans, which now harbours WGS for 3,900 individuals from 30 populations [5]. In 2010, the 1000 Genome initiative has also been developed for cattle [6, 7], for which the current database (run 6 from 2017) comprises WGS of 2,333 individuals from over 60 breeds. This resource allows for identification of breed-specific polymorphisms and furthermore for construction of breed-specific reference genomes. This procedure was a major goal of our study and was conducted for Angus, Brown Swiss, Fleckvieh, Hereford, Jersey, Limousin and Simmental breeds represented by a large number of individual whole genome DNA sequences. However, since the analysed breeds differ in many aspects such as meatiness, milkiness, fertility and so on, genomic differences between breeds were translated into the functional genomic features by functional annotation of breed-specific SNPs.

Materials and methods

Single nucleotide polymorphisms

The material comprised whole genome DNA sequences of 936 bulls representing Angus (287 bulls), Brown Swiss (148), Fleckvieh (53), Hereford (75), Jersey (66), Limousin (82) and Simmental (225) breeds available within the frame of the 1000 Bull Genome project [6]. The seven breeds were selected out of the 72 breeds with WGS available, based on two criteria comprising the relatively large number of sequenced bulls and diverse production types. Note, that despite the largest number of available WGSs, the Holstein-Friesian breed was intentionally excluded from the analysis. Since Holstein and Fleckvieh share a common population history prior and during the domestication process [8], the Holstein-Friesian breed, which represents the most highly selected dairy breed worldwide, was excluded from the comparisons, as we believe selection has had a strong influence on the available genetic variation. Whereas for all the other considered breeds selection seems to be more moderate. The distribution of bulls across breeds was not uniform with nearly one-third (30.66%) of sequenced animals, representing the Angus breed, followed by the Simmental breed (24.04%) and Brown Swiss breed (15.81%). Other breeds were represented by a much lower fraction of bulls of less than 10% each (Fig 1).
Fig 1

The percentages of individuals representing each breed.

Table 1 presents descriptive statistics of genome average coverage of the data. The highest mean of genome average coverage was available for the Angus breed (16.71 ±12.18) and the lowest mean was observed for the Fleckvieh breed (7.61 ± 3.24). The bioinformatic pipeline underlying variant detection implemented by the 1000 Bull Genomes Consortium was as follows. The bioinformatic pipeline underlying variant detection implemented by the 1000 Bull Genomes Consortium was as follows. The alignment to this reference genome was performed using the BWA-MEM software [9]. The resulting alignment files (BAM) were further processed by realignment around indels to minimize mismatches across all reads. For this purpose GATK tools RealignerTargerCreator and IndelRealigner were applied [10]. The detection of SNPs was carried out with SAMtools [11] simultaneously for all bulls representing a given breed. Filters were then implemented in the VCF file using PyVCF.
Table 1

Descriptive statistics of distribution of genome average coverage.

BreedMinimumMaximumMean (SD)
Angus2.5772.4616.71 (±12.18)
Brown Swiss5.4635.7111.78 (±3.9)
Fleckvieh4.3525.357.61 (± 3.24)
Hereford4.3564.5414.67 (± 8.38)
Jersey2.6528.3711.07 (± 4.27)
Limousin3.8537.4110.80 (± 6.56)
Simmental2.7239.3011.70 (± 5,77)

Defining breed-specific SNPs

In our analysis, aiming to identify breed-specific SNPs, the first step was to remove variants with two or more alternative alleles. The first filtering criterion was selection of a minimum number of alternative alleles observed on a forward and a reverse strand reads (threshold used was 1—removes variants never observed on forward and reverse reads). Then variants with overall quality below 20 and mapping quality below 30 were removed. The next criterion for filtering was selecting the minimum and maximum of read depth (respectively min = 10 and maximum = median of read depth + 3 standard deviation of read depth). Then two variants assigned the same base pair position were removed. Subsequently InDels, which revealed another InDel closer than 10 base pairs, SNPs, which revealed another SNP closer than 3 base pairs and SNPs closer than 5 base pairs to InDels were removed.

Genomic and functional annotation

In order to identify breed-specific variants all SNPs were filtered within-breed to satisfy criteria of the number of missing genotypes not higher than 7% and the alternative allele frequency equal to unity using he VCF Tools software [12]. Those filtered polymorphisms that were characteristic to only one breed (and absent in the other breeds) were defined as breed-specific. Those SNPs were genomically annotated using the Variant Effect Predictor [13] and functionally annotated to Gene Ontology and KEGG pathways using the KOBAS software [14]. The last step was to prepare FASTA files representing breed-specific reference genomes obtained by replacing the selected nucleotides in the UMD3.1 template.

Results and discussion

Quantitative characteristic of SNPs

The total number of identified SNPs and the number of SNPs remaining after the filtration comprising the number of missing genotypes not higher than 7% and the alternative allele frequency equal to unity was summarized in Table 2. The highest total number of SNPs was observed in the Simmental breed (61,824,209) and the lowest number in Fleckvieh (61,780,912), with the highest percentage of SNPs after the filtration remaining for Jersey (0.494%) and the lowest percentage for Simmental (0.098%).
Table 2

The total number of SNPs and the number of SNPs after the filtration.

BreedTotal number of SNPsNumber of SNPs after the filtration% of SNPs after the filtration
Angus61,823,312114,7440.186
Brown Swiss61,814,873187,5760.303
Fleckvieh61,780,912189,2380.306
Hereford61,813,02873,7720.119
Jersey61,797,641305,4770.494
Limousin61,820,28761,6230.100
Simmental61,824,20960,5820.098
The post-filtration variants were visualised using the Venn diagram (Fig 2) showing the numbers of SNPs common between all possible breed combinations. 23,113 of filtered SNPs were common to each breed, which indicated a polymorphism characteristic to Domino lineage, but monomorphic in the other breeds. Surprisingly, as many as 3,790 SNPs were alternative homozygous in Hereford, again indicating that the polymorphisms are rather Domino-Dominette specific then a characteristic of Bos taurs as a species. Still a considerable number of variants were specific for only one breed. The highest number of breed-specific SNPs was identified for Jersey (130,070) and the lowest—for the Simmental breed (197). Furthermore, considering pairwise breed comparisons, the highest numbers of SNPs specific only for a combination of two breeds was identified for Jersey-Fleckvieh (19,060) and for Jersey-Brown Swiss (18,006) combinations, the latter may have been expected provided phenotypic similarities between both breeds as well as a similar milk production performance. The lowest numbers of breed-combination specific SNPs were always attributable to constellations involving the Simmental breed, i.e. Simmental-Hereford (16 SNPs), Simmental-Angus (24 SNPs), Simmental-Brown Swiss (171 SNPs) and Simmental-Jersey (231 SNPs). This remains in agreement with results presented by [15] who showed the largest phylogenetic distance between Simmental and the other breeds considered in our study. The only exception was a relatively high number of SNPs characteristic to the Simmental-Fleckvieh combination–which reflects a common origin of both breeds. They are even regarded as the same breed, with only traditional differentiation into Fleckvieh in Germany and Simmental in other countries (see e.g. a common breeding value evaluation for Fleckvieh and Simmental conducted by the Interbull Center www.interbull.org).
Fig 2

A Venn diagram of breed-specific SNPs.

Each number represents the count of breed-specific SNPs, which are common between sets of breeds, defined by colours. ANG–Angus, BSW–Brown Swiss, FLV–Fleckvieh, HER–Hereford, JER–Jersey, LIM–Limousin, SIM–Simmental.

A Venn diagram of breed-specific SNPs.

Each number represents the count of breed-specific SNPs, which are common between sets of breeds, defined by colours. ANG–Angus, BSW–Brown Swiss, FLV–Fleckvieh, HER–Hereford, JER–Jersey, LIM–Limousin, SIM–Simmental.

Genomic and functional annotation of breed-specific SNPs

SNPs identified as breed-specific were genomically annotated to the UMD3.1 reference. Polymorphisms located in coding regions overlapped with 78 genes in Angus, 140 in Brown Swiss, 132 in Fleckvieh, 100 in Hereford, 643 in Jersey and 10 in Limousin. Note, that due to a low number of only 197 Simmental-specific SNPs, no polymorphisms located in coding regions were identified for this breed. Since such polymorphisms, potentially contribute to differences between breeds, we considered their functional consequences expressed by Sequence Ontology terms (Fig 3).
Fig 3

Functional annotation of breed-specific SNPs to coding regions.

ANG–Angus, BSW–Brown Swiss, FLV–Fleckvieh, HER–Hereford, JER–Jersey, LIM–Limousin.

Functional annotation of breed-specific SNPs to coding regions.

ANG–Angus, BSW–Brown Swiss, FLV–Fleckvieh, HER–Hereford, JER–Jersey, LIM–Limousin. As it might have been expected, in most of the breeds over the half of breed-specific variants located within coding regions was synonymous. However, the sets of Fleckvieh-specific and Hereford-specific polymorphisms were exceptions, since for those two breeds there were missense mutations, which composed the most of SNPs (FLV– 55%, i.e. 100 SNPs and HER– 60%, i.e. 141 SNPs). Also for the remaining breeds, the number of missense point mutations was considerable ranging from 41% (485 SNPs) in Jersey to 36% (5 SNPs) in Limousin. Besides missense mutations, albeit less numerous, breed-specific SNPs with other high impact consequences were identified. These comprised SNPs located in splice regions, exonic SNPs resulting in a premature termination or prolongation of a translation process, as well as SNPs which alter the translation start codons. Further on, based on KEGG pathways, we checked whether whole metabolic processes representing interplay of several genes and their products, were specifically altered in the analysed breeds. Our results showed that genes harbouring breed-specific mutations in coding regions do not significantly represent any particular pathway. Phenotypic annotation of breed-specific SNPs to the OMIA data base (omia.org) has been done using the BioMart software [16]. Genes harbouring breed-specific SNPs and related to OMIA phenotypes have been identified for Angus, Brown Swiss, Jersey and Limousin. Two of the genes represented the same gene family (solute carrier), which encodes membrane transport proteins: SLC39A4 on BTA14 related to Acrodermatitis enteropathica, contained an Angus-specific SNP and SLC4A2 on BTA4 related to Osteopetrosis, contained a Brown Swiss-specific SNP. Interestingly, the link between a mutation causing Acrodermatitis enteropathica and SLC39A4 was reported for the Angus breed [17]. Osteopetrosis in calves caused by a 2,750 long deletion in SLC4A2 and was reported for multiple breeds, but not Brown Swiss. For Brown Swiss we also identified a breed-specific SNP within the LAMA3 gene on BTA24. A mutation in this gene was reported to cause an Epidermolysis bullosa, junctionalis disease in Belgian Blue cattle [18]. Jersey-specific mutations were located within four genes harbouring disease mutations. The GON4L gene on BTA3 with a single bp deletion causing dwarfism in Fleckvieh [19], contained three Jersey-specific SNPs, the APOB gene on BTA11 contains a deletion, which causes cholesterol deficiency in German Holstein breed [20], the TG gene on BTA14 is responsible for familial goitre [21], and the COL7A1 gene on BTA22 with a mutation resulting in the an epidermolysis bullosa disorder [22, 23]. In addition, three Limousin-specific SNPs were located within the PFAS gene on BTA19, which harbours a mutation responsible for an abortion (due to haplotype MH1) identified by [24] in Montbéliarde cattle. Since the analysed data set represents phenotypically healthy individuals–none of the abovementioned disease coding mutations is identical with breed-specific SNPs. However, the three Jersey-specific SNPs within the GON4L gene were located in the same exon as the disease causing SNP. Next, it was checked which genes harbouring breed-specific SNPs overlapped with QTL found in the QTLdb (www.animalgenome.org/cgi-bin/QTLdb/BT/index). Genes with mutations characteristic for the Angus breed overlapped with QTL related to iron content in mussels, which is supported by the earlier study [25] demonstrating a different (lower) iron content of iron in Angus and Charolais than in Simmental and Limousin. For the Brown Swiss breed we linked genes with breed-specific SNPs to many QTL related to milk characteristics: milk solids percentage, milk overall proteins percentage, milk alpha-casein percentage, milk alpha-lactalbumin percentage, milk kappa-casein percentage, milk beta-casein percentage, milk lactose content and yield; udder health expressed by somatic cell score; growth traits: body weight, and longissimus muscle area; and gastrointestinal nematode burden. This collection of traits controlled by genes with Brown Swiss-specific variation reflects a production character of the breed with the balance between beef and dairy character with milk composition suitable for chees production. For the remaining breeds, has not been identified QTL associated genes.

Conclusions

In cattle phenotypic differences between breeds are much more pronounced than in humans. They are enhanced by strong artificial selection for divergent production goals, like dairy or beef. Such phenotypic differences are driven by underlying changes in the genome structure, which emphasises an importance of breed-specific genomic inferences, for which breed-specific reference genomes are the prerequisite. Breed-specific reference genomes can enhance the accuracy of SNP based inferences such as Genome-wide Association Studies or SNP genotype imputation. In the future, when specific reference genomes will be available for many breeds, they can form a basis for describing interbreed volatility and dynamics of changes in the genome of Bos taurus caused by breeding programmes [26].
  24 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  Sequencing and de novo assembly of 150 genomes from Denmark as a population reference.

Authors:  Lasse Maretty; Jacob Malte Jensen; Bent Petersen; Jonas Andreas Sibbesen; Siyang Liu; Palle Villesen; Laurits Skov; Kirstine Belling; Christian Theil Have; Jose M G Izarzugaza; Marie Grosjean; Jette Bork-Jensen; Jakob Grove; Thomas D Als; Shujia Huang; Yuqi Chang; Ruiqi Xu; Weijian Ye; Junhua Rao; Xiaosen Guo; Jihua Sun; Hongzhi Cao; Chen Ye; Johan van Beusekom; Thomas Espeseth; Esben Flindt; Rune M Friborg; Anders E Halager; Stephanie Le Hellard; Christina M Hultman; Francesco Lescai; Shengting Li; Ole Lund; Peter Løngren; Thomas Mailund; Maria Luisa Matey-Hernandez; Ole Mors; Christian N S Pedersen; Thomas Sicheritz-Pontén; Patrick Sullivan; Ali Syed; David Westergaard; Rachita Yadav; Ning Li; Xun Xu; Torben Hansen; Anders Krogh; Lars Bolund; Thorkild I A Sørensen; Oluf Pedersen; Ramneek Gupta; Simon Rasmussen; Søren Besenbacher; Anders D Børglum; Jun Wang; Hans Eiberg; Karsten Kristiansen; Søren Brunak; Mikkel Heide Schierup
Journal:  Nature       Date:  2017-07-26       Impact factor: 49.962

3.  A nonsense mutation causes hereditary goitre in the Afrikander cattle and unmasks alternative splicing of thyroglobulin transcripts.

Authors:  M H Ricketts; M J Simons; J Parma; L Mercken; Q Dong; G Vassart
Journal:  Proc Natl Acad Sci U S A       Date:  1987-05       Impact factor: 11.205

4.  Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle.

Authors:  Hans D Daetwyler; Aurélien Capitan; Hubert Pausch; Paul Stothard; Rianne van Binsbergen; Rasmus F Brøndum; Xiaoping Liao; Anis Djari; Sabrina C Rodriguez; Cécile Grohs; Diane Esquerré; Olivier Bouchez; Marie-Noëlle Rossignol; Christophe Klopp; Dominique Rocha; Sébastien Fritz; André Eggen; Phil J Bowman; David Coote; Amanda J Chamberlain; Charlotte Anderson; Curt P VanTassell; Ina Hulsegge; Mike E Goddard; Bernt Guldbrandtsen; Mogens S Lund; Roel F Veerkamp; Didier A Boichard; Ruedi Fries; Ben J Hayes
Journal:  Nat Genet       Date:  2014-07-13       Impact factor: 38.330

5.  A COL7A1 mutation causes dystrophic epidermolysis bullosa in Rotes Höhenvieh cattle.

Authors:  Annie Menoud; Monika Welle; Jens Tetens; Peter Lichtner; Cord Drögemüller
Journal:  PLoS One       Date:  2012-06-08       Impact factor: 3.240

6.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

7.  A whole-genome assembly of the domestic cow, Bos taurus.

Authors:  Aleksey V Zimin; Arthur L Delcher; Liliana Florea; David R Kelley; Michael C Schatz; Daniela Puiu; Finnian Hanrahan; Geo Pertea; Curtis P Van Tassell; Tad S Sonstegard; Guillaume Marçais; Michael Roberts; Poorani Subramanian; James A Yorke; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-04-24       Impact factor: 13.583

8.  A transposable element insertion in APOB causes cholesterol deficiency in Holstein cattle.

Authors:  F Menzi; N Besuchet-Schmutz; M Fragnière; S Hofstetter; V Jagannathan; T Mock; A Raemy; E Studer; K Mehinagic; N Regenscheit; M Meylan; F Schmitz-Hsu; C Drögemüller
Journal:  Anim Genet       Date:  2016-01-13       Impact factor: 3.169

9.  The Ensembl Variant Effect Predictor.

Authors:  William McLaren; Laurent Gil; Sarah E Hunt; Harpreet Singh Riat; Graham R S Ritchie; Anja Thormann; Paul Flicek; Fiona Cunningham
Journal:  Genome Biol       Date:  2016-06-06       Impact factor: 13.583

10.  Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

Authors:  Simon Boitard; Willy Rodríguez; Flora Jay; Stefano Mona; Frédéric Austerlitz
Journal:  PLoS Genet       Date:  2016-03-04       Impact factor: 5.917

View more
  2 in total

1.  Reduced representation approach for identification of genome-wide SNPs and their annotation for economically important traits in Indian Tharparkar cattle.

Authors:  M Joel Devadasan; D Ravi Kumar; M R Vineeth; Anjali Choudhary; T Surya; S K Niranjan; Archana Verma; Jayakumar Sivalingam
Journal:  3 Biotech       Date:  2020-06-16       Impact factor: 2.406

2.  Analyzing the genomic and transcriptomic architecture of milk traits in Murciano-Granadina goats.

Authors:  Dailu Guan; Vincenzo Landi; María Gracia Luigi-Sierra; Juan Vicente Delgado; Xavier Such; Anna Castelló; Betlem Cabrera; Emilio Mármol-Sánchez; Javier Fernández-Alvarez; José Luis Ruiz de la Torre Casañas; Amparo Martínez; Jordi Jordana; Marcel Amills
Journal:  J Anim Sci Biotechnol       Date:  2020-03-11
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.