| Literature DB >> 22085807 |
Paul Stothard1, Jung-Woo Choi, Urmila Basu, Jennifer M Sumner-Thomson, Yan Meng, Xiaoping Liao, Stephen S Moore.
Abstract
BACKGROUND: One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle.Entities:
Mesh:
Year: 2011 PMID: 22085807 PMCID: PMC3229636 DOI: 10.1186/1471-2164-12-559
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequencing libraries and sequencing runs
| Library | Run name | Read length | F3 reads | R3 reads |
|---|---|---|---|---|
| Black Angus FR 50 | solid_BA_FR1 | 50 nt | 172,104,943 | 0 |
| Black Angus FR 50 | solid_BA_FR2 | 50 nt | 225,062,253 | 0 |
| Black Angus MP 25 | solid_BA_MP1 | 25 nt | 73,568,858 | 73,917,320 |
| Black Angus MP 25 | solid_BA_MP2 | 25 nt | 76,123,158 | 76,573,521 |
| Black Angus MP 25 | solid_BA_MP3 | 25 nt | 69,657,261 | 69,108,403 |
| Black Angus MP 25 | solid_BA_MP4 | 25 nt | 356,566,421 | 356,848,504 |
| Black Angus MP 50 | solid_BA_MP5 | 50 nt | 220,164,958 | 220,910,123 |
| Black Angus MP 50 | solid_BA_MP6 | 50 nt | 78,350,806 | 80,494,794 |
| Black Angus MP 50 | solid_BA_MP7 | 50 nt | 343,878,503 | 346,998,757 |
| Holstein MP 50 | solid_HOL_MP1 | 50 nt | 140,503,099 | 142,438,241 |
| Holstein MP 50 | solid_HOL_MP2 | 50 nt | 190,712,998 | 191,191,811 |
| Holstein FR 50 | solid_HOL_FR1 | 50 nt | 172,127,298 | 0 |
| Holstein FR 50 | solid_HOL_FR2 | 50 nt | 321,848,214 | 0 |
| Holstein MP 25 | solid_HOL_MP3 | 25 nt | 160,118,556 | 159,430,437 |
| Holstein MP 25 | solid_HOL_MP4 | 25 nt | 281,714,461 | 281,402,729 |
Three libraries were constructed for each animal and sequenced using two or more instrument runs. The numbers of reads obtained for fragment libraries is given in the F3 reads column. Mate-paired libraries yielded two read types (F3 reads and R3 reads).
Coverage of the Holstein and Black Angus genomes
| Genome | Total reads | Megabases of coverage | Fold coverage |
|---|---|---|---|
| Holstein | 2,041,487,844 | 49,069.31 | 18.63 |
| Black Angus | 2,840,328,583 | 57,730.10 | 21.91 |
Megabases of coverage was calculated based on the numbers and lengths of reads that were successfully mapped to the reference. Fold coverage was calculated by dividing the megabases of coverage by the combined length of the reference chromosomes used for mapping (2,634,413,324 bp).
Comparison of BovineHD array genotypes to sequencing SNPs
| Detectable genotype | BovineHD | Sequencing calls | Concordant |
|---|---|---|---|
| Homozygous variant | 226,854 | 206,480 (91%) | 203,812 (90%) |
| Heterozygous | 189,784 | 152,910 (81%) | 149,550 (79%) |
The sequenced Holstein animal was genotyped so that the ability of the sequencing to identify detectable SNPs (homozygous variant and heterozygote) could be quantified (false negative rate). The "Sequencing Calls" column gives the number of detectable array SNPs that were identified as SNPs through sequencing, regardless of whether the sequencing genotype matched the array genotype. The "Concordant" column gives the number of sequencing calls with a sequencing genotype that matched the array genotype.
Comparison of genotypes from a custom array to sequencing SNPs
| Source of SNPs | SNPs tested | SNPs validated |
|---|---|---|
| Holstein | 427 | 420 (98%) |
| Black Angus | 422 | 415 (98%) |
To estimate the false positive rate of SNP discovery, a subset of the SNPs discovered by sequencing was genotyped in 1083 animals.
SNP functional class membership
| Functional class | Holstein | Black Angus | Intersection | Union |
|---|---|---|---|---|
| Intergenic | 2,488,430(66.3) | 2,131,566(65.7) | 1,124,055(65.6) | 3,495,941(66.1) |
| Intronic | 1,003,805(26.7) | 881,566(27.2) | 469,082(27.4) | 1,416,289(26.8) |
| Upstream | 116,529(3.1) | 103,589(3.2) | 53,027(3.1) | 167,091(3.2) |
| Downstream | 104,762(2.8) | 90,788(2.8) | 48,128(2.8) | 147,422(2.8) |
| Synonymous coding | 16,161(0.4) | 15,102(0.5) | 8,051(0.5) | 23,212(0.4) |
| Nonsynonymous coding | 11,598(0.3) | 10,723(0.3) | 5,490(0.3) | 16,831(0.3) |
| 3' UTR | 8,732(0.2) | 7,753(0.2) | 4,200(0.2) | 12,285(0.2) |
| Splice sitea | 2,921(0.1) | 2,679(0.1) | 1,421(0.1) | 4,179(0.1) |
| 5' UTR | 1,591(0.0) | 1,382(0.0) | 680(0.0) | 2,293(0.0) |
| Within non coding gene | 791(0.0) | 763(0.0) | 344(0.0) | 1210(0.0) |
| Essential splice siteb | 197(0.0) | 166(0.0) | 94(0.0) | 269(0.0) |
| Stop gained | 126(0.0) | 124(0.0) | 46(0.0) | 204(0.0) |
| Stop lost | 13(0.0) | 8(0.0) | 5(0.0) | 16(0.0) |
| Within mature miRNA | 7(0.0) | 2(0.0) | 1(0.0) | 8(0.0) |
| Total | 3,755,663(100) | 3,246,211(100) | 1,714,624(100) | 5,287,250(100) |
aSNP is located 1-3 bases into an exon or 3-8 bases into an intron.
bSNP is located in the first two or the last two bases of an intron.
The predicted functional consequences of SNPs identified by sequencing of the Holstein and Black Angus genomes. Values in parentheses are the percentage of SNPs that are in the functional class, out of the total SNPs in the column.
Figure 1Characteristics of nonsynonymous SNPs. (A) Distribution of "alignment score change" for Holstein and Black Angus nonsynonymous SNPs generated using a bin size of 3. Negative scores indicate the presence of a non-reference-sequence allele that makes the protein less similar to its orthologues. Positive scores indicate the presence of a non-reference-sequence allele that makes the protein more similar to its orthologues. (B) Proportion of heterozygous SNPs in each bin. SNPs with negative scores tend to be heterozygous. (C) Proportion of SNPs found in the other animal's SNP list. SNPs with negative scores are less frequently present in both animals.
Summary of CNVs
| BTA | Chromosome length | % length in CNV | Total CNV length | No. CNV | Mean length | Median length | Max length | Min length |
|---|---|---|---|---|---|---|---|---|
| 1 | 161,106,243 | 0.028 | 44,913 | 11 | 4,083 | 4,148 | 7,151 | 2,861 |
| 2 | 140,800,416 | 0.102 | 143,068 | 32 | 4,471 | 4,865 | 10,257 | 2,629 |
| 3 | 127,923,604 | 0.135 | 173,057 | 27 | 6,410 | 5,435 | 28,029 | 2,861 |
| 4 | 124,454,208 | 0.000 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 125,847,759 | 0.246 | 309,452 | 51 | 6,068 | 6,435 | 19,448 | 2,860 |
| 6 | 122,561,022 | 0.292 | 358,280 | 56 | 6,398 | 5,610 | 20,911 | 2,549 |
| 7 | 112,078,216 | 0.243 | 272,175 | 95 | 2,865 | 3,421 | 9,881 | 1,901 |
| 8 | 116,942,821 | 0.163 | 190,373 | 37 | 5,145 | 5,131 | 13,111 | 2,849 |
| 9 | 108,145,351 | 0.057 | 61,772 | 14 | 4,412 | 3,949 | 7,051 | 2,821 |
| 10 | 106,383,598 | 0.090 | 96,123 | 25 | 3,845 | 3,727 | 8,481 | 2,569 |
| 11 | 110,171,769 | 0.075 | 82,123 | 31 | 2,649 | 2,813 | 4,499 | 2,249 |
| 12 | 85,358,539 | 0.255 | 217,445 | 49 | 4,438 | 5,005 | 9,731 | 2,781 |
| 13 | 84,419,198 | 0.180 | 152,086 | 64 | 2,376 | 2,758 | 3,939 | 1,969 |
| 14 | 81,345,643 | 0.113 | 91,924 | 26 | 3,536 | 3,752 | 5,628 | 2,680 |
| 15 | 84,633,453 | 0.125 | 106,076 | 30 | 3,536 | 3,611 | 10,209 | 2,489 |
| 16 | 77,906,053 | 0.039 | 30,688 | 6 | 5,115 | 3,562 | 14,248 | 2,740 |
| 17 | 76,506,943 | 0.048 | 36,720 | 11 | 3,338 | 3,510 | 6,480 | 2,700 |
| 18 | 66,141,439 | 0.851 | 563,154 | 120 | 4,693 | 6,100 | 18,405 | 2,141 |
| 19 | 65,312,493 | 0.035 | 22,660 | 6 | 3,777 | 2,987 | 6,723 | 2,489 |
| 20 | 75,796,353 | 0.036 | 27,332 | 8 | 3,416 | 3,589 | 4,141 | 2,761 |
| 21 | 69,173,390 | 0.064 | 44,293 | 13 | 3,407 | 3,568 | 5,413 | 2,461 |
| 22 | 61,848,140 | 0.031 | 19,408 | 4 | 4,852 | 4,033 | 8,821 | 2,521 |
| 23 | 53,376,148 | 0.095 | 50,750 | 12 | 4,229 | 3,500 | 12,000 | 2,500 |
| 24 | 65,020,233 | 0.053 | 34,298 | 10 | 3,430 | 3,307 | 5,389 | 2,449 |
| 25 | 44,060,403 | 0.000 | 0 | 0 | 0 | 0 | 0 | 0 |
| 26 | 51,750,746 | 0.134 | 69,401 | 33 | 2,103 | 2,301 | 2,761 | 1,841 |
| 27 | 48,749,334 | 0.144 | 70,250 | 14 | 5,018 | 4,126 | 20,071 | 2,231 |
| 28 | 46,084,206 | 0.045 | 20,587 | 5 | 4,117 | 3,943 | 5,913 | 2,409 |
| 29 | 51,998,940 | 0.000 | 0 | 0 | 0 | 0 | 0 | 0 |
| TOTAL | 2,545,896,661 | 0.129 | 3,288,408 | 790 | 4,163 | 3,171 | 28,029 | 1,841 |
The distribution and size characteristics of CNVs detected through comparison of the Holstein and Black Angus read sets mapped to the Btau4.0 reference assembly.
Figure 2Genomic distribution of CNVs. Arrowheads located on the left side of the chromosome ideograms represent CNVs with higher copy number in the Holstein genome (Holstein CNV gains) while arrowheads on the right side in represent CNVs with higher copy number in the Black Angus genome (Black Angus CNV gains). Note that multiple CNVs may appear as a single arrowhead due to their proximity in the genome.
CNVs selected for validation by qPCR
| CNV ID | Entrez Gene | Log2 ratio | P-value |
|---|---|---|---|
| Chr2_CNV_29 | PLA2G2D1 | -1.799 | 0 |
| Chr3_CNV_18 | LOC781675 | 0.813 | 3.06E-99 |
| Chr5_CNV_6 | - | 0.871 | 3.44E-112 |
| Chr5_CNV_46 | - | 2.777 | 0 |
| Chr6_CNV_32 | LOC785098 | 0.811 | 3.47E-141 |
| Chr10_CNV_24 | - | -2.954 | 0 |
| Chr13_CNV_50 | ATRN | 0.858 | 4.00E-171 |
| Chr15_CNV_26 | - | 0.806 | 3.79E-109 |
| Chr18_CNV_75 | - | 0.951 | 1.64E-138 |
| Chr24_CNV_6 | SERPINB5 | 0.991 | 8.42E-164 |
Five genic and five non-genic CNVs were selected for validation by qPCR. Entrez Gene names are given for genic CNVs. The log2 ratios and p-values obtained from the CNV-seq software are shown. Positive log2 ratios indicate higher read depth in the Holstein animal.
Figure 3Validation of CNVs using qPCR. Validation results for non-genic CNVs (panels on the left) and genic CNVs (panels on the right) are shown. Each panel is labelled with the CNV tested, and the breed assayed. The name of the overlapping gene is given in parentheses for genic CNVs. Bars represent distinct animals and are labelled with animal identifiers. The right-most bar in each panel depicts the relative copy number in a calibrator animal from the alternate breed. The calibrator is assumed to contain two copies of the DNA segment. Each bar was calculated from four technical replicates. The error bars show the minimum and maximum value encountered among the replicates.
Gene Ontology terms enriched among the CNVs
| Ontology | GO ID | Description | Animal | P-BA | P-HOL |
|---|---|---|---|---|---|
| BP | GO:0032502 | Developmental process | Both | 4.3E-21 | 3.5E-40 |
| BP | GO:0032501 | Multicellular organismal process | Both | 6E-34 | 3.4E-67 |
| BP | GO:0050789 | Regulation of biological process | Both | 0.00042 | 0.00015 |
| BP | GO:0002376 | Immune system process | Both | 2.4E-19 | 1.4E-17 |
| BP | GO:0016043 | Cellular component organization | Both | 1.3E-07 | 9.7E -12 |
| BP | GO:0065007 | Biological regulation | Both | 5.5E-05 | 9.2E-05 |
| BP | GO:0048518 | Positive regulation of biological process | Both | 1.3E-35 | 2.6E-29 |
| BP | GO:0048519 | Negative regulation of biological process | Both | 5.7E-16 | 1E-30 |
| BP | GO:0022610 | Biological adhesion | Both | 8.5E-06 | 0.028 |
| BP | GO:0016265 | Death | Both | 1.5E-13 | 2.3E-07 |
| BP | GO:0009987 | Cellular process | Both | 0.011 | 0.077 |
| BP | GO:0008152 | Metabolic process | Both | 0.0051 | 0.015 |
| BP | GO:0051234 | Establishment of localization | Both | 0.011 | 0.00069 |
| BP | GO:0051179 | Localization | Both | 0.002 | 3.5E-07 |
| BP | GO:0040007 | Growth | Both | 9.7E-07 | 3.8E-16 |
| BP | GO:0050896 | Response to stimulus | Both | 3.2E-30 | 9.1E-41 |
| BP | GO:0044085 | Cellular component biogenesis | Both | 0.00072 | 0.0044 |
| BP | GO:0040011 | Locomotion | BA | 5.4E-12 | - |
| BP | GO:0000003 | Reproduction | HOL | - | 1.6E-25 |
| BP | GO:0022414 | Reproductive process | HOL | - | 3.8E-18 |
| CC | GO:0032991 | Macromolecular complex | Both | 0.007 | 1.6E-05 |
| CC | GO:0005623 | Cell | Both | 0.0016 | 0.00025 |
| CC | GO:0044464 | Cell part | Both | 0.0016 | 0.00025 |
| CC | GO:0044421 | Extracellular region part | Both | 1.2E-14 | 5.8E-13 |
| CC | GO:0005576 | Extracellular region | Both | 2E-08 | 7.8E-07 |
| CC | GO:0043226 | Organelle | Both | 0.0054 | 4.3E-07 |
| CC | GO:0044422 | Organelle part | Both | 0.00024 | 6.8E-10 |
| CC | GO:0031974 | Membrane-enclosed lumen | HOL | - | 4.5E-06 |
| MF | GO:0060089 | Molecular transducer activity | Both | 0.013 | 0.0031 |
| MF | GO:0005215 | Transporter activity | Both | 0.07 | 0.43 |
| MF | GO:0003824 | Catalytic activity | Both | 0.071 | 0.095 |
| MF | GO:0005488 | Binding | Both | 0.0015 | 0.0032 |
| MF | GO:0030528 | Transcription regulator activity | HOL | - | 0.16 |
| MF | GO:0030234 | Enzyme regulator activity | HOL | - | 3.6E-09 |
GO IDs from the three GO ontologies (BP = Biological Process; CC = Cellular Component; MF = Molecular Function) enriched among the CNV gains in Black Angus (BA), Holstein (HOL), or both animals (Both). P-values are provided, when applicable, for the subset of CNVs detected as gains in the Black Angus animal (P-BA) and the subset detected as gains in the Holstein animal (P-HOL).
Figure 4CNVs overlapping with . (A) Log2 ratio plot of the PLA2G2D gene region. Each point shows the log2 ratio of the number of Holstein reads mapped to the number of Black Angus reads mapped. Points are coloured based on the log10 p-value calculated by the CNV-seq software. (B) The PLA2G2D gene region as visualized using the UCSC Genome Browser. The precise boundaries of the five CNVs reported by CNV-seq that reside in this region are shown and labelled. The third CNV from the left (Chr2_CNV_29) was tested and validated by qPCR.