| Literature DB >> 34922445 |
Maulana M Naji1, Yuri T Utsunomiya2,3,4, Johann Sölkner1, Benjamin D Rosen5, Gábor Mészáros1.
Abstract
BACKGROUND: Reference genomes are essential in the analysis of genomic data. As the cost of sequencing decreases, multiple reference genomes are being produced within species to alleviate problems such as low mapping accuracy and reference allele bias in variant calling that can be associated with the alignment of divergent samples to a single reference individual. The latest reference sequence adopted by the scientific community for the analysis of cattle data is ARS_UCD1.2, built from the DNA of a Hereford cow (Bos taurus taurus-B. taurus). A complementary genome assembly, UOA_Brahman_1, was recently built to represent the other cattle subspecies (Bos taurus indicus-B. indicus) from a Brahman cow haplotype to further support analysis of B. indicus data. In this study, we aligned the sequence data of 15 B. taurus and B. indicus breeds to each of these references.Entities:
Mesh:
Year: 2021 PMID: 34922445 PMCID: PMC8684283 DOI: 10.1186/s12711-021-00688-1
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
List of FASTQ reads dataset
| Breeds | N | Read length | Total reads | Cov | ARS_UCD1.2 (%) | UOA_Brahman_1 (%) | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Map | Filt | Ret | Map | Filt | Ret | |||||
| Angusa | 9 | 151 | 172 | 7.26 | 97.65 | 13.74 | 83.91 | 97.77 | 14.99 | 82.77 |
| Hereforda | 8 | 99 | 242 | 10.17 | 99.34 | 15.83 | 83.51 | 99.47 | 17.26 | 82.21 |
| Holsteina | 9 | 100 | 124 | 4.58 | 93.37 | 16.82 | 76.55 | 93.45 | 17.91 | 75.54 |
| Jerseya | 7 | 100 | 186 | 6.86 | 98.79 | 19.79 | 79.00 | 98.84 | 20.84 | 78.00 |
| Shorthorna | 5 | 99 | 230 | 8.36 | 99.06 | 19.06 | 80.00 | 99.13 | 20.33 | 78.80 |
| Simmentala | 16 | 100 | 217 | 8.59 | 92.91 | 17.02 | 75.89 | 92.98 | 18.28 | 74.70 |
| Bohaibc | 3 | 150 | 222 | 12.21 | 99.55 | 25.12 | 74.44 | 99.68 | 26.40 | 73.28 |
| Boranb | 10 | 101 | 297 | 11.03 | 99.54 | 15.49 | 84.04 | 99.52 | 15.94 | 83.58 |
| Brahmanb | 7 | 114 | 283 | 11.64 | 88.74 | 3.98 | 84.76 | 89.84 | 4.58 | 85.26 |
| Girb | 9 | 99 | 91 | 3.27 | 97.09 | 12.40 | 84.69 | 95.64 | 12.32 | 83.33 |
| Indian zebub | 5 | 100 | 152 | 5.56 | 98.83 | 33.05 | 65.79 | 98.77 | 33.12 | 65.64 |
| Kenanab | 6 | 101 | 307 | 11.40 | 99.50 | 14.59 | 84.91 | 99.51 | 14.68 | 84.83 |
| Mangshib | 7 | 100 | 101 | 3.68 | 98.29 | 17.50 | 80.80 | 98.25 | 17.64 | 80.62 |
| Neloreb | 6 | 99 | 111 | 4.00 | 98.18 | 13.35 | 84.83 | 98.05 | 13.85 | 84.20 |
| Ogadenb | 5 | 101 | 291 | 10.80 | 99.51 | 15.65 | 83.86 | 99.49 | 16.11 | 83.38 |
N: Number of individuals; Read length: read length values were derived from the mode of read length from individuals raw reads from each respective breed; Total reads: total reads in million; Cov: coverage values were estimated from total bases in raw reads and using 2.7 Gb as the genome length; Map: mapped values were percentage of reads aligned to the respective reference genome; Filt: filter values were percentage of reads dropped from the BAM files due to not passing the parameters set during the base recalibration step in the genome analysis tool kit (GATK); Ret: retain values were percentage of reads passing the GATK base recalibration step and kept in the final BAM files for the next step of variants calling for the respective reference genome; All values for total reads, coverage values, mapped values, filter values, and retain values are averages of individuals for each respective breed
aBos taurus taurus
bBos taurus indicus
cHighly admixed with B. taurus ancestry
Fig. 1a Pipeline for calling single nucleotide variants (SNVs) using fastq reads from the NCBI SRA database and BWA, Samtools, and GATK as the main tools; b calculation of the number of NFAA (nearly fixed alternative allele, AA ≥ 0.95) sites for Bos taurus taurus and Bos taurus indicus sequences aligned to a specific reference genome. We carried out the work flows twice using ARS_UCD1.2 and UOA_Brahman_1 as reference genome, independently
Single nucleotide variants (SNVs) for each breed
| Breeds | Sub-speciesa | ARS_UCD1.2b | UOA_Brahman_1b |
|---|---|---|---|
| Angus | 11,396,537 | 17,143,672 | |
| Hereford | 9,971,428 | 16,734,834 | |
| Holstein | 11,133,911 | 11,459,171 | |
| Jersey | 9,512,074 | 15,423,670 | |
| Shorthorn | 8,464,487 | 14,393,682 | |
| Simmental | 14,988,534 | 20,404,489 | |
| Bohaic | 16,102,113 | 19,819,636 | |
| Boran | 27,861,344 | 27,757,944 | |
| Brahman | 33,177,201 | 30,592,708 | |
| Gir | 24,288,498 | 22,426,843 | |
| Indianzebu | 20,210,906 | 18,518,614 | |
| Kenana | 26,760,622 | 24,953,316 | |
| Mangshi | 21,993,501 | 20,859,379 | |
| Nelore | 22,770,177 | 21,279,805 | |
| Ogaden | 25,330,919 | 24,224,387 |
aSub-species were assigned based on original labels from NCBI-SRA
bSNVs passing the variants filtration process respective to the used reference genome
cHighly admixed with B. taurus ancestry
Fig. 2Distribution of the number of SNVs with an alternative allele frequency of 0.95 or higher (i.e., nearly fixed alternative allele-NFAA) using ARS_UCD1.2 as the reference genome (1-Mb scanning windows). Main lines with blue and orange colours are the average numbers of NFAA sites from individuals representing groups of Bos taurus indicus and Bos taurus taurus and the shadowed blue and orange colors are the actual numbers of NFAA sites for each single Bos taurus indicus and Bos taurus taurus breed, respectively
Fig. 3Distribution of the number of SNVs with an alternative allele frequency of 0.95 or higher (i.e., nearly fixed alternative allele-NFAA) using UOA_Brahman_1 as the reference genome (1-Mb scanning windows). Main lines with blue and orange colours are the average numbers of NFAA sites from individuals representing groups of Bos taurus indicus and Bos taurus taurus and the shadowed blue and orange colors are the actual numbers of NFAA sites for each single Bos taurus indicus and Bos taurus taurus breed, respectively
Fig. 4Delta values ( calculated as the subtraction of the mean number of nearly fixed alternative alleles (NFAA) sites across the chromosomes from the actual number of NFAA sites in each respective scanning window of Bos taurus taurus individuals. Taurine-introgressed regions in the UOA_Brahman_1 assembly defined as regions with values above the arbitrary threshold of 1.5 sd from the mean
Fig. 5PCA using SNVs from a the whole genome; b putative taurine-introgressed regions; c non-introgressed regions in the UOA_Brahman_1 assembly. Taurine/indicine were assigned based on original labels from NCBI-SRA. Blue arrows point to the Brahman haplotype (SNVs were set to be all homozygous to the reference alleles of the UOA_Brahman_1 assembly)
Fig. 6Admixture analysis using SNVs from a the whole genome; b putative taurine-introgressed; c non-introgressed regions in the UOA_Brahman_1 assembly. Brahman_hap is an artificial individual with all the genotypes homozygous to the reference alleles of the UOA_Brahman_1 reference genome. d Admixture inferred using SNVs derived from the alignment against the ARS_UCD1.2 assembly. Green, red and blue colors depict subpopulations K1, K2, and K3, respectively
B. taurus introgressed segments in the UOA_Brahman_1 reference genome that overlap with positively selected genes in the Brahman population as reported by [11]
| Chr | Start (Mb) | End (Mb) | Total | Size (Mb) | Positive genes |
|---|---|---|---|---|---|
| 3 | 58 | 59 | 1090.88 | 1 | |
| 5 | 43 | 44 | 1020.05 | 1 | |
| 6 | 2 | 4 | 2016.60 | 2 | |
| 7 | 0 | 1 | 1034.72 | 1 | |
| 7 | 29 | 30 | 1052.05 | 1 | |
| 10 | 86 | 87 | 1094.05 | 1 | |
| 11 | 11 | 13 | 2084.10 | 2 | |
| 11 | 14 | 30 | 17237.41 | 16 | |
| 11 | 95 | 96 | 1144.55 | 1 | |
| 12 | 11 | 12 | 1076.72 | 1 | |
| 13 | 0 | 21 | 22459.58 | 21 | |
| 14 | 67 | 68 | 1117.55 | 1 | |
| 14 | 81 | 83 | 2138.93 | 2 | |
| 16 | 59 | 66 | 7608.05 | 7 | |
| 17 | 0 | 5 | 5367.15 | 5 | |
| 23 | 32 | 33 | 1068.35 | 1 | |
| 25 | 12 | 23 | 11676.41 | 11 | |
| 26 | 13 | 14 | 964.55 | 1 | |
| 26 | 24 | 34 | 10574.31 | 10 |
: subtraction of the mean number of nearly fixed alternative allele (NFAA) sites across each chromosome from the total number of NFAA sites in each respective scanning window of B. taurus dataset; Total : sum of on each continuous putative taurine segment