| Literature DB >> 25075278 |
Ted Kalbfleisch1, Michael P Heaton2.
Abstract
Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these species have provided unique insights into mammalian gene function. However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life. For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project. Only six of these have reference genomes: cattle, swine, sheep, goat, water buffalo, and bison. Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade. In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1). In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep. Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous. These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene function.Entities:
Year: 2013 PMID: 25075278 PMCID: PMC4103496 DOI: 10.12688/f1000research.2-244.v2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
List of the number of variants identified in UMD3.1 for which a corresponding position could be identified in Oar3.1, and of these the number of variants whose genotype was corroborated (80.3%) vs. Oar3.1.
No variants identified on the X-chromosome of either reference were included in these totals.
| Total UMD3.1 Hets not in Repeat Regions (NR) | 3,672,099 |
| Total UMD3.1 NR Hets with Corresponding OAR3.1 position | 1,524,297 |
| Total UMD3.1 NR Hets with GT Corroborated in OAR3.1 | 1,224,642 |
Genome coverage for datasets mapped to reference assemblies.
| Measure | Reference genome | |
|---|---|---|
| Oar 3.1 | UMD3.1 | |
| Bases covered
| 2,502,381,648 | 2,047,579,163 |
| Fold coverage
| 11.89 | 6.86 |
Genome-wide variants identified in reference assemblies.
| Measure | Reference genome | |
|---|---|---|
| UMD3.1 | Oar 3.1 | |
| Total variants
[ | 83,144,283 | 16,287,956
[ |
| Homozygous variants | 78,137,488 | 7,122,032 |
| Heterozygous variants | 4,837,702 | 9,128,452 |
| Heterozygous nonRef variants | 169,031 | 37,472 |
| Total heterozygous sites not in repeat regions | 3,672,099 | N/D
[ |
| Heterozygous sites not in repeat regions | 3,542,880 | N/D |
| NonRef Hets not in repeat regions
[ | 129,219 | N/D |
aAll variants measured on chromosome X were removed from these totals.
bThe variants identified vs. Oar3.1 that occurred in repeat regions were not filtered out.
cNot determined.
dNonRef Hets are heterozygous variants where neither detected allele corresponds to the bovine reference allele at that position.
Links to pages within the Intrepid Bioinformatics data management system.
|
|
|
|
|
|
|
|
Figure 1. Screen shot of web page with links for the mapped Katahdin Ram reads to the bovine reference genome UMD3.1 (See the Methods for detail description of use).