| Literature DB >> 32191811 |
Benjamin D Rosen1, Derek M Bickhart2, Robert D Schnabel3, Sergey Koren4, Christine G Elsik3, Elizabeth Tseng5, Troy N Rowan3, Wai Y Low6, Aleksey Zimin7, Christine Couldrey8, Richard Hall5, Wenli Li2, Arang Rhie4, Jay Ghurye9, Stephanie D McKay10, Françoise Thibaud-Nissen11, Jinna Hoffman11, Brenda M Murdoch12, Warren M Snelling13, Tara G McDaneld13, John A Hammond14, John C Schwartz14, Wilson Nandolo15,16, Darren E Hagen17, Christian Dreischer18, Sebastian J Schultheiss18, Steven G Schroeder1, Adam M Phillippy4, John B Cole1, Curtis P Van Tassell1, George Liu1, Timothy P L Smith13, Juan F Medrano19.
Abstract
BACKGROUND: Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10-12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies.Entities:
Keywords: Hereford; bovine genome; cattle; reference assembly
Mesh:
Year: 2020 PMID: 32191811 PMCID: PMC7081964 DOI: 10.1093/gigascience/giaa021
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:L1 Dominette 0 1449. The line-bred Hereford cow was selected as the original cattle reference animal for her high level of inbreeding.
Figure 2:Dominette de novo assembly. A: Assembly pipeline. N50 is the minimum scaffold/contig length needed to cover 50% of the genome; L50 is the number of contigs required to reach N50. B: Cattle chromosomes painted with assembled contigs. A color shift indicates the switch from one contig to the next or the end of an alignment block. The left half of each chromosome shows UMD3.1.1 contigs while the right shows ARS-UCD1.2. To be conservative, contigs were ordered by UMD3.1.1 assembly positions; where there are conflicts in order between ARS-UCD1.2 and UMD3.1.1, the plot displays a color switch in ARS-UCD1.2. Asterisk indicates within scaffolds assigned to chromosomes.
Figure 3:Assembly assessments computed for ARS-UCD1.2 and UMD3.1.1. A, Feature response curves computed for ARS-UCD1.2 and UMD3.1.1. B, Calculated contig NGx (minimum contig length needed to cover x% of the genome calculated on a fixed genome size of 2.8 Gb) showing a 280-fold increase of ARS-UCD1.2 in comparison with UMD3.1.1. C, The percentage of gaps in gene-flanking regions is reduced from 33% to 0.3% in ARS-UCD1.2 in comparison with UMD3.1.1.
Assembly quality score value statistics and structural inconsistencies measured between ARS-UCD1.2 and UMD3.1.1 using Dominette whole-genome sequencing reads
| Major category | Subcategory | ARS-UCD1.2 | UMD3.1.1 | Description |
|---|---|---|---|---|
| QV | 48.67 | 37.98 | Quality value estimate (Phred-scale) | |
| FRCbam | ||||
| COMPR PE | 37,309 (30,643) | 54,602 (52,606) | Areas with low CE statistics | |
| STRECH PE | 37,255 (22,741) | 35,766 (35,299) | Areas with high CE statistics | |
| HIGH COV PE | 7,166 (1,970) | 7,711 (6,331) | High read coverage areas (all aligned reads) | |
| HIGH NORM COV PE | 5,641 (1,125) | 7,109 (5,778) | High paired-read coverage areas (only properly aligned pairs) | |
| HIGH OUTIE PE | 139 (102) | 2,108 (2,108) | Regions with high numbers of misoriented or distant pairs | |
| HIGH SINGLE PE | 60 (53) | 1,258 (1,256) | Regions with high numbers of unmapped pairs | |
| HIGH SPAN PE | 4,882 (1,687) | 4,172 (3,582) | Regions with high numbers of pairs that map to different scaffolds | |
| LOW COV PE | 43,370 (36,062) | 57,176 (56,648) | Low read coverage areas (all aligned reads) | |
| LOW NORM COV PE | 42,067 (34,592) | 60,560 (59,926) | Low paired-end coverage areas (only properly aligned pairs) | |
| Total features | 177,889 (128,975) | 230,462 (223,534) | All erroneous features | |
| Sniffles | ||||
| DEL | 188 | 10,504 | Deletions | |
| DUP | 16 | 728 | Duplications | |
| INS | 106 | 4,911 | Insertions | |
| INV | 34 | 2,675 | Inversions | |
| Total SVs | 344 | 18,818 | All structural variants |
Numbers in parentheses indicate the errors in placed chromosome scaffolds only.
Sniffles structural variant (SV) calls were generated using long reads aligned to the whole assembly.
CE: compression/expansion; QV: quality value.
Splign alignment of RefSeq transcripts to ARS-UCD1.2 and UMD3.1.1
| Parameter | ARS-UCD1.2 | UMD3.1.1 |
|---|---|---|
| Accession | GCF_0 022 63795.1 | GCF_0 00003055.5 |
| No. of sequences retrieved from Entrez | 14,473 | 14,473 |
| No. of sequences not aligning | 19 (12) | 13 (12) |
| No. of sequences whose best alignments span multiple loci (split genes) | 9 | 219 |
| No. of sequences with CDS coverage <95% | 37 | 734 |
Neither assembly includes a Y chromosome, yet 7 transcripts (6 not aligning to only ARS-UCD1.2 and 1 not aligning to both) are from Y-linked genes. Totals excluding Y-linked genes in parentheses.