| Literature DB >> 35134925 |
Kimberly M Davenport1, Derek M Bickhart2, Kim Worley3, Shwetha C Murali3, Mazdak Salavati4, Emily L Clark4, Noelle E Cockett5, Michael P Heaton6, Timothy P L Smith6, Brenda M Murdoch1, Benjamin D Rosen7.
Abstract
BACKGROUND: The domestic sheep (Ovis aries) is an important agricultural species raised for meat, wool, and milk across the world. A high-quality reference genome for this species enhances the ability to discover genetic mechanisms influencing biological traits. Furthermore, a high-quality reference genome allows for precise functional annotation of gene regulatory elements. The rapid advances in genome assembly algorithms and emergence of sequencing technologies with increasingly long reads provide the opportunity for an improved de novo assembly of the sheep reference genome.Entities:
Keywords: Ovis aries; Rambouillet; genome assembly; reference genome; sheep
Mesh:
Year: 2022 PMID: 35134925 PMCID: PMC8848310 DOI: 10.1093/gigascience/giab096
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Image of Benz 2616 Rambouillet ewe selected for the ovine reference genome assembly.
Assembly: quality statistics comparison
| Assembly statistic | ARS-UI_Ramb_v2.0 | Oar_rambouillet_v1.0 | Oar_v4.0 | Description |
|---|---|---|---|---|
| Total Length (Mb) | 2,628.15 | 2,869.91 | 2,615.52 | Assembly length in Mb |
| Contig No. | 226 | 7,486 | 48,482 | Total number of contigs |
| Contig NG50 (bp) | 43,178,051 | 2,850,956 | 145,655 | Half the length of the genome is in contigs of this size or greater, based on a 2,600 Mb genome |
| Contig LG50 (No. of contigs) | 24 | 263 | 5,206 | The smallest number of contigs whose length sum make up half of the genome size |
| Scaffold No. | 142 | 2,641 | 5,466 | Total number of scaffolds and unplaced contigs in the assembly |
| merQV | 44.7721 | 32.1705 | 31.9131 |
|
| merErrorRate | 0.000033327 | 0.00060662 | 0.000643714 |
|
| merCompleteness | 93.0479 | 93.4711 | 92.2182 | Proportion of complete assembly estimated by Merqury based on “reliable” |
| baseQV | 41.84 | 40.69 | 32.40 | SNP and INDEL quality value estimated from short-read data mapped to the assembly [ |
| Unmap% | 0.96 | 1.00 | 0.73 | Percentage of short reads that are unmapped to each assembly [ |
| COMPLETESC | 93.9 | 93.0 | 91.2 | Percent of complete, single-copy BUSCOs |
| COMPLETEDUP | 2.1 | 2.6 | 1.6 | Percent of complete, duplicated BUSCOs |
| FRAGMENT | 0.9 | 1.1 | 2.4 | Percent of fragmented BUSCOs |
| MISSING | 3.1 | 3.3 | 4.8 | Percent of missing BUSCOs |
Short-read sequencing from the Rambouillet ewe used to assemble both ARS-UI_Ramb_v2.0 and Oar_rambouillet_v1.0 was used in these quality values.
Short-read sequencing from the Texel animal used to assemble Oar_v4.0 was used in these quality values.
Figure 2:Hi-C contact map comparison of ARS-UI_Ramb_v2.0 (A) directly after scaffolding and before manual curation and (B) after manual curation with scaffold rearrangements and joins.
Figure 3:Assembly error comparison between ARS-UI_Ramb_v2.0, Oar_rambouillet_v1.0, and Oar_v4.0 in a feature response curve displaying sorted lengths of the assemblies with the fewest errors.
: Specific feature counts for each genome and descriptions.
| Features | ARS-UI_Ramb_v2.0 | Oar_rambouillet_v1.0 | Oar_v4.0 | Description |
|---|---|---|---|---|
| LOW_COV_PE | 7212 | 95166 | 89103 | Low read coverage areas |
| LOW_NORM_COV_PE | 2990 | 24381 | 26860 | Low coverage of normal paired end reads |
| HIGH_SPAN_PE | 6522 | 22628 | 33232 | Regions with high numbers of inter-contig paired end reads |
| HIGH_COV_PE | 2051 | 3630 | 26276 | Regions with high read coverage |
| HIGH_NORM_COV_PE | 2366 | 2633 | 1875 | Regions with high coverage of normal paired end reads |
| HIGH_OUTIE_PE | 2514 | 28766 | 37495 | Regions with high counts of improperly paired reads |
| HIGH_SINGLE_PE | 0 | 0 | 0 | Regions with high counts of single unmapped reads |
| STRECH_PE | 74 | 84 | 67 | Regions with high Comp/Expansion (CE) statistics |
| COMPR_PE | 87 | 92 | 44 | Regions with low Comp/Expansion (CE) statistics |
Figure 4:Dot plot comparison of genome assemblies between (A) ARS-UI_Ramb_v2.0 and Oar_rambouillet_v1.0, and (B) ARS-UI_Ramb_v2.0 and Oar_v4.0.
RNA-seq: alignment statistics to ARS-UI_Ramb_v2.0 and Oar_rambouillet_v1.0 from 5 different tissues
| Tissue | Genome | No. input reads | Reads uniquely mapped | Reads multi-mapped | Reads unmapped | No. indels | |||
|---|---|---|---|---|---|---|---|---|---|
| No. | % | No. | % | No. | % | ||||
| Skin | v2.0 | 62,630,134 | 53,990,480 | 86.20 | 6,684,213 | 10.67 | 1,955,441 | 3.12 | 962 |
| v1.0 | 52,523,732 | 83.86 | 8,114,599 | 12.96 | 1,991,803 | 3.18 | 2,512 | ||
| Δ | N/A | 1,466,748 | 2.34 | −1,430,386 | −2.29 | −36,362 | −0.06 | −1,550 | |
| Thalamus | v2.0 | 54,655,873 | 45,721,452 | 83.65 | 5,414,620 | 9.91 | 3,519,801 | 6.44 | 649 |
| v1.0 | 44,904,096 | 82.16 | 6,126,363 | 11.21 | 3,625,414 | 6.63 | 1,054 | ||
| Δ | N/A | 817,356 | 1.49 | −711,743 | −1.30 | −105,613 | −0.19 | −405 | |
| Pituitary | v2.0 | 43,368,663 | 39,710,031 | 91.56 | 2,405,103 | 5.55 | 1,253,529 | 2.89 | 604 |
| v1.0 | 34,115,417 | 78.66 | 7,866,251 | 18.14 | 1,386,995 | 3.20 | 960 | ||
| Δ | N/A | 5,594,614 | 12.90 | −5,461,148 | −12.59 | −133,466 | -0.31 | −356 | |
| Lymph node— | v2.0 | 43,673,576 | 38,819,419 | 88.88 | 3,562,121 | 8.16 | 1,292,036 | 2.96 | 684 |
| mesenteric | v1.0 | 38,296,065 | 87.69 | 4,057,915 | 9.29 | 1,319,596 | 3.02 | 999 | |
| Δ | N/A | 523,354 | 1.19 | −495,794 | −1.13 | −27,560 | −0.06 | −315 | |
| Abomasum | v2.0 | 45,977,534 | 41,018,529 | 89.21 | 2,978,042 | 6.48 | 1,980,963 | 4.31 | 512 |
| pylorus | v1.0 | 40,403,981 | 87.88 | 3,533,015 | 7.68 | 2,040,538 | 4.44 | 846 | |
| Δ | N/A | 614,548 | 1.33 | −554,973 | −1.20 | −59,575 | −0.13 | −334 | |
Genomes include v2.0 (ARS-UI_Ramb_v2.0) and v1.0 (Oar_rambouillet_v1.0) and the difference (Δ).
: Expressed transcripts (TPM > 0) in Benz2616 tissues (n = 61) based on Oar_rambouillet_v1.0 and ARS-UI_Ramb_v2.0 and lift over (LO) (RefSeq v103 & 104, respectively).
| Gene Biotype | Oar_rambouillet_v1.0 | Oar_rambouillet_v1.0 LO | ARS-UI_Ramb_v2.0 | LO vs. Oar_rambouillet_v1.0 | LO vs. ARS-UI_Ramb_v2.0 | Oar_rambouillet_v1.0 vs. ARS-UI_Ramb_v2.0 |
|---|---|---|---|---|---|---|
| Guide RNA | 30 | 29 | 30 | –1 | –1 | 0 |
| lncRNA | 3929 | 3752 | 6018 | –177 | –2266 | –2089 |
| Protein coding | 42058 | 40910 | 60064 | –1148 | –19154 | –18006 |
| rRNA | 272 | 17 | 22 | –255 | –5 | 250 |
| snoRNA | 644 | 590 | 593 | –54 | –3 | 51 |
| snRNA | 997 | 907 | 879 | –90 | 28 | 118 |
Figure 5:Kallisto comparison of the number of expressed transcripts for the RNA-Seq dataset of 61 tissue samples from Benz2616, across the 3 annotations (Oar_Rambouillet_v1.0, Ramb1LO2 [liftover], and ARS-UI_Ramb_v2.0). lncRNA: long non-coding RNA; rRNA: ribosomal RNA; snoRNA: small nucleolar RNA; snRNA: small nuclear RNA.