| Literature DB >> 35640223 |
R Alan Harris1, Muthuswamy Raveendran1, Dustin T Lyfoung2, Fritz J Sedlazeck1, Medhat Mahmoud1, Trent M Prall3, Julie A Karl3, Harshavardhan Doddapaneni1, Qingchang Meng1, Yi Han1, Donna Muzny1, Roger W Wiseman2,3, David H O'Connor2,3, Jeffrey Rogers1.
Abstract
BACKGROUND: The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was generated in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and greater continuity.Entities:
Keywords: COVID-19; Mesocricetus auratus; Syrian hamster; disease model; genome
Mesh:
Year: 2022 PMID: 35640223 PMCID: PMC9155146 DOI: 10.1093/gigascience/giac039
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 7.658
Assembly statistics for BCM_Maur_2.0 versus the MesAur1.0 Syrian hamster assembly
| Parameter | MesAur1.0 | Flye | Flye + Pilon | Flye + Pilon + Bionano (BCM_Maur_2.0) |
|---|---|---|---|---|
| Assembly length (bp) | 2,504,908,775 | 2,381,258,546 | 2,383,228,608 | 2,457,062,007 |
| Ungapped length (bp) | 2,076,159,990 | 2,381,254,546 | 2,383,226,373 | 2,383,228,883 |
| No. of scaffolds | 21,483 | 6,741 | 6,741 | 6,346 |
| N50 scaffold length (bp) | 12,753,307 | 10,564,357 | 10,573,641 | 85,184,847 |
| No. of contigs | 237,699 | 6,781 | 6,779 | 7,057 |
| N50 contig length (bp) | 22,512 | 10,022,145 | 10,097,207 | 9,471,653 |
BUSCO statistics for BCM_Maur_2.0 versus the MesAur1.0 Syrian hamster assembly
| Statistic | MesAur1.0 (%) | Flye (%) | Flye + Pilon (%) | Flye + Pilon + Bionano (BCM_Maur_2.0) (%) |
|---|---|---|---|---|
| Complete[ | 86.60 | 90.58 | 95.95 | 95.97 |
| Complete and single-copy | 85.75 | 89.27 | 94.43 | 94.49 |
| Complete and duplicated | 0.85 | 1.31 | 1.52 | 1.47 |
| Fragmented | 4.59 | 3.23 | 0.85 | 0.82 |
| Missing | 8.81 | 6.19 | 3.20 | 3.21 |
A total of 12,692 gene models were included in this analysis.
Figure 1:Cumulative length and continuity comparison of MesAur1.0 and BCM_Maur_2.0. This summarizes the length of contigs/scaffolds across the assemblies. Given the length of contigs, the NG50 (mid x-axis) summarizes the sequence length of the shortest contig/scaffold at 50% of the total genome length. For genome length, the SGA-preqc estimate of 2.57 Gb was used.
Figure 2:Contig length and count comparison between BCM_Maur_2.0 and MesAur1.0. Log length of contigs on the x-axis and normalized count on the y-axis comparing BCM_Maur_2.0 assembly and the previous assembly. Contigs from BCM_Maur_2.0 are shown in red and contigs for MesAur1.0 are shown in gray.
Figure 3:Comparison of IFN-Iα gene cluster between MesAur1.0, BCM_Maur_2.0, and GCRm39 mouse genome assembly. The genomic intervals illustrated here are defined by the flanking IFN-β1 and IFN-ϵ genes except for MesAur1.0, which does not include an IFN-ϵ or β1 gene in a continuous sequence with IFN-Iα genes. White space within each scaffold represents gaps in the MesAur1.0 assembly. Accession numbers for each genomic sequence are indicated on the right, with genomic coordinates for the extracted intervals shown below their respective accession numbers. Predicted IFN-Iα genes are highlighted in blue, while putative pseudogenes are depicted with open symbols and labelled below each assembly.