| Literature DB >> 32543654 |
Amanda Warr1, Nabeel Affara2, Bronwen Aken3, Hamid Beiki4, Derek M Bickhart5, Konstantinos Billis3, William Chow6, Lel Eory1, Heather A Finlayson1, Paul Flicek3, Carlos G Girón3, Darren K Griffin7, Richard Hall8, Greg Hannum9, Thibaut Hourlier3, Kerstin Howe6, David A Hume1,10, Osagie Izuogu3, Kristi Kim8, Sergey Koren11, Haibou Liu4, Nancy Manchanda12, Fergal J Martin3, Dan J Nonneman13, Rebecca E O'Connor7, Adam M Phillippy11, Gary A Rohrer13, Benjamin D Rosen14, Laurie A Rund15, Carole A Sargent2, Lawrence B Schook15, Steven G Schroeder14, Ariel S Schwartz9, Ben M Skinner2, Richard Talbot16, Elizabeth Tseng8, Christopher K Tuggle4,12, Mick Watson1, Timothy P L Smith13, Alan L Archibald1.
Abstract
BACKGROUND: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility.Entities:
Keywords: genome annotation; pig; pig genomes; reference assembly
Mesh:
Year: 2020 PMID: 32543654 PMCID: PMC7448572 DOI: 10.1093/gigascience/giaa051
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Assembly statistics
| Statistic | Sscrofa10.2 | Sscrofa11 | Sscrofa11.1 | USMARCv1.0 | GRCh38.p13 |
|---|---|---|---|---|---|
| Total sequence length | 2,808,525,991 | 2,456,768,445 | 2,501,912,388 | 2,755,438,182 | 3,099,706,404 |
| Total ungapped length | 2,519,152,092 | 2,454,899,091 | 2,472,047,747 | 2,623,130,238 | 2,948,583,725 |
| No. of scaffolds | 9,906 | 626 | 706 | 14,157 | 472 |
| Gaps between scaffolds | 5,323 | 24 | 93 | 0 | 349 |
| No. of unplaced scaffolds | 4,562 | 583 | 583 | 14,136 | 126 |
| Scaffold N50 | 576,008 | 88,231,837 | 88,231,837 | 131,458,098 | 67,794,873 |
| Scaffold L50 | 1,303 | 9 | 9 | 9 | 16 |
| No. of unspanned gaps | 5,323 | 24 | 93 | 0 | 349 |
| No. of spanned gaps | 233,116 | 79 | 413 | 661 | 526 |
| No. of contigs | 243,021 | 705 | 1,118 | 14,818 | 998 |
| Contig N50 | 69,503 | 48,231,277 | 48,231,277 | 6,372,407 | 57,879,411 |
| Contig L50 | 8,632 | 15 | 15 | 104 | 18 |
| No. of chromosomes* | *21 | 19 | *21 | *21 | 24 |
Summary statistics for assembled pig genome sequences and comparison with current human reference genome (source: NCBI, https://www.ncbi.nlm.nih.gov/assembly/). *Includes mitochondrial genome.
Figure 1:Assemblies and radiation hybrid (RH) map alignments. Plots illustrating co-linearity between RH map and (a) Sscrofa11.1 and (b) USMARCv1.0 assemblies (autosomes only).
Summary of quality statistics for SSC1–18, SSCX
| Statistic | Bases, Sscrofa11 | % Genome | |
|---|---|---|---|
| Sscrofa11 | Sscrofa10.2 | ||
| High coverage | 119,341,205 | 4.9 | 2.6 |
| LC | 185,385,536 | 7.5 | 26.6 |
| Low proportion properly paired | 95,508,007 | 3.9 | 5.0 |
| High proportion large inserts | 40,835,320 | 1.7 | 1.5 |
| High proportion small inserts | 114,793,298 | 4.7 | 4.0 |
| LQ | 284,838,040 | 11.6 | 13.9 |
| Total LQLC | 399,927,747 | 16.3 | 33.1 |
| LQLC windows that do not intersect RepeatMasker regions | 39,918,551 | 1.6 | |
Quality measures and terms as defined [14]. LC: low coverage; LQ: low quality.
Annotation statistics for Ensembl annotation of pig (Sscrofa10.2, Sscrofa11.1, USMARCv1.0), human (GRCh38.p13), and mouse (GRCm38.p6) assemblies
| Statistic | Sscrofa10.2(Release 89) | Sscrofa11.1 (Release 98) | USMARCv1.0 (Release 97) | GRCh38.p13 (Release 98) | GRCm38.p6 (Release 98) |
|---|---|---|---|---|---|
| Coding genes | 21,630 (incl 10 RT) | 21,301 | 21,535 | 20,444 (incl 667 RT) | 22,508 (incl 270 RT) |
| Non-coding genes | 3,124 | 8,971 | 6,113 | 23,949 | 16,078 |
| Small non-coding genes | 2,804 | 2,156 | 2,427 | 4,871 | 5,531 |
| Long non-coding genes | 135 (incl 1 RT) | 6,798 | 3,307 | 16,857 (incl 304 RT) | 9,985 (incl 75 RT) |
| Miscellaneous non-coding genes | 185 | 17 | 379 | 2,221 | 562 |
| Pseudogenes | 568 | 1,626 | 674 | 15,214 (incl 8 RT) | 13,597 (incl 4 RT) |
| Gene transcripts | 30,585 | 63,041 | 58,692 | 227,530 | 142,446 |
| Genscan gene predictions | 52,372 | 46,573 | 152,168 | 51,756 | 57,381 |
| Short variants | 60,389,665 | 64,310,125 | 665,834,144 | 83,761,978 | |
| Structural variants | 224,038 | 224,038 | 6,013,113 | 791,878 |
Incl: including; RT: read through.
Figure 2:Visualization of improvements in assembly contiguity. Graphical visualization of contigs for Sscrofa11 (top) and Sscrofa10.2 (bottom) as alternating dark and light grey bars.