| Literature DB >> 31513692 |
Rodrigo P Baptista1,2, Jessica C Kissinger1,2,3.
Abstract
Advances in genomics have made whole genome studies increasingly feasible across the life sciences. However, new technologies and algorithmic advances do not guarantee flawless genomic sequences or annotation. Bias, errors, and artifacts can enter at any stage of the process from library preparation to annotation. When planning an experiment that utilizes a genome sequence as the basis for the design, there are a few basic checks that, if performed, may better inform the experimental design and ideally help avoid a failed experiment or inconclusive result.Entities:
Mesh:
Year: 2019 PMID: 31513692 PMCID: PMC6742220 DOI: 10.1371/journal.ppat.1007901
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Fig 1Common genome assembly problems.
(A) Expected genome organization with roughly equal distribution of aligned reads across the genome sequence. (B) Illustration of a collapsed repeat region and detection via an accumulation of mapped reads resulting in a peak region in the depth coverage plot. (C) An 85-kb region shown for four strains of Toxoplasma gondii chr VI. Contiguous reads are shown as yellow and green horizontal lines. Annotated genes are shown in blue (forward strand) and red (reverse strand). Grey shading indicates orthology. The region defined by the orange window near 270-kb mark (top ruler) highlights the gap in contigs for two strains likely caused by the repetitive surface antigen genes located in the 238–275-kb region.
Genome sequence status of common eukaryotic hosts and pathogens.
| Species | K | Genome Assembly Status | Initial Release | Most Recent | ||||
|---|---|---|---|---|---|---|---|---|
| Scaffolds | Gaps | N’s | Genome Size (MB) | Assembly | Annotation | |||
| 8 | 8 | 11 | 575,000 | 28.75 | 2005 | 2005 | 2019 | |
| 8 | 8 | 80 | 6,259 | 14.28 | 2004 | 2016 | 2018 | |
| 7 | 99 | 660 | 165,810 | 12.49 | 2015 | 2015 | 2017 | |
| 4 | 7 | 4 | 400 | 29.01 | 2004 | 2015 | 2015 | |
| 14 | 14 | 8 | 13,078 | 18.37 | 2011 | 2011 | 2011 | |
| 4 | 31 | 414 | 234,405 | 36.45 | 2003 | 2017 | 2017 | |
| 15 | 114 | 1,277 | 1,450,151 | 61.38 | 2007 | 2015 | 2015 | |
| - | 280 | 2,592 | 2,381,013 | 33.03 | 2005 | 2014 | 2014 | |
| 7 | 53 | 163 | 29,800 | 40.97 | 2003 | 2016 | 2016 | |
| 5 | 57 | 499 | 483,815 | 29.95 | 2008 | 2008 | 2014 | |
| - | 4,921 | 13,367 | 38,410,029 | 228.54 | 2006 | 2014 | 2014 | |
| - | 295 | 1 | 214 | 60.25 | 2006 | 2018 | 2018 | |
| - | 70 | 0 | 0 | 8.39 | 2015 | 2015 | 2015 | |
| 16 | 17 | 0 | 0 | 12.16 | 1999 | 2014 | 2018 | |
| 5 | 7 | 95 | 185,644 | 119.66 | 2001 | 2018 | 2019 | |
| 12 | 58 | 256 | 117,485 | 374.42 | 2002 | 2015 | 2018 | |
| 10 | 598 | 2,522 | 30,732,878 | 2,135.08 | 2010 | 2017 | 2017 | |
| 7 | 22 | 692,976 | 275,682,619 | 14,547.26 | 2017 | 2018 | 2018 | |
| 23 | 473 | 875 | 151,122,679 | 3,099.73 | 2002 | 2015 | 2019 | |
| 21 | 162 | 634 | 78,088,216 | 2,730.85 | 2004 | 2017 | 2017 | |
| 35 | 525 | 946 | 9,784,460 | 1,065.36 | 2004 | 2018 | 2018 | |
| 3 | 8,145 | 8,735 | 12,572,948 | 265.02 | 2002 | 2014 | 2018 | |
| 15 | 369,492 | 201,145 | 376,910,010 | 1765.38 | 2008 | 2012 | 2017 | |
| - | 384 | 2,808 | 2,576,247 | 42.01 | 2013 | 2013 | 2014 | |
| 4 | 13 | 0 | 0 | 8.17 | 2007 | 2007 | 2007 | |
| 8 | 358 | 7 | 119 | 8.91 | 2004 | 2004 | 2013 | |
| 8 | 8 | 10 | 14,600 | 9.10 | 2004 | 2007 | 2018 | |
| - | 2,297 | 1,276 | 71,547 | 44.03 | 2016 | 2016 | 2016 | |
| 14 | 4,665 | 8,063 | 686,045 | 51.89 | 2013 | 2013 | 2015 | |
| - | 1,529 | 643 | 64,300 | 20.83 | 2005 | 2005 | 2014 | |
| - | 92 | 214 | 21,400 | 11.21 | 2007 | 2007 | 2014 | |
| 36 | 36 | 0 | 7 | 32.85 | 2005 | 2005 | 2019 | |
| 14 | 100 | 123 | 126,523 | 18.56 | 2014 | 2014 | 2019 | |
| 14 | 15 | 0 | 0 | 23.32 | 1998 | 2016 | 2019 | |
| 14 | 242 | 340 | 137,629 | 29.04 | 2013 | 2018 | 2018 | |
| - | 871 | 2,320 | 1,981,126 | 124.40 | 2014 | 2014 | 2015 | |
| 4 | 8 | 2 | 200 | 8.35 | 2005 | 2005 | 2015 | |
| 14 | 2,277 | 244 | 203,077 | 65.66 | 2008 | 2013 | 2015 | |
| - | 64,769 | 8,181 | 818,393 | 176.42 | 2005 | 2005 | 2014 | |
| 11 | 12 | 39 | 3,590 | 26.07 | 2005 | 2005 | 2019 | |
| 41 | 29,495 | 3,251 | 325,100 | 89.93 | 2005 | 2005 | 2019 | |
Data obtained from NCBI Genbank and EuPathDB.org, Release 41, Dec 2018.
a Karyotype
b Ambiguous bases
“-”Unknown