| Literature DB >> 31937263 |
Nicolae Herndon1, Jennifer Shelton2, Lizzy Gerischer3, Panos Ioannidis4, Maria Ninova5, Jürgen Dönitz6, Robert M Waterhouse7, Chun Liang8, Carsten Damm9, Janna Siemanowski6, Peter Kitzmann6, Julia Ulrich6, Stefan Dippel10, Georg Oberhofer6, Yonggang Hu6, Jonas Schwirz6, Magdalena Schacht6, Sabrina Lehmann6, Alice Montino6, Nico Posnien11, Daniela Gurska12, Thorsten Horn12, Jan Seibert12, Iris M Vargas Jentzsch12, Kristen A Panfilio13, Jianwei Li14, Ernst A Wimmer15, Dominik Stappert16, Siegfried Roth16, Reinhard Schröder17, Yoonseong Park18, Michael Schoppmeier19, Ho-Ryun Chung20, Martin Klingler21, Sebastian Kittelmann22, Markus Friedrich23, Rui Chen24, Boran Altincicek25, Andreas Vilcinskas26, Evgeny Zdobnov4, Sam Griffiths-Jones5, Matthew Ronshaugen5, Mario Stanke27, Sue J Brown28, Gregor Bucher29.
Abstract
BACKGROUND: The red flour beetle Tribolium castaneum has emerged as an important model organism for the study of gene function in development and physiology, for ecological and evolutionary genomics, for pest control and a plethora of other topics. RNA interference (RNAi), transgenesis and genome editing are well established and the resources for genome-wide RNAi screening have become available in this model. All these techniques depend on a high quality genome assembly and precise gene models. However, the first version of the genome assembly was generated by Sanger sequencing, and with a small set of RNA sequence data limiting annotation quality.Entities:
Keywords: Gene annotation; Gene prediction; Gene set OGS3; Genome; Genome assembly Tcas5.2; Reannotation; RefSeq genome; Tribolium castaneum; miRNA; microRNA
Year: 2020 PMID: 31937263 PMCID: PMC6961396 DOI: 10.1186/s12864-019-6394-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Ungapped length and spanned gaps before and after running GapFiller
| Molecule | Ungapped length before | Spanned gaps before | Ungapped length after | Spanned gaps after |
|---|---|---|---|---|
| LG1 = X | 7,071,107 | 301 | 7,096,881 | 201 |
| LG2 | 14,229,660 | 359 | 14,306,202 | 192 |
| LG3 | 28,072,007 | 1451 | 28,315,770 | 929 |
| LG4 | 11,540,046 | 300 | 11,632,658 | 160 |
| LG5 | 14,111,830 | 358 | 14,196,565 | 193 |
| LG6 | 8,262,430 | 555 | 8,332,882 | 407 |
| LG7 | 15,084,119 | 429 | 15,185,902 | 258 |
| LG8 | 12,870,760 | 577 | 12,987,347 | 378 |
| LG9 | 14,900,846 | 634 | 15,007,071 | 384 |
| LG10 | 7,070,154 | 498 | 7,128,489 | 365 |
| Unplaced multi-contig | 14,079,574 | 1111 | 14,205,681 | 874 |
| Unplaced single-contig | 4,020,722 | – | 4,021,060 | – |
| Total | 151,313,255 | 6573 | 152,416,508 | 4341 |
Assembly improvement
| Assembly | Length | Scaffolds | Scaffold N50 (kbp) |
|---|---|---|---|
| Tcas 3.0 | 160,445,652 | 2320 | 976.4 |
| After Atlas-Link | 160,667,144 | 2240 | 1175.4 |
| After GapFiller | 160,744,700 | 2240 | 1176.7 |
| After BioNano Genomics / Tcas 5.2 | 165,921,904 | 2148 | 4753.0 |
Read alignments to OGS2 and OGS3 transcript sets. The numbers of alignments are shown. Only the best alignment(s) for each read are reported. The last row suggests that OGS2 may have a slight bias towards highly expressed genes
| OGS2 | OGS3 | |
|---|---|---|
| Total number of alignments | 4,634,356,882 | 7,418,675,525 |
| Number of alignments per transcript | 278,926 | 400,317 |
| Number of aligned reads per exon position | 285.77 | 260.45 |
Annotation improvement
| OGS2 | OGS3 | |
|---|---|---|
| Number of genes | 16,561 | 16,593 |
| Average coding length | 1341 bp | 1473 bp |
| Number of coding exons per transcript | 4.32 | 5.02 |
| GC content | 0.4597% | 0.4625% |
| Fraction of single exon genes | 17.66% | 17.74% |
| Number of introns (excluding UTR) | 54,909 (54875) | 63,211 (58837) |
| Fraction of RNA-Seq-supported introns | 76.3% | 86.2% |
| Average intron length | 1167 bp | 1362 bp |
BUSCO analysis
| Tcas OGS2 | Tcas OGS3 | Dmel r16.19 | Amel 4.5 | Ptep 2.0 | |
|---|---|---|---|---|---|
| Complete | 1058 (99.3%) | 1061 (99.6%) | 1063 (99.8%) | 1043 (97.9%) | 1007 (94.4%) |
| Complete single copy | 1054 (98.9%) | 1056 (99.1%) | 1055 (99%) | 1038 (97.4%) | 966 (90.6%) |
| Complete duplicated | 4 (0.4%) | 5 (0.5%) | 8 (0.8%) | 5 (0.5%) | 41 (3.8%) |
| Fragmented | 5 (0.5%) | 2 (0.2%) | 0 (0%) | 15 (1.4%) | 18 (1.7%) |
| Missing | 3 (0.2%) | 3 (0.2%) | 3 (0.2%) | 8 (0.7%) | 41 (3.9%) |
| Genes in BUSCO profile | 1066 | 1066 | 1066 | 1066 | 1066 |
Fig. 1Protein evolution in selected model organisms. a An alignment-based comparison of the protein sequences of 1263 single-copy orthologs indicate that the proteome of Tribolium is more conserved than that of the main invertebrate models Drosophila melanogaster (DMELA) or Caenorhabditis elegans (CELEG). Sequences of annelids are more conserved. Shown is Capitella teleta - see Raible et al. 2005 for Platynereis dumerilii. The tree was rooted using the Mus musculus (Mammalia) as outgroup. The distances are shown as substitutions per site. b An alignment-free comparison shows the same trend but with lower resolution. DMELA: Drosophila melanogaster; TCAST: Tribolium castaneum; CELEG: Caenorhabditis elegans; CTELE: Capitella telata; MMUSC: Mus musculus
Mate pairs jumping library statistics
| FastQ | Total reads | Total length |
|---|---|---|
| 3kb_1 | 23,677,983 | 2,120,896,823 |
| 3kb_2 | 23,677,983 | 2,123,186,604 |
| 8kb_1 | 23,202,365 | 2,093,651,921 |
| 8kb_2 | 23,202,365 | 2,096,015,114 |
| 20kb_1 | 12,884,671 | 1,151,209,160 |
| 20kb_2 | 12,884,671 | 1,153,515,873 |
Number of scaffolds and ungapped length before and after running Atlas-Link
| Molecule | Scaffolds before | Ungapped length before | Scaffolds after | Ungapped length after | Unplaced scaffolds added | Unplaced ungapped length added |
|---|---|---|---|---|---|---|
| LG1 = X | 13 | 7,011,684 | 13 | 7,071,107 | 2 | 59,423 |
| LG2 | 20 | 14,013,343 | 18 | 14,229,660 | 2 | 216,317 |
| LG3 | 35 | 27,022,651 | 29 | 28,072,007 | 8 | 1,049,356 |
| LG4 | 7 | 11,540,046 | 6 | 11,540,046 | – | – |
| LG5 | 17 | 13,832,902 | 17 | 14,111,830 | 3 | 278,928 |
| LG6 | 15 | 8,229,537 | 12 | 8,262,430 | 2 | 32,893 |
| LG7 | 18 | 14,841,431 | 15 | 15,084,119 | 3 | 242,688 |
| LG8 | 16 | 12,760,817 | 14 | 12,870,760 | 1 | 109,943 |
| LG9 | 21 | 14,567,469 | 21 | 14,900,846 | 2 | 333,377 |
| LG10 | 14 | 7,043,942 | 12 | 7,070,154 | 1 | 26,212 |
| Unplaced multi-contig | 305 | 16,272,476 | 263 | 14,079,574 | ||
| Unplaced single-contig | 1839 | 4,176,957 | 1820 | 4,020,722 | ||
| Total | 2320 | 151,313,255 | 2240 | 151,313,255 |
Number of scaffolds, scaffolds’ length, and N50 before and after using BNG consensus maps
| Molecule | Scaffolds before | Scaffolds after | Length (Mb) before | Length (Mb) after | N50 (kb) before | N50 (kb) after | Unplaced scaffolds added |
|---|---|---|---|---|---|---|---|
| LG1 = X | 13 | 4 | 7.34 | 8.92 | 1160.70 | 7264.05 | 2 |
| LG2 | 18 | 8 | 14.78 | 15.034064 | 1207.76 | 9314.472 | 0 |
| LG3 | 29 | 18 | 29.78 | 31.017975 | 1409.81 | 2672.697 | 3 |
| LG4 | 6 | 3 | 12.11 | 12.24 | 2906.70 | 9484.15 | 0 |
| LG5 | 17 | 7 | 14.64 | 15.36 | 1402.64 | 4484.65 | 1 |
| LG6 | 12 | 9 | 9.02 | 9.25 | 956.12 | 2189.88 | 0 |
| LG7 | 15 | 6 | 15.74 | 16.48 | 1333.70 | 8809.74 | 0 |
| LG8 | 14 | 9 | 13.66 | 13.98 | 1312.85 | 4002.45 | 1 |
| LG9 | 21 | 10 | 15.81 | 16.12 | 893.90 | 4920.63 | 0 |
| LG10 | 12 | 11 | 7.54 | 8.84 | 1198.49 | 1224.30 | 3 |
| Unplaced | 2083 | 2072 | 20.33 | 17.35 | 150.43 | 104.32 | 2 |
| Total | 2240 | 2157 | 160.74 | 164.60 | 1160.70 | 4002.45 | 12 |