| Literature DB >> 27503335 |
Xuewei Li1, Ling Kui2, Jing Zhang3, Yinpeng Xie1, Liping Wang1, Yan Yan1, Na Wang1, Jidi Xu1, Cuiying Li1, Wen Wang2, Steve van Nocker4, Yang Dong5,6, Fengwang Ma7, Qingmei Guan8.
Abstract
BACKGROUND: Domesticated apple (Malus × domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9 × genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses.Entities:
Keywords: Apple; Illumina sequencing; Malus x domestica; PacBio sequencing
Mesh:
Year: 2016 PMID: 27503335 PMCID: PMC4976516 DOI: 10.1186/s13742-016-0139-0
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Statistics of the completeness of the hybrid de novo assembly genome of ‘Golden Delicious’ based on 248 core eukaryotic genes, produced by the software CEGMA [7] with default parameters
| Group | #Prots | %Completeness | #Total | Average | %Ortho |
|---|---|---|---|---|---|
| Complete | 231 | 93.15 | 545 | 2.36 | 74.46 |
| Group1 | 63 | 95.45 | 127 | 2.02 | 66.67 |
| Group2 | 50 | 89.29 | 120 | 2.40 | 78.00 |
| Group3 | 58 | 95.08 | 136 | 2.34 | 72.41 |
| Group4 | 60 | 92.31 | 162 | 2.70 | 81.67 |
| Partial | 243 | 97.98 | 710 | 2.92 | 86.01 |
| Group1 | 64 | 96.97 | 173 | 2.70 | 82.81 |
| Group2 | 54 | 96.43 | 159 | 2.94 | 87.04 |
| Group3 | 61 | 100.00 | 181 | 2.97 | 88.52 |
| Group4 | 64 | 98.46 | 197 | 3.08 | 85.94 |
#Prots: number of 248 ultra-conserved CEGs present in genome
%Completeness: percentage of 248 ultra-conserved CEGs present
Total: total number of CEGs present including putative orthologs
Average: average number of orthologs per CEG
%Ortho: percentage of detected CEGS that have more than 1 ortholog
‘Complete’: predicted proteins in the set of 248 CEGs that, when aligned to the HMM (a hidden markov model) for the KOG (eukaryotic orthologous groups) for that protein family, give an alignment length that is at least 70 % of the protein length
‘Partial’: If a protein is not complete, but exceeds a pre-computed minimum alignment score, then we call the protein ‘partial’. The pre-computed scores are all in the file CEGMA/data/completeness_cutoff.tbl [7]
CEGs: core eukaryotic genes
Statistics for ‘Golden Delicious’ genome protein-coding sequences annotation
| Gene_number | Avg_mRNA_length (bp) | Total_exon_number | Avg_exon_length (bp) | Avg_cds_length (bp) | Avg_exon_number | Total_intron_length (bp) | ||
|---|---|---|---|---|---|---|---|---|
|
| augustus | 37693 | 2233.785106 | 203848 | 166.933235 | 902.793781 | 5.408113 | 50169056 |
| genscan | 33206 | 8849.329489 | 210077 | 158.970511 | 1005.723303 | 6.326477 | 260454787 | |
| glimmerHMM | 48129 | 1404.407447 | 151751 | 182.492643 | 575.400299 | 3.153005 | 39899285 | |
| snap | 73555 | 936.269975 | 219634 | 162.207063 | 484.347577 | 2.985983 | 33241152 | |
| Homolog |
| 7000 | 2320.829429 | 46309 | 139.074802 | 920.059286 | 6.615571 | 9805391 |
|
| 8578 | 2427.167172 | 60008 | 137.457522 | 961.593728 | 6.995570 | 12571689 | |
|
| 11000 | 1887.083182 | 61308 | 137.971668 | 768.978818 | 5.573455 | 12299148 | |
|
| 9000 | 2623.029667 | 67760 | 135.473332 | 1019.963667 | 7.528889 | 14427594 | |
|
| 30585 | 2321.131764 | 207830 | 138.869210 | 943.638646 | 6.795161 | 42130627 | |
|
| 12733 | 2431.885573 | 93666 | 134.420665 | 988.820074 | 7.356161 | 18374553 | |
|
| 34642 | 2833.118267 | 256347 | 129.467222 | 958.043242 | 7.399890 | 64956349 | |
|
| 17175 | 2460.852402 | 118296 | 138.772773 | 955.823231 | 6.887686 | 25848876 | |
|
| 22341 | 2004.558569 | 130795 | 138.548645 | 811.130657 | 5.854483 | 26662373 | |
| RNA-seq | GDflorwer1 | 48423 | 2234.847387 | 212811 | 300.027231 | 1318.569585 | 4.394833 | 49998557 |
| GDflorwer2 | 49952 | 2231.126001 | 220057 | 304.286867 | 1340.495976 | 4.405369 | 50837822 | |
| GDflorwer3 | 49848 | 2242.481785 | 223056 | 305.307031 | 1366.164440 | 4.474723 | 49515976 | |
| GDleaf1 | 45034 | 2258.958920 | 203894 | 296.653634 | 1343.116223 | 4.527557 | 46765622 | |
| GDleaf2 | 44669 | 2300.217086 | 204106 | 298.250576 | 1362.795943 | 4.569299 | 47700782 | |
| GDleaf3 | 45220 | 2292.436975 | 206566 | 301.208723 | 1375.928372 | 4.568023 | 47304519 | |
| GDstem1 | 46908 | 2299.298840 | 212019 | 308.944807 | 1396.396542 | 4.519890 | 48015182 | |
| GDstem2 | 46271 | 2308.347604 | 209286 | 307.787090 | 1392.136090 | 4.523049 | 48368862 | |
| GDstem3 | 46657 | 2296.511542 | 209284 | 310.624348 | 1393.332319 | 4.485586 | 48454706 | |
| EVM | 53922 | 1793.161066 | 221394 | 167.775983 | 688.857906 | 4.105820 | 59546235 |