| Literature DB >> 25494611 |
Romain Blanc-Mathieu, Bram Verhelst, Evelyne Derelle, Stephane Rombauts, François-Yves Bouget, Isabelle Carré, Annie Château, Adam Eyre-Walker, Nigel Grimsley, Hervé Moreau, Benoit Piégu, Eric Rivals, Wendy Schackwitz, Yves Van de Peer, Gwenaël Piganeau1.
Abstract
BACKGROUND: Cost effective next generation sequencing technologies now enable the production of genomic datasets for many novel planktonic eukaryotes, representing an understudied reservoir of genetic diversity. O. tauri is the smallest free-living photosynthetic eukaryote known to date, a coccoid green alga that was first isolated in 1995 in a lagoon by the Mediterranean sea. Its simple features, ease of culture and the sequencing of its 13 Mb haploid nuclear genome have promoted this microalga as a new model organism for cell biology. Here, we investigated the quality of genome assemblies of Illumina GAIIx 75 bp paired-end reads from Ostreococcus tauri, thereby also improving the existing assembly and showing the genome to be stably maintained in culture.Entities:
Mesh:
Year: 2014 PMID: 25494611 PMCID: PMC4378021 DOI: 10.1186/1471-2164-15-1103
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Quality assessment statistics of assemblies of high throughput sequencing studies
|
|
|
|
| |
|---|---|---|---|---|
| ABySS | Velvet | |||
|
| Less than 5 consecutive unaligned bases. >95% identity | 10 (8) | 9 (3) | [ |
|
| 6 (0.6) | 5 (0.5) | ||
|
| 8 (2.8) | 2 (0.7) | ||
|
| Translocation, relocation and inversion | 1 (0.4) | 17 (37) | [ |
|
| 3 (2) | 6 (3) | ||
|
| 9 (0) | 9156 (250)1 | ||
1 the number of scaffolds was greatly reduced compared to the number of contigs.
Assembly Statistics of assemblers in
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| Velvet (41) | 2080 | 9539 | 12.3 | 2066 | 11.68 | 94 | 96 | 42 |
| ABySS (31) | 1490 | 14550 | 12.8 | 1474 | 11.87 | 95 | 98 | 43 |
| CLCbio (28) | 1402 | 14519 | 12.6 | 1394 | 11.96 | 96 | 98 | 42 |
1,2: percentage of aligned bases against the reference genome sequence and against the CDS sequences. 3: percentage of complete CDS within a single scaffold.
Figure 1Illumina DNAseq and RNAseq aligned against Ostreococcus tauri reference genome sequence. Colored numbered lines represent the 20 chromosomes of Ostreococcus tauri. The contiguity of the de novo assembly along the chromosomes ranges from 0 (white) to 28 scaffolds per 30 kb window (red). The inner blue track is the DNAseq coverage (from 0 to 582 reads per bp). The inner purple track is the RNAseq coverage averaged across 10 kb windows (from 0 to 1947 reads per bp). Figure generated with the RCircos software [51].
Correctness Statistics of each assembly assessed with
|
|
|
|
| ||
|---|---|---|---|---|---|
| Translocation | Relocation | Inversion | |||
| Velvet | 5 | 3 | 0 | 0.4 | 0.004 |
| ABySS | 8 | 4 | 0 | 0.9 | 0.009 |
| CLCbio | 6 | 7 | 0 | 0.9 | 0.009 |
Figure 2Saturation curve of coverage along the GenBank reference genome sequence. BWA alignment of 41 M Illumina paired-end reads subsets representing different sequencing depth (black line) and after NUCmer alignment of de novo scaffolds produced by a Velvet de novo assembly of these same paired-end reads subset (grey line).
Evolution of the Genome sequence between 2001 and 2009
| Chrom | Position | 2001 | 2009 | Type | CDS | Annotation |
|---|---|---|---|---|---|---|
| Ch3 | 333101 | T | C | Non-Syn | 0t03g02090 | Unknown |
| Ch3 | 829938 | T | A | Non-Syn | Ot03g05020 | Metal-dependent hydrolase |
| Ch5 | 180669 | C | T | Syn | Ot05g01240 | Transcription factor NF-X1 |
| Ch5 | 224089 | A | T | Non-Syn | Ot05g01550 | Dehydrogenase |
| Ch6 | 28989 | G | C | Nonsense | Ot06g00160 | Unknown |
| Ch6 | 772097 | G | A | Non-Syn | Ot06g04800 | Dynein 1-alpha heavy chain |
| Ch12 | 137126 | C | A | Non-Syn | Ot12g00990 | Glutamate receptor-related |
| Ch12 | 137173 | C | T | Non-Syn | Ot12g00990 | Glutamate receptor-related |
| Ch12 | 137177 | T | G del | Frameshift | Ot12g00990 | Glutamate receptor-related |
| Ch17 | 13580 | C | GTCCAT del | Deletion | Ot17g00070 | Heat shock protein 90 |
| Ch9 | 145 | A | C ins | Insertion | non coding | Telomeric region |
Figure 3Localization of the substitutions between 2001 and 2009 within two genes. A: ostta06g00130 (Ot06g00160), B: gene organization of ostta12g00065 (Ot12g00160), C: Transmembrane organization of the two encoded proteins, left : Arabidopsis glutamate-like receptors homologous to Ot12g00160 from Lam et al. [58], right : TMHMM prediction for Ot06g00160.
Genome annotation update of
| Version | Total size (Mbp) | Nb CDS | Average gene length (bp) | Nb of genes with introns | Average intron size | Nb of TE |
|---|---|---|---|---|---|---|
| 2006 | 12.5 | 7 890 | 1 290 | 3 186 (39%) | 103 | 417 |
| 2013 | 12.9 | 7 699 | 1 387 | 1 440 (19%) | 140 | 319 |