| Literature DB >> 26645680 |
Shaun D Jackman1, René L Warren1, Ewan A Gibb1, Benjamin P Vandervalk1, Hamid Mohamadi1, Justin Chu1, Anthony Raymond1, Stephen Pleasance1, Robin Coope1, Mark R Wildung2, Carol E Ritland3, Jean Bousquet4, Steven J M Jones5, Joerg Bohlmann6, Inanç Birol7.
Abstract
The genome sequences of the plastid and mitochondrion of white spruce (Picea glauca) were assembled from whole-genome shotgun sequencing data using ABySS. The sequencing data contained reads from both the nuclear and organellar genomes, and reads of the organellar genomes were abundant in the data as each cell harbors hundreds of mitochondria and plastids. Hence, assembly of the 123-kb plastid and 5.9-Mb mitochondrial genomes were accomplished by analyzing data sets primarily representing low coverage of the nuclear genome. The assembled organellar genomes were annotated for their coding genes, ribosomal RNA, and transfer RNA. Transcript abundances of the mitochondrial genes were quantified in three developmental tissues and five mature tissues using data from RNA-seq experiments. C-to-U RNA editing was observed in the majority of mitochondrial genes, and in four genes, editing events were noted to modify ACG codons to create cryptic AUG start codons. The informatics methodology presented in this study should prove useful to assemble organellar genomes of other plant species using whole-genome shotgun sequencing data.Entities:
Keywords: ABySS; genome assembly; gymnosperms; organelle; sequencing; white spruce
Mesh:
Year: 2015 PMID: 26645680 PMCID: PMC4758241 DOI: 10.1093/gbe/evv244
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Methods of cpDNA Separation, Sequencing, and Assembly of Complete Plastid Genomes of Gymnosperms Published
| Species | cpDNA Separation | Sequencing | Sequence Assembler Software Tool |
|---|---|---|---|
| BLAST in silico | 454 GS FLX Titaniuma | Newbler | |
| Saline Percoll gradient | Illumina MiSeq | Newbler | |
| Longer-range PCR | Illumina GAIIa | Geneious | |
| Other | Unspecified | Illumina MiSeq | Velvet |
| BLAT in silico | Illumina HiSeq 2000 | SOAPdenovo |
aFinished with PCR and Sanger sequencing.
Sequencing, Assembly, and Annotation Metrics of the White Spruce Organellar Genomes
| Metric | Plastid | Mitochondrion |
|---|---|---|
| Number of lanes | 1 MiSeq lane | 1 HiSeq lane |
| Number of read pairs | 4.9 million | 133 million |
| Read length | 2 × 300 bp | 2 × 150 bp |
| Number of merged reads | 3.0 million | 1.4 million |
| Median merged read length | 492 bp | 465 bp |
| Number of assembled reads | 21,000 | 377,000 |
| Proportion of organellar reads | 1/140 or 0.7% | 1/350 or 0.3% |
| Depth of coverage | 80× | 30× |
| Assembled genome size | 123,266 bp | 5.94 Mb |
| Number of contigs | 1 contig | 130 contigs |
| Contig N50 | 123 kb | 102 kb |
| Number of scaffolds | 1 scaffold | 36 scaffolds |
| Scaffold N50 | 123 kb | 369 kb |
| Largest scaffold | 123 kb | 1,222 kb |
| GC content | 38.8% | 44.7% |
| Number of genes without ORFs | 114 (108) | 143 (74) |
| Protein-coding genes (mRNA) | 74 (72) | 106 (51) |
| rRNA genes | 4 (4) | 8 (3) |
| tRNA genes | 36 (32) | 29 (20) |
| ORFs | Not available | 1,065 |
| Coding genes containing introns | 8 | 5 |
| Introns in coding genes | 9 | 7 |
| tRNA genes containing introns | 6 | 0 |
Note.—The number of distinct genes are shown in parentheses.
FThe complete plastid genome of white spruce. The PG29 white spruce chloroplast genome was annotated using MAKER and plotted using OrganellarGenomeDRAW (Lohse et al. 2007). The inner gray track depicts the G+C content of the genome.
FRelative order and size of genes on the scaffolds of the white spruce mitochondrial genome. Each box is proportional to the size of the gene including introns, except that genes smaller than 200 bp are shown as 200 bp. The space between genes is not to scale. An asterisk indicates that the gene name is truncated. Only scaffolds that harbor annotated genes are shown.
FGene content of the white spruce mitochondrial genome, grouped by gene family. Each box is proportional to the size of the gene including introns. The color of each gene is unique within its gene family.
FRepetitive sequence content of the white spruce mitochondrial genome, annotated using RepeatMasker and RepeatModeler.
FHeatmap of the transcript abundance of mitochondrial protein-coding genes of white spruce. Each column is a tissue sample. Each row is a gene. Each cell represents the transcript abundance of one gene in one sample. The color scale is log10(TPM+1), where TPM is transcripts per million as measured by Salmon (Patro et al. 2014).
Number of Expressed Protein-Coding Genes and ORFs of the White Spruce Mitochondrial Transcriptome Tabulated by Developmental Stage
| Both | Mature Only | Developing Only | Neither | Sum | |
|---|---|---|---|---|---|
| CDS | 60 | 0 | 29 | 17 | 106 |
| ORF | 411 | 16 | 2,809 | 3,029 | 6,265 |
| Sum | 471 | 16 | 2,838 | 3,046 | 6,371 |
FHeatmap of the transcript abundance of mitochondrial protein-coding genes of white spruce, including ORFs. Each column is a tissue sample. Each row is a gene. Each cell represents the transcript abundance of one gene in one sample. The color scale is log10(TPM+1), where TPM is transcripts per million as measured by Salmon (Patro et al. 2014).