| Literature DB >> 27632164 |
Lauren Coombe1, René L Warren1, Shaun D Jackman1, Chen Yang1, Benjamin P Vandervalk1, Richard A Moore1, Stephen Pleasance1, Robin J Coope1, Joerg Bohlmann2, Robert A Holt1, Steven J M Jones1, Inanc Birol1.
Abstract
The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.Entities:
Mesh:
Year: 2016 PMID: 27632164 PMCID: PMC5025161 DOI: 10.1371/journal.pone.0163059
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Alignments of the Sitka spruce chloroplast genome to the white spruce and Norway spruce chloroplast genomes.
The cross_match alignments were visualized using XMatchView. Histograms at the top and bottom show the sequence identity (S.I.) over the length of the alignments, including those from repeated sequences. The dark blue represents sequences repeated only once, while the light blue represents sequences repeated twice. The middle section represents co-linear and inverted sequence alignment blocks in blue and pink, respectively.
Fig 2Molecular Phylogenetic analysis of five conifer chloroplast genomes by Maximum Likelihood method.
The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model [20]. The tree with the highest log likelihood is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 5 chloroplast genome nucleotide sequences, white spruce (Picea glauca genotype PG29, KT634228.1 [4]), Norway spruce (P. abies NC021456.1 [5]), Sitka spruce (P. sitchensis [2] from *our study KU215903.1 and from **previous public genome sequence EU998739.3 [19]), and Japanese black pine (Pinus thunbergii NC_001631.1 [21]). Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There were a total of 106,346 positions in the final dataset. Evolutionary analyses were conducted in MEGA7 [22].
Fig 3The complete plastid genome of Sitka spruce.
The Sitka spruce chloroplast genome was annotated using MAKER and plotted using OrganellarGenomeDRAW [25]. The inner grey track depicts the G+C content of the genome. The genome comprises 74 coding genes, 4 ribosomal RNA (rRNA), 36 transfer RNA (tRNA) genes and 14 ORFs.