Literature DB >> 28721204

Initial genome sequencing of the sugarcane CP 96-1252 complex hybrid.

Jason R Miller¹, Kari A Dilley¹, Derek M Harkins¹, Manolito G Torralba², Kelvin J Moncera², Karen Beeri², Karrie Goglin², Timothy B Stockwell¹, Granger G Sutton¹, Reed S Shabman¹.

Abstract

The CP 96-1252 cultivar of sugarcane is a complex hybrid of commercial importance. DNA was extracted from lab-grown leaf tissue and sequenced. The raw Illumina DNA sequencing results provide 101 Gbp of genome sequence reads. The dataset is available from https://www.ncbi.nlm.nih.gov/bioproject/PRJNA345486/.

Entities: Disease Species

Keywords: DNA sequencing; Sugarcane genome; sequencing reads

Year: 2017 PMID： 28721204 PMCID： PMC5497815 DOI： 10.12688/f1000research.11629.1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

Sugarcane is an important crop for food and energy production. The genomes of modern cultivars are hybrids of species that are themselves polyploid; see for example ( Vilela ). Selected genomic BAC sequences have been sequenced and assembled ( de Setta ) ( Okura ). Chloroplast and mitochondrial genomes have been published ( Asano ) ( Shearman ), as have several transcriptomes ( Cardoso-Silva ). Whole genome sequence assemblies have not been published. CP 96-1252 is the top commercial sugarcane cultivar in Florida, USA ( Sandhu & Davidson, 2016). CP 96-1252 was developed by USDA-ARS, the University of Florida, and the Florida Sugar Cane League and released to growers in 2003. CP 96-1252 is a complex hybrid of Saccharum officinarum L., S. barberi Jeswiet, S. spontaneum L., and S. sinense Roxb. amend. Jeswiet ( Edmé ). Toward better understanding of this cultivar through its genome sequence, DNA reads were generated and made public.

Methods

Using lab-grown plantlets, kindly provided by USDA, 14 g of tissue was harvested from the leaves of Saccharum hybrid cultivar CP 96-1252 (Reg. no CV-120, PI 634935, NCBI taxon ID 1983727). DNA was extracted from purified plant nuclei at Amplicon Express (Pullman, WA, USA). Separately, DNA was extracted from whole cells at JCVI (Rockville, MD, USA) using a Qiagen Plant DNA isolation kit. Extracted DNA was fragmented and size selected on the Blue Pippin (Sage Scientific) prior to library construction to ensure a 260 bp insert size. Standard Illumina PE libraries were generated using the NEBNext kit (NEB). Libraries were size selected, QC’d and quantified by qPCR prior to sequencing. Barcode BS78 AGCCATGC was used for the nuclei prep library and barcode BS79 AGGCTAAC was used for the cell prep library. The libraries were generated and sequenced at the JCVI sequencing core in La Jolla, CA, USA. To test for bacterial contamination, both DNA samples plus negative controls were used to generate amplicon libraries targeting the V4 16S region followed by Illumina MiSeq sequencing. These reads were processed by a pipeline using usearch version 8.1.1.1861 for clustering ( Edgar, 2017), mothur version 1.36.1 for taxonomic classification ( Schloss ), and the SILVA SSURef NR99 123 database for reference ( Quast ). Hits to chloroplast and mitochondria were observed as expected, but bacteria were virtually absent and similar to controls. An Illumina NextSeq 500 instrument was used to generate paired 150 bp shotgun reads. Run #1 applied the Illumina High Output kit to libraries BS78 and BS79. Run #1 instrument metrics were: 1.8 pM pool loaded, 1% PhiX spike-in with 1.8% aligned, cluster density 138 K/mm 2, 96% pass filter, and 106 Gbp in 345 M PE reads. Barcode analysis indicated 46% BS78 and 49% BS79. Run #2 applied the Illumina High Output kit to library BS78 only. Run #2 metrics were: 1.8 pM pool loaded, 1% PhiX spike-in with 1% aligned, and 110 Gbp in 360 M PE reads. The resulting FASTQ files contained 101 Gbp in 161 M pairs from BS78 run #1, 169 M pairs from BS79 run #1, and 341 M pairs from BS78 run #2.

Dataset validation

To confirm sugarcane origin of the DNA, the run #1 reads were mapped to available BACs, namely the 608 Kbp of R570 BACs (GenBank accessions KF184657.1 to KF184973.1 ( de Setta )). Reads were mapped with bowtie2 ( Langmead & Salzberg, 2012) version 2.2.5 with options “-p 4 --no-unal --no-mixed --no-discordant --end-to-end --fast”. Both sequencing libraries demonstrated concordant pair mapping rates of 4.1% unique, 27% repeat, and 69% unmapped. Genome coverage analysis was inconclusive; the K-mer frequency distribution computed by Jellyfish ( Marçais & Kingsford, 2011) version 2.2.4, with K=17 showed no peak above 1X coverage.

Data availability

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2017 Miller JR et al. Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication). The data are available at NCBI SRA under BioProject PRJNA345486, Study SRP091668. Amplified reads from BS78 and BS79 have respective accessions SRR5500242 and SRR5500243. Genomic reads from BS78 have accessions are SRR5500246 and SRR5500247. Genomic reads from BS79 have accession SRR5500249. The manuscript describes the generation of whole genome shotgun sequence data from two separate DNA preparation methods. The methods for data generation are clearly described and the sample that was used has ample information about its origins publicly available and referenced. This dataset will be useful for SNP discovery and comparative genomics of sugarcane cultivars. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The data note reported was produced by 150bp paired-end Illumina sequencing of genomic DNA prepared from the sugarcane variety CP-96-1252. A raw data set of 101 Gbps was generated and made public available.The authors did not present assemblage data which would be useful for the research community interested in sugarcane genomics. Sequence coverage was not stimated but it seems to be under 1X. Sugarcane commercial varieties are hybrids between Saccharum officinarum and Saccharum spontaneum. These two parents are highly complex polyploids with ploidy varying from 8-12. In general, the hybrids conserve ~75% of the S. officinarum and 15% of S. spontaneum intact. Around 10% of the hybrid genome are chromosomal recombinants between the two species. This complex situation makes it very difficult assembling large non-chimeric contigs especially using short insert shotgun sequencing. The high quality data set presented in this data note is of value for those interested in recover short gene regions of interest. Because sugarcane genome sequencing dataset is very scarse I recommend the publication of the note presented here as a source of genome data for the sugarcane community. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

10 in total

1. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Authors: Guillaume Marçais; Carl Kingsford
Journal: Bioinformatics Date: 2011-01-07 Impact factor: 6.937

2. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

3. Complete nucleotide sequence of the sugarcane (Saccharum officinarum) chloroplast genome: a comparative analysis of four monocot chloroplast genomes.

Authors: Takayuki Asano; Takahiko Tsudzuki; Sakiko Takahashi; Hiroaki Shimada; Koh-ichi Kadowaki
Journal: DNA Res Date: 2004-04-30 Impact factor: 4.458

4. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies.

Authors: Patrick D Schloss; Dirk Gevers; Sarah L Westcott
Journal: PLoS One Date: 2011-12-14 Impact factor: 3.240

5. Building the sugarcane genome for biotechnology and identifying evolutionary trends.

Authors: Nathalia de Setta; Cláudia Barros Monteiro-Vitorello; Cushla Jane Metcalfe; Guilherme Marcelo Queiroga Cruz; Luiz Eduardo Del Bem; Renato Vicentini; Fábio Tebaldi Silveira Nogueira; Roberta Alvares Campos; Sideny Lima Nunes; Paula Cristina Gasperazzo Turrini; Andreia Prata Vieira; Edgar Andrés Ochoa Cruz; Tatiana Caroline Silveira Corrêa; Carlos Takeshi Hotta; Alessandro de Mello Varani; Sonia Vautrin; Adilson Silva da Trindade; Mariane de Mendonça Vilela; Carolina Gimiliani Lembke; Paloma Mieko Sato; Rodrigo Fandino de Andrade; Milton Yutaka Nishiyama; Claudio Benicio Cardoso-Silva; Katia Castanho Scortecci; Antônio Augusto Franco Garcia; Monalisa Sampaio Carneiro; Changsoo Kim; Andrew H Paterson; Hélène Bergès; Angélique D'Hont; Anete Pereira de Souza; Glaucia Mendes Souza; Michel Vincentz; João Paulo Kitajima; Marie-Anne Van Sluys
Journal: BMC Genomics Date: 2014-06-30 Impact factor: 3.969

6. The two chromosomes of the mitochondrial genome of a sugarcane cultivar: assembly and recombination analysis using long PacBio reads.

Authors: Jeremy R Shearman; Chutima Sonthirod; Chaiwat Naktang; Wirulda Pootakham; Thippawan Yoocha; Duangjai Sangsrakru; Nukoon Jomchai; Somvong Tragoonrung; Sithichoke Tangphatsornruang
Journal: Sci Rep Date: 2016-08-17 Impact factor: 4.379

7. Analysis of Three Sugarcane Homo/Homeologous Regions Suggests Independent Polyploidization Events of Saccharum officinarum and Saccharum spontaneum.

Authors: Mariane de Mendonça Vilela; Luiz Eduardo Del Bem; Marie-Anne Van Sluys; Nathalia de Setta; João Paulo Kitajima; Guilherme Marcelo Queiroga Cruz; Danilo Augusto Sforça; Anete Pereira de Souza; Paulo Cavalcanti Gomes Ferreira; Clícia Grativol; Claudio Benicio Cardoso-Silva; Renato Vicentini; Michel Vincentz
Journal: Genome Biol Evol Date: 2017-02-01 Impact factor: 3.416

8. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors: Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal: Nucleic Acids Res Date: 2012-11-28 Impact factor: 16.971

9. De novo assembly and transcriptome analysis of contrasting sugarcane varieties.

Authors: Claudio Benicio Cardoso-Silva; Estela Araujo Costa; Melina Cristina Mancini; Thiago Willian Almeida Balsalobre; Lucas Eduardo Costa Canesin; Luciana Rossini Pinto; Monalisa Sampaio Carneiro; Antonio Augusto Franco Garcia; Anete Pereira de Souza; Renato Vicentini
Journal: PLoS One Date: 2014-02-11 Impact factor: 3.240

10. BAC-Pool Sequencing and Assembly of 19 Mb of the Complex Sugarcane Genome.

Authors: Vagner Katsumi Okura; Rafael S C de Souza; Susely F de Siqueira Tada; Paulo Arruda
Journal: Front Plant Sci Date: 2016-03-23 Impact factor: 5.753