Literature DB >> 27512398

Shallow Whole Genome Sequencing for the Assembly of Complete Chloroplast Genome Sequence of Arachis hypogaea L.

Sudheesh K Prabhudas¹, Sowjanya Prayaga¹, Parani Madasamy¹, Purushothaman Natarajan¹.

Abstract

Entities: Chemical Gene Species

Keywords: Arachis hypogaea; complete chloroplast genome; de novo assembly; groundnut; illumina; peanut

Year: 2016 PMID： 27512398 PMCID： PMC4961713 DOI： 10.3389/fpls.2016.01106

Source DB: PubMed Journal: Front Plant Sci ISSN： 1664-462X Impact factor: 5.753

× No keyword cloud information.

Introduction

The chloroplast (CP) is a plant organelle originated from cyanobacteria through symbiosis and had become an important component of the plant cell. It is the reaction center for the photosynthesis and also for several steps in the biosynthetic pathways of fatty acids, vitamins, pigments and amino acids. The CP genome is highly conserved in land plants (Raubeson and Jansen, 2005). The CP genome is circular and exhibits a quadripartite genome structure consisting of a large single copy region (LSC) and a small single copy region (SSC), separated by a pair of inverted repeats (IRs) with a few exceptions where loss of an IR or the SSC was observed. The size of the CP genome varies from 19 to 217 Kb in land plants, and the IRs are usually 20–26 kb in length (http://www.ncbi.nlm.nih.gov/genome/organelle/). Lack of recombination makes the CP genome an ideal target for phylogenetic studies (Ravi et al., 2008; Wu and Ge, 2012). Arachis hypogaea L. also known as groundnut is an herbaceous plant belonging to the Fabaceae family. It has an allotetraploid genome (AABB; 2n = 4x = 40) with a size of about 2.8 Gb. There have been many speculations regarding the ancestors of A and B subgenomes of A. hypogaea and proved to have originated through a hybridization event between Arachis ipaensis L. (B subgenome) and Arachis duranensis L. (A subgenome) (Kochert et al., 1996; David et al., 2016). It is one of the major edible oilseed crops in the world, and India is the second largest producer accounting for about 15% of the world production (FAOSTAT, 2015). Kernels of A. hypogaea L. contains 43–50% oil and 23–26% proteins. The oil comprises majorly of palmitic acid (16:0), stearic acid (18:0), oleic acid (18:1), linoleic acid (18:2), arachidic acid (20:0), eicosenoic acid (20:1), behemic acid (22:0), and lignoseric acid (24:0) along with trace amounts of palmitoleic acid (16:1). The mono and poly-unsaturated fatty acids, oleic acid and linoleic acid constitute about 75% of the total oil content (Shiv, 1982). Many attempts have successfully been made to improve the crop yield, drought resistance, disease resistance and other characteristics of A. hypogaea L. using classical breeding as well as genetic engineering using nuclear transformation. Chloroplast transformation by homologous recombination for producing transgenic plants is also possible due to the presence of candidate loci on the CP genome. Additionally, Genetic engineering of chloroplast genome when compared to nuclear transformation is environment-friendly; it minimizes the pleiotropic effects along with containment of the foreign genes (Daniell et al., 2005). Hence, the availability of the complete chloroplast genome of A. hypogaea L. will be an invaluable resource for designing and evaluating efficient chloroplast transformation experiments.

Materials and methods

Plant material and genome sequencing

The seeds of A. hypogaea L. Co7 variety were obtained from Tamilnadu Agricultural University, Coimbatore, India. The plants were grown in the green house facility at SRM University, Kattankulathur, India. Leaves from 1-month old plant were used for total genomic DNA isolation using DNeasy Plant Mini Kit (Qiagen, Germany). A paired-end library with an average insert size of about 400 bp was constructed as per the manufacturer's protocol (Illumina Inc., USA). The library quality was assessed on CaliperLabChip GX using High Sensitivity Assay Kit (Caliber, USA). It was then hybridized on a flow cell for generating clonal clusters on cBOT using Truseq PE Cluster Kit v3-cBot-HS (Illumina Inc., USA). Sequencing by synthesis was performed on Illumina Hiseq 2500 using Truseq v3-HS kit to generate 100 bp paired end reads (Illumina Inc., USA).

Genome assembly and validation

The per base quality of the raw paired-end reads (51,650,486) of 100 bp was assessed by FastQC v0.11.2 (Andrews, 2010). The adapter trimming and quality filtering was done using Cutadapt v1.7.1 (Martin, 2011) and Sickle v1.33 (Joshi and Fass, 2011) tools respectively. A phred score of 20 was used for quality filtering. The quality filtered paired-end reads (49,299,308) were subjected to de novo assembly using three different de novo assemblers such as Velvet v1.2.10 (Zerbino and Birney, 2008), SOAPdenovo v2.04 (Luo et al., 2012) and Edena v3.131028 (Hernandez et al., 2008). The assembled contigs were pooled and ordered against the complete CP genome of closest relative Acacia ligulata L. as the reference using Mauve v2.3.1 tool (Darling et al., 2010; Williams et al., 2015). The gaps in the genome were filled by manual alignment of paired-end reads using overlapping method (Natarajan and Parani, 2015) and primer walking (Sanger sequencing method). Validation of the junctions between the single copy regions and the inverted repeats was done by Sanger sequencing using specific primers. The filtered reads were mapped against the assembled CP genome of A. hypogaea L. to calculate the genome coverage. The complete CP genome of A. hypogaea L. was annotated using DOGMA (Wyman et al., 2004).

Results and discussion

The size of the complete CP genome of A. hypogaea L. was found to be 156,391 bp. The genome coverage was calculated to be 2122x with 3,863,475 quality filtered reads mapped to the assembled CP genome. The CP genome exhibited a quadripartite structure consisting of LSC and SSC regions of 85,946 bp and 18,797 bp respectively, with a pair of inverted repeats (IRa and IRb) of 25,824 bp each separating them. The overall GC content of the complete chloroplast genome was 36.4% and the individual GC content for LSC, SSC, and IRs was 33.8%, 30.2%, and 42.8% respectively. A total of 110 genes were annotated including 76 protein coding genes, 30 tRNA genes, and 4 rRNA genes. Six of the protein coding genes and the 3' exon of rps12 are duplicated in the IR regions. Six of the tRNA genes and four of the rRNA genes are also duplicated in the IR regions. The presence of one or two introns were identified in the 13 genes, which includes 8 protein coding genes and 5 tRNAgenes (Table 1). The complete CP genome sequence of A. hypogaea that is reported here for the first time will be an invaluable resource for designing and evaluating efficient chloroplast transformation experiments and to improve the desired traits.

Table 1

List of genes found in the .

S.No	Group of genes	Gene names
1	ATP synthase	atpA, atpB, atpE, atpF^*, atpH, atpI
2	Cytochrome b/f complex	petA, petB, petD, petG, petL, petN
3	NADH dehydrogenase	ndhA^*, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ
4	Photosystem I	psaA, psaB, psaC, psaI, psaJ
5	Photosystem II	psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
6	Proteins of unknown function	ycf1, ycf2, ycf3^*, ycf4, orf42, ycf68^
7	Ribosomal proteins (SSU)	rps2, rps3, rps4, rps7, rps8, rps11, rps12^#, rps14, rps15, rps18, rps19
8	Ribosomal proteins (LSU)	rpl2^*, rpl14, rpl16, rpl20, rpl23, rpl32, rpl33, rpl36
9	Ribosomal RNAs	rrn4.5, rrn5, rrn16, rrn23
10	RNA polymerase	rpoA, rpoB, rpoC1^*, rpoC2
11	Other genes	accD, ccsA, cemA, clpP^**, matK, rbcL
12	Transfer RNAs	trnA-UGC^, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-UCC, trnH-GUG, trnI-CAU, trnI-GAU^, trnK-UUU^, trnL-CAA, trnL-UAA^, trnL-UAG, trnM-CAU, trnN-GUU, trnP-GGG, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UAC^*, trnW-CCA, trnY-GUA

Contains one intron

Contains two introns

Exhibits trans-splicing.

List of genes found in the . Contains one intron Contains two introns Exhibits trans-splicing.

Deposited data and information to the user

The complete data from the current study was submitted at NCBI under the BioProject ID PRJNA314013 and BioSample ID SAMN04527043. The assembled complete chloroplast genome sequence was submitted to NCBI Genbank with an accession number KX257487 (http://www.ncbi.nlm.nih.gov/nuccore/KX257487). The raw reads in compressed FASTQ were submitted to SRA database at NCBI under the accession number SRP076091 (http://www.ncbi.nlm.nih.gov/sra/SRP076091). Users can download and reuse the data for research purpose only with an acknowledgement to us and quoting this paper as reference to the data.

Author contributions

PN conceived the study and acquired the funding; SKP and SP performed the genome assembly and analysis; SKP, PN, and PM drafted the manuscript. All authors approved the final manuscript.

Funding

The project was funded by Department of Biotechnology (DBT), Government of India, under the Rapid Grant for Young Investigator (RGYI) scheme (BT/PR6394/GBD/27/422/2012).

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

10 in total

1. Automatic annotation of organellar genomes with DOGMA.

Authors: Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal: Bioinformatics Date: 2004-06-04 Impact factor: 6.937

2. The phylogeny of the BEP clade in grasses revisited: evidence from the whole-genome sequences of chloroplasts.

Authors: Zhi-Qiang Wu; Song Ge
Journal: Mol Phylogenet Evol Date: 2011-11-10 Impact factor: 4.286

Review 3. Breakthrough in chloroplast genetic engineering of agronomically important crops.

Authors: Henry Daniell; Shashi Kumar; Nathalie Dufourmantel
Journal: Trends Biotechnol Date: 2005-05 Impact factor: 19.536

4. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

5. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut.

Authors: David John Bertioli; Steven B Cannon; Lutz Froenicke; Guodong Huang; Andrew D Farmer; Ethalinda K S Cannon; Xin Liu; Dongying Gao; Josh Clevenger; Sudhansu Dash; Longhui Ren; Márcio C Moretzsohn; Kenta Shirasawa; Wei Huang; Bruna Vidigal; Brian Abernathy; Ye Chu; Chad E Niederhuth; Pooja Umale; Ana Cláudia G Araújo; Alexander Kozik; Kyung Do Kim; Mark D Burow; Rajeev K Varshney; Xingjun Wang; Xinyou Zhang; Noelle Barkley; Patrícia M Guimarães; Sachiko Isobe; Baozhu Guo; Boshou Liao; H Thomas Stalker; Robert J Schmitz; Brian E Scheffler; Soraya C M Leal-Bertioli; Xu Xun; Scott A Jackson; Richard Michelmore; Peggy Ozias-Akins
Journal: Nat Genet Date: 2016-02-22 Impact factor: 38.330

6. First complete genome sequence of a probiotic Enterococcus faecium strain T-110 and its comparative genome analysis with pathogenic and non-pathogenic Enterococcus faecium genomes.

Authors: Purushothaman Natarajan; Madasamy Parani
Journal: J Genet Genomics Date: 2014-08-01 Impact factor: 4.275

7. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement.

Authors: Aaron E Darling; Bob Mau; Nicole T Perna
Journal: PLoS One Date: 2010-06-25 Impact factor: 3.240

8. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer.

Authors: David Hernandez; Patrice François; Laurent Farinelli; Magne Osterås; Jacques Schrenzel
Journal: Genome Res Date: 2008-03-10 Impact factor: 9.043

9. The Complete Sequence of the Acacia ligulata Chloroplast Genome Reveals a Highly Divergent clpP1 Gene.

Authors: Anna V Williams; Laura M Boykin; Katharine A Howell; Paul G Nevill; Ian Small
Journal: PLoS One Date: 2015-05-08 Impact factor: 3.240

10. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors: Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal: Gigascience Date: 2012-12-27 Impact factor: 6.524

10 in total

5 in total

1. Development of chloroplast genome resources for peanut (Arachis hypogaea L.) and other species of Arachis.

Authors: Dongmei Yin; Yun Wang; Xingguo Zhang; Xingli Ma; Xiaoyan He; Jianhang Zhang
Journal: Sci Rep Date: 2017-09-14 Impact factor: 4.379

2. Twelve complete chloroplast genomes of wild peanuts: great genetic resources and a better understanding of Arachis phylogeny.

Authors: Juan Wang; Yuan Li; Chunjuan Li; Caixia Yan; Xiaobo Zhao; Cuiling Yuan; Quanxi Sun; Chengren Shi; Shihua Shan
Journal: BMC Plant Biol Date: 2019-11-19 Impact factor: 4.215

3. Chloroplast Phylogenomic Analyses Reveal a Maternal Hybridization Event Leading to the Formation of Cultivated Peanuts.

Authors: Xiangyu Tian; Luye Shi; Jia Guo; Liuyang Fu; Pei Du; Bingyan Huang; Yue Wu; Xinyou Zhang; Zhenlong Wang
Journal: Front Plant Sci Date: 2021-12-17 Impact factor: 5.753

4. Haplotype Analysis of Chloroplast Genomes for Jujube Breeding.

Authors: Guanglong Hu; Yang Wu; Chaojun Guo; Dongye Lu; Ningguang Dong; Bo Chen; Yanjie Qiao; Yuping Zhang; Qinghua Pan
Journal: Front Plant Sci Date: 2022-03-10 Impact factor: 5.753

5. A comparative analysis of the complete chloroplast genome sequences of four peanut botanical varieties.

Authors: Juan Wang; Chunjuan Li; Caixia Yan; Xiaobo Zhao; Shihua Shan
Journal: PeerJ Date: 2018-07-31 Impact factor: 2.984

5 in total