Literature DB >> 25602379

Development of genetic markers in Eucalyptus species by target enrichment and exome sequencing.

Modhumita Ghosh Dasgupta1, Veeramuthu Dharanishanthi1, Ishangi Agarwal2, Konstantin V Krutovsky3.   

Abstract

The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family- based QTL and association analysis in Eucalyptus.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25602379      PMCID: PMC4300219          DOI: 10.1371/journal.pone.0116528

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The genus Eucalyptus belongs to family Myrtaceae and consists of over 700 species [1] that occupy a broad range of environmental conditions. Most of the species are native to Australia and have been introduced to India, France, Chile, Brazil, South Africa and Portugal in the first quarter of 1800s [2]. It is one of the most widely planted hardwood crop in the world because of its superior growth, adaptability and wood properties and occupies 20.07 M hectares globally. India ranks second in area under Eucalyptus plantation (3.943 M ha) after Brazil (4.259 M ha) [3]. In tropical and subtropical regions, E. grandis, E. urophylla and their hybrids are highly preferred for pulp production and solid wood, while E. globulus is favored in the temperate regions [4]. Six species including E. camaldulensis, E. grandis, E. globulus, E. pellita, E. tereticornis and E. urophylla are reported to be suitable for Indian agro-climatic conditions and widely planted in the subcontinent [5-6]. Eucalyptus is a potential out-crosser and due to unlimited free natural hybridizations, the populations are highly heterozygous. Hence, extensive studies were conducted to determine genetic diversity at species and population levels using different marker systems [7-16]. Linkage maps in different species of Eucalypts have been widely reported [17-21]. QTL mapping in this genus has been conducted tagging important traits like wood properties, vegetative propagation, response to biotic and abiotic stress, juvenile traits, stem growth, water stress tolerance and frost tolerance [22-27]. QTL studies in Eucalyptus species was recently reviewed in detail by Grattapaglia et al. [28]. Population based association studies were reported for E. nitens and E. globulus targeting wood property traits [29-31]. Recently, the first experimental study of Genomic Selection was reported by Resende and co workers [32] in two Eucalyptus populations for growth and wood property traits. The genomic data in Eucalyptus species are well-documented and available in public databases, private collections and consortia as EST resources [33-34] and transcriptome resources [16, 35–42]. Several dedicated databases are available for Eucalyptus genome research, such as EUCANEXT, EucalyptusDB, Eucspresso [38], EUCATOUL, EUCAWOOD [33], EucaCold [34], EucGenIE [43] and Phytozome10. Subsequently, the Eucalyptus genome sequencing project was initiated independently for E. grandis at the US Department of Energy Joint Genome Institute, USA and E. camaldulensis at Kazusa DNA Research Institute in Japan. Recently, the complete genome sequence of E. grandis (‘BRASUZ1’) was published [44] and the assembled non-redundant chromosome-scale reference (v1.0) was released with 640 Mb (94%) genome coverage organized into 11 pseudomolecules. It was also reported that 34% of the protein-coding genes occur as tandem duplication and 84% share similarity to rosid lineages. The draft genome sequence of E. camaldulensis sequenced in Japan had a total length of 655,922,307 bp of non-redundant genomic sequences consisting of 81,246 scaffolds and 121,194 singlets. These sequences accounted for approximately 92% of the gene-containing regions. A total of 77,121 complete and partial structures of protein-encoding genes were annotated [45]. The database containing the draft sequence can be accessed at http://www.kazusa.or.jp/eucaly. In the last decades several generic DNA markers have been employed for molecular breeding. These markers are usually effective but their development is labor-intensive and time consuming. However, with the advent of ‘next generation’ sequencing technologies, a paradigm shift has occurred in DNA sequencing approach, resulting in high throughput and cost effective sequencing methods [46-47]. Nevertheless, sequencing of large number of genomes is still not feasible due to the substantial cost, time, management and storage of the enormous informatics data. Hence, considerable effort has been directed towards sequencing of genome sub-regions by ‘target enrichment’ methods. Re-sequencing of these enriched genomic regions is time and cost effective and the data analysis is less complex [48]. In the present study, we conducted target enrichment of exomes for 94 genes involved in xylogenesis and re-sequenced them in three Eucalyptus species, which were used in developing mapping pedigrees. Presence of SNVs and InDels across different species in pair-wise comparisons and in comparison to the E. grandis reference genome was documented. This study presents an efficient and high throughput method on development of genetic markers for family – based QTL and Association analysis in Eucalyptus.

Materials and Methods

Plant Material and DNA Isolation

Three genotypes from Eucalyptus camaldulensis, E. tereticornis and E. grandis were selected for target enrichment. E. camaldulensis (Ec111) belonging to Kennedy River Provenance from Queensland, Australia is a selection from the Provenance Resource Stand, Pudukkotai, Tamil Nadu, India while E. tereticornis (Et86) is a selection from Seed Production Area, Pudukkotai, Tamil Nadu, India. E. grandis (Eg9) is a selection from the Lorne provenance trial at Hossammund, Ootacamund, Tamil Nadu, India. These genotypes were used as parents for development of mapping populations targeting wood property traits. The leaf tissues from the three genotypes were harvested and immediately frozen at −80°C. Genomic DNA was isolated from the leaf tissues using the GenElute Plant Genomic DNA isolation kit (Sigma Aldrich, USA) and quantified using NanoDrop ND1000 spectrophotometer (Thermo Scientific, USA).

Selection of Genes and Probe Design for Sequence Capture Array

Genes involved in different steps of secondary xylem formation including cell division, cell expansion, cell wall thickening, cell wall proteins, lignin biosynthesis and programmed cell death in Arabidopsis, Populus, Zinnia and Eucalyptus spp. were short-listed from literature and 94 genes were selected for target enrichment and re-sequencing. Their respective gene orthologs were downloaded from E. grandis genome database hosted by Phytozome portal (http://www.phytozome.net/cgi-bin/gbrowse/ Eucalyptus). The sequences were functionally annotated and their position in chromosome, protein domains, biological pathways and gene ontology were defined based on the recent assembly of E. grandis using Phytozome v10 [44]. Hundred and twenty bp long hybridization probes (“baits”) were designed with 1bp tilling using SureSelect eArray software (Agilent Technologies, Santa Clara, California, USA) targeting exons and UTRs in 94 genes. A total of 169,700 baits were designed to capture the exons and UTRs in the three species. Using this design, a customized array was synthesized at Agilent Technologies.

Library Preparation, Target Enrichment and Validation

Ten micrograms of DNA from each sample in 100 μl of nuclease free water were sonicated to fragment DNA to size range of 100 to 500 bp. The size distribution was checked on the Agilent 2100 Bioanalyzer, and the DNA was cleaned using the Agencourt AMPure XP SPRI beads (Beckman Coulter, Australia). The libraries for each sample were prepared using the Illumina TruSeq DNA Sample Preparation Kit (Illumina Inc., San Diego, CA, USA). The sheared DNA was subjected to a series of enzymatic reactions that repair frayed ends, phosphorylated the fragments, added a single nucleotide overhang to code the libraries and ligated adaptors using manufacturer’s protocol for the Illumina TruSeq DNA sample preparation kit. Subsequently, PCR enrichment (10 cycles) was performed to amplify the library. The three barcoded libraries were pooled in equimolar amounts and approximately 20mg of DNA was hybridized on the Agilent 244Kmicroarray (AMADID: EA560-037734) following manufacturers’ protocol. The hybridization was carried out at 65°C for 65 hrs as described by Hodges et al. [49]. After standard washing procedures, DNA was eluted in nuclease free water by incubating the array at 95°C for 10 min. The captured library was PCR amplified for 18 cycles and purified using the Agencourt AMPure XP SPRI beads (Beckman Coulter, Australia). The enriched library was quantified using a NanoDrop Spectrophotometer and the quality was checked on the Agilent High Sensitivity Bioanalyzer Chip. RT-qPCR was conducted on pre- and post-captured library using primer pairs designed for the target (EtCesA1, EtCesA2 and EtCesA5) and non-target (EteIF4 and EtH2B) genes (S1 Table) to confirm enrichment of the targeted regions. The qRT-PCR data was analyzed using the ΔΔCT method described by Livak and Schmittgen [50].

Sequencing and Analysis

The three pooled barcoded libraries were subjected to cluster generation and 2 × 100bp paired end sequencing was conducted using the Illumina GAII Analyzer. High Quality (HQ) reads were filtered from raw data using SeqQC_V2.2 (a proprietary QC tool of Genotypic Technologies Ltd., Bangalore, India) with cutoff Phred quality scores (Q) of 20 (the probability of 1 in 100 bases sequenced may be due to an error). Further, the quality passed sequencing reads were trimmed for Adapter, B Block and low quality end sequences with 50bp cut off using Raw Data Processing Script. The trimmed reads were aligned (gapped alignment) to the E. grandis reference sequence using bowtie 2-2.0.0-beta5 [51] with affine read gap penalty and affine reference gap penalty of 5 for gap open and 3 for gap extension. The un-gapped alignment was done using bowtie version 0.12.7 [52]. The variations across the aligned sequences were taken into account from both gapped and un-gapped alignments to overcome the possibilities of false variations induced by allowing gaps. Variations reported in both alignments are expected to be of higher confidence. SNV calling and InDel detection was done using SAMtools version 0.1.7a (http://samtools.sourceforge.net) with default parameters [53]. The cut off thresholds of 3 and 10 were set for the minimum number of reads showing variation and for the minimum RMS mapping quality for SNVs, respectively. The same tool was used to generate the consensus sequence of the aligned reads, while multiple alignments were done using ClustalW version 2.0.12. Pair wise comparison of the sequence data for the three species was conducted to identify SNVs and InDels based on their positions using R Bioconductor code. The ambiguous SNVs generated due to genetic divergence of the three species were not considered for analysis.

Results and Discussion

Selection of Candidate Genes

Ninety four xylogenesis-related genes involved in different stages of wood formation including biosynthesis of lignin, cellulose, pectin, monoterpene, xyloglucan, cell wall related genes, genes involved in carbohydrate metabolism, programmed cell death, phyto-hormone signaling, transcription factors and regulatory proteins were selected for the present study (Table 1). The position of the genes in chromosomes and their biological functions in respect to the E. grandis reference genome are presented in S2 Table. As many as 14 genes were localized on chromosome 7, while only 4 genes localized on chromosome 8. Two genes, monoterpene glucosyl transferase and IAA binding domain were not assigned to any chromosome.
Table 1

Genes selected for target enrichment.

S.No. Gene ID Gene Product CDS length (bp) Transcript length (bp) Biological Function Xylogenesis-related function
1 4CL 4-Coumarate CoA Ligase16352068Provides activated thioester substrates for phenylpropanoid natural product biosynthesisLignin biosynthesis
2 ACO1 Aminocyclopropane-1-carboxylate oxidase9631296Conversion of ACC to the gaseous hormone ethyleneEthylene signaling
3 ADH Alcohol dehydrogenase18962173Catalyzes the NAD+−dependent oxidation of alcoholsAlcohol fermentation
4 AP2L APETALA TF7381052Key regulators of several developmental processes like floral developmentRegulate secondary wall biosynthesis
5 ARF1 Auxin response factor23643228Transcription factors that bind to TGTCTC auxin response elements in promoters of early auxin response genesAuxin Signaling. Key regulator of cambium activity and wood formation
6 ARF2 Auxin response factor25203949TF that bind to auxin response elements in promoters of early auxin response genesAuxin signaling
7 ASP Aspartyl protease12181218Proteolytic enzymeProgrammed cell death
8 BFN1 S1/P1 Nuclease induced during senescence9121205Degradation of RNA and single-stranded DNAProgrammed cell death
9 BP KNAT Knotted like Homeobox TF11641820Regulates secondary cell wall biosynthesisSecondary cell wall biogenesis
10 bZIP Basic region/leucine zipper motif TF858858Regulate pathogen defence, light and stress signallingStem development and xylem fibre differentiation
11 C3H Coumarate 3- hydroxylase15302048Hydroxylation of p-coumarate to form caffeateLignin biosynthesis
12 C4H Cinnamate 4- hydroxylase15181928Catalyzes the conversion of cinnamate into 4-hydroxy-cinnamateLignin biosynthesis
13 CAD1 Cinnamyl alcohol dehydrogenase11581583Conversion of coniferaldehyde to coniferyl alcoholLignin biosynthesis
14 CAld5H Coniferaldehyde 5- hydroxylase15902026Conversion of confieraldehyde to hydroxyconiferaldehydeLignin biosynthesis
15 CCAAT CBF TF453453Cis acting element with diverse functions
16 CCoAOMT1 Caffeoyl-CoA-O-methyl transferase 17411266Methylation of the 3-hydroxyl group of caffeoyl CoALignin biosynthesis
17 CesA1 Cellulose synthase 129372937Cellulose deposition in developing secondary xylemCellulose biosynthesis
18 CesA2 Cellulose synthase 227753136Cellulose deposition in developing secondary xylemCellulose biosynthesis
19 CesA3 Cellulose synthase 331263126Cellulose deposition in developing secondary xylemCellulose biosynthesis
20 CesA4 Cellulose synthase 432433951Cellulose deposition in primary cell wallPrimary cell wall formation
21 CesA5 Cellulose synthase 517102136Cellulose deposition in primary cell wallPrimary cell wall formation
22 CesA6 Cellulose synthase 626913208Cellulose deposition in primary cell wallPrimary cell wall formation
23 CKX Cytokinin Oxidase13861386Cytokinin signalingHormonal regulators of cambial development
24 COMT1 Caffeic acid O- methyl transferase11011966Catalyzes the conversion of caffeic acid to ferulic acid and of 5-hydroxyferulic acid to sinapic acidLignin biosynthesis
25 CRE Cytokinin receptor 129943392Cytokinin regulationCytokinin signal transduction
26 DHN Dehydrin513918Hydrophilic LEA proteins and accumulate during cellular dehydrationExpressed during dehydration
27 DIR1 Dirigent- like protein498498Lignan biosynthesis processTemplate for lignin polymerization
28 DOF1 Plant specific DNA-binding with one finger domain proteins.10141864Transcriptional Regulators in plant growth and developmentRegulates Interfascicular Cambium Formation and Vascular Tissue Development
29 DREB1 Dehydration responsive element binding protein7621355Belong to AP2 TFs and induced during abiotic stressStress tolerance
30 DUF1 Domain with unknown function372372UnknownPredicted function in fibre cell wall development
31 ERF Ethylene responsive Transcription factor681824Ethylene signalingEthylene signaling
32 EXPA Alpha Expansin7531295Cell wall proteins involved in plant cell growth and developmental processes where cell wall loosening occursPlasticize the cellulose-hemicellulose network of primary walls
33 EXPB Beta Expansin8251122Cell wall proteins involved in plant cell growth and developmental processes where cell wall loosening occursCell wall related
34 F5H Ferrulate 5-hydroxylase15902026Hydroxylation of ferulate to 5-hydroxyferulateLignin biosynthesis
35 FLA1 Fasciclin like Arabinogalactan protein9451281Diverse developmental roles like differentiation, cell-cell recognition, somatic embryogenesis and PROGRAMMED CELL DEATHExpressed during xylem differentiation
36 GA20 Gibberellin 20-oxidase11581716Key oxidase enzyme in the biosynthesis of gibberellinGA signaling
37 GATA1 GATA1 transcription factor10021002Nitrogen metabolism, blue-light-regulated morphogenesis and circadian rhythmUnknown
38 GLU Endo glucanase15061938Catalyzes the hydrolysis of celluloseCellulose biosynthesis
39 GRAS1 GRAS family TF14851972Play diverse roles in root and shoot development, gibberellic acid (GA) signaling and phytochrome A signal transductionVascular differentiation
40 GT Monoterpeneglucosyltransferase13831467Monoterpene biosynthesisMonoterpene biosynthesis
41 GT 1 Beta 1–4 xylosyltransferase/Glycosyltransferase10081262Involved in the synthesis of the hemicellulose glucuronoxylanSecondary cell wall biogenesis
42 HB Homeodomain TF9511774Plant development, including maintenance of the biosynthesis and signaling pathways of different hormones.Xylogenesis
43 HB1 Class III Homeodomain Transcription factor Class III25354141Regulates meristem functionRegulates vascular development
44 HBI class II Homeodomain TF7591205Phototropism and auxin responseAuxin Signaling
45 HCT Hydroxycinnamoyl CoA shikimate14942316Insertion of the 3-hydroxyl group into monolignol precursorsLignin biosynthesis
46 HDKNOX1 Homeobox Knotted 1-like 7-like TF9301434Repression of progression into specific differentiation stepsFormation and maintenance of shoot apical meristem
47 HYD Predicted alpha/beta hydrolase fold protein9601603Common to several hydrolytic enzymes with diverse functionsRegulate Xylem Cell differentiation
48 IAA IAA binding domain258736Mediators of the auxin signal transduction pathwayIAA signaling
49 KNOX2 Class-I KNOTTED1-like homeobox(KNOX) TF13052164Growth of shoot meristemPromote meristem function
50 KOR Korrigan/ Endo glucanase18722910Catalyzes the hydrolysis of celluloseCellulose biosynthesis
51 LAC Carbohydrate binding module 48/ Dextrinase24512804Hydrolysis of starchCellulose biosynthesis
52 LAC2 Laccase16592255Oxidative coupling of lignolsLignin biosynthesis
53 LBD LATERAL ORGAN BOUNDARY domain TF in inflorescences7351029Involved in position of axillary meristem formationRegulated by Vascular related NAC Domain TFs
54 LEAFY Floricaula / Leafy protein10801080Floral meristem identity proteinsExpress in floral and vegetative meristems
55 LIM1 Homeodomain TF5671421Developmental regulators in basic cellular processes such as organizing of cytolskeletonLignin biosynthesis
56 MAN 1,4 beta Mannanendohydrolase13082052Depolymerization of these cell wall mannan polysaccharidesCell wall components/carbohydrate metabolic pathway
57 MAX MORE AXILLARY GROWTH Gene20942659Regulate auxin transportStigolactone related auxin-dependent stimulation of secondary growth
58 MIBP1 Metal (copper) ion binding protein14251823UnknownPredicted function in xylogenesis
59 MTS Monoterpene synthase17492328Monoterpene biosynthesisMonoterpene biosynthesis
60 MUR3 XyloglucangalactosyltransferaseExostosin family18542347Xyloglucan biosynthesisXyloglucan biosynthesis
61 MYB1 Transcription Factor7681629Second-level master regulators insecondary cell wall biosynthesisLignification
62 MYB2 Transcription Factor666807Second-level master regulators in secondary cell wall biosynthesisLignification
63 NAM1 No apical meristem protein19712982NAC TF involved in development of shoot apical meristemVascular differentiation and Signaling
64 PAAPA Hydroxyproline-rich glycoprotein (HRGP) and ‘PAAPA’ motif5191189Probable role in cell wall developmentNo function assigned
65 PAE Pectin acetyl esterase12722136Deacetylation of pectin, a major compound of primary cell wallsPectin biosynthesis
66 PAL Phenyl alanine ammonia lyase21483044Participates in phenylpropanoid biosynthesisLignin biosynthesis
67 PG Polygalacturanase15302288Degrades polygalacturonanCell wall degradation
68 PIP1 Aquaporin8641268Membrane intrinsic protein for water channellingTransport of water and/or small neutral solutes
69 PL PectateLyase10231023Cleavage of pectatePectin biosynthesis
70 POX1 Peroxidase9511343Hemoprotein catalyzing the oxidation by hydrogen peroxideLignin biosynthesis
71 PTM5 MAD box TF279279Flower developmentVascular development
72 RAB Ras-related protein6241122Protein transport. Regulator of membrane traffic from the Golgi apparatus towards the endoplasmic reticulumActivation of autophagy during wood formation
73 RNS Ribonulcease T2 family687856Hydrolyse RNAProgrammed cell death
74 ROP1 RAC- like small GTPase5941290Regulate cellular processes ranging from vesicle trafficking to hormone signalingSignaling protein during secondary xylem formation
75 SAMS S-adenosylmethionine synthase11821769Catalyzes the formation of S-adenosylmethionineMethylation of lignin precursors
76 SBP1 Squamosa promoter binding protein TF16562407Involved in the vegetative to reproductive phase transition; Expression is regulated by MIR156b.Meristem activation
77 SCD Short chain dehrydrogenase9751449Paticipates in secondary metabolism, stress responses and phytosteroid biosynthesisHormone biosynthesis
78 SND1 Wood-associated NAC domain transcription factor 1A (WND1A)12001684Plant developmental processKey Regulator of secondary wall synthesis in fibres
79 STM Shoot meristemless TF11282348Meristem formation and maintenanceRegulator of vascular cambium
80 SuSy1 Sucrose synthase24182869Starch and sucrose metabolismCellulose biosynthesis
81 TUA1 Alpha tubulin13562011Globular cytoskeleton proteinsComponent of microtubules
82 UBI LIG Ubiquitin Ligase9691595Protein ubiquitinization. Targets specific protein substrates for degradation by the proteasomeProgrammed cell death
83 UGDH UDP-glucose dehydrogenase14432141Oxidizes UDP-Glc (UDP-D-glucose) to UDP-GlcA (UDP-D-glucuronate)Carbohydrate metabolism
84 UGT UDP glucose glucosyltransferase14101687Catalyze the conjugation of glucose from sugar nucleotides to various substratesCarbohydrate biosynthesis
85 UXS1 UDP-D-Glucuronatecarboxylyase615689Catalyzes the conversion of UDP-d-glucuronate to UDP-d-xyloseCarbohydrate metabolism
86 VND6 Vascular-related NAC-domain TF10473267Master regulator of xylem vessel differentiationMaster regulator of xylem vessel differentiation
87 VND7 Vascular-related NAC-domain TF9631531Master regulator of xylem vessel differentiationMaster regulator of xylem vessel differentiation
88 WND1 Wood associated NAC TF11521527Activating the entire secondary wall biosynthetic programRegulation of secondary wall biosynthesis pathway
89 WRKY1 TFs involved in biotic and abiotic stress responses22562870Plant responses to biotic and abiotic stressesCell wall lignification
90 WUS1 Homeodomain TF685726A typical Homeodomain TF involved in lateral organ formation and meristem functionShoot apical meristem formation and maintenance
91 XCP Xylem specific cysteine protease11311564Cellular autolysisProgrammed cell death
92 XTH Xyloglucanendo-transglycosylase/ hydrolase8941223Cell wall extensibilityRegulates cell growth by strengthening or weakening xyloglucan-cellulose microfibril network
93 XYL Endo 1–4 beta Xylanase27963444Degrades the linear polysaccharide beta-1,4-xylan into xyloseCarbohydrate active enzymes in secondary cell wall biogenesis. Decreases cellulose crystallinity in cell walls
94 Znf1 Zinc finger C3HC4 type (RING)11131542Cysteine rich domain involved in mediating protein-protein interactionsUbiquitinization
The formation of the secondary cell wall is driven by the coordinated expression of numerous genes involved in the biosynthesis of cellulose and hemicellulose, lignin, pectin, cell wall proteins and minor soluble and insoluble compounds [54-59], [33, 38–39]. Expressed wood-formation genes show high functional conservation across plant genera and up to 90% of genes expressed in loblolly pine have homologs in Arabidopsis [60]. Similarly, a high proportion of poplar ESTs appear to have homologs in the Arabidopsis genome [61-62]. The role of transcription factors as master switches in vascular and xylem development has been investigated in detail in poplar, eucalypts, pine and Arabidopsis. Highly expressed transcription factors like MYB and NAC families are implicated as critical regulators of vascular differentiation, phenylpropanoid metabolism, xylem differentiation and secondary wall formation. The other important regulators include the homeodomain superfamily of transcription factors (HD-Zip, WOX, KNOX, and ZF-HD), ethylene responsive elements (AP2/ERF domain), bZIP, WRKY and LIM [63-70]. Hormonal regulation of wood formation is well documented and major phyto-hormones playing pivotal role in cambial activity and wood formation include auxin, cytokinin, gibberellic acid, brassinosteroids and ethylene. The receptors of hormone responsive genes and transcription factors are reported to be expressed during cambial development and wood formation [71-74]. The selection of genes in the present study was based on the literature survey as described above and major functional and regulatory genes presumably involved in cambial development and wood formation were selected.

Validation of Target Enrichment

The array based hybridization enrichment was conducted to capture the 94 xylogenesis-related genes in three species of Eucalyptus. The enrichment of the targeted regions after hybridization was validated using the RT-qPCR on pre- and post-capture libraries for target genes EtCesA1, EtCesA2 and EtCesA5 and non target genes EteIF4 and EtH2B. The comparison of pre and post hybridization data demonstrated 64 fold, 165 fold and 59 fold enrichments of the target genes, EtCesA1, EtCesA2 and EtCesA5 respectively, while no enrichment was observed for the non target genes, EteIF4 and EtH2B.

Read and Alignment Statistics

The 2 × 100 bp paired end raw reads were subjected to quality checking using SeqQC_V2.2. In E. camaldulensis (Ec111), a total of 15.75 million reads were generated and the total number of HQ reads were 13.86 million (88.02%), while in E. tereticornis (Et 86), the total number of reads were 17.07 million and the number of HQ reads were 15.14 million (88.69%). In E. grandis (Eg9), the total number of reads was 11.41 million with 10.22 million HQ reads (89.59%). The HQ reads from all the three species were aligned with the E. grandis reference sequence using both gapped and un-gapped alignment tools. In E. camaldulensis, 170866bp (98.43% read coverage) were aligned with the reference sequence, which had a total sequence length of 173593bp, while in E. tereticornis, 170825bp sequence length was aligned with reference with 98.41% coverage. Similarly, in E. grandis, 170671bp was aligned with the reference sequence with coverage of 98.32%. The total percent of reference covered with at least 5X depth was 97.71%, 97.86% and 97.12% in E. camaldulensis, E. tereticornis and E. grandis, respectively, while reference covered with at least 10X read depth was 96.99%, 97.36% and 95.67%, respectively. Similarly, the alignment statistics for reference covered with 20X depth was 95.9%, 96.34% and 93.53% in E. camaldulensis, E. tereticornis and E. grandis, respectively. The optimized average read depth in E. camaldulensis was ∼223X, while in E. tereticornis it was calculated as ∼227X. The optimized average read depth in E. grandis was ∼199X. The aligned sequence data was deposited in NCBI Short Read Archive with the accession number SRP045253 for E. tereticornis (SRX747331), E. camaldulensis (SRX669390) and E. grandis (SRX747330). Next generation sequencing platforms produce robust sequence output making high throughput DNA marker discovery feasible and cost effective [75-76]. It was reported that considering all available NGS platforms, Illumina was preferred for de novo sequencing, re-sequencing and high-throughput SNP discovery, due to generation of high read depth leading to reference based contig assembly with high confidence [75-77]. The efficiency of this platform in SNP discovery has been well documented in E. camaldulensis [78]; Arabidopsis [79]; wheat [80-82]; olive [83]; Solanum spp. [84]; Douglas—fir [85]; soybean [86-87]; apple [88] and pine [89]. Another important consideration while conducting target enrichment and re-sequencing is the read depth to reliably detect SNPs. It was reported that a minimum of 8X coverage [90] and up to 200X [91] was optimal for SNP calling. In the present study, the read depth was significantly high at ∼223X in E. camaldulensis, ∼227X in E. tereticornis and ∼199X in E. grandis. Similar studies in Fragaria vesca documented the average depth as 120X [92], while in E. camaldulensis, the average read depth for all the bases was 6124X [78]. Specificity (the number of reads that map to the targeted sequence) is an important aspect of target enrichment experiments. The present study documented high read coverage with E. camaldulensis showing 98.43% coverage, E. tereticornis with 98.41% coverage and E. grandis with coverage of 98.32% with reference sequence, suggesting high specificity of the hybridized probes to the target sequences. Similarly, in an earlier study in E. camaldulensis, 94.2% coverage was reported with reference genome of E. grandis [78]. In the wheat, NimbleGen array with genomic DNA derived from eight wheat varieties was used for target enrichment and exome sequencing and an average of 38.1% (22%–44.5%) was aligned to the reference sequence [80], while Saintenac and co workers [82] reported an increase in specificity of reads on target to 60% and the number of covered target bases reported was 92%. In Populus trichocarpa, an average of 86.8% of base pairs in the bait regions was mapped on the reference sequence [93]. Hence, the high read depth and coverage achieved in the present investigation can be considered optimal for identification of variation with high confidence.

Identification of Variants (Snvs And Indels) in Three Eucalyptus Species across E. Grandis Reference Genome

The SNVs and InDels present in the sequences aligned with the reference were individually determined for each species. A total of 5905 SNVs were discovered in all three species, which included 2294 SNVs in E. camaldulensis (604 and 299 SNVs from gapped and un-gapped alignments, respectively and 1391 SNVs common for both gapped and un-gapped alignments), 2383 SNVs in E. tereticornis (636 and 303 SNVs from gapped and un-gapped alignments, respectively and 1444 SNVs common for both alignments), and 1228 SNVs in E. grandis (460 and 122 SNVs from gapped and un-gapped alignments, respectively and 646 SNVs common for both alignments) (Table 2).
Table 2

SNVs and InDels across 94 genes in three Eucalyptus species.

S. No. Gene Name Gene ID E. tereticornis E. camaldulensis E. grandis
SNVs InDels SNVs InDels SNVs InDels
14-coumarate-CoA ligase 4CL 41543255
2Aminocyclopropane-1-carboxylate oxidase ACO1 366343305
3Alcohol dehydrogenase ADH 327317268
4APETALA TF AP2L 746414
5Auxin response factor ARF 2611301217
6Auxin response factor ARF2 29142215112
7Aspartyl protease ASP 360320330
8S1/P1 nuclease induced during senescence BFN1 25321343
9KNAT knotted like homeobox TF BP 11712727
10Basic region / leucine zipper motif TF bZIP 152132112
11P-coumarate 3-hydroxylation C3H 686685605
12Cinnamate 4-hydroxylase C4H 11412431
13Cinnamyl alcohol dehydrogenase CAD1 14618566
14Coniferaldehyde 5-hydroxylase CAld5H 19019090
15CBF TF CCAAT 212201
16Caffeoyl-CoA-O -methyltransferase CCoAOMT1 1497445
17Cellulose synthase 1 CesA1 22111610289
18Cellulose synthase 2 CesA2 198298176
19Cellulose synthase 3 CesA3 351136112710
20Cellulose synthase 4 CesA4 511559153011
21Cellulose synthase 5 CesA5 5586210578
22Cellulose synthase 6 CesA6 301329132512
23Cytokinin oxidase CKX 305274265
24Caffeic acid-O-methyltransferase COMT1 357379263
25Cytokinin receptor 1 CRE 31133714712
26Dehydrin DHN 20420341
27Dirigent like protein DIR1 202130
28DNA binding with one finger DOF1 17019040
29Dehydration-Responsive Element-Binding protein DREB1 11316322
30Domain of Unknown function 1 DUF1 7113171
31Ethylene responsive transcription factor ERF 191211203
32Alpha expansin EXPA 13317713
33Beta expansin EXPB 293302292
34Ferulate-5-hydroxylase F5H 19017080
35Fasciclin like arabinogalacton FLA1 22017001
36Gibberllin 20-oxidase GA20 123104123
37 GATA1 transcription factor GATA1 729010
38Endo glucanase GLU 214165164
39GRAS family TF GRAS1 16314142
40Monopterene glycosyl transferases GT 332351142
41Beta 1–4 xylosyltransferase/glycosyl transferase GT_1 153133104
42Homeodomain TF HB 2310238114
43Homeodomain TF HB1ClassIII 322022181617
44Homeodomain TF HBIclassII 12313434
45Hydroxycinnamoyl CoA shikimate HCT 345375201
46Homeobox knotted 1-like 7-like TF HDKNOX1 9611554
47Predicted alpha/beta hydrolase fold protein HYD 17615474
48IAA binding domain IAA 1339181
49Class-I KNOTTED 1 like homeobox (KNOX)TF KNOX2 2282081013
50KORRIGAN /endo glucanase KOR 60954112811
51Carbohydrate binding module 48/ dextrinase LAC 2610251368
52Laccase LAC2 29523715
53Lateral organ boundary domain TF in infloresceneces LBD 859201
54Floricaula/ leafy protein LEAFY 314343222
55Homeodomain TF LIM1 17713639
561,4 beta mannan endohydrolase MAN 15719676
57More axillary growth gene MAX 488564392
58Metal (copper) ion binding MIBP1 554442333
59Monopterene synthase MTS 497448244
60Xyloglucan galactosyl transferase exostosin family MUR3 30128141
61Myeloblastosis TF MYB1 21518523
62Myeloblastosis TF MYB2 10193111
63No apical meristem family protein NAM1 71762886
64Hydroxy proline rich glycoprotein (HRGP)/ PAAPA motif PAAPA 37321162
65Pectin acetyl esterase PAE 2693210116
66Phenylalanine ammonia-lyase PAL 756714273
67Poly galacturanase PG 38744585
68Plasma membrane intrinsic protein PIP1 213245125
69Pectate lyase PL 321220200
70Peroxidase POX1 273243123
71MAD box TF PTM5 111111
72RAS related protein RAB 126115126
73Ribonuclease T2 family RNS 8393133
74RAC like small GTPase ROP1 1158644
75S-Adenosyl methionine synthetase SAMS 373313212
76Squamosapromoter binding protein TF SBP1 336306225
77Sitosterol cello dextrin SCD 17587115
78Wood assocated NAC domain TF 1(WND1) SND1 675746
79Shoot meristemless TF STM 1820171237
80Sucrose synthase SuSy1 859729548
81α-Tubulin TUA1 33721754
82Ubiquitin Ligase UBILIG 959425
83UDP glucose glucosyl dehydrogenase UGDH 216183123
84UDP glucose glucosyl transferase UGT 402631442
85UDP-D-glucuronate carboxylyase UXS1 9112090
86Vascular related NAC domain TF VND6 22821514
87Vascular related NAC domain TF VND7 959444
88Wood associated NAC TF WND1 19216452
89TF involved in biotic and abioic stress response WRKY1 3210259157
90Homeodomain TF WUS1 319150
91Xylem-specific papain-like Cysteine Peptidase XCP 21420362
92Xyloglucan transglycosylase XTH 222284122
93Endo 1,4 beta xylanase XYL 4894181610
94Zinc finger (C3HC4-type ring finger) TF protein Znf1 221013698
Total 2383 518 2294 479 1228 409
The presence of SNVs in UTRs and exons were also identified and maximum number of SNVs was recorded in the exon region (4187), while 1226 SNVs were documented in the 3’UTR. A total number of 492 SNVs were identified in the 5’UTR across all the three species (Table 3, 4 & 5). In E. tereticornis, the maximum number of SNVs was recorded in SuSy1 (85), while only one SNV was observed in PTM5 (S3a Table). In E. camaldulensis, a similar trend was observed with maximum of 72 SNVs identified in SuSy1 and only one SNV recorded in PTM5 (S4a Table). However, when the E. grandis sequences were compared with the reference genome, a maximum of 60 SNVs was observed in C3H while a single SNV was documented in several genes, including AP2L, ARF, ARF2, EXPA, GATA1, LAC2, PTM5, VND6. No SNVs were detected in CCAAT, FLA1, and LBD (S5a Table).
Table 3

SNV frequency in three Eucalyptus species in 5′UTR region.

E. tereticornis E. camaldulensis E. grandis
GENE ID 5′UTR_length (bp) No. of SNVs SNV frequency (bp/SNV) No. of SNVs SNV frequency (bp/SNV) No. of SNVs SNV frequency (bp/SNV)
4CL 126342.0342.01126.0
ACO1 107253.51107.01107.0
ADH 0nananananaNa
AP2L 59159.0159.0-
ARF 4134103.33137.71
ARF2 10048125.59111.6-
ASP 0nananananaNa
BFN1 0nananananaNa
BP 267---
bZIP 0nananananaNa
C3H 86243.0186.0243.0
C4H 125341.71125.0-
CAD1 127263.5525.4-
CAld5H 39---
CCAAT 0nananananana
CCoAOMT1 97197.0--
CesA1 0nananananana
CesA2 361---
CesA3 0nananananana
CesA4 340842.52170.0485.0
CesA5 118-1118.01118.0
CesA6 16---
CKX 0nananananana
COMT1 98198.0-249.0
CRE 0nananananana
DHN 112428.01112.01112.0
DIR1 0nananananana
DOF1 2271227.01227.0-
DREB1 298559.6837.3-
DUF1 0nananananana
ERF 0nananananana
EXPA 43-143.0-
EXPB 44144.0144.0144.0
F5H 39---
FLA1 1---
GA20 105---
GATA1 0nananananana
GLU 41141.0141.0-
GRAS1 213371.0371.0-
GT 54318.0318.0-
GT 1 187---
HB 3051305.01305.01305.0
HB1 ClassIII 10908136.38136.35218.0
HBI class II 102---
HCT 298474.5933.1399.3
HDKNOX1 103---
HYD 2622131.01262.02131.0
IAA 91---
KNOX2 182360.7726.0360.7
KOR 465951.7951.7766.4
LAC 0nananananana
LAC2 52---
LBD 35---
LEAFY 0nananananana
LIM1 4081234.01040.82204.0
MAN 218-1218.0-
MAX 35---
MIBP1 0nananananana
MTS 230638.3546.0546.0
MUR3 493861.6954.81493.0
MYB1 429853.6671.51429.0
MYB2 79179.0179.0326.3
NAM1 3071030.7561.41307.0
PAAPA 131-265.5-
PAE 5361244.71535.7776.6
PAL 194632.3538.8-
PG 242640.3640.3460.5
PIP1 67---
PL 0nananananana
POX1 42142.0--
PTM5 0nananananana
RAB 158279.0279.01158.0
RNS 15---
ROP1 241---
SAMS 2711320.8930.12135.5
SBP1 4514112.82225.51451.0
SCD 181630.21181.01181.0
SND1 2182109.01218.0372.7
STM 6695133.84167.32334.5
SuSy1 140270.01140.0-
TUA1 3181128.9935.3-
UBILIG 434672.34108.51434.0
UGDH 2582129.02129.0-
UGT 11---
UXS1 0nananananana
VND6 4223140.74105.5-
VND7 1171117.01117.0-
WND1 77---
WRKY1 390848.8578.0-
WUS1 41---
XCP 30---
XTH 1191119.0-1119.0
XYL 362660.33120.72181.0
Znf1 1801180.01180.01180.0
Total 16246 223 78.49* 195 101.11* 74 170.42*

na: Not applicable

* denotes average SNV frequency

Table 4

SNV frequency in three Eucalyptus species in Exon region.

E. tereticornis E. camaldulensis E. grandis
GENE ID Exon length (bp) No. of SNVs SNV frequency (bp/SNV) No. of SNVs SNV frequency (bp/SNV) No. of SNVs SNV frequency (bp/SNV)
4CL 16353744.23843.04408.8
ACO1 9632834.42933.22637.0
ADH 189618105.32382.42094.8
AP2L 7382369.02369.0-
ARF 236419124.422107.5-
ARF2 252017148.212210.012520.0
ASP 12183633.83238.13336.9
BFN1 9121657.01091.23304.0
BP 11647166.310116.42582.0
bZIP 8581557.21366.01178.0
C3H 15305527.85030.65130.0
C4H 15186253.08189.83506.0
CAD1 11581296.51389.16193.0
CAld5H 15901793.51793.59176.7
CCAAT 4532226.52226.5-
CCoAOMT1 7415148.24185.3-
CesA1 293722133.516183.628104.9
CesA2 277519146.12995.717163.2
CesA3 31263589.33686.827115.8
CesA4 32433885.35064.921154.4
CesA5 17104538.04835.64736.4
CesA6 26912992.82896.123117.0
CKX 13863046.22751.32653.3
COMT1 11012839.32839.32347.9
CRE 299427110.93390.75598.8
DHN 5131051.31146.62256.5
DIR1 4982249.02249.03166.0
DOF1 101410101.41192.24253.5
DREB1 7624190.57108.9-
DUF1 372753.11328.6753.1
ERF 6811837.81935.81935.8
EXPA 7537107.6894.1-
EXPB 8251943.41943.42335.9
F5H 15901793.515106.08198.8
FLA1 9452047.31467.5-
GA20 11587165.46193.07165.4
GATA1 10027143.19111.311002.0
GLU 15061694.111136.914107.6
GRAS1 148511135.06247.54371.3
GT 13832947.73144.61498.8
GT 1 10081472.01284.09112.0
HB 9511373.21279.34237.8
HB1 ClassIII 253515169.09281.77362.1
HBI class II 7597108.47108.43253.0
HCT 14942171.12559.814106.7
HDKNOX1 9305186.04232.53310.0
HYD 9606160.06160.03320.0
IAA 258464.5464.5551.6
KNOX2 13055261.03435.011305.0
KOR 18724046.83160.46312.0
LAC 245114175.115163.46408.5
LAC2 16591987.316103.7-
LBD 7356122.57105.0-
LEAFY 10803134.83431.82249.1
LIM1 5671567.02283.51567.0
MAN 13086218.09145.35261.6
MAX 20944052.44546.53461.6
MIBP1 14254035.63146.02850.9
MTS 17493451.43058.313134.5
MUR3 18542284.31997.63618.0
MYB1 7687109.75153.61768.0
MYB2 666974.0883.3883.3
NAM1 19714642.84345.86328.5
PAAPA 5191730.51243.3686.5
PAE 12729141.310127.24318.0
PAL 21484350.04745.719113.1
PG 153014109.31790.03510.0
PIP1 8641461.71557.67123.4
PL 10233232.02246.52051.2
POX1 9511659.41659.49105.7
PTM5 2791279.01279.01279.0
RAB 6244156.03208.04156.0
RNS 6875137.45137.4798.1
ROP1 5943198.03198.01594.0
SAMS 11821769.51673.91390.9
SBP1 16562372.02372.012138.0
SCD 9756162.54243.88121.9
SND1 12003400.03400.011200.0
STM 11286188.02564.0-
SuSy1 24187233.66835.64949.3
TUA1 13561875.38169.511356.0
UBILIG 9691969.03323.01969.0
UGDH 144312120.311131.22721.5
UGT 14103540.35724.74035.3
UXS1 615787.91061.5787.9
VND6 10476174.53349.011047.0
VND7 9636160.57137.64240.8
WND1 11521576.811104.73384.0
WRKY1 22562398.119118.714161.1
WUS1 6853228.3976.15137.0
XCP 11316188.57161.64282.8
XTH 8941655.92240.61089.4
XYL 279613215.13384.711254.2
Znf1 11132153.01292.87159.0
Total 124987 1621 126.78* 1618 125.61* 948 306.72*

na: Not applicable

* denotes average SNV frequency

Table 5

SNV frequency in three Eucalyptus species in 3′ UTR region.

E. tereticornis E. camaldulensis E. grandis
GENE ID 3′UTR_length (bp) No. of SNVs SNV frequency (bp/SNV) No. of SNVs SNV frequency (bp/SNV) No. of SNVs SNV frequency (bp/SNV)
4CL 3071307.02153.5-
ACO1 226637.7456.5375.3
ADH 2771419.8834.6646.2
AP2L 255463.8385.01255.0
ARF 4513150.3590.2-
ARF2 4254106.31425.0-
ASP 0nananananana
BFN1 293932.61126.61293.0
BP 389497.32194.5-
bZIP 0nananananana
C3H 4321139.31725.4761.7
C4H 2852142.5395.0-
CAD1 298---
CAld5H 3972198.52198.5-
CCAAT 0nananananana
CCoAOMT1 428853.53142.74107.0
CesA1 0nananananana
CesA2 0nananananana
CesA3 0nananananana
CesA4 368573.6752.6573.6
CesA5 3081030.81323.7934.2
CesA6 5011501.01501.02250.5
CKX 0nananananana
COMT1 7676127.8985.21767.0
CRE 398499.5499.52199.0
DHN 293648.8836.61293.0
DIR1 0nananananana
DOF1 6236103.8789.0-
DREB1 2952147.51295.02147.5
DUF1 0nananananana
ERF 1431143.0271.51143.0
EXPA 499683.2862.41499.0
EXPB 253928.11025.3550.6
F5H 3972198.52198.5-
FLA1 3352167.53111.7-
GA20 453590.64113.3590.6
GATA1 0nananananana
GLU 391497.8497.82195.5
GRAS1 2742137.0554.8-
GT 30130.0130.0-
GT 1 67167.0167.0167.0
HB 518957.61051.8686.3
HB1 ClassIII 516957.35103.24129.0
HBI class II 344568.8657.3-
HCT 524958.23174.73174.7
HDKNOX1 4014100.3757.32200.5
HYD 381942.3847.62190.5
IAA 387943.0577.43129.0
KNOX2 6771448.41067.76112.8
KOR 5731152.11440.91538.2
LAC 3531229.41035.3-
LAC2 5441054.4777.71544.0
LBD 2592129.52129.5-
LEAFY 0nananananana
LIM1 4464111.51446.0-
MAN 526958.4958.42263.0
MAX 530866.31148.25106.0
MIBP1 3981526.51330.6579.6
MTS 349938.8938.8658.2
MUR3 0nananananana
MYB1 432672.0761.7-
MYB2 62---
NAM1 7041546.91450.31704.0
PAAPA 5392027.0777.0-
PAE 328565.6746.9-
PAL 7022627.01936.9887.8
PG 5161828.72124.61516.0
PIP1 337748.1937.4567.4
PL 0nananananana
POX1 3501035.0843.83116.7
PTM5 0nananananana
RAB 340656.7656.7748.6
RNS 154351.3438.5625.7
ROP1 455856.9591.03151.7
SAMS 316745.1652.7652.7
SBP1 300650.0560.0933.3
SCD 293558.6397.72146.5
SND1 2661266.01266.0-
STM 551778.71150.11551.0
SuSy1 3111128.33103.7562.2
TUA1 337484.3484.3484.3
UBILIG 192296.0296.0-
UGDH 440762.9588.01044.0
UGT 266553.2644.3466.5
UXS1 74237.0237.0237.0
VND6 179813138.314128.4-
VND7 4512225.51451.0-
WND1 298474.5559.62149.0
WRKY1 2241224.01224.01224.0
WUS1 0nananananana
XCP 4031526.91331.02201.5
XTH 210542.0635.01210.0
XYL 286299.9557.2395.3
Znf1 249--1249.0
Total 30768 539 86.61* 481 100.23* 206 176.08*

na: Not applicable

* denotes average SNV frequency

na: Not applicable * denotes average SNV frequency na: Not applicable * denotes average SNV frequency na: Not applicable * denotes average SNV frequency The SNV frequency was calculated for exon and the UTR regions individually in each species. The SNV frequency in 5′UTR of E. tereticornis, E. camaldulensis and E. grandis was 1/78.49bp, 1/101.11bp and 1/170.42 respectively, while SNV frequency in the exon region was 1/126.78, 1/125.61 and 1/306.72 for E. tereticornis, E. camaldulensis and E. grandis respectively. In 3′UTR, the SNV frequency was 1/86.61, 1/100.23 and 1/176.08 for E. tereticornis, E. camaldulensis and E. grandis respectively (Table 3, 4 & 5). Further, the presence of SNVs in pair-wise combination between the three Eucalyptus species was also conducted. The gene-wise presence of ambiguous nucleotides was not considered and SNV with no ambiguity was mapped on the candidate genes (S6 Table). When E. camaldulensis and E. tereticornis were compared, a total of 317 SNVs were documented with a minimum of one SNV in 4CL, bZIP, CCoAOMT1, CesA3, EXPA, GRAS1, NAM1, PIP1, PTM5, SBP1, SND1, STM, SuSy1, TUA1, VND7 and a maximum of 25 SNVs in LAC. Larger number of SNVs were recorded when E. grandis was compared with E. tereticornis and E. camaldulensis with 875 and 1014 SNVs respectively. In both pair-wise combinations, the maximum number of SNVs was observed in LAC with 53 SNVs when compared across E. camaldulensis and 46 SNVs when compared across E. tereticornis. The presence of InDels were also detected when the sequences of 94 genes were compared individually across the reference and a total of 1406 InDels were discovered with the size range of 1–24 nucleotides (Table 2). The position of InDels in exons and UTRs was also determined and the total number documented was 843, 309 and 254 in exons, 3’UTR and 5’UTR, respectively (Table 6). In E. tereticornis, a total of 518 InDels were detected and a maximum of 20 InDels was recorded in the transcription factor HB1 Class III, while a single InDel was documented in several genes including CCAAT, DUF1,ERF, MUR3,MYB2,PL, PTM5,UXS1 and WUS1. No InDels were recorded in ASP, CAld5H, DOF1, F5H, DIR1, and FLA1 (S3b Table). In E. camaldulensis, a total of 479 InDels were recorded and the maximum number of InDels was discovered in HB1ClassIII (18), while only a single InDel was identified in DIR1, DUF1, ERF, GRAS1, GT, IAA, MUR3, PAAPA, PTM5, UGT and WUS1. InDels were not detected in ASP, CAld5H, DOF1, F5H, FLA1, GATA1, PL and UXS1 (S4b Table). In E. grandis, a total of 409 InDels were discovered and a maximum of 17 InDels was documented in HB1ClassIII, while only a single InDel was identified in FLA1, DUF1, IAA, MUR3, PTM5, CCAAT, LBD, DHN, MYB2, C4H and HCT. InDels were not found in ASP, CAld5H, DOF1, F5H, GATA1, PL, UXS1, DIR1 and WUS1 (S5b Table). The InDel frequency was calculated for each species (Table 6). The InDel frequency (bp/InDel) was the highest in the exon region for all the three species with 411.14, 446.38 and 482.58 in E. tereticornis, E. camaldulensis and E. grandis, respectively. The total InDel frequency was 332.05, 359.08 and 420.54 bp per InDel in E. tereticornis, E. camaldulensis and E. grandis respectively, across the all the genes selected (Table 6).
Table 6

InDel frequency in three Eucalyptus species.

Region No. of InDels Length(bp) InDel frequency(bp/InDel)
E. tereticornis
5′UTR9916246164.10
EXON304124987411.14
3′UTR11530768267.55
Total 518 172001 332.05*
E. camaldulensis
5′UTR8916246182.54
EXON280124987446.38
3′UTR11030768279.71
Total 479 172001 359.08*
E. grandis
5′UTR6616246246.15
EXON259124987482.58
3′UTR8430768366.29
Total 409 172001 420.54*

* denotes average InDel frequency

* denotes average InDel frequency Similarly, the presence of InDels was also documented in pair-wise combination and a total of 731 and 699 InDels were detected across E. grandis & E. tereticornis and E. grandis & E. camaldulensis, respectively. A total of 702 InDels were detected between E. camaldulensis and E. tereticornis. Maximum number of InDels across all combinations was observed in HB1 Class III transcription factor with 26 InDels when compared between E. grandis and E. tereticornis, 27 InDels between E. grandis and E. camaldulensis and 27 InDels between E. camaldulensis and E. tereticornis. A minimum of one InDel was documented across several genes like FLA1; DIR1, EXPB, FLA1, WUS1 and DIR1, DUF1, PL, UXS1 in E. grandis & E. tereticornis; E. grandis & E. camaldulensis and E. camaldulensis & E. tereticornis respectively (S7a,b,c Table). The abundance of SNPs / SNVs in plant genome and the availability of cost effective technologies for genotyping has made high-throughput SNP genotyping pivotal for genetic mapping, gene discovery, germplasm characterization and population genomics [94]. NGS based SNP discovery is reported in several crop like wheat [80], [81], [82]; Eucalyptus [95]; rice [96]; barley [97]; cotton [98]; soybean [86]; potato [99]; Arabidopsis [100]; maize [101] and several other species. Use of SNP marker panels for genetic analysis has been widely explored in less domesticated crop [102] and trees [103-105]. SNP genotyping in Eucalypts species is reported from E. grandis [35], E. globulus, E. nitens, E. camaldulensis and E. loxophleba [16], inter-specific hybrids of Eucalyptus [106], E. pilularis [107], E. globulus [108] and E. camaldulensis [41,78]. The SNP frequency in Eucalyptus species is considered to be one of the highest in woody species due to its recent domestication, large population size and outbred mating system [94]. Kulheim and coworkers [16] reported that the SNP density in E. nitens was 1/33bp, 1/31 bp in E. globulus, while in E. camaldulensis and E. loxophleba it was significantly high at 1/16bp and 1/17bp respectively. However, a later study showed that the SNP frequency was 1/83.9bp in E. camaldulensis [78]. In the present study, the SNV frequency ranged from 1/78.49bp to 1/306.72bp across different genic regions of E. camaldulensis, E. tereticornis and E. grandis. Recently, the SNP frequency in inter-specific hybrids of Eucalypts was documented as 1/133bp [109], suggesting that the SNP frequency was depended on the target region. In heterozygous species, the SNP frequency is generally high as documented in pine with 1/102.6bp [110], grapevine with 1/64bp [111], maize with 1/60bp [112] and rye which registered a SNP frequency of 1 SNP at 52bp interval [113]. Insertion and deletion polymorphisms (InDels) are an important source of genomic variation in plant and animal genomes. Mechanisms such as insertion and excision of transposable elements, slippage in simple sequence replication, errors in DNA synthesis and repair, recombination and unequal crossover can result in the formation of InDels [114-115]. However, accurate genotyping from low-coverage sequence data can be challenging [116]. Further, polymorphism in short InDels is increasingly being used as an important marker in humans [117], Drosophila melanogaster [118] and G. gallus [119]. Report on InDel genotyping in plants are limited to rice [120], Arabidopsis thaliana [121], Citrus clementina [122] and Phaseolus vulgaris [123]. In tree species, InDel discovery is reported from Salix spp. [124] and Populus spp. [125-126]. InDel markers for species discrimination have been reported in E. grandis and E. gunnii [39] and Populus spp. [125,127]. In the present study, high number InDels in the size range of 1–24 nucleotides were documented in the three Eucalypts species at a frequency of 332.05, 359.08 and 420.54 bp per InDel in E. tereticornis, E. camaldulensis and E. grandis, respectively. This is higher than the earlier reported InDel frequency of 1.5 InDel/1000 bp [115] in Eucalyptus genome and 1/2756bp in inter-specific hybrid population [109]. Similarly, in Pinus taeda, Kong et al. [128] reported that InDels were infrequent with only 0.67% frequency in targeted regions. The probable reason for this variance in the present investigation could be due to the highly divergent genotypes selected in the present study, indicating that InDels could be a useful marker for genetic analysis in Eucalyptus species.

Conclusion

The NGS platforms have brought in paradigm shift in understanding the different aspects of plant biology especially in model species and plants with small genome. Its downstream usefulness in linkage map construction, genetic diversity analyses, association mapping, and marker—assisted selection has been demonstrated in several plants [129]. However, sequencing of complete genomes cannot be regularly employed due to high cost and computational limitations in handling robust informatics data. With availability of complexity reduction strategies, sequencing of sub-genomic regions by on-array/in-solution target enrichment technology has provided an efficient alternate strategy to amplicon re-sequencing for SNP/ SNV discovery [130]. In the present study, this strategy was implemented in re-sequencing ninety four genes across three Eucalypts species. This study has also revealed that target enrichment strategy can be successfully used for identification of markers (SNVs and InDels) for future use in QTL and association mapping studies in Eucalyptus species.

Primer pairs used for RT-qPCR to confirm enrichment of targeted genes.

(DOC) Click here for additional data file.

Functional Annotation of selected genes across E. grandis genome sequence using Phytozome v10.

(XLSX) Click here for additional data file.

A, Details of SNVs documented in E. tereticornis across reference sequence.

B, Details of InDels documented in E. tereticornis across reference sequence. (XLS) Click here for additional data file.

A, Details of SNVs documented in E. camaldulensis across reference sequence.

B, Details of InDels documented in E. camaldulensis across reference sequence. (XLS) Click here for additional data file.

A, Details of SNVs documented in E. grandis across reference sequence.

B, Details of InDels documented in E. grandis across reference sequence. (XLS) Click here for additional data file.

Presence of SNVs in Pair-wise comparison across three Eucalyptus species.

(XLS) Click here for additional data file.

A, Presence of InDels in Pair-wise comparison across E. grandis and E. tereticornis.

B, Presence of InDels in Pair-wise comparison across E. grandis and E. camaldulensis. C, Presence of InDels in Pair-wise comparison across E. camaldulensis and E. tereticornis. (XLSX) Click here for additional data file.
  100 in total

1.  Genomic selection for growth and wood quality in Eucalyptus: capturing the missing heritability and accelerating breeding for complex traits in forest trees.

Authors:  Marcos D V Resende; Márcio F R Resende; Carolina P Sansaloni; Cesar D Petroli; Alexandre A Missiaggia; Aurelio M Aguiar; Jupiter M Abad; Elizabete K Takahashi; Antonio M Rosado; Danielle A Faria; Georgios J Pappas; Andrzej Kilian; Dario Grattapaglia
Journal:  New Phytol       Date:  2012-02-06       Impact factor: 10.151

Review 2.  Transcriptional regulation of vascular cell fates.

Authors:  Kyoko Ohashi-Ito; Hiroo Fukuda
Journal:  Curr Opin Plant Biol       Date:  2010-09-23       Impact factor: 7.834

Review 3.  Fast-forward genetics enabled by new sequencing technologies.

Authors:  Korbinian Schneeberger; Detlef Weigel
Journal:  Trends Plant Sci       Date:  2011-03-24       Impact factor: 18.313

4.  Globular cluster ages.

Authors:  R Jimenez
Journal:  Proc Natl Acad Sci U S A       Date:  1998-01-06       Impact factor: 11.205

5.  Targeted re-sequencing of the allohexaploid wheat exome.

Authors:  Mark O Winfield; Paul A Wilkinson; Alexandra M Allen; Gary L A Barker; Jane A Coghill; Amanda Burridge; Anthony Hall; Rachael C Brenchley; Rosalinda D'Amore; Neil Hall; Michael W Bevan; Todd Richmond; Daniel J Gerhardt; Jeffrey A Jeddeloh; Keith J Edwards
Journal:  Plant Biotechnol J       Date:  2012-06-18       Impact factor: 9.803

6.  Identification of lignin genes and regulatory sequences involved in secondary cell wall formation in Acacia auriculiformis and Acacia mangium via de novo transcriptome sequencing.

Authors:  Melissa M L Wong; Charles H Cannon; Ratnam Wickneswari
Journal:  BMC Genomics       Date:  2011-07-05       Impact factor: 3.969

7.  Targeted analysis of nucleotide and copy number variation by exon capture in allotetraploid wheat genome.

Authors:  Cyrille Saintenac; Dayou Jiang; Eduard D Akhunov
Journal:  Genome Biol       Date:  2011-09-14       Impact factor: 13.583

8.  A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation.

Authors:  Glenn T Howe; Jianbin Yu; Brian Knaus; Richard Cronn; Scott Kolpak; Peter Dolan; W Walter Lorenz; Jeffrey F D Dean
Journal:  BMC Genomics       Date:  2013-02-28       Impact factor: 3.969

9.  Genome-wide analysis of Aux/IAA and ARF gene families in Populus trichocarpa.

Authors:  Udaya C Kalluri; Stephen P Difazio; Amy M Brunner; Gerald A Tuskan
Journal:  BMC Plant Biol       Date:  2007-11-06       Impact factor: 4.215

10.  Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations.

Authors:  Florian Jupe; Kamil Witek; Walter Verweij; Jadwiga Sliwka; Leighton Pritchard; Graham J Etherington; Dan Maclean; Peter J Cock; Richard M Leggett; Glenn J Bryan; Linda Cardle; Ingo Hein; Jonathan D G Jones
Journal:  Plant J       Date:  2013-10-08       Impact factor: 6.417

View more
  9 in total

1.  Construction of co-expression network based on natural expression variation of xylogenesis-related transcripts in Eucalyptus tereticornis.

Authors:  Veeramuthu Dharanishanthi; Modhumita Ghosh Dasgupta
Journal:  Mol Biol Rep       Date:  2016-07-27       Impact factor: 2.316

Review 2.  Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms.

Authors:  Cyrielle Gasc; Eric Peyretaillade; Pierre Peyret
Journal:  Nucleic Acids Res       Date:  2016-04-21       Impact factor: 16.971

Review 3.  From Genomes to GENE-omes: Exome Sequencing Concept and Applications in Crop Improvement.

Authors:  Parampreet Kaur; Kishor Gaikwad
Journal:  Front Plant Sci       Date:  2017-12-19       Impact factor: 5.753

4.  Liquid-phase sequence capture and targeted re-sequencing revealed novel polymorphisms in tomato genes belonging to the MEP carotenoid pathway.

Authors:  Irma Terracciano; Concita Cantarella; Carlo Fasano; Teodoro Cardi; Giuseppe Mennella; Nunzio D'Agostino
Journal:  Sci Rep       Date:  2017-07-17       Impact factor: 4.379

5.  Design and evaluation of a sequence capture system for genome-wide SNP genotyping in highly heterozygous plant genomes: a case study with a keystone Neotropical hardwood tree genome.

Authors:  Orzenil Bonfim Silva-Junior; Dario Grattapaglia; Evandro Novaes; Rosane G Collevatti
Journal:  DNA Res       Date:  2018-10-01       Impact factor: 4.458

6.  Development and application of the Faba_bean_130K targeted next-generation sequencing SNP genotyping platform based on transcriptome sequencing.

Authors:  Chenyu Wang; Rong Liu; Yujiao Liu; Wanwei Hou; Xuejun Wang; Yamei Miao; Yuhua He; Yu Ma; Guan Li; Dong Wang; Yishan Ji; Hongyan Zhang; Mengwei Li; Xin Yan; Xuxiao Zong; Tao Yang
Journal:  Theor Appl Genet       Date:  2021-06-12       Impact factor: 5.699

Review 7.  Current and new approaches in GMO detection: challenges and solutions.

Authors:  Marie-Alice Fraiture; Philippe Herman; Isabel Taverniers; Marc De Loose; Dieter Deforce; Nancy H Roosens
Journal:  Biomed Res Int       Date:  2015-10-15       Impact factor: 3.411

Review 8.  Genotyping-by-sequencing approaches to characterize crop genomes: choosing the right tool for the right application.

Authors:  Armin Scheben; Jacqueline Batley; David Edwards
Journal:  Plant Biotechnol J       Date:  2017-02       Impact factor: 9.803

9.  Identification of induced mutations in hexaploid wheat genome using exome capture assay.

Authors:  Momina Hussain; Muhammad Atif Iqbal; Bradley J Till; Mehboob-Ur- Rahman
Journal:  PLoS One       Date:  2018-08-13       Impact factor: 3.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.