Literature DB >> 23796049

Open access to tree genomes: the path to a better forest.

David B Neale, Charles H Langley, Steven L Salzberg, Jill L Wegrzyn.   

Abstract

An open-access culture and a well-developed comparative-genomics infrastructure must be developed in forest trees to derive the full potential of genome sequencing in this diverse group of plants that are the dominant species in much of the earth's terrestrial ecosystems.

Entities:  

Mesh:

Year:  2013        PMID: 23796049      PMCID: PMC3706761          DOI: 10.1186/gb-2013-14-6-120

Source DB:  PubMed          Journal:  Genome Biol        ISSN: 1474-7596            Impact factor:   13.583


Opportunities and challenges in forest tree genomics are seemingly as diverse and as large as the trees themselves; however, here, we have chosen to focus on the potential significant impact on all of tree biology research if only an open-access culture and comparative-genomics infrastructure were developed. In earlier articles [1,2], we argued that the great diversity of forest trees found in both the undomesticated and domesticated state provides an excellent opportunity to understand the molecular basis of adaptation in plants and furthermore that comparative-genomic approaches will greatly facilitate discovery and understanding. We identified several priority research areas towards realizing these goals (Box 1), such as establishing reference genome sequences for important tree species, determining how to apply sequencing technologies to understand adaptation, and developing resources for storing and accessing forestry data. Significant progress has been made in many of these priorities, with the exception of investments in database resources and understanding ecological functions. Here, we briefly summarize the rapid progress in developing genomic resources in a small number of species and then offer our view on what we believe it will take to realize the final two priorities.

The great diversity found in forest trees

There are an estimated 60,000 tree species on earth, and approximately 30 of the 49 plant orders contain tree species. Clearly, the tree phenotype has evolved many times in plants. The diversity of plant structures, development, life history, environments occupied and so on in trees is nearly as broad as higher plants in general, but trees share the common characteristic that all are perennial and many are very long lived. Because of the sessile nature of plants, each tree must survive and reproduce in a specific environment over the seasonal cycles of its lifetime. This tight association between individual genotypes and their environment provides a powerful research setting, just as it has driven the evolution of a plethora of uniquely arboreal adaptations. Understanding these evolutionary strategies is a long-standing area of study of tree biologists, with many broader biological implications. Completed and current genome-sequencing projects in forest trees are limited to about 25 species from just 4 of more than 100 families: Pinaceae (pines, spruces and firs), Salicaceae (poplars and willows), Myrtaceae (eucalyptus) and Fagaceae (oaks, chestnuts and beeches). Large-scale sequencing projects such as the 1000 Human Genomes [3], 1000 Plant Genomes (1KP) [4] or the 5000 Insect Genome (i5k) [5] projects have not yet been proposed for forest trees.

Rapidly developing genomic resources in forest trees

Genome resources are developing rapidly in forest trees in spite of the challenges associated with working with large, long-lived organisms and sometimes very large genomes [2]. Complete genome sequencing, however, has been slow to advance in forest trees owing to funding limitations and the large size of conifer genomes. Black cottonwood (Populus trichocarpa Torr. & Gray) was the first forest tree genome to be sequenced by the US Department of Energy Joint Genome Institute (DOE/JGI) [6] (Table 1). Black cottonwood has a relatively small genome (450 Mb) and is a target feedstock species for cellulosic ethanol production, and thus fits into the DOE/JGI priority of sequencing bioenergy feedstock species. The genus Populus has 30+ species (aspens and cottonwoods) with genome sizes of approximately 500 Mb. Several species are being sequenced by DOE/JGI, and other groups around the world, and it seems likely that all members of the genus will soon have a genome sequence (Table 1). The next forest tree to be sequenced was the flooded gum (Eucalyptus grandis BRASUZ1, which is a member of the Myrtaceae family), again by DOE/JGI. Eucalyptus species and their hybrids are important commercial species grown in their native Australia and many regions throughout the southern hemisphere. Several more eucalyptus species are being sequenced (Table 1), each with relatively small genomes (500 Mb), but it will probably take many years before all 700+ members of this genus are completed. Several members of the Fagaceae family are now being sequenced (Table 1). Members of this group include the oaks, beeches and chestnuts, with genome sizes less than 1 Gb.
Table 1

Genome resources in forest trees

FamilyGenusSpeciesSequenceaccessGenomeRelated publications
Pinaceae Pinus taeda (loblolly pine)[14,19]Resequenced amplicons
[20]BACs[21,22]
[14,19]Fosmids
[23]Draft genome complete

Pinus lambertiana (sugar pine)[14,19]Resequenced amplicons[24]
[23]Draft genome (in progress)

Pseudotsuga menziesii (Douglas-fir)[14,19]Resequenced amplicons[25]
[23]Draft genome (in progress)

Pinus sylvestris (Scots pine)[26]Draft genome (in progress)

Pinus pinsater (maritime pine)[26]Draft genome (in progress)
[14,19]Resequenced amplicons[27]

Pinus sibirica (Siberian pine)[28]Draft genome (in progress)

Pinus radiata (Monterey pine)Draft genome (in progress)
[14,19]Resequenced amplicons[29]

Picea abies (Norway spruce)[14,19]Resequenced amplicons[30]
[31]Draft genome complete (restricted)

Picea glauca (white spruce)[32]Draft genome complete
[14,19]BACs[33]

Larix sibirica (Siberian larch)[28]Draft genome (in progress)

Salicaceae Populus trichocarpa (black cottonwood)[34]Genome complete[6]
[19]BACs[35]
Genome resequencing (restricted)[36]
Populus tremula (European aspen)[37]Draft genome complete

Populus tremula x tremuloides[37]Draft genome complete

Populus tremuloides (quaking aspen)[37]Draft genome complete
[19]BACs[38]

Populus grandidentata (bigtooth aspen)[37]Draft genome complete

Populus nigra (black poplar)[39]Draft genome complete (restricted)

Salix purpurea (purpleosier willow)[40]Draft genome complete (restricted)

Myrtaceae Eucalytpus grandis (rose gum)[19]BACs[41]
[34]Draft genome complete
Eucalyptus globulus (blue gum)[42]Draft genome (in progress)

Eucalyptus camaldulensis (river red gum)[43]Draft genome complete[44]

Corymbia citriodora (lemon-scented gum)[45]Draft genome complete (restricted)

Fagaceae Quercus robur (English oak)[19]BACs[46,47]
[48]Draft genome (in progress)

Castanea mollissima (Chinese chestnut)[49]Draft genome (in progress)
[50]BACs[51]

Betulaceae Betula nana (dwarf birch)[52]Draft genome complete[53]

Oleaceae Fraxinus excelsior (European ash)[54]Draft genome complete

Details current genome sequencing projects in forest trees with sequence access information and relevant publications.

Genome resources in forest trees Details current genome sequencing projects in forest trees with sequence access information and relevant publications. The gymnosperm forest trees (such as the conifers) were the last to enter the world of genome sequencing. This was entirely due to their very large genomes (10 Gb and greater) as they are extremely important economically and ecologically, and phylogenetically they represent the ancient sister lineage to that of angiosperm species. Genome resources needed to support a sequencing project were reasonably well developed, but it was not until the introduction of next-generation sequencing (NGS) technologies that sequencing conifer genomes became tractable. Currently, there are at least ten conifer (Pinaceae) genome-sequencing projects under way (Table 1). Aside from reference genome sequencing in forest trees, there is significant activity in transcriptome sequencing and resequencing for polymorphism discovery (Tables 2 and 3). We have only listed the transcriptome and resequencing projects in Table 1 that are associated with a species that has an active genome-sequencing project.
Table 2

Transcriptome resources in forest trees

FamilyGenusSpeciesSequenceaccessTranscriptomeRelated publications
Pinaceae Pinus taeda (loblolly pine)[14,19,55]EST sequencing (Sanger)[56-59]
[14,19,60]EST sequencing (454)[60]
[19]Exome resequencing[61]

Pinus lambertiana (sugar pine)[14,19,60]EST sequencing (454)[60]

Pseudotsuga menziesii (Douglas-fir)[14,19]EST sequencing (Sanger)
[14,19,62]EST sequencing (454)[60,63,64]

Pinus sylvestris (Scots pine)[14,19,55]EST sequencing (Sanger)

Pinus pinsater (maritime pine)[14,19,65]EST sequencing (Sanger/454)[65,66]

Pinus radiata (Monterey pine)[14,19,55]EST sequencing (Sanger)[67-72]

Picea abies (Norway spruce)[14,19,60]EST sequencing (454)[60]
[19]EST sequencing (Next-Gen)[73]

Picea glauca (white spruce)[19]UniGenes (Sanger/454)[74]

Salicaceae Populus trichocarpa (black cottonwood)[14,19,75]EST sequencing (Sanger)[75-78]
[19]Exon capture[79]
[14,19]UniGenes (Sanger)[80]

Populus tremula (European aspen)[14,19,75]EST sequencing (Sanger)[75,81]

Populus tremula x tremuloides[14,19,75]EST sequencing (Sanger)[75,76]

Populus tremuloides (quaking aspen)[14,19]EST sequencing (Next-Gen)[82]

Populus nigra (black poplar)[14,19]UniGenes (Sanger)[83]

Myrtaceae Eucalytpus grandis (rose gum)[14,19]EST sequencing (Sanger)[84-86]
[14,19]EST sequencing (NextGen)[87]

Eucalyptus globulus (blue gum)[14,19]EST sequencing (Next-Gen)[41,88]

Eucalyptus camaldulensis (river red gum)[14,19]EST sequencing (RNA-Seq)[89]

Fagaceae Quercus robur (English oak)[19,90]EST sequencing (454)[91]

Castanea mollissima (Chinese chestnut)[50]EST sequencing (454)[92,93]

Oleaceae Fraxinus excelsior (European ash)[19]EST sequencing (454)[94]

Details current transcriptome sequencing projects in forest trees with sequence access information and relevant publications.

Table 3

Polymorphism resources in forest trees

FamilyGenusSpeciesSNPaccessPolymorphismRelated Publications
Pinaceae Pinus taeda (loblolly pine)[14,19]GoldenGate & Infinium iSelect[95,96]

Pinus lambertiana (sugar pine)[14]GoldenGate array[24]

Pseudotsuga menziesii (Douglas-fir)[14]GoldenGate array[25]
[19]Infinium iSelect[64]

Pinus sylvestris (Scots pine)[14]GoldenGate array

Pinus pinsater (maritime pine)[14]GoldenGate array[27,90]

Pinus radiata (Monterey pine)[14]GoldenGate array[29]

Picea abies (Norway spruce)[14]GoldenGate array[30]

Picea glauca (white spruce)[19]GoldenGate & Infinium iSelect[97,98]

Salicaceae Populus trichocarpa (black cottonwood)[19,99]Infinium iSelect[100]
Infinium iSelect (Restricted)[36]
[14]GoldenGate array[101]
[19,99]SNP assay[102]

Populus nigra (black poplar)[14]GoldenGate array[103]

Myrtaceae Eucalytpus grandis (rose gum)[104]DArT high-density array[105]
[106]GoldenGate array[106]

Eucalyptus camaldulensis (river red gum)RNA-Seq SNP discovery (restricted)[107]

Details current genotyping projects in forest trees with data access information and relevant publications.

Transcriptome resources in forest trees Details current transcriptome sequencing projects in forest trees with sequence access information and relevant publications. Polymorphism resources in forest trees Details current genotyping projects in forest trees with data access information and relevant publications.

The opportunity for comparative-genomic approaches in forest trees

The power of comparative-genomic approaches for understanding function in an evolutionary framework is well established [7-13]. Comparative genomics can be applied to sequence data (nucleotide and protein) at the level of individual genes or genome-wide. Genome-wide approaches provide insight into both chromosome evolution and the diversification of biological functions and interactions. Understanding of gene function in forest tree species is challenged by the lack of standard reverse-genetic tools routinely used in other systems - for example, standard marker stocks, facile transformation and regeneration - and by the long generation times. Thus, comparative genomics becomes the more powerful approach to understanding gene function in trees. Comparative genomics requires not only data availability but also cyber-infrastructure to support exchange and analysis. The TreeGenes database is the most comprehensive resource for comparative-genomic analyses in forest trees [14]. Several smaller databases have been created to facilitate collaborations, including: Fagaceae genomics web, hardwoodgenomics.org, Quercus portal, PineDB, ConiferGDB, EuroPineDB, PopulusDB, PoplarDB, EucalyptusDB and Eucanext (Tables 1, 2, and 3). These resources vary greatly in their scope, relevance and integration. Some are static and archival, whereas others focus on current sequence content for a specific species or a small number of related species. This results in overlapping and conflicting data among repositories. In addition, each database uses its own custom interfaces and back-end database technology to serve sequence to the user. The US National Science Foundation funding for large-scale infrastructure projects, such as iPlant, is leading efforts aimed towards centralizing resources for research communities [15]. Without centralized resources, researchers are forced to employ inefficient data-mining methods through queries of independently maintained databases or inconsistently formatted supplemental files on journal websites. Specific areas of interest for the forest tree genomic community include the ability to connect sequence, genotype and phenotype to individual, geo-referenced trees. This type of integration can only be achieved through web services that allow disparate resources to communicate in ways that are transparent to the user [16]. With the recent increase of genome sequences available for many of these species, there is a need to facilitate community-level annotation and research support.

The need for a better-developed open-access culture in forest tree genomics research

The Human Genome Project established a culture of open access and data sharing in genomics research for both humans and animal models that has been extended to many other species, including Arabidopsis, rat, cow, dog, rice, maize and more than 500 other eukaryotes. Beginning in the late 1990s, these large-scale projects released data very rapidly to the scientific community, often years before publication. This rapid release of data with few restrictions has allowed thousands of scientists to begin work on specific genes and gene families, and on functional studies, long before the genome papers have appeared. One of the driving motivations for this culture, and the reason that many scientists support it, is that large-scale sequencing can be done most efficiently when centers that have expertise in sequencing technology take the lead. With all the sequencing concentrated, the body of data needs to be shared freely in order to get it in the hands of the widely distributed experts. This open-access culture has dramatically accelerated scientific progress in biological research.

The path to success avoids delays

Careful inspection of Table 1 reveals that forest tree genome projects are very slow to release sequence data into the public domain. Once a project is finished and submitted for publication, a draft genome becomes available - for example, the poplar genome was released and published in 2006. However, pre-publication releases are infrequent, exceptions being the PineRefSeq project that has made three releases and the SMarTForest project that has made one (Table 1). This is unfortunate because good-quality sequence contigs and scaffolds could be made available years before publication, delivering an extremely important resource to the community. This delay can be understood from privately financed projects seeking commercial advantages, but nearly all the projects listed in Table 1 are financed by public funds whose stated mission is advancing science and development of community resources. Publication rights are easily protected by data-use policy statements such as the Ft Lauderdale [17] and Toronto agreements [18], but unfortunately these conventions are not often used and data access is restricted by password-protected websites (Tables 1, 2, and 3). We hope the opinion offered here will lead to a discussion in the forest tree community, to a more open-access culture and thus to a more vibrant and rapidly advancing research area.

Box 1

Research priorities in forest tree genomics identified in earlier Opinion papers. From Neale and Ingvarsson [1]: •Deep expressed-sequence tag (EST) sequencing in many species •Comparative resequencing in many species •Reference genome sequence for pine From Neale and Kremer [2]: •Reference genome sequences for several important species •Greater investment in diverse species towards understanding ecological function •Application of next-generation sequencing technologies to understand adaptation using landscape genomic approaches •Greater investment in database resources and cyber-infrastructure development •Development of new and high-throughput phenotyping technologies

Abbreviations

EST: expressed-sequence tag; Mb: mega-base; NGS: next-generation sequencing.

Competing interests

The authors declare that they have no competing interests.
  74 in total

1.  Back to nature: ecological genomics of loblolly pine (Pinus taeda, Pinaceae).

Authors:  Andrew J Eckert; Andrew D Bower; Santiago C González-Martínez; Jill L Wegrzyn; Graham Coop; David B Neale
Journal:  Mol Ecol       Date:  2010-08-13       Impact factor: 6.185

2.  Human and mouse gene structure: comparative analysis and application to exon prediction.

Authors:  S Batzoglou; L Pachter; J P Mesirov; B Berger; E S Lander
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

3.  In vitro vs in silico detected SNPs for the development of a genotyping array: what can we learn from a non-model species?

Authors:  Camille Lepoittevin; Jean-Marc Frigerio; Pauline Garnier-Géré; Franck Salin; María-Teresa Cervera; Barbara Vornam; Luc Harvengt; Christophe Plomion
Journal:  PLoS One       Date:  2010-06-09       Impact factor: 3.240

4.  Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome.

Authors:  Patricia Faivre Rampant; Isabelle Lesur; Clément Boussardon; Frédérique Bitton; Marie-Laure Martin-Magniette; Catherine Bodénès; Grégoire Le Provost; Hélène Bergès; Sylvia Fluch; Antoine Kremer; Christophe Plomion
Journal:  BMC Genomics       Date:  2011-06-06       Impact factor: 3.969

5.  Transcriptome profiling of Pinus radiata juvenile wood with contrasting stiffness identifies putative candidate genes involved in microfibril orientation and cell wall mechanics.

Authors:  Xinguo Li; Harry X Wu; Simon G Southerton
Journal:  BMC Genomics       Date:  2011-10-01       Impact factor: 3.969

6.  Chestnut resistance to the blight disease: insights from transcriptome analysis.

Authors:  Abdelali Barakat; Meg Staton; Chun-Huai Cheng; Joseph Park; Norzawani Buang M Yassin; Stephen Ficklin; Chia-Chun Yeh; Fred Hebard; Kathleen Baier; William Powell; Stephan C Schuster; Nicholas Wheeler; Albert Abbott; John E Carlson; Ronald Sederoff
Journal:  BMC Plant Biol       Date:  2012-03-19       Impact factor: 4.215

7.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

8.  Xylem transcription profiles indicate potential metabolic responses for economically relevant characteristics of Eucalyptus species.

Authors:  Marcela Mendes Salazar; Leandro Costa Nascimento; Eduardo Leal Oliveira Camargo; Danieli Cristina Gonçalves; Jorge Lepikson Neto; Wesley Leoricy Marques; Paulo José Pereira Lima Teixeira; Piotr Mieczkowski; Jorge Maurício Costa Mondego; Marcelo Falsarella Carazzolle; Ana Carolina Deckmann; Gonçalo Amarante Guimarães Pereira
Journal:  BMC Genomics       Date:  2013-03-22       Impact factor: 3.969

9.  Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms.

Authors:  Jun Chen; Severin Uebbing; Niclas Gyllenstrand; Ulf Lagercrantz; Martin Lascoux; Thomas Källman
Journal:  BMC Genomics       Date:  2012-11-02       Impact factor: 3.969

10.  Comparison of the transcriptomes of American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection.

Authors:  Abdelali Barakat; Denis S DiLoreto; Yi Zhang; Chris Smith; Kathleen Baier; William A Powell; Nicholas Wheeler; Ron Sederoff; John E Carlson
Journal:  BMC Plant Biol       Date:  2009-05-09       Impact factor: 4.215

View more
  8 in total

1.  Plant genomics: sowing the seeds of success.

Authors:  Gemma D Bilsborough
Journal:  Genome Biol       Date:  2013-06-28       Impact factor: 13.583

2.  Western white pine SNP discovery and high-throughput genotyping for breeding and conservation applications.

Authors:  Jun-Jun Liu; Richard A Sniezko; Rona N Sturrock; Hao Chen
Journal:  BMC Plant Biol       Date:  2014-12-30       Impact factor: 4.215

3.  Saturated genic SNP mapping identified functional candidates and selection tools for the Pinus monticola Cr2 locus controlling resistance to white pine blister rust.

Authors:  Jun-Jun Liu; Richard A Sniezko; Arezoo Zamany; Holly Williams; Ning Wang; Angelia Kegley; Douglas P Savin; Hao Chen; Rona N Sturrock
Journal:  Plant Biotechnol J       Date:  2017-03-17       Impact factor: 9.803

4.  Transcriptome analysis of Pinus halepensis under drought stress and during recovery.

Authors:  Hagar Fox; Adi Doron-Faigenboim; Gilor Kelly; Ronny Bourstein; Ziv Attia; Jing Zhou; Yosef Moshe; Menachem Moshelion; Rakefet David-Schwartz
Journal:  Tree Physiol       Date:  2018-03-01       Impact factor: 4.196

5.  Genomic Variation Among and Within Six Juglans Species.

Authors:  Kristian A Stevens; Keith Woeste; Sandeep Chakraborty; Marc W Crepeau; Charles A Leslie; Pedro J Martínez-García; Daniela Puiu; Jeanne Romero-Severson; Mark Coggeshall; Abhaya M Dandekar; Daniel Kluepfel; David B Neale; Steven L Salzberg; Charles H Langley
Journal:  G3 (Bethesda)       Date:  2018-07-02       Impact factor: 3.154

6.  Evolution of complex genome architecture in gymnosperms.

Authors:  Tao Wan; Yanbing Gong; Zhiming Liu; YaDong Zhou; Can Dai; Qingfeng Wang
Journal:  Gigascience       Date:  2022-08-10       Impact factor: 7.658

7.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies.

Authors:  David B Neale; Jill L Wegrzyn; Kristian A Stevens; Aleksey V Zimin; Daniela Puiu; Marc W Crepeau; Charis Cardeno; Maxim Koriabine; Ann E Holtz-Morris; John D Liechty; Pedro J Martínez-García; Hans A Vasquez-Gross; Brian Y Lin; Jacob J Zieve; William M Dougherty; Sara Fuentes-Soriano; Le-Shin Wu; Don Gilbert; Guillaume Marçais; Michael Roberts; Carson Holt; Mark Yandell; John M Davis; Katherine E Smith; Jeffrey F D Dean; W Walter Lorenz; Ross W Whetten; Ronald Sederoff; Nicholas Wheeler; Patrick E McGuire; Doreen Main; Carol A Loopstra; Keithanne Mockaitis; Pieter J deJong; James A Yorke; Steven L Salzberg; Charles H Langley
Journal:  Genome Biol       Date:  2014-03-04       Impact factor: 13.583

Review 8.  Can Forest Trees Cope with Climate Change?-Effects of DNA Methylation on Gene Expression and Adaptation to Environmental Change.

Authors:  Ewelina A Klupczyńska; Ewelina Ratajczak
Journal:  Int J Mol Sci       Date:  2021-12-16       Impact factor: 5.923

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.