Literature DB >> 32051026

Maize genomes to fields (G2F): 2014-2017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets.

Bridget A McFarland1, Naser AlKhalifah1, Martin Bohn2, Jessica Bubert2, Edward S Buckler3,4, Ignacio Ciampitti5, Jode Edwards4,6, David Ertl7, Joseph L Gage3, Celeste M Falcon1, Sherry Flint-Garcia4,8, Michael A Gore3, Christopher Graham9, Candice N Hirsch10, James B Holland4,11, Elizabeth Hood12, David Hooker13, Diego Jarquin14, Shawn M Kaeppler1, Joseph Knoll4, Greg Kruger14, Nick Lauter4,6, Elizabeth C Lee13, Dayane C Lima1, Aaron Lorenz10, Jonathan P Lynch15, John McKay16, Nathan D Miller1, Stephen P Moose2, Seth C Murray17, Rebecca Nelson3, Christina Poudyal10, Torbert Rocheford18, Oscar Rodriguez14, Maria Cinta Romay3, James C Schnable14, Patrick S Schnable6, Brian Scully4,19, Rajandeep Sekhon20, Kevin Silverstein10, Maninder Singh21, Margaret Smith3, Edgar P Spalding1, Nathan Springer10, Kurt Thelen21, Peter Thomison22, Mitchell Tuinstra18, Jason Wallace23, Ramona Walls24, David Wills8, Randall J Wisser25, Wenwei Xu17, Cheng-Ting Yeh6, Natalia de Leon26.   

Abstract

OBJECTIVES: Advanced tools and resources are needed to efficiently and sustainably produce food for an increasing world population in the context of variable environmental conditions. The maize genomes to fields (G2F) initiative is a multi-institutional initiative effort that seeks to approach this challenge by developing a flexible and distributed infrastructure addressing emerging problems. G2F has generated large-scale phenotypic, genotypic, and environmental datasets using publicly available inbred lines and hybrids evaluated through a network of collaborators that are part of the G2F's genotype-by-environment (G × E) project. This report covers the public release of datasets for 2014-2017. DATA DESCRIPTION: Datasets include inbred genotypic information; phenotypic, climatic, and soil measurements and metadata information for each testing location across years. For a subset of inbreds in 2014 and 2015, yield component phenotypes were quantified by image analysis. Data released are accompanied by README descriptions. For genotypic and phenotypic data, both raw data and a version without outliers are reported. For climatic data, a version calibrated to the nearest airport weather station and a version without outliers are reported. The 2014 and 2015 datasets are updated versions from the previously released files [1] while 2016 and 2017 datasets are newly available to the public.

Entities:  

Keywords:  Environment; Field metadata; GBS; Genome; Genotype; G × E; Hybrid; Inbred; Maize; Phenotype

Mesh:

Year:  2020        PMID: 32051026      PMCID: PMC7017475          DOI: 10.1186/s13104-020-4922-8

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


Objective

Genomes to fields (G2F) is a multi-institutional, public collaborative to develop information and tools that support the translation of maize (Zea mays L.) genomic information into relevant phenotypes for the benefit of growers, consumers, and society. Building on existing maize genome sequence resources, the project focuses on developing approaches to improve phenomic predictability and facilitate the development and deployment of tools and resources that help address fundamental problems of sustainable agricultural productivity. Specific projects within G2F involve collaboration from research fields such as genetics, genomics, plant physiology, agronomy, climatology and crop modeling, computational sciences, statistics, and engineering. As part of this effort, the G2F G × E project has collected, utilized, and shared multi-year, large-scale genotypic, phenotypic, environmental, and metadata datasets. The datasets described here were generated using standard formats between 2014 and 2017. For each of the testing locations, metadata and soil characterization are also included. During these four growing seasons, over 55,000 plots across 68 unique locations were used to evaluate inbred and hybrid plants. The resulting datasets are unique as they represent, to our knowledge, the most extensive publicly available datasets of their kind in maize, reporting a consistent set of traits across common sets of fully genotyped germplasm across many locations, along with relevant information reported down to the level of specific plots. Making these datasets publicly available is expected to enable researchers to conduct novel data analyses and develop tools using the curated and organized data described here. The 2014 and 2015 datasets are recently updated versions from previously released files (AlKhalifah et al. in BMC Res Notes 11:452, 2018) while 2016 and 2017 datasets are newly available to the public.

Data description

Online forms were developed for logging field site coordinates, field management metadata, and other site-specific information. Datasets include: Genotypic information for inbreds (with and without imputation): This includes single nucleotide polymorphism (SNP) information generated using a genotyping-by-sequence (GBS) method [2] for the inbreds used to produce the hybrids tested across all locations. Data is formatted to be readily analyzed using the TASSEL software [3]. Phenotypic measurements for inbreds and hybrids: A handbook of instructions for making traditional phenotypic measurements (reviewed in [4]) is available via the G2F website [5]. Standard traits include stand count, stalk lodging, root lodging, days to anthesis, days to silking, ear height, plant height, plot weight, grain moisture, test weight, and estimated grain yield. Datatypes reported as both raw files and files with outliers removed are described in README files. Additionally, a set of ear, cob, and kernel measurements was made using flatbed scanners and a machine vision platform to quantify components of yield [6]. These data are reported in millimeters with shape descriptors reported as principal components of contour data points. Cob color was reported as RGB (red/green/blue) pixel values. Kernel row number, counted manually, is reported as an integer. Environmental data: Data was collected using WatchDog 2700 weather stations (Spectrum Technologies) measuring at 30-min intervals from planting through harvest at each location. Collected information includes wind speed, direction, and gust; air temperature, dewpoint, and relative humidity; rainfall; and photoperiod. Data are reported based on calibration derived from nearby National Weather Service (NWS) Automated Surface Observing Systems (ASOS) airport weather stations and cleaned by removing obvious artifacts from the calibrated dataset. Soil characterizations: Information was first collected in 2015. Measurements include plow depth, pH, buffered pH, organic matter, texture and nitrogen, phosphorous, potassium, sulfur, and sodium levels (in parts per million). The previously released 2014 and 2015 datasets have been updated through additional quality control of the phenotypic and environmental datasets, the addition of missing site-specific field information and an update of the genotypic data to version 4 of the B73 reference genome. The 2014–2017 datasets are publicly available via CyVerse/iPlant [7] with files and access links as shown in Table 1.
Table 1

Overview of data file/data set

LabelName of data file/data setFile types (Extension)Data repository and identifier
2014 Planting season_readme.txt.txtCyVerse [8] (10.25739/9wjm-eq41)
/a._2014_hybrid_phenotypic_datadirectory
g2f_2014_hybrid_data_clean.csv.csv
g2f_2014_hybrid_raw.csv.csv
/b._2014_weather_datadirectory
g2f_2014_weather.csv.csv
/c._2014_inbred_phenotypic_datadirectory
g2f_2014_hybrid_data_clean.csv.csv
g2f_2014_hybrid_raw.csv.csv
/z._2014_supplemental_infodirectory
g2f_2014_field_characteristics.csv.csv
2015 Planting season_readme.txt.txtCyVerse [9] (10.25739/kjsn-dz84)
/a._2015_hybrid_phenotypic_datadirectory
g2f_2015_hybrid_data_clean.csv.csv
g2f_2015_hybrid_raw.csv.csv
/b._2015_weather_datadirectory
g2f_2015_weather.csv.csv
/c._2015_inbred_phenotypic_datadirectory
g2f_2015_hybrid_data_clean.csv.csv
g2f_2015_hybrid_raw.csv.csv
/d._2015_soil_datadirectory
g2f_2016_soil_data.txt.txt
g2f_2016_soil_data.csv.csv
z._2015_supplemental_infodirectory
g2f_2015_cooperator_list.csv.csv
g2f_2015_field_irrigation.csv.csv
g2f_2015_field_metadata.csv.csv
g2f_2015_supplemental_information.txt.csv
2016 Planting season_readme.txt.txtCyVerse [10] (10.25739/yjnh-kt21)
/a._2016_hybrid_phenotypic_datadirectory
g2f_2016_hybrid_data_clean.csv.csv
g2f_2016_hybrid_raw.csv.csv
/c._2016_weather_datadirectory
g2f_2016_weather.csv.csv
/c._2016_soil_datadirectory
g2f_2016_soil_data.txt.txt
g2f_2016_soil_data_clean.csv.csv
g2f_2016_soil_data_raw.csv.csv
/z._2016_supplemental_infodirectory
g2f_2016_supplemental_information.txt.txt
g2f_2016_agronomic_information.csv.csv
g2f_2016_cooperators_list.csv.csv
g2f_2016_field_metadata.csv.csv
2017 Planting season_readme.txttxtCyVerse [11] (10.25739/w560-2114)
/a._2017_hybrid_phenotypic_datadirectory
g2f_2017_hybrid_data_clean.csv.csv
g2f_2017_hybrid_data_raw.csv.csv
/b._2017_weather_datadirectory
g2f_2017_weather_data.csv.txt
/c._2017_soil_datadirectory
g2f_2017_soil_data.txt.txt
g2f_2017_soil_data_clean.csv.csv
g2f_2017_soil_data_raw.csv.csv
/d._2017_genotypic_datadirectory
g2f_2017_gbs_hybrid_codes.xlsx.xlsx
g2f_2017_ZeaGBSv27_Imputed_ABPv4.h5.h5
g2f_2017_ZeaGBSv27_Imputed_ABPv4.h5.zip.zip
g2f_2017_ZeaGBSv27_Raw_ABPv4.h5.h5
g2f_2017_ZeaGBSv27_Raw_ABPv4.h5.zip.zip
/z._2017_supplemental_infodirectory
g2f_2017_supplemental_information.txt.txt
g2f_2017_agronomic_information.csv.csv
g2f_2017_cooperators_list.csv.csv
g2f_2017_field_metadata.csv.csv
2014 and 2015 Inbred ear imaging_readme.txt.txtCyVerse [12] (10.7946/P2C34P)
2014_2015_compiledData.tar.gz.tar.gz
2014_gxe_compiledDataAndFileNames.csv.csv
2014_gxe_compiledDataAndFileNames_Raw.csv.csv
2015_gxe_compiledDataAndFileNames.csv.csv
2015_gxe_compiledDataAndFileNames_Raw.csv.csv
CEK_Data_Files.tar.gz.tar.gz
/cobdirectory
_cob.txttxt
cob.tar.gz.tar.gz
cob_01of05.tar.gz.tar.gz
cob_02of05.tar.gz.tar.gz
cob_03of05.tar.gz.tar.gz
cob_04of05.tar.gz.tar.gz
cob_05of05.tar.gz.tar.gz
/eardirectory
_ear.txt.txt
ear.tar.gztar.gz
ear_01of08.tar.gztar.gz
ear_02of08.tar.gztar.gz
ear_03of08.tar.gztar.gz
ear_04of08.tar.gztar.gz
ear_05of08.tar.gztar.gz
ear_06of08.tar.gztar.gz
ear_07of08.tar.gztar.gz
ear_08of08.tar.gztar.gz
/kerneldirectory
_kernel.txt.txt
kernel.tar.gztar.gz
kernel_01of05.tar.gztar.gz
kernel_02of05.tar.gztar.gz
kernel_03of05.tar.gztar.gz
kernel_04of05.tar.gztar.gz
kernel_05of05.tar.gztar.gz
Overview of data file/data set As the number of collaborators, plots evaluated and research questions across this project grows, it is anticipated that the variety and depth of data collected will also increase. Several projects have utilized aspects of these datasets [13-16], and more are in preparation. The potential scope of application for these data is broad and is anticipated to impact the field simply by being the first public dataset of its scale that has been collected and reported in a crop sciences using standardized protocols and formats, thus defining standards for data collection, formatting, and access for maize and other species.

Limitations

These datasets contain missing data. In the phenotypic and genotypic datasets, missing data is left blank instead of indicated by ‘null’ or zero to not interfere with software compatibility and interpretation. The only exception is for traits extracted from 2014 and 2015 ear imaging data, which are demarcated with ‘NA’. For weather datasets, raw files reported by sensors are not provided because machine data were calibrated based on information from nearby weather stations to ensure accuracy (e.g., if the wind vane was set improperly, a calibration correction was required). Instead, only the cleaned version of the file is reported to reduce misinterpretation. The geographic locations of field locations are not identical across years due to crop rotation management practices. Along with the field location code, the GPS coordinates are reported. While the germplasm used in the experiments is publicly accessible, it was not generated directly by national public genebanks. Seed access and availability are handled by the G2F collaborators directly.
  7 in total

1.  TASSEL: software for association mapping of complex traits in diverse samples.

Authors:  Peter J Bradbury; Zhiwu Zhang; Dallas E Kroon; Terry M Casstevens; Yogesh Ramdoss; Edward S Buckler
Journal:  Bioinformatics       Date:  2007-06-22       Impact factor: 6.937

Review 2.  The Quest for Understanding Phenotypic Variation via Integrated Approaches in the Field Environment.

Authors:  Duke Pauli; Scott C Chapman; Rebecca Bart; Christopher N Topp; Carolyn J Lawrence-Dill; Jesse Poland; Michael A Gore
Journal:  Plant Physiol       Date:  2016-08-01       Impact factor: 8.340

3.  A robust, high-throughput method for computing maize ear, cob, and kernel attributes automatically from images.

Authors:  Nathan D Miller; Nicholas J Haase; Jonghyun Lee; Shawn M Kaeppler; Natalia de Leon; Edgar P Spalding
Journal:  Plant J       Date:  2016-11-19       Impact factor: 6.417

4.  A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species.

Authors:  Robert J Elshire; Jeffrey C Glaubitz; Qi Sun; Jesse A Poland; Ken Kawamoto; Edward S Buckler; Sharon E Mitchell
Journal:  PLoS One       Date:  2011-05-04       Impact factor: 3.240

5.  The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.

Authors:  Nirav Merchant; Eric Lyons; Stephen Goff; Matthew Vaughn; Doreen Ware; David Micklos; Parker Antin
Journal:  PLoS Biol       Date:  2016-01-11       Impact factor: 8.029

6.  Maize Genomes to Fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets.

Authors:  Naser AlKhalifah; Darwin A Campbell; Celeste M Falcon; Jack M Gardiner; Nathan D Miller; Maria Cinta Romay; Ramona Walls; Renee Walton; Cheng-Ting Yeh; Martin Bohn; Jessica Bubert; Edward S Buckler; Ignacio Ciampitti; Sherry Flint-Garcia; Michael A Gore; Christopher Graham; Candice Hirsch; James B Holland; David Hooker; Shawn Kaeppler; Joseph Knoll; Nick Lauter; Elizabeth C Lee; Aaron Lorenz; Jonathan P Lynch; Stephen P Moose; Seth C Murray; Rebecca Nelson; Torbert Rocheford; Oscar Rodriguez; James C Schnable; Brian Scully; Margaret Smith; Nathan Springer; Peter Thomison; Mitchell Tuinstra; Randall J Wisser; Wenwei Xu; David Ertl; Patrick S Schnable; Natalia De Leon; Edgar P Spalding; Jode Edwards; Carolyn J Lawrence-Dill
Journal:  BMC Res Notes       Date:  2018-07-09

7.  The effect of artificial selection on phenotypic plasticity in maize.

Authors:  Joseph L Gage; Diego Jarquin; Cinta Romay; Aaron Lorenz; Edward S Buckler; Shawn Kaeppler; Naser Alkhalifah; Martin Bohn; Darwin A Campbell; Jode Edwards; David Ertl; Sherry Flint-Garcia; Jack Gardiner; Byron Good; Candice N Hirsch; Jim Holland; David C Hooker; Joseph Knoll; Judith Kolkman; Greg Kruger; Nick Lauter; Carolyn J Lawrence-Dill; Elizabeth Lee; Jonathan Lynch; Seth C Murray; Rebecca Nelson; Jane Petzoldt; Torbert Rocheford; James Schnable; Patrick S Schnable; Brian Scully; Margaret Smith; Nathan M Springer; Srikant Srinivasan; Renee Walton; Teclemariam Weldekidan; Randall J Wisser; Wenwei Xu; Jianming Yu; Natalia de Leon
Journal:  Nat Commun       Date:  2017-11-07       Impact factor: 14.919

  7 in total
  8 in total

1.  High density genotype storage for plant breeding in the Chado schema of Breedbase.

Authors:  Nicolas Morales; Guillaume J Bauchet; Titima Tantikanjana; Adrian F Powell; Bryan J Ellerbrock; Isaak Y Tecle; Lukas A Mueller
Journal:  PLoS One       Date:  2020-11-11       Impact factor: 3.240

2.  Predicting phenotypes from genetic, environment, management, and historical data using CNNs.

Authors:  Jacob D Washburn; Emre Cimen; Guillaume Ramstein; Timothy Reeves; Patrick O'Briant; Greg McLean; Mark Cooper; Graeme Hammer; Edward S Buckler
Journal:  Theor Appl Genet       Date:  2021-08-27       Impact factor: 5.699

3.  The importance of dominance and genotype-by-environment interactions on grain yield variation in a large-scale public cooperative maize experiment.

Authors:  Anna R Rogers; Jeffrey C Dunne; Cinta Romay; Martin Bohn; Edward S Buckler; Ignacio A Ciampitti; Jode Edwards; David Ertl; Sherry Flint-Garcia; Michael A Gore; Christopher Graham; Candice N Hirsch; Elizabeth Hood; David C Hooker; Joseph Knoll; Elizabeth C Lee; Aaron Lorenz; Jonathan P Lynch; John McKay; Stephen P Moose; Seth C Murray; Rebecca Nelson; Torbert Rocheford; James C Schnable; Patrick S Schnable; Rajandeep Sekhon; Maninder Singh; Margaret Smith; Nathan Springer; Kurt Thelen; Peter Thomison; Addie Thompson; Mitch Tuinstra; Jason Wallace; Randall J Wisser; Wenwei Xu; A R Gilmour; Shawn M Kaeppler; Natalia De Leon; James B Holland
Journal:  G3 (Bethesda)       Date:  2021-02-09       Impact factor: 3.154

4.  Temporal covariance structure of multi-spectral phenotypes and their predictive ability for end-of-season traits in maize.

Authors:  Mahlet T Anche; Nicholas S Kaczmar; Nicolas Morales; James W Clohessy; Daniel C Ilut; Michael A Gore; Kelly R Robbins
Journal:  Theor Appl Genet       Date:  2020-07-01       Impact factor: 5.699

5.  Association mapping across a multitude of traits collected in diverse environments in maize.

Authors:  Ravi V Mural; Guangchao Sun; Marcin Grzybowski; Michael C Tross; Hongyu Jin; Christine Smith; Linsey Newton; Carson M Andorf; Margaret R Woodhouse; Addie M Thompson; Brandi Sigmon; James C Schnable
Journal:  Gigascience       Date:  2022-08-23       Impact factor: 7.658

6.  Optimizing Genomic-Enabled Prediction in Small-Scale Maize Hybrid Breeding Programs: A Roadmap Review.

Authors:  Roberto Fritsche-Neto; Giovanni Galli; Karina Lima Reis Borges; Germano Costa-Neto; Filipe Couto Alves; Felipe Sabadin; Danilo Hottis Lyra; Pedro Patric Pinho Morais; Luciano Rogério Braatz de Andrade; Italo Granato; Jose Crossa
Journal:  Front Plant Sci       Date:  2021-07-01       Impact factor: 5.753

7.  Resources for image-based high-throughput phenotyping in crops and data sharing challenges.

Authors:  Monica F Danilevicz; Philipp E Bayer; Benjamin J Nestor; Mohammed Bennamoun; David Edwards
Journal:  Plant Physiol       Date:  2021-10-05       Impact factor: 8.340

8.  MegaLMM: Mega-scale linear mixed models for genomic predictions with thousands of traits.

Authors:  Daniel E Runcie; Jiayi Qu; Hao Cheng; Lorin Crawford
Journal:  Genome Biol       Date:  2021-07-23       Impact factor: 13.583

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.