| Literature DB >> 35385099 |
Nicolas Morales1,2, Alex C Ogbonna1,2, Bryan J Ellerbrock1, Guillaume J Bauchet1, Titima Tantikanjana1, Isaak Y Tecle1, Adrian F Powell1, David Lyon1, Naama Menda1, Christiano C Simoes1, Surya Saha1, Prashant Hosmani1, Mirella Flores1, Naftali Panitz1, Ryan S Preble1, Afolabi Agbona3, Ismail Rabbi3, Peter Kulakow3, Prasad Peteti3, Robert Kawuki4, Williams Esuma4, Micheal Kanaabi4, Doreen M Chelangat4, Ezenwanyi Uba5, Adeyemi Olojede5, Joseph Onyeka5, Trushar Shah6, Margaret Karanja6, Chiedozie Egesi1,3,5, Hale Tufan2, Agre Paterne3, Asrat Asfaw7, Jean-Luc Jannink2,8, Marnin Wolfe2, Clay L Birkett2,8, David J Waring2,8, Jenna M Hershberger2, Michael A Gore2, Kelly R Robbins2, Trevor Rife9, Chaney Courtney9, Jesse Poland9, Elizabeth Arnaud10, Marie-Angélique Laporte10, Heneriko Kulembeka11, Kasele Salum11, Emmanuel Mrema11, Allan Brown3, Stanley Bayo3, Brigitte Uwimana3, Violet Akech3, Craig Yencho12, Bert de Boeck13, Hugo Campos13, Rony Swennen14, Jeremy D Edwards15, Lukas A Mueller1.
Abstract
Modern breeding methods integrate next-generation sequencing and phenomics to identify plants with the best characteristics and greatest genetic merit for use as parents in subsequent breeding cycles to ultimately create improved cultivars able to sustain high adoption rates by farmers. This data-driven approach hinges on strong foundations in data management, quality control, and analytics. Of crucial importance is a central database able to (1) track breeding materials, (2) store experimental evaluations, (3) record phenotypic measurements using consistent ontologies, (4) store genotypic information, and (5) implement algorithms for analysis, prediction, and selection decisions. Because of the complexity of the breeding process, breeding databases also tend to be complex, difficult, and expensive to implement and maintain. Here, we present a breeding database system, Breedbase (https://breedbase.org/, last accessed 4/18/2022). Originally initiated as Cassavabase (https://cassavabase.org/, last accessed 4/18/2022) with the NextGen Cassava project (https://www.nextgencassava.org/, last accessed 4/18/2022), and later developed into a crop-agnostic system, it is presently used by dozens of different crops and projects. The system is web based and is available as open source software. It is available on GitHub (https://github.com/solgenomics/, last accessed 4/18/2022) and packaged in a Docker image for deployment (https://hub.docker.com/u/breedbase, last accessed 4/18/2022). The Breedbase system enables breeding programs to better manage and leverage their data for decision making within a fully integrated digital ecosystem.Entities:
Keywords: breeding; database; digital agriculture; digital ecosystem; genome-based breeding; genomic selection; genotyping; open source breeding software; phenotyping; predictive breeding; web-based software
Mesh:
Year: 2022 PMID: 35385099 PMCID: PMC9258556 DOI: 10.1093/g3journal/jkac078
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Fig. 1.a) Breedbase platform architecture. User interface: To offer a dynamic, highly interactive user interface, several JavaScript libraries are implemented including D3, JQuery, and Bootstrap. RESTful APIs, including a full BrAPI 2.0 implementation, handle the communication between the front and back end, allowing fast calculations without reloading the website. HTML5 for interactive graphical display, allowing instant reorganization of visual elements. The Bootstrap framework is used for modern and dynamic page templating. Middleware layer: A Perl software stack including Mason components to connect to the user interface, a Catalyst a web application framework, Moose an object oriented perl library and DBIX::Class an object-relational mapper to connect to SQL code. In addition, BrAPI libraries are used. Finally a job cluster scheduler, Slurm is implemented to allocate server resources and ensure scalability. Data source layer: Breedbase operates on a relational database using Postgres. Postgres 12.0 offers “Big data” solutions including parallel query execution and optimized binary JSON data type handling. Binary JSON (JSONB) is a simple data structure designed to be storage space and scan-speed efficient. In Breedbase, JSONB is used in various data types including genotypic (marker) information. In addition to the relational database a standard file system space is available for flat files. Finally, other databases can communicate to a Breedbase instance to provide additional back-end for marker data [i.e. Genomic Open Source Informatic Initiative (GOBii)] or to exchange germplasm information for example. b) Breedbase codevelopment process. User–developers interactions are promoted using various media. Users have online access to documentation (https://solgenomics.github.io/sgn/, last accessed 4/18/2022), video tutorials, or through onsite training. Software development goals are extensively discussed between developers, data managers, breeders, and other appropriate stakeholders. Agile development allows short-term product release. Suggested improvements, issues, and bugs discovered in Breedbase are submitted and tracked on the public GitHub issue tracking software (https://github.com/, last accessed 4/18/2022). Software development progress is tracked using a version control system and Docker releases. c) Cassavabase, a breedbase instance: data content overview. Cassavabase involves national and international breeding programs (22) from various African and South American countries (15) and currently has 1,131 registered users. Cassavabase hosts various data types including high-density and low-density genotyping assays (35,000), plot-based phenotypic data points (near 15 million), images from plants and plots from trials (5107) and locations (435).
Fig. 2.Screenshot of the “Search Wizard” interface, a central query function on Breedbase. With the Search Wizard, the data in the database can be intersected by dimensions, such as locations, years, breeding programs, and traits. For each dimension, a number of elements can be selected. The individual selected dimensions can be stored in lists, and the combined selections can be saved as a dataset. Both lists and datasets can be used to feed data into various tools on Breedbase.