Jaehee Jung1, Jong Im Kim2, Gangman Yi3. 1. Department of Information and Communication Engineering, Myongji University, Yongin, Gyeonggi-do, Korea. 2. Department of Biology, Chungnam National University, Daejeon, Korea. 3. Department of Multimedia Engineering, Dongguk University, Seoul, Korea.
Abstract
SUMMARY: In comparative and evolutionary genomics, a detailed comparison of common features between organisms is essential to evaluate genetic distance. However, identifying differences in matched and mismatched genes among multiple genomes is difficult using current comparative genomic approaches due to complicated methodologies or the generation of meager information from obtained results. This study describes a visualized software tool, geneCo (gene Comparison), for comparing genome structure and gene arrangements between various organisms. User data are aligned, gene information is recognized, and genome structures are compared based on user-defined GenBank files. Information regarding inversion, gain, loss, duplication and gene rearrangement among multiple organisms being compared is provided by geneCo, which uses a web-based interface that users can easily access without any need to consider the computational environment. AVAILABILITY AND IMPLEMENTATION: Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and the web-based user interface is built by PHP, HTML and CSS to support all browsers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: In comparative and evolutionary genomics, a detailed comparison of common features between organisms is essential to evaluate genetic distance. However, identifying differences in matched and mismatched genes among multiple genomes is difficult using current comparative genomic approaches due to complicated methodologies or the generation of meager information from obtained results. This study describes a visualized software tool, geneCo (gene Comparison), for comparing genome structure and gene arrangements between various organisms. User data are aligned, gene information is recognized, and genome structures are compared based on user-defined GenBank files. Information regarding inversion, gain, loss, duplication and gene rearrangement among multiple organisms being compared is provided by geneCo, which uses a web-based interface that users can easily access without any need to consider the computational environment. AVAILABILITY AND IMPLEMENTATION: Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/geneCo. The main module of geneCo is implemented by Python and the web-based user interface is built by PHP, HTML and CSS to support all browsers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Comparative genomics is mainly focused on creating highly detailed visualizations of the common features between organisms. The major principle of comparative genomics is to compare basic biological similarities or differences in genomic features resulting from DNA sequences between organisms at the genetic level. Genomic features include DNA sequences, gene contents, gene order, regulatory sequences and other genomic structures. Therefore, comparative genomic approaches are needed to specifically align genome sequences and to compare genomic features among organisms. Comparative genomics also provides powerful tools to study evolutionary relationships among organisms and to identify genes that are either conserved or represent unique genomic features.Several comparative genomic tools, such as Artemis comparison tool (ACT) (Barrell ), Mauve (Darling ), BLAST ring image generator (BRIG) (Alikhan ) and Circos (Schnable ), have been developed for multiple genome assemblies. ACT visualizes comparisons of two or more genomes and is most useful for comparing a few DNA sequences, making it easy to spot and zoom in on regions of difference. This tool displays sequence similarities from various allowed input formats, such as GenBank entries or FASTA sequences. Mauve aligns whole genomes and shows output in the form of SNPs, regions of difference and homologous blocks, among others. Mauve can also be used to assess assembly quality against a reference using Mauve Contig Metrics. BRIG gives a global view of whole genome comparisons by visualizing BLAST comparisons with elaborate circular figures. BRIG is suitable for comparing multiple genomes; however, it is difficult to compare more than a dozen or so because each genome must be entered through the GUI. Circos uses plain text files for both input data and configuration, with the latter controlling the placement and format of each data track. The function to generate both data and configuration files automatically makes Circos highly amenable to incorporation in web-based database mining and visualization.Despite the many bioinformatic approaches to compare genomes, the development of tools for comparative DNA analysis remains a challenge. Since, the identification of mismatched genes that may be non-conserved sequences is also meaningful in terms of evolution. A visualization and comparative genomic tool, geneCo, is proposed to align and compare multiple genome structures resulting from user-defined data in the GenBank file format. Information regarding inversion, gain, loss, duplication and gene rearrangement among the multiple organisms being compared is provided by geneCo. Another purpose of geneCo is to provide a web-based user interface that can be comfortably used by biologists, offering easy-to-use options and displaying the results in a web browser.
2 Application
Figure 1 shows an overview of geneCo, including the web-based interface, user options and output results. Figure 1a is a screen shot of the web page. The main engine of geneCo is implemented in Python, and the web-based interface is implemented in PHP, Python, Javascript and Bootstrap (Spurlock, 2013). Figure 1b explains the option values. User configurations and usages are described in detail in Supplementary Data. Figure 1c shows the representative result for both the construction of a genome map and map comparisons. The output generated by geneCo varies in accordance with user options.
Fig. 1.
Overview of the geneCo system (geneCo web page (a), option values (b) and results of two different types in geneCo (c))
Overview of the geneCo system (geneCo web page (a), option values (b) and results of two different types in geneCo (c))The main functions of geneCo can be divided into two categories. The first function is ‘the construction of a genome map.’ GenBank files can be used as inputs to generate single and multiple genome maps. OrganellarGenomeDRAW (Lohse ) functions in a similar manner, but it generates only one gene map based on the plastid or mitochondrial sequence of the organelle. When comparing several genes, OrganellarGenomeDRAW has to generate several sets of individual data. In contrast, geneCo permits a comparison of genome maps in the order in which they appear in GenBank and the generation of outputs designed by the user that can be customized by adjusting genome lengths, intervals, output file formats and a user-defined functional category of configuration files.The second function is a ‘genome map comparison’ between genes from two different genomes based on genomic input sequences in the GenBank flat file format, which compares the matched genes of both genomes with genes that exist only in a single genome. Currently, existing tools either manually construct and compare each genome structure or do not include gene annotation, making it difficult to distinguish between matched or mismatched genes in genome structures at a glance. In contrast, geneCo takes GenBank input files and arranges them as left to right genome pairs according to the order of the input files. Thus, each step in n-loops compares a pair of Genbank files from the left to right genomes and draws them in accordance with the input settings.The comparative genomic method is a modified local sequence alignment based on the dynamic programming algorithm to align each matched gene name. This method compares the genes from two genomes in the order defined by the user and analyzes missed genes that are not conserved and the genomic features of different genomes in terms of their biological similarities and genetic levels. Matched and mismatched genes are distinguished and identified by geneCo. In addition, geneCo will also enable biologists to repeatedly change input parameters in order to return desired outputs, especially when there is a large amount of genome data to be analyzed. The geneCo method is described in detail in Supplementary Section S2.Multiple genome maps can be created by geneCo using the title and input files with additional gene alignment options. The final output is immediately generated in a web browser and supports various vector types as outputs. In addition, different display options for users who want increased precision for adjustments and greater customization based on their preferences are supported. For example, users can either set the color of the map to the default option or change it manually. Furthermore, users can define keywords and set functional categories to match the keywords using various colors to improve visualization. The legend in the output shows the specified functional categories. For analysis within a specific range, users can specify the start and end base-pair in the output using the zoom-in option. Moreover, geneCo also supports various output options for almost all objects.
3 Evaluation
Tools, such as BRIG, Mauve, ACT and Circos, can be used for comparative genomics. BRIG selects several query genomes in FASTA, GenBank, or EBML formats and selects only one as a reference and then compares the other genomes against the reference. However, geneCo can set multiple references that are determined by order of the uploaded GenBank files so that mismatched genes between two compared genomes are easily found. To evaluate the performance of geneCo, several mitochondria, nucleomorph chromosome 1 and plastid genomes from multiple species were used as the test dataset (Supplementary Tables S2 and S3). Supplementary Figures S2–S9 show geneCo outputs with different input options. Supplementary Tables S4 and S5 show the results obtained with other applications. The most important key feature of geneCo is the identification of mis-matched genes found by comparing two related genomes. Further details are described in the Supplementary Data.
4 Conclusion
Comparative genomics aims to find the common function between genomes to study the evolution of the genome. This study requires tools for comparing and visualizing of genomes. The proposed geneCo method is implemented as a Python-based software that can compare and analyze various genome maps. In the past, users have had to construct individual gene maps manually to compare genome structures. With geneCo, users can easily compare and analyze the position of genes, find common genes between other genomes, and find genes that exist only in one genome using GenBank files as input data with user-defined settings. Various options are available for visualizing elaborate genome structures and generating results specific to the objective of the user.
Funding
This research was supported by the National Research Foundation (NRF) of Korea funded by the Ministry of Science, ICT & Future Planning, Basic Science Research Program [MSIP; NRF-2016R1C1B1007929] to J.J.; the Ministry of Education [2018R1D1A1B07050727] to J.I.K.; [NRF-2016R1D1A1A09919318, NRF-2019R1F1A1064019] to G.Y.Conflict of Interest: none declared.Click here for additional data file.
Authors: Tim J Carver; Kim M Rutherford; Matthew Berriman; Marie-Adele Rajandream; Barclay G Barrell; Julian Parkhill Journal: Bioinformatics Date: 2005-06-23 Impact factor: 6.937
Authors: Patrick S Schnable; Doreen Ware; Robert S Fulton; Joshua C Stein; Fusheng Wei; Shiran Pasternak; Chengzhi Liang; Jianwei Zhang; Lucinda Fulton; Tina A Graves; Patrick Minx; Amy Denise Reily; Laura Courtney; Scott S Kruchowski; Chad Tomlinson; Cindy Strong; Kim Delehaunty; Catrina Fronick; Bill Courtney; Susan M Rock; Eddie Belter; Feiyu Du; Kyung Kim; Rachel M Abbott; Marc Cotton; Andy Levy; Pamela Marchetto; Kerri Ochoa; Stephanie M Jackson; Barbara Gillam; Weizu Chen; Le Yan; Jamey Higginbotham; Marco Cardenas; Jason Waligorski; Elizabeth Applebaum; Lindsey Phelps; Jason Falcone; Krishna Kanchi; Thynn Thane; Adam Scimone; Nay Thane; Jessica Henke; Tom Wang; Jessica Ruppert; Neha Shah; Kelsi Rotter; Jennifer Hodges; Elizabeth Ingenthron; Matt Cordes; Sara Kohlberg; Jennifer Sgro; Brandon Delgado; Kelly Mead; Asif Chinwalla; Shawn Leonard; Kevin Crouse; Kristi Collura; Dave Kudrna; Jennifer Currie; Ruifeng He; Angelina Angelova; Shanmugam Rajasekar; Teri Mueller; Rene Lomeli; Gabriel Scara; Ara Ko; Krista Delaney; Marina Wissotski; Georgina Lopez; David Campos; Michele Braidotti; Elizabeth Ashley; Wolfgang Golser; HyeRan Kim; Seunghee Lee; Jinke Lin; Zeljko Dujmic; Woojin Kim; Jayson Talag; Andrea Zuccolo; Chuanzhu Fan; Aswathy Sebastian; Melissa Kramer; Lori Spiegel; Lidia Nascimento; Theresa Zutavern; Beth Miller; Claude Ambroise; Stephanie Muller; Will Spooner; Apurva Narechania; Liya Ren; Sharon Wei; Sunita Kumari; Ben Faga; Michael J Levy; Linda McMahan; Peter Van Buren; Matthew W Vaughn; Kai Ying; Cheng-Ting Yeh; Scott J Emrich; Yi Jia; Ananth Kalyanaraman; An-Ping Hsia; W Brad Barbazuk; Regina S Baucom; Thomas P Brutnell; Nicholas C Carpita; Cristian Chaparro; Jer-Ming Chia; Jean-Marc Deragon; James C Estill; Yan Fu; Jeffrey A Jeddeloh; Yujun Han; Hyeran Lee; Pinghua Li; Damon R Lisch; Sanzhen Liu; Zhijie Liu; Dawn Holligan Nagel; Maureen C McCann; Phillip SanMiguel; Alan M Myers; Dan Nettleton; John Nguyen; Bryan W Penning; Lalit Ponnala; Kevin L Schneider; David C Schwartz; Anupma Sharma; Carol Soderlund; Nathan M Springer; Qi Sun; Hao Wang; Michael Waterman; Richard Westerman; Thomas K Wolfgruber; Lixing Yang; Yeisoo Yu; Lifang Zhang; Shiguo Zhou; Qihui Zhu; Jeffrey L Bennetzen; R Kelly Dawe; Jiming Jiang; Ning Jiang; Gernot G Presting; Susan R Wessler; Srinivas Aluru; Robert A Martienssen; Sandra W Clifton; W Richard McCombie; Rod A Wing; Richard K Wilson Journal: Science Date: 2009-11-20 Impact factor: 47.728
Authors: Quang Lam Truong; Thi Lan Nguyen; Thi Hoa Nguyen; Jishu Shi; Hiep Lai Xuan Vu; Thi Lan Huong Lai; Van Giap Nguyen Journal: Microbiol Resour Announc Date: 2021-05-13
Authors: Jong Im Kim; Bok Yeon Jo; Myung Gil Park; Yeong Du Yoo; Woongghi Shin; John M Archibald Journal: Front Plant Sci Date: 2022-05-26 Impact factor: 6.627