Literature DB >> 32061017

genodive version 3.0: Easy-to-use software for the analysis of genetic data of diploids and polyploids.

Abstract

genodive version 3.0 is a user-friendly program for the analysis of population genetic data. This version presents a major update from the previous version and now offers a wide spectrum of different types of analyses. genodive has an intuitive graphical user interface that allows direct manipulation of the data through transformation, imputation of missing data, and exclusion and inclusion of individuals, population and/or loci. Furthermore, genodive seamlessly supports 15 different file formats for importing or exporting data from or to other programs. One major feature of genodive is that it supports both diploid and polyploid data, up to octaploidy (2n = 8x) for some analyses, but up to hexadecaploidy (2n = 16x) for other analyses. The different types of analyses offered by genodive include multiple statistics for estimating population differentiation (φST , FST , F'ST , GST , G'ST , G''ST , Dest , RST , ρ), analysis of molecular variance-based K-means clustering, Hardy-Weinberg equilibrium, hybrid index, population assignment, clone assignment, Mantel test, Spatial Autocorrelation, 23 ways of calculating genetic distances, and both principal components and principal coordinates analyses. A unique feature of genodive is that it can also open data sets with nongenetic variables, for example environmental data or geographical coordinates that can be included in the analysis. In addition, genodive makes it possible to run several external programs (lfmm, structure, instruct and vegan) directly from its own user interface, avoiding the need for data reformatting and use of the command line. genodive is available for computers running Mac OS X 10.7 or higher and can be downloaded freely from: http://www.patrickmeirmans.com/software.

Entities: Chemical Disease Species

Keywords: AMOVA; K-means; genetic distances; genetic diversity; polyploidy; population differentiation

Mesh：

Year: 2020 PMID： 32061017 PMCID： PMC7496249 DOI： 10.1111/1755-0998.13145

Source DB: PubMed Journal: Mol Ecol Resour ISSN： 1755-098X Impact factor: 7.090

INTRODUCING genodive VERSION 3.0

Here, I present genodive version 3.0, a program for the analysis of population genetic and population genomic data. genodive version 3.0 is a major update from the previously published version version 1.0, which was a command line tool that performed only a single task, the estimation of clonal diversity (Meirmans & Van Tienderen, 2004). Over the years, genodive has been under continuous development and the latest version has an intuitive user interface and implements a wide range of different types of analyses allowing for easy and straightforward testing of evolutionary and ecological hypotheses with genetic data. The philosophy behind genodive is to enable the performance of powerful analyses of modern genetic data sets without needing to use any command line tools. The program is available for computers running Apple Mac OS X 7.0 (“Lion”) and higher.

AN INTUITIVE USER INTERFACE

genodive version 3.0 has a fully developed mouse‐driven graphical user interface (GUI). This GUI provides a flexible and intuitive way of approaching data and its analysis (see Figure 1), for both students and advanced researchers. The program has a so‐called document‐based user interface, which means that multiple documents of different types can be open and analysed simultaneously.

Figure 1

The genodive user interface, with its most important features explained [Colour figure can be viewed at wileyonlinelibrary.com]

The genodive user interface, with its most important features explained [Colour figure can be viewed at wileyonlinelibrary.com] genodive can seamlessly import and export data in a wide variety of formats. This circumvents tedious recoding of text files from one format to the other, or the use of third‐party tools to do such reformatting. genodive automatically recognizes which format a text file is in and reads the data, without any need for dialogues. File formats recognized by genodive are bayesass (Wilson & Rannala, 2003), convert (Glaubitz, 2004), fstat (Goudet, 1995), genalex (Peakall & Smouse, 2006), genepop (Raymond & Rousset, 1995), genetix (Belkhir, Borsa, Chikhi, Raufaste, & Bonhomme, 2004), migrate (Beerli & Felsenstein, 1999), spagedi (Hardy & Vekemans, 2002) and structure (Pritchard, Stephens, & Donnelly, 2000); in addition, files can be exported in the formats of aflp‐surv (Vekemans, 2002), arlequin (Excoffier & Lischer, 2010), bayescan (Foll & Gaggiotti, 2008), lfmm (Frichot, Schoville, Bouchard, & Francois, 2013) and tess3 (Caye, Deist, Martins, Michel, & Francois, 2015). genodive also has its own file format that allows for some features that are unique to genodive, such as support for mixed ploidy data, clone assignments and multiple series of population groups. When data have been loaded successfully, the GUI of genodive provides extensive tools for manipulating the data (see Figure 1 for some examples). Most importantly, genodive makes it very easy to create a subset of the data. The GUI has checkboxes for every individual, population and locus that can be clicked to include or exclude them from the analysis. Furthermore, there is a “Special Include” dialogue that allows the user to include or exclude observations based on a specific set of criteria, such as including loci or individuals based on the percentage of missing data, including populations based on the number of samples, or including only a single individual per clone (one ramet per genet). This in‐ and excluding allows for testing hypotheses that pertain to only a subset of the data, for example only populations from a certain region or only individuals with a certain ploidy level. Furthermore, subsetting the data makes it possible to easily assess the influence of possible erratic individuals or loci on the outcome of the analysis. Besides subsetting, genodive provides other ways of data manipulation: transformation, recoding of alleles, subsampling to a single ploidy level, and smart imputation of missing data. Although genodive does not show any graphical output, great care has been taken to ensure that all output is tab‐delimited so that the results can be easily plotted using spreadsheet software such as excel or using r. Behind the user‐friendly GUI of genodive is a computational core that makes full use of the power of both laptop and desktop computers. genodive is fully multithreaded, which means that all calculations are performed in the background, keeping the user interface responsive and available for other tasks. Almost all computers sold nowadays have processors with multiple independent cores; genodive automatically distributes the calculations over all processor cores available in the computer. These parallelized computations, in combination with the fact that genodive is written in a low‐level programming language (C, with Objective C used for the interface) makes the program often several orders of magnitude faster than other programs, especially scripts written in python or r. In addition, the user‐friendly GUI means that those who do not have programming skills—or do not have the time to write and debug custom‐made scripts—do have access to powerful statistical tools. genodive can handle a wide range of data sets, from modest microsatellite data sets with only a few loci to large RAD data sets with tens of thousands of SNPs. This is because the maximum size of data sets for genodive is only limited by the amount of working memory (RAM) of the computer: genodive has been successfully used to analyse data sets with tens of thousands of loci and hundreds of individuals (e.g. Benestan et al., 2015; Miller et al., 2018). This is also where the multithreaded calculations really make a difference as the calculations for the different loci can be easily distributed over the computer's processors. Importantly, all such technicalities remain hidden behind the GUI, thus enabling the researcher to focus on the analysis itself.

DATA ANALYSIS FEATURES UNIQUE TO genodive

Polyploids

genodive supports a wide range of ploidy levels from haploid up to hexadecaploid (2n = 16x); this means that almost all analyses implemented in genodive can be used on any of these ploidy levels or a mixture thereof. The analysis of polyploid data presents some challenges that are not present in diploids (Dufresne, Stift, Vergilino, & Mable, 2014; Meirmans, Liu, & Van Tienderen, 2018): information on the dosage of alleles—the number of copies of each allele present within an individual—is often missing and there can be double reduction, which leads to nonrandom segregation of alleles into gametes. genodive has dedicated algorithms to deal with these issues for autopolyploids, including the estimation of allele frequencies when there is missing dosage information (De Silva, Hall, Rikkerink, Mcneilage, & Fraser, 2005; for data up to octaploids) and the calculation of a ploidy‐independent estimator of population differentiation, the ρ‐statistic, which is also independent of the rate of double reduction (Meirmans & Liu, 2018; Ronfort, Jenczewski, Bataillon, & Rousset, 1998). However, genodive is not limited to polyploid data; it is equally well suited for the analysis of diploid data.

Combining genetic and ecological data

Besides loading allelic genetic data sets, genodive can also read data sets with descriptive variables for the individuals or populations. These latter data sets are especially useful for integrating ecological data of the samples into the data analysis. For example, spatial coordinates can be used for spatial autocorrelation analysis and testing for isolation by distance, or climatic data for the sampling sites can be used for tests of isolation by environment or for genetic environment associations.

AMOVA‐based clustering

genodive can cluster genetic data based on analysis of molecular variance (AMOVA, Excoffier, Smouse, & Quattro, 1992), where the F‐statistics from AMOVA are used as the optimality criterion to find the clustering that gives the maximum amount of genetic differentiation among clusters (Meirmans, 2012b). Clustering can take place at the individual level or at the population level—which is absent in other genetic clustering analyses—and either uses a standard K‐means or a specially modified simulated annealing algorithm. genodive can also perform a standard AMOVA with a predefined hierarchical population structure, for both diploid and polyploid data (Meirmans & Liu, 2018).

Clone assignment

genodive can assign individuals to clones (genets), based on their multilocus genotypes, while allowing for somaclonal mutations and genotyping errors (Meirmans & Van Tienderen, 2004). This functionality was formerly present in a separate program, genotype, but has now been fully integrated into genodive. The clone assignment is done by selecting a threshold in a histogram of among‐individual genetic distances. In genodive version 3.0 this threshold can be selected visually in the GUI with direct feedback about the number of resulting clones. Furthermore, genodive now implements a permutation test to verify whether the replication of multilocus genotypes is indeed due to clonal replication and not simply due to insufficient resolution of the genetic markers (Gomez & Carvalho, 2000).

Add‐in analyses

genodive can run several separate programs as add‐ins from within genodive to perform some additional analyses. This means that genodive does not implement the algorithm itself, but it runs the program in the background, parses the results and displays these within genodive. The advantage of this is that all analyses can be run using the genodive GUI, which contains the most important settings available for each program. Currently, four such add‐ins are available: instruct (Gao, Williamson, & Bustamante, 2007), lfmm (Frichot et al., 2013), Redundancy Analysis using vegan in r (Oksanen et al., 2015), and structure (Pritchard et al., 2000). For an add‐in to work, the user must first install these programs on their computer, by downloading them from their respective websites.

ADDITIONAL TYPES OF DATA ANALYSIS

Genetic diversity and HWE

genodive can calculate multiple indices for estimating genetic diversity within populations, including gene diversity (H S; for diploids equal to the expected heterozygosity), observed heterozygosity (H O), number of alleles and effective number of alleles. When full dosage information is available for polyploids, H O is calculated as the gametic heterozygosity (Moody, Mueller, & Soltis, 1993)—the heterozygosity of diploid gametes drawn from the polyploid genotype—as this is known to provide better comparisons among ploidy levels (Meirmans et al., 2018). Conformation to Hardy–Weinberg equilibrium (HWE) expectations can be tested using the F IS statistic calculated on H O and H S or using the within‐individual and among‐individual variance components of AMOVA. Significance testing is performed using permutations of alleles among individuals.

Genetic differentiation

The merits of different approaches for estimating the degree of differentiation among populations have been debated extensively in the past (Jost, 2008; Meirmans & Hedrick, 2011; Whitlock, 2011). genodive therefore provides a wide range of summary statistics for such estimation, implementing the two most widely used approaches: one based on heterozygosities and one based on variance components. Included heterozygosity‐based estimators are G ST (Nei, 1987), standardized GʹST (Hedrick, 2005), its unbiased estimator GʹʹST (Meirmans & Hedrick, 2011), D est (Jost, 2008) and the ploidy‐independent ρ‐statistic (Ronfort et al., 1998). The variance components‐based estimators are all calculated using AMOVA (Excoffier et al., 1992; Michalakis & Excoffier, 1996); therefore, the returned summary statistic depends on the choice of distance metric used, and can be either φ ST, F ST, FʹST, R ST or ρ. The AMOVA‐based statistics can naturally be calculated at multiple hierarchical levels (clusters of populations) resulting in hierarchical series of F‐statistics. The interface for defining this hierarchy is very flexible and can include or exclude the individual or population level. This flexibility makes it straightforward to compare multiple ways of clustering populations. Significance testing can be done, using permutations, either overall or between pairs of populations.

Distance metrics

All the above‐mentioned differentiation estimators can be used in genodive to calculate a matrix of pairwise genetic distances among populations. In addition, several other among‐population distance metrics are included in the program: Nei's D (Nei, 1987), and Rogers’ (1972), Euclidean, Manhattan, Chord (Cavalli‐Sforza & Bodmer, 1971) and chi‐square distances. For polyploid data with missing dosage information, G ST, GʹʹST and D est can additionally be calculated with dosage correction (De Silva et al., 2005). Besides these among‐population distances, genodive can also estimate the following distances among pairs of individuals: clonal (Meirmans & Van Tienderen, 2004), Bruvo (Bruvo, Michiels, D’Souza, & Schulenburg, 2004), Smouse and Peakall (1999), Euclidean (Meirmans & Liu, 2018), Manhattan, Chord (Cavalli‐Sforza & Bodmer, 1971) and chi‐square distances. The program can also calculate the kinship coefficient between pairs of individuals (Loiselle, Sork, Nason, & Graham, 1995), which is technically not a distance metric but an estimate of the genealogical relationship among individuals. Finally, genodive can calculate distances based on ecological variables; the most important of these is the geographical distance among longitude–latitude data, taking both the curvature of the Earth and the flattening of the Earth at the poles into account. The genodive GUI also makes it easy to import and export distance matrices from and to other programs. For importing triangular matrices, genodive has a smart algorithm to detect whether it is a lower or upper diagonal matrix, based on triangle inequality.

Spatial autocorrelation

Processes such as isolation by distance can lead to striking patterns of spatial autocorrelation in genetic data (Meirmans, 2012a). genodive implements the standard method to test for this using a Mantel test (Mantel, 1967), testing for a correlation between genetic distances—either among individuals or among populations—and geographical distances. Other options are to correct for the influence of a third distance matrix (partial Mantel test, Smouse, Long, & Sokal, 1986), or to constrain the permutations based on specified strata, for example population clusters (see Meirmans, 2015). A more fine‐grained analysis of spatial autocorrelation is provided by calculation of a correlogram, which can be done in genodive for either population data or individual data. For a correlogram, the matrix of geographical distances needs to be converted into a set of distance classes; in the genodive GUI, this can be done in a dialogue that allows the use to choose between equidistant classes (breaks between classes are equally spaced) or equifrequent classes (where the spacing is changed to have approximately the same number of distances in each class). The dialogue shows a histogram that reflects the distribution of distances over the classes and permits editing the class distances by hand.

Miscellaneous

In addition to all the above analyses, genodive provides a number of other analyses that are frequently done for the analysis of population genetic data. Because some of these analyses do not allow for missing data, genodive has three different ways of imputing missing data: (a) randomly picking alleles based on the overall allele frequencies; (b) randomly picking alleles based on the population allele frequencies; and (c) restoring the dosage of polyploid individuals based on the incomplete genotype and the estimated population allele frequencies. As with any imputation of missing data, users should be aware that this may lead to a bias, the direction and extent of which may depend on the used imputation method. genodive implements both a principal components analysis (PCA), based on a matrix of within‐individual or within‐population allele frequencies, and a principal coordinates analysis (PCoA), based on a user‐specified distance matrix. The latter analysis offers the option to correct for negative eigenvalues that may be the result of using a noneuclidean distance metric. genodive also implements population assignments (Paetkau, Slade, Burden, & Estoup, 2004), including the Monte Carlo test of Cornuet, Piry, Luikart, Estoup, and Solignac (1999) to generate a null‐distribution of likelihood values with which the values for the sampled individuals are compared. The assignments can be done for polyploids, but the algorithm does not yet include the recent method of Field, Broadhurst, Elliott, and Young (2017) to allow for missing dosage and double reduction. genodive is also the only software that can calculate Buerkle’s (2005) hybrid index for polyploid data. Finally, genodive allows testing whether groups of populations differ in their level of genetic diversity or in the strength of the population differences. This latter option is, for example, useful to test whether there is a difference in those aspects when comparing invading populations to populations from the native range of the species.

CONCLUSIONS

Overall, genodive provides a comprehensive tool for the analysis of population genetic data: it is very easy to use thanks to its intuitive user interface while empowering a wide range of different types of analyses, several of which are unique to the program. In particular, genodive has extensive support for polyploids; however, it is equally well suited for the analysis of diploid data. More information about the program is available both in the form of a pdf‐manual and in the on‐screen Help system, accessible via the Help‐menu in the GUI. genodive is available only for Mac OS and can be downloaded free of charge from: http://www.patrickmeirmans.com/software.

AUTHOR CONTRIBUTION

P.M. taught himself programming, wrote the software and wrote the manuscript.

33 in total

1. New methods employing multilocus genotypes to select or exclude populations as origins of individuals.

Authors: J M Cornuet; S Piry; G Luikart; A Estoup; M Solignac
Journal: Genetics Date: 1999-12 Impact factor: 4.562

2. Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure.

Authors: P E Smouse; R Peakall
Journal: Heredity (Edinb) Date: 1999-05 Impact factor: 3.821

3. Sex, parthenogenesis and genetic structure of rotifers: microsatellite analysis of contemporary and resting egg bank populations.

Authors: A Gómez; G R Carvalho
Journal: Mol Ecol Date: 2000-02 Impact factor: 6.185

4. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.

Authors: Laurent Excoffier; Heidi E L Lischer
Journal: Mol Ecol Resour Date: 2010-03-01 Impact factor: 7.090

5. Seven common mistakes in population genetics and how to avoid them.

Authors: Patrick G Meirmans
Journal: Mol Ecol Date: 2015-06-19 Impact factor: 6.185

6. Genome-Wide Assessment of Diversity and Divergence Among Extant Galapagos Giant Tortoise Species.

Authors: Joshua M Miller; Maud C Quinzin; Danielle L Edwards; Deren A R Eaton; Evelyn L Jensen; Michael A Russello; James P Gibbs; Washington Tapia; Danny Rueda; Adalgisa Caccone
Journal: J Hered Date: 2018-08-24 Impact factor: 2.645

Review 7. The Analysis of Polyploid Genetic Data.

Authors: Patrick G Meirmans; Shenglin Liu; Peter H van Tienderen
Journal: J Hered Date: 2018-03-16 Impact factor: 2.645

8. Genetic variation and random drift in autotetraploid populations.

Authors: M E Moody; L D Mueller; D E Soltis
Journal: Genetics Date: 1993-06 Impact factor: 4.562

9. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update.

Authors: Rod Peakall; Peter E Smouse
Journal: Bioinformatics Date: 2012-07-20 Impact factor: 6.937

10. genodive version 3.0: Easy-to-use software for the analysis of genetic data of diploids and polyploids.

Authors: Patrick G Meirmans
Journal: Mol Ecol Resour Date: 2020-03-11 Impact factor: 7.090

42 in total

1. Population genetic data and forensic parameters of the 27 Y-STR panel Yfiler^® Plus in Russian population.

Authors: Andrei Semikhodskii; Yevgeniy Krassotkin; Tatiana Makarova; Vladislav Zavarin; Viktoria Ilina; Daria Sutyagina
Journal: Int J Legal Med Date: 2021-04-21 Impact factor: 2.686

2. Development of the first microsatellite markers using high-throughput sequencing for a hexaploid coastal species, Tournefortia argentea L. f. (Boraginaceae).

Authors: Miaomiao Shi; Qiubiao Zeng; Tieyao Tu; Dianxiang Zhang
Journal: Mol Biol Rep Date: 2021-08-30 Impact factor: 2.316

3. Genome-scale phylogeography resolves the native population structure of the Asian longhorned beetle, Anoplophora glabripennis (Motschulsky).

Authors: Mingming Cui; Yunke Wu; Marion Javal; Isabelle Giguère; Géraldine Roux; Jose A Andres; Melody Keena; Juan Shi; Baode Wang; Evan Braswell; Scott E Pfister; Richard Hamelin; Amanda Roe; Ilga Porth
Journal: Evol Appl Date: 2022-06-07 Impact factor: 4.929

4. Linking genetic, morphological, and behavioural divergence between inland island and mainland deer mice.

Authors: Joshua M Miller; Dany Garant; Charles Perrier; Tristan Juette; Joël W Jameson; Eric Normandeau; Louis Bernatchez; Denis Réale
Journal: Heredity (Edinb) Date: 2021-12-24 Impact factor: 3.821

5. Annual aboveground carbon uptake enhancements from assisted gene flow in boreal black spruce forests are not long-lasting.

Authors: Martin P Girardin; Nathalie Isabel; Xiao Jing Guo; Manuel Lamothe; Isabelle Duchesne; Patrick Lenz
Journal: Nat Commun Date: 2021-02-19 Impact factor: 14.919

6. Genetic diversity and population structure in the endangered tree Hopea hainanensis (Dipterocarpaceae) on Hainan Island, China.

Authors: Chen Wang; Xiang Ma; Mingxun Ren; Liang Tang
Journal: PLoS One Date: 2020-11-30 Impact factor: 3.240

7. Genetic structure analysis of cultivated and wild chestnut populations reveals gene flow from cultivars to natural stands.

Authors: Sogo Nishio; Norio Takada; Shingo Terakami; Yukie Takeuchi; Megumi K Kimura; Keiya Isoda; Toshihiro Saito; Hiroyuki Iketani
Journal: Sci Rep Date: 2021-01-08 Impact factor: 4.379

8. Isolation and characterization of twelve polymorphic microsatellite markers in the endangered Hopea hainanensis (Dipterocarpaceae).

Authors: Chen Wang; Xiang Ma; Liang Tang
Journal: Ecol Evol Date: 2020-12-02 Impact factor: 2.912

9. Genetic differentiation and signatures of local adaptation revealed by RADseq for a highly dispersive mud crab Scylla olivacea (Herbst, 1796) in the Sulu Sea.

Authors: Michael John R Mendiola; Rachel Ravago-Gotanco
Journal: Ecol Evol Date: 2021-05-04 Impact factor: 2.912

10. genodive version 3.0: Easy-to-use software for the analysis of genetic data of diploids and polyploids.

Authors: Patrick G Meirmans
Journal: Mol Ecol Resour Date: 2020-03-11 Impact factor: 7.090