| Literature DB >> 32100706 |
Fiona Jane Whelan1, Martin Rusilowicz2, James Oscar McInerney2,1.
Abstract
The accessory genes of prokaryote and eukaryote pangenomes accumulate by horizontal gene transfer, differential gene loss, and the effects of selection and drift. We have developed Coinfinder, a software program that assesses whether sets of homologous genes (gene families) in pangenomes associate or dissociate with each other (i.e. are 'coincident') more often than would be expected by chance. Coinfinder employs a user-supplied phylogenetic tree in order to assess the lineage-dependence (i.e. the phylogenetic distribution) of each accessory gene, allowing Coinfinder to focus on coincident gene pairs whose joint presence is not simply because they happened to appear in the same clade, but rather that they tend to appear together more often than expected across the phylogeny. Coinfinder is implemented in C++, Python3 and R and is freely available under the GNU license from https://github.com/fwhelan/coinfinder.Entities:
Keywords: gene association networks; gene co-occurrence; pangenome
Mesh:
Year: 2020 PMID: 32100706 PMCID: PMC7200068 DOI: 10.1099/mgen.0.000338
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Description of Coinfinder output files
|
Suffix |
File description |
|---|---|
|
_pairs.tsv |
Tab-delimited list of significant coincident gene pairs |
|
_nodes.tsv |
Node list of all unique coincident genes and their D value |
|
_edges.tsv |
Edge list of significant gene–gene pairs and the associated |
|
_network.gexf |
GEXF (Graph Exchange XML Format) v1.2 formatted network file. Nodes are coloured by connected component (i.e. coincident gene set) and sized by D value; edge thickness is proportional to the |
|
_components.tsv |
Tab-delimited list of all connected components within the gene–gene coincident network |
|
_heatmap[0-X].pdf |
Heatmap images (R, ggplot2 [ |
Fig. 1.Example of Coinfinder output. The network (a,c) and heatmap (b,d) outputs from Coinfinder executed on 534 genomes. (a, c) The resultant gene association (a) and dissociation (c) networks. Each gene (node) is connected to (edge) another gene if they statistically associate/dissociate with each other in the pangenome. Nodes are coloured by connected component (i.e. coincident gene sets) and the colours correspond to those used in the heatmap outputs. The network file Coinfinder generates includes all node and edge colouring; Gephi [37] was used to apply the Fruchterman Reingold layout. (b,d) A portion of the heatmaps of the presence/absence patterns of the associating (b) and dissociating (d) gene sets. Similar to the network, each set of coincident genes are co-coloured. Genes are displayed in relation to the input core gene phylogeny. Here the phylogeny tip and gene cluster labels have been removed from the output for clarity. Additionally, the largest connected component in the network (wine colour) has been omitted from the heatmap for ease of display.
Real computational time for Coinfinder executed on a 534 genome dataset consisting of 2,813 accessory genes using different numbers of CPUs (GenuineIntel; Intel Xeon Gold 6142 CPU @ 2.60 GHz)
|
No. of CPUs |
Real computer clock time |
|---|---|
|
2 |
31m16.265s |
|
4 |
17m56.973s |
|
8 |
11m15.469s |
|
16 |
7m44.942s |
|
32 |
6m16.218s |
Number of gene–gene associations identified with different sized subsets of the original 534 genome dataset
|
Iteration |
|
|
|
|
|
|---|---|---|---|---|---|
|
|
75 586 |
52 038 |
24 196 |
1137 |
0 |
|
|
71 977 |
50 420 |
21 167 |
1389 |
0 |
|
|
75 190 |
51 459 |
25 545 |
1382 |
0 |
Fig. 2.Example of the association relationships Coinfinder can identify. (a) A clique of genes in the ntp operon which was identified within the association network (Fig. 1a). Six of these genes were correctly labelled with their gene names via the Prokka/Roary pipeline; one gene was given an alternative gene name often used as a synonym in the literature; a further two genes were listed as ‘hypothetical proteins’. Collectively, the nine genes that compose the V-ATPase/ntp operon form cliques with an additional 51 genes. These cliques are shown as a network (b) and as a presence–absence heatmap (c). In the heatmap, unlabelled gene columns represent unnamed hypotheticals.