| Literature DB >> 27503221 |
Nicole E Wheeler1,2, Lars Barquist3, Robert A Kingsley4,5, Paul P Gardner1,2,6.
Abstract
MOTIVATION: Next generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide polymorphisms and indels on protein function, particularly in bacterial genomics.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27503221 PMCID: PMC5181535 DOI: 10.1093/bioinformatics/btw518
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 2.The results of a DBS comparison of the proteomes of S. Enteritidis (a generalist infectious agent) and S. Gallinarum (a host-restricted infectious agent). (A) The functional changes in orthologous protein-coding genes of S. Enteritidis and S. Gallinarum have been grouped into functional categories using the KEGG pathways database. Genes included in the pathway that have no ortholog in the other serovar are indicated in the darkest colour, followed by genes previously identified as hypothetically disrupted coding sequences (HDCs), then genes with significant DBS values that had not already been identified as HDCs. We refer to these as hypothetically attenuated coding sequences (HACs); (B) The distribution of delta-bitscores for orthologous genes showing non-synonymous changes in S. Enteritidis and S. Gallinarum. A symmetrical empirical distribution of scores generated by mirroring the least dispersed side is shown in red, the cutoff values we use to establish significance are shown as dashed lines. A skewed distribution implies excess functional changes in one lineage; (C) A plot of the dN/dS score and corresponding DBS is shown for each orthologous S. Enteritidis and S. Gallinarum gene
Fig. 1.(A) Area above the relative cost curve (AAC) values for different analysis methods using three mutagenesis benchmarking datasets: phage lysozyme, LacI and HIV protease. The performance for each model across a range of sequence identity cutoffs is shown, as well as AAC values for the individual methods. (B) Distribution of delta-bitscores for E. coli LacI using custom model analysis; (C) Distribution of delta-bitscores for E. coli LacI using Pfam model analysis
Fig. 3.Counts of hypothetically attenuated coding sequences (HACs) for each serovar, identified as the most extreme deviations from the median bitscore across strains for each gene. The generalist serovars are coloured in dark grey, host-restricted serovars are coloured in light grey