Literature DB >> 20562415

Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

Thomas D Otto¹, Mandy Sanders, Matthew Berriman, Chris Newbold.

Abstract

MOTIVATION: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy.
RESULTS: Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. AVAILABILITY: The software is available at http://icorn.sourceforge.net

Entities: Disease Species

Mesh：

Substances：
Nucleotides

Year: 2010 PMID： 20562415 PMCID： PMC2894513 DOI： 10.1093/bioinformatics/btq269

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Although there are now over 5000 whole genome sequences in the public databases, their level of accuracy varies considerably. The aspiration set by the Human Genome Project was for a maximum of one error per 10 kb of finished sequence (International Human Genome Sequencing Consortium, 2001). However, the true error rate varies significantly from this figure depending on the nature of the sequence (base composition, repeats, etc.) both in human and in other organisms. Even to achieve this error rate, expensive manual finishing is required to ensure that each base is covered by at least 2 clones and has a cumulative Phred score of at least 40 (Ewing and Green, 1998). The ‘Gold Standard’ for quality involves the manual inspection of each base by an experienced finisher. This is a major expense within a genome project. For example, nine chromosomes of Plasmodium falciparum were completed at the Wellcome Trust Sanger Institute (Hall et al., 2002), by the equivalent of approximately seven finishers working for up to 5 years. Despite this and subsequent efforts since publication, as we show here, many errors are still present. Even in genomes described as completed or finished, the underlying quality at each base is unknown and the error rate can be variable genome-wide. Therefore, rapidly fixing errors, highlighting regions that are error-prone and quantifying accuracy genome-wide is a priority that will significantly benefit the end user. So far, few methods exist to correct genomic errors automatically. There are algorithms to improve base calling (Gajer et al., 2004) or to detect frameshifts by protein homology or by sequence analysis. New assembly software like Mira (http://www.chevreux.org/projects_mira.html) has also been developed that allows hybrid assemblies with different sequencing technologies. This can both assemble mixed Sanger/454 data and improve the homopolymer length errors in 454 technologies using high Illumina read coverage. To date, however, no methods exist that can accurately detect and correct base errors and small indels in genome sequences. We have developed an algorithm that uses deep coverage of sequence reads produced using Illumina's Genome Analyser platform, mapped iteratively to a reference genome, in a way that allows confident sequence correction.

2 METHODS

Due to their short length, mapping reads from second generation sequencing platforms is highly susceptible to single base errors or small indels. Small corrections made to a reference can, therefore, improve the mapability of short reads and, conversely, introducing small errors in a reference will markedly reduce mapability. We have made use of this fact in developing a new methodology to automatically correct base errors and short insertions or deletions (indels) of up to 3 bp. In an iterative process, short reads are mapped against the genome and high-quality discrepancies and indels are identified and corrected. In each iteration, we compare the coverage of perfectly mapping reads at each corrected base before and after correction. Corrections that reduce the read coverage at that position are rejected. In this way, we evaluate whether each potential correction is accurate or not. We repeat the iterations until no new corrections are called.

2.1 Data

For the P.falciparum reference sequence, we used 3D7 version 2.1.4 (ftp://ftp.sanger.ac.uk/pub/pathogens/Plasmodium/falciparum/3D7/3D7.version2.1.4/). All Illumina data were produced within the Sanger sequencing facility. The protocol to obtain PCR-free data is described in Kozarewa et al. (2009). Preparation of other samples can be seen in Quail et al. (2008).

2.2 Iterative Correction of Reference Nucleotides Implementation

An overview of Iterative Correction of Reference Nucleotides (iCORN) is given in Figure 1. The program itself is hosted at http://icorn.sourceforge.net/. Short reads are first mapped with SSAHA2 (Ning et al., 2001) against the genome sequence that is to be corrected (although another mapping algorithm could be used, e.g. Li et al., 2008). Standard Illumina mapping values are used with the ‘paired’ option when reads are paired. Read pairs that do not map within the correct insert size constraint, map to different chromosomes or are in the wrong orientation, are ignored. Using the SSAHA pileup pipeline (ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/), single nucleotide polymorphisms (SNPs) and short indels (1–3 bp) are called from the remaining read pairs. Note that, each ‘SNP’ or ‘indel’ called by the software refers to potential sequencing errors or sample heterogeneity. A SNP is accepted if it has a SSAHA SNP quality of at least 60. Short indels are called if they occur in at least 30% of the reads with a minimum read coverage of at least 5. These parameters are the standard values but can be changed. The called SNP and indel errors are corrected in the genome sequence and saved as a new version.

Fig. 1.

Flow chart of iCORN.

Flow chart of iCORN. To evaluate the corrections, the coverage of each base before correction is compared to that after correction using SNP-o-matic (Manske and Kwiatkowski, 2009) that only maps reads mapping perfectly over their whole length. If read coverage of a corrected base goes down, the change is rejected and the original sequence is restored. If there is no change in coverage, we assume that this region may have additional errors and accept the correction. The procedure is repeated, using the newly corrected genome sequence as the reference and continues to iterate until no new errors can be found. The algorithm returns all changes, including coverage statistics, in GFF format [visible with Artemis (Carver et al., 2008)] or as Gap4 feature file.

3 RESULTS

To calibrate the SSAHA2 alignment score threshold to detect real base errors but minimize false positives, all calls on a single chromosome at different calling thresholds were manually inspected by an experienced professional finisher. This involved interrogating the capillary reads and their quality scores in a GAP4 database. Using a SSAHA SNP score of 60 (reflecting a base coverage of ≥20) resulted in all of the corrections being confirmed. This score was subsequently adopted for future analyses. We first applied our algorithm to the genome sequence of P.falciparum: a reference sequence whose low complexity results from an extremely biased base composition (19% G+C content) and presents a challenge to short read alignment algorithms due to its exceptionally low information content. For the analysis, we used 28 million 36 bp paired-end and 20 million 76 bp paired-end Illumina reads. The mean coverage obtained by mapping read pairs with the correct fragment size was 82.9× and we used a minimum coverage of 20-fold to call changes (see Section 2). An example of a corrected region can be seen in Figure 2. The coverage plots show that the amount of mapping reads and perfectly mapping reads increase with each iteration.

Fig. 2.

Example of correction of a region of chromosome one of P.falciparum 3D7. The upper plot shows the coverage per iteration of the SSAHA mapping. The lower plot represent the coverage of the perfect mapping reads SNP-o-matic (http://snpomatic.sourceforge.net/). The vertical bars show the positions of the corrections. The actual corrections made at each iteration are shown in the multiple sequence alignment below. After 6 iterations, no new corrections were called. We found a total of 1906 base errors and 368 indels. In the first iteration, 81% of the base errors were corrected. After the corrections, the coverage of 84 827 more bases pairs increased to at least 5 and 87 952 additional reads were perfectly mapped. For most chromosomes, the single base error detection drops to zero after the fifth iteration (Supplementary Table S1). Intitially, we found 208 sites (SNP score ≥60) that appear to be heterozygous in this haploid organism, using a cutoff of 15% of calls of an alternative base. Visual inspection of the areas in which these heterozygous calls occurred, however, revealed that the majority (75%) were roughly symmetrically distributed around homopolymeric tracts. The remainder appears to be strand specific as they only occur in one read direction and are clustered in general in sequences that are rich in T and G bases. These calls were not present in the original capillary sequence data and we believe them to be hitherto unreported systematic errors occurring during Illumina sequencing (Supplementary Fig. S1). Ninety-six percent of the genome is covered by a read depth of ≥20, so that we were unable at this level of confidence to correct the remaining 4% of the genome. These regions are mostly telomeric and non-unique. To test the accuracy of our algorithm, we randomly introduced approximately one error per 50 kb into the 3D7 sequence 2.1.4, inserting a total of 457 errors and used the Illumina reads to correct this altered genome using iCORN. In the first iteration, 435 (96%) of the errors were found (Supplementary Table S2). As the errors were generated randomly, they were not clustered and could be found quickly. The random distribution also explains that 4% of the introduced errors cannot be found, as 4% of the genome is not covered sufficiently to be corrected, (Supplementary Table S1). We further evaluated the performance of iCORN by manually inspecting the capillary chromatograms of called errors in chromosomes 5, 9 and 14. Of 174 corrected errors, 1 was rejected. This region comprised a string of 45 As with a G in the middle and was re-sequenced following polymerase chain reaction (PCR) amplification. This confirmed the presence of the G that had been erroneously corrected by iCORN to an A. We suspect that this may be due to the fact that polyA sequences are over represented in Illumina data because of occasional edge effects on the slide. Finally, we designed an additional 96 PCR products over regions with correction. Eighty-eight out of 96 PCR reactions were successful and in no case did the PCR product sequence disagree with the changes called by iCORN. We next went on to assess the utility of this approach to correct the homopolymer errors that can occur using 454 technology (Droege and Hill, 2008). We applied it to a 454 assembly of 310 242 reads (fragment size 3 kb) from P.berghei. The contigs from the assembly were corrected with ∼50 million 76 bp PCR-free paired-end Illumina reads. After 6 iterations, 25 976 SNPs and 33 860 indels were called (Table 1). Figure 3A shows a typical example where multiple frameshifts due to homopolymer errors are corrected after just two iterations. Figure 3B shows similar data from the correction of a 454 assembly of Clostridium difficile using deep Illumina coverage. In both cases more indels than SNPs are called due to homopolymer errors.

Table 1.

Application of iCORN to prokaryotic and eukaryotic genome projects in various stages of completion

Organism	Sequence quality	Sequencing method	Genome size (Mb)	SNPs	Indels	Number rejected	Genome covered		New mappable reads	Iterations
							Before (%)	After (%)
Plasmodium falciparum 3D7	A	Capillary	23	1906	368	30	97.20	97.56	24 698	6
Echinococcus multilocularis	B	Capillary	110	5508	2520	2140	48.89	49.11	1 023 315	5
Leishmania major	B	Capillary	33	594	1061	122	98.52	98.62	313	6
Leishmania infantum	B	Capillary	32	2770	1878	320	89.26	89.72	5629	8
Plasmodium ovale	B	Capillary	21	1431	238	1081	91.27	91.42	6368	4
Plasmodium berghei	B	454	18	25 976	33 860	5639	88.65	95.38	140 788	7
Plasmodium berghei	B	Capillary	22	1901	3818	538	97.18	97.48	23 805	7
Chlaymiadia trachomatis	B	Capillary	1.0	487	16	18	99.86	99.997	9734	4
Clostridium difficile	B	454	4.1	61	1652	32	99.30	99.43	1708	6
Streptococcus pneumoniae	B	RNAseq	2.0	13	5	1	64.23	64.23	6	3
Streptococcus suis BM402	A	Capillary	2.1	2	1	0	98.84	98.85	15	2
Streptococcus suis P1_7	A	Capillary	2.0	0	0	0	99.7626		0	1
Salmonella Dublin Strain	B	454	5.0	13	45	18	96.84	96.85	207	7
Yersinia enterocolitica	B	Capillary	5.0	25	235	6	99.96	99.97	131 796	3

Sequence quality: ‘A’ indicates manually finished and published genomes and ‘B’ indicates a draft assembly. SNPs and Indels shows the total number called between the first and last iteration. Rejected indicates the total number of changes that were rejected because they decreased the total of perfectly mapping reads at that location. Percent genome covered indicates how many bases are covered at least five times by perfectly mapping reads, before and after the correction. New mapable reads indicates the additional number of reads that could be mapped by SSAHA between the first and last iteration. Further information can be found in Supplementary Table S3.

Fig. 3.

Examples of corrections of homopolymer length errors in assemblies from 454 sequencing. Details of the reads used can be found in Table 1. Figures are Artemis screen shots that show the three different reading frames in the direction of the gene. Black vertical lines are stop codons. Filled coloured boxes denote open reading frames. (A) Correction of a region of an assembly of P.berghei 454 reads. (B) Correction of a region of a 454 assembly of C.dificile.

Application of iCORN to prokaryotic and eukaryotic genome projects in various stages of completion Sequence quality: ‘A’ indicates manually finished and published genomes and ‘B’ indicates a draft assembly. SNPs and Indels shows the total number called between the first and last iteration. Rejected indicates the total number of changes that were rejected because they decreased the total of perfectly mapping reads at that location. Percent genome covered indicates how many bases are covered at least five times by perfectly mapping reads, before and after the correction. New mapable reads indicates the additional number of reads that could be mapped by SSAHA between the first and last iteration. Further information can be found in Supplementary Table S3. Examples of corrections of homopolymer length errors in assemblies from 454 sequencing. Details of the reads used can be found in Table 1. Figures are Artemis screen shots that show the three different reading frames in the direction of the gene. Black vertical lines are stop codons. Filled coloured boxes denote open reading frames. (A) Correction of a region of an assembly of P.berghei 454 reads. (B) Correction of a region of a 454 assembly of C.dificile. Finally, we applied iCORN to a series of other eukaryotic and prokaryotic genome projects in various stages of completion (Table 1). For finished bacterial genomes, very few or no corrections were made. For those in draft assembly, it was possible to call a number of errors in relatively few iterations. This is presumably because bacteria generally have higher coverage, are shorter and have a less complex genome structure than eukaryotes. All errors in Yersinia enterocolitica and Streptococcus suis were confirmed by manual inspection of the trace files.

4 DISCUSSION

Here, we have shown that iterative mapping of short reads can correct errors remaining in a reference genome with great accuracy. Critical to the success of this approach is the use of two different mapping strategies during the iterations. High-quality discrepancies called using SSAHA2 are introduced into the genome and only confirmed if a separate mapping of perfectly aligning reads along their whole length using SNP-o-matic does not decrease coverage at the altered sites. Iterative mapping approaches have been used before to derive a consensus genome sequence from metagenomic sequencing data (Dutilh et al., 2009) but since this derives from aggregated sequences from an unknown number of starting genotypes, the resulting consensus represents no single genome and hides much of the diversity present in the original sequence pool. We have also shown that, after very few iterations, iCORN is efficient at correcting homopolymer errors that are often present in 454 data, thus potentially improving the ability to combine assemblies constructed using different sequencing technologies. We have explored the use of iCORN to ‘morph’ a reference genome into a closely related genotype using deep short read coverage. Although this approach may produce erroneous sequence changes, we have found that it has been very successful in improving the mapping of assembled contigs from a new genotype onto a reference genome. Finally, with third generation sequencing technology on the horizon, bringing gigabase coverage from much longer read lengths but with an increase in error rates, the use of additional Illumina reads and algorithms such as iCORN may be of considerable use in first-pass error correction.

12 in total

1. Initial sequencing and analysis of the human genome.

Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal: Nature Date: 2001-02-15 Impact factor: 49.962

2. SSAHA: a fast search method for large DNA databases.

Authors: Z Ning; A J Cox; J C Mullikin
Journal: Genome Res Date: 2001-10 Impact factor: 9.043

3. Automated correction of genome sequence errors.

Authors: Pawel Gajer; Michael Schatz; Steven L Salzberg
Journal: Nucleic Acids Res Date: 2004-01-26 Impact factor: 16.971

4. Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors: Heng Li; Jue Ruan; Richard Durbin
Journal: Genome Res Date: 2008-08-19 Impact factor: 9.043

5. Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors: B Ewing; P Green
Journal: Genome Res Date: 1998-03 Impact factor: 9.043

6. Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13.

Authors: N Hall; A Pain; M Berriman; C Churcher; B Harris; D Harris; K Mungall; S Bowman; R Atkin; S Baker; A Barron; K Brooks; C O Buckee; C Burrows; I Cherevach; C Chillingworth; T Chillingworth; Z Christodoulou; L Clark; R Clark; C Corton; A Cronin; R Davies; P Davis; P Dear; F Dearden; J Doggett; T Feltwell; A Goble; I Goodhead; R Gwilliam; N Hamlin; Z Hance; D Harper; H Hauser; T Hornsby; S Holroyd; P Horrocks; S Humphray; K Jagels; K D James; D Johnson; A Kerhornou; A Knights; B Konfortov; S Kyes; N Larke; D Lawson; N Lennard; A Line; M Maddison; J McLean; P Mooney; S Moule; L Murphy; K Oliver; D Ormond; C Price; M A Quail; E Rabbinowitsch; M-A Rajandream; S Rutter; K M Rutherford; M Sanders; M Simmonds; K Seeger; S Sharp; R Smith; R Squares; S Squares; K Stevens; K Taylor; A Tivey; L Unwin; S Whitehead; J Woodward; J E Sulston; A Craig; C Newbold; B G Barrell
Journal: Nature Date: 2002-10-03 Impact factor: 49.962

7. Increasing the coverage of a metapopulation consensus genome by iterative read mapping and assembly.

Authors: Bas E Dutilh; Martijn A Huynen; Marc Strous
Journal: Bioinformatics Date: 2009-06-19 Impact factor: 6.937

8. A large genome center's improvements to the Illumina sequencing system.

Authors: Michael A Quail; Iwanka Kozarewa; Frances Smith; Aylwyn Scally; Philip J Stephens; Richard Durbin; Harold Swerdlow; Daniel J Turner
Journal: Nat Methods Date: 2008-12 Impact factor: 28.547

9. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database.

Authors: Tim Carver; Matthew Berriman; Adrian Tivey; Chinmay Patel; Ulrike Böhme; Barclay G Barrell; Julian Parkhill; Marie-Adèle Rajandream
Journal: Bioinformatics Date: 2008-10-09 Impact factor: 6.937

10. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes.

Authors: Iwanka Kozarewa; Zemin Ning; Michael A Quail; Mandy J Sanders; Matthew Berriman; Daniel J Turner
Journal: Nat Methods Date: 2009-03-15 Impact factor: 28.547

150 in total

1. Graph accordance of next-generation sequence assemblies.

Authors: Guohui Yao; Liang Ye; Hongyu Gao; Patrick Minx; Wesley C Warren; George M Weinstock
Journal: Bioinformatics Date: 2011-10-23 Impact factor: 6.937

2. Seeking perfection.

Authors: Thomas D Otto
Journal: Nat Rev Microbiol Date: 2010-10 Impact factor: 60.633

3. Chromerid genomes reveal the evolutionary path from photosynthetic algae to obligate intracellular parasites.

Authors: Yong H Woo; Hifzur Ansari; Thomas D Otto; Christen M Klinger; Martin Kolisko; Jan Michálek; Alka Saxena; Dhanasekaran Shanmugam; Annageldi Tayyrov; Alaguraj Veluchamy; Shahjahan Ali; Axel Bernal; Javier del Campo; Jaromír Cihlář; Pavel Flegontov; Sebastian G Gornik; Eva Hajdušková; Aleš Horák; Jan Janouškovec; Nicholas J Katris; Fred D Mast; Diego Miranda-Saavedra; Tobias Mourier; Raeece Naeem; Mridul Nair; Aswini K Panigrahi; Neil D Rawlings; Eriko Padron-Regalado; Abhinay Ramaprasad; Nadira Samad; Aleš Tomčala; Jon Wilkes; Daniel E Neafsey; Christian Doerig; Chris Bowler; Patrick J Keeling; David S Roos; Joel B Dacks; Thomas J Templeton; Ross F Waller; Julius Lukeš; Miroslav Oborník; Arnab Pain
Journal: Elife Date: 2015-07-15 Impact factor: 8.140

4. Whole-genome sequencing of Burkholderia pseudomallei isolates from an unusual melioidosis case identifies a polyclonal infection with the same multilocus sequence type.

Authors: Erin P Price; Derek S Sarovich; Linda Viberg; Mark Mayo; Mirjam Kaestli; Apichai Tuanyok; Jeffrey T Foster; Paul Keim; Talima Pearson; Bart J Currie
Journal: J Clin Microbiol Date: 2014-10-22 Impact factor: 5.948

5. Genomic and Functional Analysis of Emerging Virulent and Multidrug-Resistant Escherichia coli Lineage Sequence Type 648.

Authors: Katharina Schaufler; Torsten Semmler; Jukka Corander; Sebastian Guenther; Lothar H Wieler; Darren J Trott; Johann Pitout; Gisele Peirano; Jonas Bonnedahl; Monika Dolejska; Ivan Literak; Stephan Fuchs; Niyaz Ahmed; Mirjam Grobbel; Carmen Torres; Alan McNally; Derek Pickard; Christa Ewers; Nicholas J Croucher
Journal: Antimicrob Agents Chemother Date: 2019-05-24 Impact factor: 5.191

6. Genomic analyses of primitive, wild and cultivated citrus provide insights into asexual reproduction.

Authors: Xia Wang; Yuantao Xu; Siqi Zhang; Li Cao; Yue Huang; Junfeng Cheng; Guizhi Wu; Shilin Tian; Chunli Chen; Yan Liu; Huiwen Yu; Xiaoming Yang; Hong Lan; Nan Wang; Lun Wang; Jidi Xu; Xiaolin Jiang; Zongzhou Xie; Meilian Tan; Robert M Larkin; Ling-Ling Chen; Bin-Guang Ma; Yijun Ruan; Xiuxin Deng; Qiang Xu
Journal: Nat Genet Date: 2017-04-10 Impact factor: 38.330

7. Detecting miRNAs in deep-sequencing data: a software performance comparison and evaluation.

Authors: Vernell Williamson; Albert Kim; Bin Xie; G Omari McMichael; Yuan Gao; Vladimir Vladimirov
Journal: Brief Bioinform Date: 2012-03-24 Impact factor: 11.622

8. Parallel independent evolution of pathogenicity within the genus Yersinia.

Authors: Sandra Reuter; Thomas R Connor; Lars Barquist; Danielle Walker; Theresa Feltwell; Simon R Harris; Maria Fookes; Miquette E Hall; Nicola K Petty; Thilo M Fuchs; Jukka Corander; Muriel Dufour; Tamara Ringwood; Cyril Savin; Christiane Bouchier; Liliane Martin; Minna Miettinen; Mikhail Shubin; Julia M Riehm; Riikka Laukkanen-Ninios; Leila M Sihvonen; Anja Siitonen; Mikael Skurnik; Juliana Pfrimer Falcão; Hiroshi Fukushima; Holger C Scholz; Michael B Prentice; Brendan W Wren; Julian Parkhill; Elisabeth Carniel; Mark Achtman; Alan McNally; Nicholas R Thomson
Journal: Proc Natl Acad Sci U S A Date: 2014-04-21 Impact factor: 11.205

9. Genome sequence of Blattabacterium sp. strain BGIGA, endosymbiont of the Blaberus giganteus cockroach.

Authors: Charlie Y Huang; Zakee L Sabree; Nancy A Moran
Journal: J Bacteriol Date: 2012-08 Impact factor: 3.490

10. Orenia metallireducens sp. nov. Strain Z6, a Novel Metal-Reducing Member of the Phylum Firmicutes from the Deep Subsurface.

Authors: Yiran Dong; Robert A Sanford; Maxim I Boyanov; Kenneth M Kemner; Theodore M Flynn; Edward J O'Loughlin; Yun-Juan Chang; Randall A Locke; Joseph R Weber; Sheila M Egan; Roderick I Mackie; Isaac Cann; Bruce W Fouke
Journal: Appl Environ Microbiol Date: 2016-10-14 Impact factor: 4.792