Literature DB >> 18414585

A scaffold analysis tool using mate-pair information in genome sequencing.

Pan-Gyu Kim¹, Hwan-Gue Cho, Kiejung Park.

Abstract

We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures.

Entities: Chemical Species

Mesh：

Year: 2008 PMID： 18414585 PMCID： PMC2291285 DOI： 10.1155/2008/675741

Source DB: PubMed Journal: J Biomed Biotechnol ISSN： 1110-7243

1. INTRODUCTION

In 2001, the Human Genome Project (HGP) Consortium and Celera Genomics reported the first drafts of sequences of the human genome [1, 2]. The HGP Consortium used the hierarchical sequencing or “clone-by-clone” approach, whereas Celera Genomics used the whole genome shotgun (WGS) approach, which had been successfully used in 1995 to sequence the H. influenzae genome [3]. In the hierarchical sequencing approach, a tiling of large DNA sequences, such as bacterial artificial chromosome (BAC) or yeast artificial chromosome (YAC), are constructed for a genome, and each of the sequences is determined. The HGP Consortium used BAC as the large sequence, followed by shotgun sequencing of each BAC. In sequencing the genome, owing to physical limitations of shotgun sequencing methods, the genome must be broken down into smaller portions, shotgun reads sized in the range of 600 bps (base-pairs) to 800 bps, and as the sequence data for each of these shotgun reads is produced, it must be connecting them with those adjacent and overlapping reads that have been previously sequenced, that is, to achieve an assembly of these smaller sequences into larger contiguous regions or “contigs.” In most cases, the sequences of shotgun reads are obtained by sequencing both ends of a DNA fragment whose approximate size is known. Such pair information, referred to as mate-pair information, constrains the placement of the reads within an assembly. In an ideal assembly, all read pairs are placed in such a manner as to satisfy the orientation and distance constraints imposed by the pairing. Mate-pair information can be used to determine the quality of an assembly, because most types of misassemblies lead to violations of these constraints. In contrast to hierarchical sequencing, WGS breaks a whole genome into small pieces randomly, without shearing into large DNA pieces of intermediate size. WGS is faster and cheaper than hierarchical sequencing because of the simplicity of the processing steps. The success of WGS [4, 5] has increased its usage and the size of the genome to be sequenced has increased. Although contig assembly programs are well established, less is known about scaffold analysis. While some of its features have been implemented to sequence specific genomes [6-8], the features needed for general scaffold analysis and visualization have not been provided. Consed [9], a graphical tool for contig assembly, provides good visualization and helps to finish sequencing by connecting with Autofinish [10]; however, it does not have many features related to scaffold analysis. It has been suggested that the contig scaffolding problem can be solved by greedy-path merging algorithm [8]. Moreover, GigAssembler can orient the contigs based on mRNA, paired plasmid ends, EST, and BAC end pairs [7]. This paper introduces a novel scaffold analysis tool, ConPath, which calculates the longest scaffolds. Due to the abundance of repeats in genomic DNA sequences, a purely overlap-based approach for WGS assembly is not feasible, but the use of mate-pair information is crucial. The ConPath program uses end read pairs of fixed-sized DNA libraries as mate-pairs to calculate orientations, orders, and gap sizes. It reads a Phrap [11] output file (∗.out) and an ACE format file, which contain contig structures and mate-pair information.

2. MATERIALS AND METHODS

2.1. Mate-pair information

The most important characteristic of ConPath is its ability to exploit the mate-pair information of large DNA fragments such as fosmids or cosmids, which are about 40 kbps(kilo base-pairs) in size, or BACs, which are about 100–300 kbps in size, rather than plasmids, which are about 2–10 kbps in size. Figure 1 shows an example of mate-pair end reads. A mate-pair is composed of two end reads that always face each other. Each end read, b or g, has an orientation relative to the contig containing it. If the direction of an end read is the same as the direction of the contig, the former has direction U, otherwise, it has direction C. In Figure 1, b has direction U because the C contig and b read are in the same direction, whereas g has direction C because the C2 contig and g read are in opposite directions. The size of the mate-pair helps to estimate the gap size between contigs C1 and C2. When one contig contains one end of a mate-pair and a second contig contains the other end of the mate-pair, the two contigs are said to be linked by the mate-pair. A scaffold is a series of contigs that can be linked by mate-pairs. The connection relationship of all the contigs can be represented as a graph in which each contig is represented as a vertex. An edge is created between two contig vertices when they are linked by at least one mate-pair, and the number of linking mate-pairs between two contigs is defined as the edge weight.

Figure 1

An example of mate-pair information. Mate-pair reads are indicated as read “b” and “g” and the relative directions to encompassing contigs are denoted as “U (same direction)” and “C (complementary direction).”

2.2. Construction of scaffolds

To construct scaffolds using mate-pair information, a scaffold graph can be defined as follows. Given a set of contigs C = {c1, c2, c3,…, c} a mate-pair set M = {m1, m2, m3,…, m} and a set of reads R = {r1, r2, r3,…, r} let G denote the scaffold graph using C and M: When a mate-pair m = (r, r) exists, in which contig c contains r and contig c contains r there is an edge between contigs c and c. Edge set E is expressed as In constructing a scaffold graph, the linking level (l), the threshold value for the edge weights, was used as a filtering value in constructing and showing scaffolds on output. When an edge has a weight value smaller than the linking level (l), the edge is discarded from the graph. Considering the errors that occur in base calling and contig assembly, the optimal construction of a scaffold graph is an NP-complete problem [8]. To practically solve this problem, ConPath uses a simple greedy algorithm. Whenever a new edge is added to the graph, graph G is additive modified for that edge. This provides a feasible heuristic solution for a scaffold construction in linear time. Algorithm 1 shows the algorithm of ConPath to construct scaffolds.

Algorithm 1

The algorithm for scaffold construction. ConPath uses a simple greedy algorithm to obtain a feasible heuristic solution for an NP-complete problem.

2.3. Determination of the orders and orientations of contigs

It is worthwhile noting that ConPath determines the relative orientations of all contigs using the orientations of the end reads. Figure 2 shows the determination of the order and orientations of three contigs using two mate-pairs. In Figure 2(a), b1 and g1 reads determine the relative orientation of contigs C1 and C2, and, in the same way, b2 and g2 reads determine the relative orientations of contigs C2 and C3 (see Figure 2(b)). The relative orientations of contigs C1, C2, and C3 are determined by rotating the scaffold in Figure 2(b), as shown in Figure 2(c).

Figure 2

Determining the relative orientations of contigs using mate-pair information. (a): b1 and g1 reads determine the relative orientation of contigs C1 and C2; (b): b2 and g2 reads determine the relative orientations of contigs C2 and C3; and (c): the relative orientations of contigs C1, C2, and C3 are determined by rotating the scaffold in Figure 2(b).

2.4. Estimation of the gap size between contigs

Assuming all mate-pairs have a fixed size, the size of the gap between two adjacent contigs is determined by the sizes of the two contigs and the positions of the end reads of contigs. Suppose that contig C1 contains b read and contig C2 contains g read. Let Gap (C1, C2) be the gap size between C1 and C2. Let P(b) and P(b) be the start and end positions of b read in C1, respectively, and let P(g) and P(g) be the start and end positions of g read in C2, respectively. Considering all the possible directions of a mate-pair of two end reads, ConPath estimates the gap size as Gap (C1, C2) = mate−pair size − {(C1 ⋅ length − P(b)) + (C2 ⋅ length − P(g))} Gap (C1, C2) = mate−pair size − {(C1 ⋅ length − P(b)) + P(g))} Gap (C1, C2) = mate−pair size − {(P(b) + (C2 ⋅ length − P(g))} Gap (C1, C2) = mate−pair size − {P(b) + P(g)} Figure 3 shows the procedure for estimating the gap size between contigs when b and g have U and C directions, respectively. The orientations of contigs C1 and C2 are set in the same direction. The length of part of the mate-pair library in contig C1(C1 ⋅ length − P(b)) and the length of part of the mate-pair library in contig C2(P(g)) are calculated. Finally, the gap size is calculated as

Figure 3

Estimation of the gap size between contigs when b has direction U and g has direction C. The gap size between C1 and can be calculated as mate−pair size − {(C1 ⋅ length − P(b)) + P(g)}.

2.5. Detection of erroneous contigs

One important feature of ConPath is the verification of a contig assembly by identifying erroneous contigs. We have defined 4 types of contig assembly errors to check the quality of a contig assembly.

Self-collision error

When the number of mate-pairs connecting two adjacent contigs is more than 2, and there is an inconsistency in determining the orientation of contigs with mate-pairs, the error is defined as a self-collision error, the most serious error type. If this error occurs, the contigs should be inspected manually one by one.

Mate-pair size error

When a mate-pair of an end read is contained in a contig, the real size of this mate-pair can be calculated. If the difference between the calculated and predefined sizes is larger than a threshold value, the error is defined as a mate-pair size error. This type of error is very critical to the contig assembly process.

Gap-size error

If the gap size between two contigs is a negative value, it indicates that the two contigs should be merged in the contig assembly process; this is defined as a gap size error.

Overlap error

After calculating the distances of all adjacent contigs, any two nonadjacent contigs can be overlapped due to the accumulation of errors in gap size estimations. This type of error is defined as an overlap error, which happens rarely and is not so critical. Identifying error types is useful in verifying and correcting the final result of a contig assembly. If a contig has more than two types of errors, it is highly probable that a misassembled contig is present. ConPath assigns different colors to contigs by the number of error types, with nonerroneous contigs colored blue. When one contig has more than one error, ConPath assigns this contig a reddish color, with the intensity proportional to the number of error types. Therefore, we can check the quality of the final result of a contig assembly by simply inspecting the color information in the scaffold visualization window of ConPath.

2.6.Implementation

ConPath was implemented on a Windows XP system using Visual C++. It provides a user-friendly interface and shows visual and color-informative outputs, which can help analyze scaffolds both intuitively and informatively. ConPath provides dialogue windows for “mate-pair information”, “edge information”, “contig path”, and “invalid contigs” by automatically checking for the 4 types of errors. Scaffolds are displayed graphically in proportion to the real sizes of vertices and edges after aligning vertices and edges to avoid graphical collision, and the detailed information for each vertex and edge is shown on a pop-up window. ConPath can produce a large picture for all scaffolds by assembling separately printed module pictures. Figure 4 shows various viewers and dialogues of ConPath.

Figure 4

A set of snapshots of ConPath. ConPath provides a set of useful information, “mate-pair information”, “edge information”, “contig path”, and “invalid contigs” by checking for the 4 types of error.

3. EXPERIMENTS AND DISCISSION

We tested ConPath using both artificial and real data. Artificial data were generated in two different versions: R (randomly) and U (uniformly). The R version consisted of contigs of random sizes, whereas the U version consisted of contigs of uniform size. In these artificial data experiments, ConPath showed very successful scaffold constructions using mate-pair information. From experiments with artificial data, ConPath made a reasonable scaffold construction in linear time. ConPath worked very successfully and efficiently on real data sets, in sequencing the Mannheimia succiniciproducens and Vibro vulnificus genomes. ConPath verified the results of contig assembly by detecting misassembled contigs. Table 1 shows the mate-pair information in these real datasets. Four datasets were tested in sequencing the M. succiniciproducens genome, whereas one dataset was tested in sequencing the V. vulnificus genome, to verify the results of contig assembly. Table 2 shows these results. MH1, MH2, MH3 and MH4 are the contig assembly results of the M. succiniciproducens genome and VV is the contig assembly result for the V. vulnificus genome. For the M. succiniciproducens genome, going from MH1 to MH4 increased the reliability of the contig assembly results.

Table 1

Mate-pair information in real test datasets. The proportion of mate-pair reads for V. vulnificus is about double that for M. succiniciproducens.

Genome	Genome length	Fold	Number of reads	Number of mate-pairs	Proportion of mate-pair reads relative to number of reads
M. succiniciproducens	2.3 Mbp	13.2	about 25,000*	275	2.2%
V. vulnificus	5.1 Mbp	11.7	76,971	1,781	4.5%

* The numbers of reads for 4 versions of M. succiniciproducens show slight variation.

Table 2

Real test datasets. Four datasets for the M. succiniciproducens genome and one for the V. vulnificus genome were tested with ConPath. MP: mate-pair, MPIC: mate-pair in the same contigs.

Data name	Number of contigs	Number of MPs	Number of MPICs	Average size of MP(fosmid)s
MH1	98	238	72	37,673 bp
MH2	86	240	115	38,102 bp
MH3	85	240	120	38,157 bp
MH4	112	240	108	37,917 bp
VV	334	1,220	454	33,024 bp

We examined the edge number according to linking level (see Figure 5). ConPath was most successful at linking level 2 by minimizing the loss of edges.

Figure 5

Distribution of the number of edges according to linking level (l). ConPath constructed the best scaffolds at linking level 2 while minimizing edge loss.

Table 3 shows the detected errors in scaffold construction for the 5 datasets. Among the M. succiniciproducens datasets, MH1 had the most errors, whereas MH4 had no erroneous contigs. These results show that identifying the 4 types of errors for contigs is effective in verifying the result of contig assembly.

Table 3

Number of reported errors in scaffold construction for 5 dataset.

Data name		l
Data name	Errors*	1	2	3	4
MH1	Self Collision	0	0	0	0
	Gap size	3	3	3	0
	Overlap	22	2	0	2
MH2	Self collision	2	2	2	2
	Gap size	2	2	2	2
	Overlap	20	2	2	0
MH3	Self collision	0	0	0	0
	Gap size	5	0	0	0
	Overlap	18	0	0	0
MH4	Self collision	0	0	0	0
	Gap size	0	0	0	0
	Overlap	0	0	0	0
VV	Self collision	16	16	10	7
	Gap size	65	7	3	2
	Overlap	85	24	0	4

* Mate-pair size errors were excluded because these errors do not depend on l.

Figure 6 shows the constructed scaffolds at linking levels 2 and 3 for the MH1 dataset. Contig 93 is suspected of being erroneous because it has several erroneous contigs on both sides. ConPath showed that contig 93 was misassembled. The contig information dialogue box for contig 93 is shown in Figure 6(c).

Figure 6

An example of the detection of mis-assembled contigs. (a): Scaffolds for MH1 at linking level 2; (b): scaffolds for MH1 at linking level 3; (c): information on contig 93.

Table 4 shows a comparison of features of several scaffold analysis tools, including ConPath, Consed [9], Autofinish [10], and Bambus [12]. Compared with these other tools, ConPath has very good features for 5 criteria. Most importantly, ConPath helps users to intuitively verify the contig assembly by providing many visualization features and additional information to detect erroneous contigs.

Table 4

Comparison of ConPath with other scaffold tools.

Comparison item	Tools
Comparison item	ConPath	Consed	Autofinish	Bambus
Accuracy of scaffold	Medium	Medium	Medium	Strong
Construction time	Strong	Strong	Strong	Strong
Visualization	Strong	Medium	Weak	Weak
Error detection	Strong	Medium	Medium	Medium
Additional information	Strong	Strong	Medium	Medium

4. CONCLUSION

A scaffold analyzer is a very important tool in genome sequencing, in that it can verify the results of contig assembly and to identify misassembled contigs. We have developed ConPath, a scaffold analyzer that exploits mate-pair information to construct scaffolds by ordering and orienting separate sequence contigs. ConPath provides various useful viewers and dialogue boxes for intuitive understanding. Using end read pairs of a fixed-sized mate-pair library, ConPath can determine the relative orientations of all contigs successfully, and estimate the gap size of each adjacent contig pair. We defined 4 types of errors to detect misassembly. ConPath was used successfully in sequencing several microbial genomes, including the M. succiniciproducens genome [13]. ConPath is, therefore, a useful scaffold analyzer to verify contig assembly by detecting erroneous contigs. ConPath will doubtless improve as its algorithm becomes more correct and efficient, as well as through the development of additional features, such as primer design for the finishing step and a sequence read viewer.

10 in total

1. Initial sequencing and analysis of the human genome.

Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal: Nature Date: 2001-02-15 Impact factor: 49.962

2. Assembly of the working draft of the human genome with GigAssembler.

Authors: W J Kent; D Haussler
Journal: Genome Res Date: 2001-09 Impact factor: 9.043

3. Hierarchical scaffolding with Bambus.

Authors: Mihai Pop; Daniel S Kosack; Steven L Salzberg
Journal: Genome Res Date: 2004-01 Impact factor: 9.043

4. Fosmid-based physical mapping of the Histoplasma capsulatum genome.

Authors: Vincent Magrini; Wesley C Warren; John Wallis; William E Goldman; Jian Xu; Elaine R Mardis; John D McPherson
Journal: Genome Res Date: 2004-08 Impact factor: 9.043

5. Consed: a graphical tool for sequence finishing.

Authors: D Gordon; C Abajian; P Green
Journal: Genome Res Date: 1998-03 Impact factor: 9.043

6. Human whole-genome shotgun sequencing.

Authors: J L Weber; E W Myers
Journal: Genome Res Date: 1997-05 Impact factor: 9.043

7. Automated finishing with autofinish.

Authors: D Gordon; C Desmarais; P Green
Journal: Genome Res Date: 2001-04 Impact factor: 9.043

8. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.

Authors: R D Fleischmann; M D Adams; O White; R A Clayton; E F Kirkness; A R Kerlavage; C J Bult; J F Tomb; B A Dougherty; J M Merrick
Journal: Science Date: 1995-07-28 Impact factor: 47.728

9. The genome sequence of the capnophilic rumen bacterium Mannheimia succiniciproducens.

Authors: Soon Ho Hong; Jin Sik Kim; Sang Yup Lee; Yong Ho In; Sun Shim Choi; Jeong-Keun Rih; Chang Hoon Kim; Haeyoung Jeong; Cheol Goo Hur; Jae Jong Kim
Journal: Nat Biotechnol Date: 2004-09-19 Impact factor: 54.908

10. The sequence of the human genome.

Authors: J C Venter; M D Adams; E W Myers; P W Li; R J Mural; G G Sutton; H O Smith; M Yandell; C A Evans; R A Holt; J D Gocayne; P Amanatides; R M Ballew; D H Huson; J R Wortman; Q Zhang; C D Kodira; X H Zheng; L Chen; M Skupski; G Subramanian; P D Thomas; J Zhang; G L Gabor Miklos; C Nelson; S Broder; A G Clark; J Nadeau; V A McKusick; N Zinder; A J Levine; R J Roberts; M Simon; C Slayman; M Hunkapiller; R Bolanos; A Delcher; I Dew; D Fasulo; M Flanigan; L Florea; A Halpern; S Hannenhalli; S Kravitz; S Levy; C Mobarry; K Reinert; K Remington; J Abu-Threideh; E Beasley; K Biddick; V Bonazzi; R Brandon; M Cargill; I Chandramouliswaran; R Charlab; K Chaturvedi; Z Deng; V Di Francesco; P Dunn; K Eilbeck; C Evangelista; A E Gabrielian; W Gan; W Ge; F Gong; Z Gu; P Guan; T J Heiman; M E Higgins; R R Ji; Z Ke; K A Ketchum; Z Lai; Y Lei; Z Li; J Li; Y Liang; X Lin; F Lu; G V Merkulov; N Milshina; H M Moore; A K Naik; V A Narayan; B Neelam; D Nusskern; D B Rusch; S Salzberg; W Shao; B Shue; J Sun; Z Wang; A Wang; X Wang; J Wang; M Wei; R Wides; C Xiao; C Yan; A Yao; J Ye; M Zhan; W Zhang; H Zhang; Q Zhao; L Zheng; F Zhong; W Zhong; S Zhu; S Zhao; D Gilbert; S Baumhueter; G Spier; C Carter; A Cravchik; T Woodage; F Ali; H An; A Awe; D Baldwin; H Baden; M Barnstead; I Barrow; K Beeson; D Busam; A Carver; A Center; M L Cheng; L Curry; S Danaher; L Davenport; R Desilets; S Dietz; K Dodson; L Doup; S Ferriera; N Garg; A Gluecksmann; B Hart; J Haynes; C Haynes; C Heiner; S Hladun; D Hostin; J Houck; T Howland; C Ibegwam; J Johnson; F Kalush; L Kline; S Koduru; A Love; F Mann; D May; S McCawley; T McIntosh; I McMullen; M Moy; L Moy; B Murphy; K Nelson; C Pfannkoch; E Pratts; V Puri; H Qureshi; M Reardon; R Rodriguez; Y H Rogers; D Romblad; B Ruhfel; R Scott; C Sitter; M Smallwood; E Stewart; R Strong; E Suh; R Thomas; N N Tint; S Tse; C Vech; G Wang; J Wetter; S Williams; M Williams; S Windsor; E Winn-Deen; K Wolfe; J Zaveri; K Zaveri; J F Abril; R Guigó; M J Campbell; K V Sjolander; B Karlak; A Kejariwal; H Mi; B Lazareva; T Hatton; A Narechania; K Diemer; A Muruganujan; N Guo; S Sato; V Bafna; S Istrail; R Lippert; R Schwartz; B Walenz; S Yooseph; D Allen; A Basu; J Baxendale; L Blick; M Caminha; J Carnes-Stine; P Caulk; Y H Chiang; M Coyne; C Dahlke; A Deslattes Mays; M Dombroski; M Donnelly; D Ely; S Esparham; C Fosler; H Gire; S Glanowski; K Glasser; A Glodek; M Gorokhov; K Graham; B Gropman; M Harris; J Heil; S Henderson; J Hoover; D Jennings; C Jordan; J Jordan; J Kasha; L Kagan; C Kraft; A Levitsky; M Lewis; X Liu; J Lopez; D Ma; W Majoros; J McDaniel; S Murphy; M Newman; T Nguyen; N Nguyen; M Nodell; S Pan; J Peck; M Peterson; W Rowe; R Sanders; J Scott; M Simpson; T Smith; A Sprague; T Stockwell; R Turner; E Venter; M Wang; M Wen; D Wu; M Wu; A Xia; A Zandieh; X Zhu
Journal: Science Date: 2001-02-16 Impact factor: 47.728

10 in total

6 in total

Review 1. Visualizing genomes: techniques and challenges.

Authors: Cydney B Nielsen; Michael Cantor; Inna Dubchak; David Gordon; Ting Wang
Journal: Nat Methods Date: 2010-02-25 Impact factor: 28.547

2. The genome of the hydatid tapeworm Echinococcus granulosus.

Authors: Huajun Zheng; Wenbao Zhang; Liang Zhang; Zhuangzhi Zhang; Jun Li; Gang Lu; Yongqiang Zhu; Yuezhu Wang; Yin Huang; Jing Liu; Hui Kang; Jie Chen; Lijun Wang; Aojun Chen; Shuting Yu; Zhengchao Gao; Lei Jin; Wenyi Gu; Zhiqin Wang; Li Zhao; Baoxin Shi; Hao Wen; Renyong Lin; Malcolm K Jones; Brona Brejova; Tomas Vinar; Guoping Zhao; Donald P McManus; Zhu Chen; Yan Zhou; Shengyue Wang
Journal: Nat Genet Date: 2013-09-08 Impact factor: 38.330

Review 3. Getting trichy: tools and approaches to interrogating Trichomonas vaginalis in a post-genome world.

Authors: Melissa D Conrad; Martina Bradic; Sally D Warring; Andrew W Gorman; Jane M Carlton
Journal: Trends Parasitol Date: 2012-12-05

4. Sequencing the genome of Marssonina brunnea reveals fungus-poplar co-evolution.

Authors: Sheng Zhu; You-Zhi Cao; Cong Jiang; Bi-Yue Tan; Zhong Wang; Sisi Feng; Liang Zhang; Xiao-Hua Su; Brona Brejova; Tomas Vinar; Meng Xu; Ming-Xiu Wang; Shou-Gong Zhang; Min-Ren Huang; Rongling Wu; Yan Zhou
Journal: BMC Genomics Date: 2012-08-09 Impact factor: 3.969

5. Genome-wide transcriptomic analysis of a superior biomass-degrading strain of A. fumigatus revealed active lignocellulose-degrading genes.

Authors: Youzhi Miao; Dongyang Liu; Guangqi Li; Pan Li; Yangchun Xu; Qirong Shen; Ruifu Zhang
Journal: BMC Genomics Date: 2015-06-16 Impact factor: 3.969

6. Genomic and secretomic analyses reveal unique features of the lignocellulolytic enzyme system of Penicillium decumbens.

Authors: Guodong Liu; Lei Zhang; Xiaomin Wei; Gen Zou; Yuqi Qin; Liang Ma; Jie Li; Huajun Zheng; Shengyue Wang; Chengshu Wang; Luying Xun; Guo-Ping Zhao; Zhihua Zhou; Yinbo Qu
Journal: PLoS One Date: 2013-02-01 Impact factor: 3.240

6 in total