| Literature DB >> 20096101 |
Shaolin Wang1, Eric Peatman, Jason Abernathy, Geoff Waldbieser, Erika Lindquist, Paul Richardson, Susan Lucas, Mei Wang, Ping Li, Jyothi Thimmapuram, Lei Liu, Deepika Vullaganti, Huseyin Kucuktas, Christopher Murdock, Brian C Small, Melanie Wilson, Hong Liu, Yanliang Jiang, Yoona Lee, Fei Chen, Jianguo Lu, Wenqi Wang, Peng Xu, Benjaporn Somridhivej, Puttharat Baoprasertkul, Jonas Quilang, Zhenxia Sha, Baolong Bao, Yaping Wang, Qun Wang, Tomokazu Takano, Samiran Nandi, Shikai Liu, Lilian Wong, Ludmilla Kaltenboeck, Sylvie Quiniou, Eva Bengten, Norman Miller, John Trant, Daniel Rokhsar, Zhanjiang Liu.
Abstract
BACKGROUND: Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20096101 PMCID: PMC2847720 DOI: 10.1186/gb-2010-11-1-r8
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
cDNA library information and sequencing summary
| Library | Species | Nature of library | Organ, tissue, or cell line | Total sequences |
|---|---|---|---|---|
| CBFH | Blue catfish | Normalized | Stomach, muscle, olfactory tissue and trunk kidney | 37,314 |
| CBZC | Blue catfish | Normalized | Stomach, muscle, olfactory tissue and trunk kidney | 30,902 |
| CBNH | Blue catfish | Normalized | Head kidney, gill, intestine, spleen, skin and liver | 9,323 |
| CBZF | Blue catfish | Normalized | Head kidney, gill, intestine, spleen, skin and liver | 51,172 |
| Subtotal | 128,711 | |||
| CBCZ | Channel catfish | Non-normalized | Mixed leukocytes of parallel blood leukocytes | 16,168 |
| CBFA | Channel catfish | Normalized | Catfish whole fry library | 63,602 |
| CBNG | Channel catfish | Normalized | Kidney, gill, intestine, spleen, skin and liver | 2,982 |
| CBZB | Channel catfish | Normalized | Kidney, gill, intestine, spleen, skin and liver | 57,772 |
| CBNI | Channel catfish | Normalized | Stomach, muscle, olfactory tissue and trunk kidney | 17,023 |
| CBZA | Channel catfish | Normalized | Stomach, muscle, olfactory tissue and trunk kidney | 61,320 |
| CBPN | Channel catfish | Subtracted | Liver, pituitary, ovary and testis | 62,058 |
| CBPO | Channel catfish | Normalized | Peripheral blood leukocytes stimulated with LPS | 28,685 |
| Subtotal | 309,610 | |||
| NCBI | Blue catfish | 10,764 | ||
| NCBI | Channel catfish | 44,767 | ||
| Total | 493,852 |
Library names were designated by the Joint Genome Institute. LPS, lipopolysaccharide.
Figure 1Length distribution of Joint Genome Institute EST sequences.
EST assembly statistics
| Blue catfish | Channel catfish | All catfish | |
|---|---|---|---|
| 139,475 | 354,377 | 493,852 | |
| Short and simple sequences removed | 2,735 | 6,230 | 8,965 |
| Sequences for assembly | 136,740 | 348,147 | 484,887 |
| Contigs | 22,009 | 28,941 | 45,306 |
| Singletons | 32,806 | 41,776 | 66,272 |
| Average number of sequences per contig | 4.72 | 10.6 | 9.2 |
| Total unique sequences | 54,815 | 70,717 | 111,578 |
Figure 2Distribution of contig sizes.
Figure 3Distribution of sequence similarity between blue catfish and channel catfish sequences.
Figure 4Open reading frame (ORF) length distribution from unique sequences of the all catfish assembly.
Figure 5Analysis of open reading frames (ORFs). (a) Percentage of ORFs among unique sequences from the all catfish EST assembly; (b) Percentage of ORF greater than 100 bp among unique sequences from the all catfish EST assembly; (c) Percentage of ORFs equal to or greater than 100 bp with significant BLASTX hits; (d) Percentage of ORFs smaller than 100 bp with significant BLASTX hits
Summary of BLASTX search analysis of catfish ESTs
| Database | Catfish hits* | Unique protein | % of total unique proteins | Unique gene |
|---|---|---|---|---|
| NR | 41,311 | 22,642 | ||
| Uniprot | 34,860 | 17,948 | ||
| Refseq/Ensembl | ||||
| Zebrafish | 39,546 | 14,988 | 54% of 27,996 | 12,470 |
| Medaka | 36,641 | 13,588 | 56% of 24,461 | 12,920 |
| | 34,418 | 13,132 | 57% of 23,118 | 10,322 |
| Human | 33,847 | 12,621 | 33% of 38,342 | 9,668 |
| Mouse | 33,594 | 12,267 | 35% of 35,236 | 11,518 |
| Chicken | 31,646 | 11,059 | 50% of 22,194 | 8,717 |
| Cumulative unique (E-10)† | 42,668 | 16,439 | 14,776 |
*Number of significant (E-10) alignments using all catfish unique sequences as queries to search the listed databases. †Cumulative unique totals were derived from the sum of unique gene/protein identities across all listed species.
Figure 6Comparison of shared and unique gene identities of channel catfish and blue catfish from a total of 14,776 unique genes.
Figure 7Conservation of catfish gene identities with other species. Number of catfish homologous genes identified from other species using BLASTX searches.
Summary of microsatellite marker identification from catfish ESTs
| Total number of unique sequences | 111,578 |
| Microsatellites identified | 20,757 |
| Di-nucleotide repeats | 12,367 |
| Tri-nucleotide repeats | 5,506 |
| Tetra-nucleotide repeats | 2,664 |
| Penta-nucleotide repeats | 182 |
| Hexa-nucleotide repeats | 38 |
| Number of unique sequences containing microsatellites | 15,082 |
| Number of unique sequences containing microsatellites with sufficient flanking sequences for PCR primer design | 13,375 |
Summary of SNP identification from the catfish ESTs
| Number of SNPs | |||
|---|---|---|---|
| SNP | Blue catfish | Channel catfish | All catfish |
| Putative | |||
| Transitions | 29,305 | 61,184 | 172,746 |
| Transversions | 19,397 | 41,068 | 130,254 |
| Total SNPs | 48,702 | 102,252 | 303,000 |
| Indels | 14,803 | 41,660 | 100,636 |
| SNP rate (kb) | 3.2 | 4.1 | 7.7 |
| Filtered putative | |||
| Transitions | 2,886 | 11,012 | 32,235 |
| Transversions | 1,005 | 4,815 | 16,359 |
| Total SNPs | 3,891 | 15,827 | 48,594 |
| Indels | 1,070 | 6,707 | 19,398 |
| Filtered/Non-filtered rate | 7.8% | 15.7% | 16.2% |
| SNP rate* (kb) | 0.25 | 0.64 | 1.6 |
*SNP rate was calculated by dividing the total number of SNPs excluding indels by the total length (bp) of the consensus sequences of the contigs.
Quality assessment of the filtered putative SNPs identified from the catfish ESTs based on the number of sequences per contig and the sequence frequencies of the minor alleles
| Number of sequences in the contig | Number of contigs with SNPs | Number of SNPs | SNP rate (per kb) |
|---|---|---|---|
| 2 (1:1) | 16,567 | 96,565 | 5.2 |
| 3 (2:1) | 8,374 | 86,686 | 10.8 |
| 4 (3:1) | 5,136 | 71,155 | 13.0 |
| Subtotal | 30,077 | 254,406 | 8.0 |
| 4 (2:2) | 1,528 | 5,008 | 0.9 |
| 5-6 (2) | 3,099 | 13,725 | 2.0 |
| 7-8 (3) | 805 | 2,659 | 0.7 |
| 9-12 (4) | 730 | 2,376 | 0.5 |
| 13-20 (5) | 629 | 2,307 | 0.6 |
| 21-30 (5) | 628 | 2,864 | 1.3 |
| 31-50 (6) | 730 | 5,052 | 3.0 |
| 51-100 (6) | 542 | 6,379 | 6.0 |
| 101-500 (6) | 316 | 6,580 | 13.4 |
| >500 | 31 | 1,644 | 15.0 |
| Subtotal | 9,038 | 48,594 | 1.6 |
| Total | 39,115 | 303,000 | 7.7 |
Figure 8Categorization of four different types of SNPs from the all catfish EST assembly and examples of SNPs whose categories could not be determined. (a-d) Types of SNPs from the all catfish EST assembly that can be identified from the all catfish EST assembly. (e) Examples of SNPs whose categories could not be determined because the minor allele sequence from a given species is fewer than two.
Estimation of proportions of inter-specific and intra-specific SNPs from the set of filtered SNPs identified from the interspecific all catfish EST assembly
| SNP type* | From 1,000 random contigs | Estimated from the all catfish assembly | Estimated % of total filtered SNP |
|---|---|---|---|
| Inter-specific SNP1 | 430 | 18,731 | 39 |
| Intra-specific SNP, blue catfish2 | 12 | 523 | 1 |
| Intra-specific SNP, channel catfish3 | 54 | 2,352 | 5 |
| Intra-specific SNP, blue catfish and channel catfish4 | 87 | 3,790 | 8 |
| Undetermined5 | 383 | 16,683 | 34 |
| Subtotal | 966 | 42,080 | 87 |
| SNP from only blue catfish ESTs6 | NA | 1,118 | 2 |
| SNP from only channel catfish ESTs6 | NA | 5,396 | 11 |
| Subtotal | NA | 6,514 | 13 |
| Total SNP | NA | 48,594 | 100 |
*SNPs were identified from contigs containing at least four sequences with at least two sequences from either channel catfish or blue catfish in the all catfish EST assembly: 1where there were no intra-specific blue catfish SNPs or intra-specific channel catfish SNPs, but the sequence differed between the two species at the inter-specific SNP position; 2where there were SNPs within blue catfish, but not within channel catfish; 3where there were SNPs within channel catfish, but not within blue catfish; 4where there were SNPs within both blue catfish and channel catfish; 5undetermined because overall the SNPs qualified as SNPs with at least two minor allele sequences, but only one of the minor allele sequences was from one of the two species of blue catfish or channel catfish; 6these SNPs were identified from ESTs that have been sequenced from only one of the two species, blue catfish or channel catfish, to date.