| Literature DB >> 20003258 |
Hong Liu1, Yanliang Jiang, Shaolin Wang, Parichart Ninwichian, Benjaporn Somridhivej, Peng Xu, Jason Abernathy, Huseyin Kucuktas, Zhanjiang Liu.
Abstract
BACKGROUND: Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish.Entities:
Mesh:
Year: 2009 PMID: 20003258 PMCID: PMC2796685 DOI: 10.1186/1471-2164-10-592
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
A summary of BAC end sequences
| Category | Numbers |
|---|---|
| BAC sequence reactions | 84,480 |
| Total clean sequences | 63,387 (75% success) |
| T7 sequences | 32,074 |
| SP6 sequences | 31,313 |
| Pair BAC end sequences | 25,676 |
| Total length sequenced | 37,784,877 bp |
| Average length | 596 bp |
Distribution of comparatively anchored BAC clones using protein encoding gene sequences only.
| Zebrafish chromosome | Chromosome size (Mb) | No. of protein encoding genes* | No. of tBLASTx hits | Hits to unique genes | Unique gene hits per Mb | No. of contigs with single gene hits | No. of contigs with multiple gene hits | No. putative micro-syntenies |
|---|---|---|---|---|---|---|---|---|
| 1 | 56.2 | 818 | 205 | 123 | 1.83 | 75 | 17 | 13 |
| 2 | 54.4 | 875 | 194 | 133 | 2.13 | 85 | 15 | 13 |
| 3 | 62.9 | 975 | 196 | 127 | 1.75 | 72 | 18 | 13 |
| 4 | 42.6 | 743 | 221 | 130 | 2.77 | 78 | 16 | 9 |
| 5 | 70.4 | 1,173 | 340 | 224 | 2.74 | 103 | 35 | 21 |
| 6 | 59.2 | 818 | 232 | 151 | 2.31 | 85 | 22 | 18 |
| 7 | 70.3 | 990 | 283 | 191 | 2.33 | 80 | 34 | 25 |
| 8 | 56.5 | 864 | 196 | 128 | 1.93 | 65 | 22 | 14 |
| 9 | 51.5 | 700 | 212 | 133 | 2.17 | 61 | 21 | 16 |
| 10 | 42.4 | 670 | 150 | 87 | 1.72 | 51 | 12 | 9 |
| 11 | 44.6 | 627 | 161 | 121 | 2.26 | 67 | 16 | 10 |
| 12 | 47.5 | 636 | 177 | 114 | 2.21 | 72 | 14 | 7 |
| 13 | 53.5 | 744 | 200 | 113 | 1.96 | 67 | 21 | 14 |
| 14 | 56.5 | 701 | 197 | 113 | 1.77 | 84 | 11 | 7 |
| 15 | 46.6 | 688 | 177 | 125 | 2.25 | 68 | 14 | 8 |
| 16 | 53.1 | 773 | 181 | 124 | 2.02 | 77 | 14 | 10 |
| 17 | 52.3 | 715 | 180 | 115 | 1.99 | 62 | 19 | 13 |
| 18 | 49.3 | 749 | 193 | 121 | 2.23 | 47 | 22 | 21 |
| 19 | 46.2 | 780 | 233 | 134 | 2.58 | 86 | 21 | 16 |
| 20 | 56.5 | 1,053 | 277 | 171 | 2.48 | 76 | 30 | 19 |
| 21 | 46.1 | 721 | 163 | 117 | 2.28 | 63 | 14 | 10 |
| 22 | 39.0 | 959 | 178 | 113 | 2.59 | 50 | 19 | 16 |
| 23 | 46.4 | 669 | 204 | 121 | 2.24 | 68 | 18 | 13 |
| 24 | 40.3 | 513 | 117 | 78 | 1.71 | 47 | 8 | 7 |
| 25 | 32.9 | 597 | 199 | 114 | 3.04 | 65 | 19 | 14 |
*: Annotated genes only from ENSEMBL.
Distribution of genes with hits from multiple BAC end sequences, with details provided for genes with 10 or more hits from BAC end sequences
| Putative Identities | Presence in Zebrafish genome | Potential explanation | ||
|---|---|---|---|---|
| 1 | Novel protein similar to DNA polymerases | 31 | 28 | Repetitive elements related to retroelements |
| 1 | Methionine aminopeptidase 1 | 22 | 2 | Repetitive elements or multigene family |
| 1 | NOD3 protein-like | 18 | 63 | Common domains shared by many related proteins |
| 1 | Similar to tudor domain containing 7, hypothetical protein LOC393661 | 17 | 89 | Repetitive elements or repetitive genes |
| 1 | Similar to porf2 | 16 | 81 | Repetitive elements or multigene family |
| 1 | Similar to general transcription factor II-I repeat domain-containing protein 2A | 16 | 82 | Repetitive elements or multigene family |
| 1 | Similar to novel G protein-coupled receptor | 13 | 92 | Repetitive elements or multigene family |
| 1 | Similar to serine/threonine-protein kinase pim-3; | 11 | 85 | Repetitive elements or multigene family |
| 1 | Similar to novel protein from | 11 | 85 | Repetitive elements or multigene family |
| 1 | Similar to Dynein heavy chain 6 | 11 | 20 | Repetitive elements or multigene family |
| 1 | ORF2 [ | 10 | 91 | Repetitive elements or multigene family |
| 1 | PREDICTED: tubulin, alpha, ubiquitous isoform 8 [ | 10 | 16 | Repetitive elements or multigene family |
| 1 | PREDICTED: similar to vacuolar protein sorting 52 [ | 10 | 69 | Repetitive elements or multigene family |
| 1 | GF20795 [ | 10 | 4 | Repetitive elements or multigene family |
| 68 | 5-9 | Repetitive elements or multigene family | ||
| 71 | 4 | Repetitive elements or multigene family | ||
| 230 | 3 | Potentially duplicated gene candidates | ||
| 641 | 2 | Potentially duplicated gene candidates | ||
Summary of 50 conserved syntenies identified by comparison of 95 mate-paired genes of channel catfish with genomic locations of those on the zebrafish draft genome sequence.
| Catfish | SP6 hits | T7 hits | zebrafish Chr | Chr location (Mb) | Distance |
|---|---|---|---|---|---|
| 035I16 | [GenBank: | [GenBank: | 2 | 38.61 | 18,048 |
| 063L16 | [GenBank: | [GenBank: | 2 | 19.25 | 48,679 |
| 026B05 | [GenBank: | [GenBank: | 3 | 31.82 | 4,418 |
| 098D05 | [GenBank: | [GenBank: | 4 | 1.82 | 301,470 |
| 007M06 | [GenBank: | [GenBank: | 5 | 5.46 | 379,579 |
| 028M04 | [GenBank: | [GenBank: | 5 | 22.23 | 133,063 |
| 074L07 | [GenBank: | [GenBank: | 5 | 15.69 | 290,598 |
| 035N11 | [GenBank: | [GenBank: | 6 | 18.25 | 231,524 |
| 040J24 | [GenBank: | [GenBank: | 6 | 7.37 | 443,524 |
| 077F04 | [GenBank: | [GenBank: | 6 | 11.42 | 484,877 |
| 032F20 | [GenBank: | [GenBank: | 7 | 28.40 | 244,425 |
| 062B16 | [GenBank: | [GenBank: | 7 | 15.05 | 222,416 |
| 075K10 | [GenBank: | [GenBank: | 7 | 13.34 | 268,455 |
| 103I21 | [GenBank: | [GenBank: | 7 | 22.34 | 293,845 |
| 047N13 | [GenBank: | [GenBank: | 8 | 23.84 | 203,260 |
| 057N22 | [GenBank: | [GenBank: | 9 | 36.69 | 163,677 |
| 076H22 | [GenBank: | [GenBank: | 9 | 13.84 | 313,352 |
| 105N24 | [GenBank: | [GenBank: | 9 | 27.37 | 268,271 |
| 056A23 | [GenBank: | [GenBank: | 10 | 7.50 | 486,745 |
| 068C21 | [GenBank: | [GenBank: | 10 | 17.72 | 104,067 |
| 093K02 | [GenBank: | [GenBank: | 10 | 41.67 | 152,479 |
| 096A14 | [GenBank: | [GenBank: | 11 | 10.02 | 980,276 |
| 010B15 | [GenBank: | [GenBank: | 12 | 25.17 | 111,625 |
| 041B24 | [GenBank: | [GenBank: | 12 | 13.59 | 398,886 |
| 109K22 | [GenBank: | [GenBank: | 12 | 25.55 | 449,096 |
| 018H11 | [GenBank: | [GenBank: | 13 | 23.23 | 293,181 |
| 026C08 | [GenBank: | [GenBank: | 13 | 23.50 | 201,663 |
| 027E09 | [GenBank: | [GenBank: | 14 | 0.22 | 574,761 |
| 103F19 | [GenBank: | [GenBank: | 14 | 22.90 | 256,143 |
| 022D09 | [GenBank: | [GenBank: | 15 | 29.14 | 418,219 |
| 059B20 | [GenBank: | [GenBank: | 15 | 27.80 | 94,396 |
| 077H08 | [GenBank: | [GenBank: | 15 | 6.05 | 339,215 |
| 004O03 | [GenBank: | [GenBank: | 16 | 10.40 | 149,618 |
| 075M09 | [GenBank: | [GenBank: | 16 | 8.30 | 120,355 |
| 104I08 | [GenBank: | [GenBank: | 16 | 27.17 | 214,517 |
| 013P16 | [GenBank: | [GenBank: | 18 | 18.65 | 366,849 |
| 042J20 | [GenBank: | [GenBank: | 18 | 32.85 | 223,781 |
| 021L13 | [GenBank: | [GenBank: | 19 | 17.40 | 286,019 |
| 052N18 | [GenBank: | [GenBank: | 19 | 7.59 | 345,553 |
| 056A09 | [GenBank: | [GenBank: | 19 | 13.46 | 122,217 |
| 065M02 | [GenBank: | [GenBank: | 19 | 15.37 | 430,325 |
| 080O21 | [GenBank: | [GenBank: | 19 | 17.06 | 389,643 |
| 086C16 | [GenBank: | [GenBank: | 19 | 6.00 | 364,590 |
| 010H22 | [GenBank: | [GenBank: | 21 | 2.24 | 240,019 |
| 034I14 | [GenBank: | [GenBank: | 21 | 20.16 | 306,102 |
| 081J10 | [GenBank: | [GenBank: | 21 | 22.12 | 277,818 |
| 099H22 | [GenBank: | [GenBank: | 22 | 7.03 | 115,754 |
| 053P11 | [GenBank: | [GenBank: | 23 | 30.73 | 197,703 |
| 068O10 | [GenBank: | [GenBank: | 23 | 18.01 | 332,973 |
| 105H10 | [GenBank: | [GenBank: | 23 | 33.36 | 274,318 |
Figure 1Identification of microsyntenies through comparative sequence analysis (chr 1 through chr 5). tBLASTX searches were conducted using BAC contig-associated BAC end sequences as queries against the zebrafish genome sequence. The putative conserved microsyntenies are presented along the 25 zebrafish chromosomes (chr 1 through chr 25). The position of the zebrafish sequence is shown on the left of each chromosome bars in million base pairs. The conserved microsyntenies are indicated on the right side of the chromosome bars, with the numbers representing the contig numbers of the BAC assembly of the catfish physical map [41]. Circles represent short syntenic regions and short vertical lines represent relatively longer conserved syntenic regions proportional to the length of the bar with a number in parenthesis representing the number of conserved sequences within the microsyntenies. The microsyntenies designated with asterisks (*) are those with duplicated conservation of the microsyntenies that are color-coded to facilitate the visualization of the duplicated syntenic regions along the chromosome. Duplicated syntenic regions refer to a conserved genomic segment between the catfish genome and the zebrafish genome that is duplicated in the zebrafish genome such that identical or nearly identical significant hits are generated from two chromosomal regions of the zebrafish genome using a single catfish genome segment (say it is a contig or a scaffold) as the query. In just few cases, this term is used in an extended fashion to include those that are tripled in the zebrafish genome.
Figure 2Identification of microsyntenies through comparative sequence analysis (chr 6 through chr 10). tBLASTX searches were conducted using BAC contig-associated BAC end sequences as queries against the zebrafish genome sequence. The putative conserved microsyntenies are presented along the 25 zebrafish chromosomes (chr 1 through chr 25). The position of the zebrafish sequence is shown on the left of each chromosome bars in million base pairs. The conserved microsyntenies are indicated on the right side of the chromosome bars, with the numbers representing the contig numbers of the BAC assembly of the catfish physical map [41]. Circles represent short syntenic regions and short vertical lines represent relatively longer conserved syntenic regions proportional to the length of the bar with a number in parenthesis representing the number of conserved sequences within the microsyntenies. The microsyntenies designated with asterisks (*) are those with duplicated conservation of the microsyntenies that are color-coded to facilitate the visualization of the duplicated syntenic regions along the chromosome. Duplicated syntenic regions refer to a conserved genomic segment between the catfish genome and the zebrafish genome that is duplicated in the zebrafish genome such that identical or nearly identical significant hits are generated from two chromosomal regions of the zebrafish genome using a single catfish genome segment (say it is a contig or a scaffold) as the query. In just few cases, this term is used in an extended fashion to include those that are tripled in the zebrafish genome.
Figure 3Identification of microsyntenies through comparative sequence analysis (chr 11 through chr 15). tBLASTX searches were conducted using BAC contig-associated BAC end sequences as queries against the zebrafish genome sequence. The putative conserved microsyntenies are presented along the 25 zebrafish chromosomes (chr 1 through chr 25). The position of the zebrafish sequence is shown on the left of each chromosome bars in million base pairs. The conserved microsyntenies are indicated on the right side of the chromosome bars, with the numbers representing the contig numbers of the BAC assembly of the catfish physical map [41]. Circles represent short syntenic regions and short vertical lines represent relatively longer conserved syntenic regions proportional to the length of the bar with a number in parenthesis representing the number of conserved sequences within the microsyntenies. The microsyntenies designated with asterisks (*) are those with duplicated conservation of the microsyntenies that are color-coded to facilitate the visualization of the duplicated syntenic regions along the chromosome. Duplicated syntenic regions refer to a conserved genomic segment between the catfish genome and the zebrafish genome that is duplicated in the zebrafish genome such that identical or nearly identical significant hits are generated from two chromosomal regions of the zebrafish genome using a single catfish genome segment (say it is a contig or a scaffold) as the query. In just few cases, this term is used in an extended fashion to include those that are tripled in the zebrafish genome.
Figure 4Identification of microsyntenies through comparative sequence analysis (chr 16 through chr 20). tBLASTX searches were conducted using BAC contig-associated BAC end sequences as queries against the zebrafish genome sequence. The putative conserved microsyntenies are presented along the 25 zebrafish chromosomes (chr 1 through chr 25). The position of the zebrafish sequence is shown on the left of each chromosome bars in million base pairs. The conserved microsyntenies are indicated on the right side of the chromosome bars, with the numbers representing the contig numbers of the BAC assembly of the catfish physical map [41]. Circles represent short syntenic regions and short vertical lines represent relatively longer conserved syntenic regions proportional to the length of the bar with a number in parenthesis representing the number of conserved sequences within the microsyntenies. The microsyntenies designated with asterisks (*) are those with duplicated conservation of the microsyntenies that are color-coded to facilitate the visualization of the duplicated syntenic regions along the chromosome. Duplicated syntenic regions refer to a conserved genomic segment between the catfish genome and the zebrafish genome that is duplicated in the zebrafish genome such that identical or nearly identical significant hits are generated from two chromosomal regions of the zebrafish genome using a single catfish genome segment (say it is a contig or a scaffold) as the query. In just few cases, this term is used in an extended fashion to include those that are tripled in the zebrafish genome.
Figure 5Identification of microsyntenies through comparative sequence analysis (chr 21 through chr 25). tBLASTX searches were conducted using BAC contig-associated BAC end sequences as queries against the zebrafish genome sequence. The putative conserved microsyntenies are presented along the 25 zebrafish chromosomes (chr 1 through chr 25). The position of the zebrafish sequence is shown on the left of each chromosome bars in million base pairs. The conserved microsyntenies are indicated on the right side of the chromosome bars, with the numbers representing the contig numbers of the BAC assembly of the catfish physical map [41]. Circles represent short syntenic regions and short vertical lines represent relatively longer conserved syntenic regions proportional to the length of the bar with a number in parenthesis representing the number of conserved sequences within the microsyntenies. The microsyntenies designated with asterisks (*) are those with duplicated conservation of the microsyntenies that are color-coded to facilitate the visualization of the duplicated syntenic regions along the chromosome. Duplicated syntenic regions refer to a conserved genomic segment between the catfish genome and the zebrafish genome that is duplicated in the zebrafish genome such that identical or nearly identical significant hits are generated from two chromosomal regions of the zebrafish genome using a single catfish genome segment (say it is a contig or a scaffold) as the query. In just few cases, this term is used in an extended fashion to include those that are tripled in the zebrafish genome.
Figure 6Scaffolds of conserved syntenic regions between the catfish and zebrafish genomes. Scaffolds of conserved syntenic regions were established by genetic linkage mapping of BAC contig-associated microsatellites. The zebrafish chromosome 13 (chr 13) is presented with its base positions on the far left in million base pairs; The second column of the numbers are catfish BAC contig numbers [41], with the identified syntenic regions shown immediately right of the chromosome bar. The numbers in the parenthesis are the number of conserved sequences; the circles and bars represent relatively short and long conserved syntenic regions; the asterisks represent duplicated syntenic regions with color coding to facilitate the visualization of duplicated regions, the same way as described under Figure 1 legend, except that the open circles represent conserved sequences coming from non-gene sequences while the solid circles represent conserved gene sequences. Microsatellites from the BAC contigs were genetically mapped to linkage groups as shown on the right, with the names of microsatellites being labeled on the second most right, e.g., AUBES1884. The positional relationship of the conserved syntenies on the zebrafish genome sequence and within the catfish linkage group is indicated by thin lines linking the zebrafish chromosome and the catfish linkage group positions. The positions of markers within the linkage group are shown on the furthest right in centi-Morgans.
Figure 7Scaffolds of conserved syntenic regions between the catfish linkage groups and zebrafish chromosome 7. Details of the methods used for the identification and presentation of the syntenic regions are the same as described under Figure 6.