| Literature DB >> 17822570 |
Yoshitaka Suetsugu1, Hiroshi Minami, Michihiko Shimomura, Shun-ichi Sasanuma, Junko Narukawa, Kazuei Mita, Kimiko Yamamoto.
Abstract
BACKGROUND: We performed large-scale bacterial artificial chromosome (BAC) end-sequencing of two BAC libraries (an EcoRI- and a BamHI-digested library) and conducted an in silico analysis to characterize the obtained sequence data, to make them a useful resource for genomic research on the silkworm (Bombyx mori).Entities:
Mesh:
Year: 2007 PMID: 17822570 PMCID: PMC2014780 DOI: 10.1186/1471-2164-8-314
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of two bacterial artificial chromosome (BAC) libraries
| Vector | pBACe3.6 [52] | pBeloBAC11 [53] |
| Cloning site | ||
| Number of clones | 36000 (96 × 384 wells) | 21120 (55 × 384 wells) |
| Mean insert size (kbp) | 168 | 165 |
| Clone coverage | × 11.4 | × 6.6 |
| Strain | p50T (mixed insects) | p50T (mixed insects) |
To calculate the percentage of the silkworm genome covered by the clones (clone coverage) in the EcoRI- and BamHI-digested libraries, we assumed that the silkworm genome size was 530 Mb [34]. Detailed information on the EcoRI-digested library, such as the size distribution of BAC inserts, is available in the paper cited [49] and at the RPCI-96 BAC Library website [32]. Detailed information on the BamHI-digested library can be obtained from the website of Texas A&M BAC Libraries [33].
Characteristics of the two groups of BAC end sequences (BESs)
| Total | |||
| Number of sequences | 61696 | 33208 | 94904 |
| Average read length (bp) | 571.6 | 598.1 | 580.9 |
| Minimum read length (bp) | 50 | 50 | 50 |
| Maximum read length (bp) | 955 | 920 | 955 |
| Total bases (bp) | 35266874 | 19860186 | 55127060 |
| GC content (%) | 37.45 | 40.30 | 38.47 |
| Clones | 34240 | 18251 | 52491 |
| Paired-end clones | 27456 | 14957 | 42413 |
| Percentage of paired-end clones (%) | 80.2 | 82.0 | 80.8 |
A paired-end clone is a clone that contains both end sequences. The percentage of paired-end clones is the ratio of the number of paired-end clones to the total number of clones.
Distribution of interspersed repeat DNA sequences within both BAC end sequences (BESs) in different repeat classes
| GC% | Elements | Percentage | GC% | Elements | Percentage | |
| SINE | 45.57 | 4088 | 1.63 | 45.92 | 2120 | 1.37 |
| LINE | 53.85 | 6865 | 5.14 | 53.22 | 11105 | 16.40 |
| LTR | 47.04 | 4140 | 2.32 | 47.26 | 2264 | 2.22 |
| DNA | 41.49 | 3711 | 3.77 | 40.77 | 744 | 0.70 |
| Unclassified | 40.91 | 1469 | 0.69 | 41.24 | 963 | 0.62 |
"Elements" denotes the number of repeat elements detected. "Percentage" denotes the ratio of length occupied by interspersed repeats to total length. GC content of unmasked region of EcoRI and BamHI BESs were 35.49% and 37.05%, respectively. Overall GC content of EcoRI and BamHI BESs were 37.45% and 40.30%, respectively.
Figure 1Summary of BLAST searches with each group of BAC end sequences (BESs) versus the silkworm whole-genome shotgun sequencing (WGS) data sets. BLAST searches were performed with each group of BESs against the two available silkworm WGS data sets. Each bin consists of two types of hits (red indicates a hit with, and blue a hit without, a repetitive region). The method for detecting a repetitive region was given in a previous section (Repeat analysis of BESs).
Figure 2BAC end sequence (BES) categorization results based on the BLAST search. We defined a BLAST hit with ≥ 99% identity and > 0.8 alignment coverage, defined as the ratio of alignment length to BES length, as a match. BES+ denotes a BES with a single match, and BES++ a BES with multiple matches. BES- denotes a BES without a match, and BES-- a BES without a "raw BLAST hit."
Summary of clustering results
| Cluster size | ||
| 28595 (71.69) | 19731 (79.02) | |
| 4 > | 9606 (24.08) | 4306 (17.57) |
| 8 > | 1494 (3.75) | 373 (1.52) |
| 16 > | 136 (0.34) | 64 (0.26) |
| 32 > | 43 (0.11) | 32 (0.13) |
| 64 > | 7 (0.0176) | 9 (0.04) |
| 128 > | 5 (0.0125) | 0 (0) |
| 1 (0.0025) | 0 (0) | |
| Total | 39887 | 24515 |
Figure 3Relationship between the GC% of genome and the GC% of long interspersed nuclear elements (LINEs) in different species. We used the following LINE elements: A. gambiae; T1(M93689), RT1(M93690), RT2(M93691), Q(U03849), R6Ag3(AB090819), RTAg4(AB090813). D. melanogaster; BS(X77571), Doc(X17551), F(M17214), G(X06950), Helena(AF012030), HeT-A(U06920), I(M14954), Jockey(M22874), Pilger(AF278684), R1Dm(X51968), R2Dm(X15707), Tart(U02279), X(AF237761), You(AJ302712). H. sapiens; L1(U93574), HSLINE1O(X52235), L1.24(U93571), L1.21(U93570). M. mulculus; L1Md-A2(M13002), MMU15647(U15647), L1Md4(X14061), L1Md-Tf14(AF081108), L1Md-Tf23(AF081110), L1Md-Tf26(Af081112), L1Md-Tf9(AF081107), L1orl(D84391), L1spa(AF016099), L1Md-Tf18(AF0181111), L1Md-Tf30(AF081112), L1Md-Tf8(Af081106),, L1Md-Tf29(AF081113), L1Md-Tf17(AF081109), L1Md-Tf5(AF081104), L1Md-Tf6(AF081105). B. mori; BMC1(AB018558), R1Bm(M19755), R2Bm(AB076841), TRAS(AB04668), SART1(D85594). C. elegans; Rte-1(AF054983), Frodo-1(Z70755), Frodo-2(Z48009), Sam1(U13643), Sam2(U57054), Sam3(U46668), Sam4(Z92972), Sam5(Z81092), Sam6(Z82275), Sam7(Z82090), Sam8(AF016663), Sam9(Z81064).