Literature DB >> 26594380

Construction, complete sequence, and annotation of a BAC contig covering the silkworm chorion locus.

Zhiwei Chen1, Junko Nohata2, Huizhen Guo1, Shenglong Li1, Jianqiu Liu1, Youbing Guo1, Kimiko Yamamoto3, Keiko Kadono-Okuda3, Chun Liu1, Kallare P Arunkumar4, Javaregowda Nagaraju4, Yan Zhang1, Shiping Liu1, Vassiliki Labropoulou5, Luc Swevers5, Panagiota Tsitoura5, Kostas Iatrou5, Karumathil P Gopinathan6, Marian R Goldsmith7, Qingyou Xia1, Kazuei Mita1.   

Abstract

The silkmoth chorion was studied extensively by F.C. Kafatos' group for almost 40 years. However, the complete structure of the chorion locus was not obtained in the genome sequence of Bombyx mori published in 2008 due to repetitive sequences, resulting in gaps and an incomplete view of the locus. To obtain the complete sequence of the chorion locus, expressed sequence tags (ESTs) derived from follicular epithelium cells were used as probes to screen a bacterial artificial chromosome (BAC) library. Seven BACs were selected to construct a contig which covered the whole chorion locus. By Sanger sequencing, we successfully obtained complete sequences of the chorion locus spanning 871,711 base pairs on chromosome 2, where we annotated 127 chorion genes. The dataset reported here will recruit more researchers to revisit one of the oldest model systems which has been used to study developmentally regulated gene expression. It also provides insights into egg development and fertilization mechanisms and is relevant to applications related to improvements in breeding procedures and transgenesis.

Entities:  

Keywords:  DNA sequencing; Molecular biology

Mesh:

Year:  2015        PMID: 26594380      PMCID: PMC4640134          DOI: 10.1038/sdata.2015.62

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

Silkmoth chorion proteins, the main components of the eggshell, are sequentially synthesized and secreted by follicular epithelium cells with a high degree of developmental programming[1]. The structural genes for chorion proteins comprise a multigene family whose members are grouped under α and β branches based on their evolutionarily conserved central domains[2]. Chorion proteins are further classified into six subgroups, early A, early B, middle A, middle B, late high-cysteine A (HcA) and late high-cysteine B (HcB), according to their timing of developmental expression and amino acid composition[3]. Based on genetic linkage mapping, the chorion genes are located between the larval marker p at the proximal end of chromosome 2 and the cocoon color marker Y[4-6]. The recent silkworm genome assembly[7] indicates that the chorion locus is localized at [1,780,900-3,840,078] on chromosome 2, although it is largely interrupted by gaps due to highly repetitive sequences. A high quality BAC library was constructed from genomic DNA of silkworm fifth instar day 3 posterior silk glands partially digested with EcoRI[8], designated RPCI-96 (RP96), and is available from BACPAC Resources of the Children’s Hospital Oakland Research Institute (BACPAC Resources Center [bacpac.chori.org/]). Here we undertook the following strategy to obtain complete sequences of the chorion locus (Fig. 1): ESTs of chorion genes were used as probes to screen the BAC library, and selected clones were used to construct a BAC contig which covered the complete chorion locus (Fig. 2b). By Sanger sequencing of the BAC contig, we successfully obtained the complete sequence of the chorion locus spanning 871,711 base pairs on chromosome 2, where we annotated 127 chorion genes (Fig. 2c).
Figure 1

Schematic overview of the study.

Figure 2

Distribution of genome assembly, BAC contig and annotated chorion genes in the chorion locus.

Probes are marked by stars: early chorion genes (black stars); middle chorion genes (green stars); late chorion genes (red stars). The probes used here are presented in Table 1. (a) Diagram of the chorion locus in the B. mori genome assembly. Arrows and dotted lines represent scaffolds and gap regions, respectively, edited from KAIKObase, respectively. (b) BAC contig that covers the chorion locus. Each black line represents a complete BAC region. Six BACs were sequenced except for 544H24, because its sequence was known. (c) Early, middle, late and non-chorion genes are highlighted in black, green, red and yellow, respectively.

We report and describe in detail the methods, data and quality measurements for the construction and sequencing of the silkmoth chorion BAC contig in this paper. Additional information for a comprehensive understanding of the structure, transcription, and proteomics of genes in the chorion locus is described in a related research paper[9]. In the present paper, we describe in detail our experimental approach for obtaining the complete BAC contig covering the silkworm chorion locus, together with its sequence data and annotation, which are presented briefly in the ‘Materials and Methods’ section of our related paper[9]. Our strategy can serve as a model to facilitate the sequencing of selected loci in genomes in other species containing highly repetitive sequences.

Methods

EST analysis of follicular cell and ovary cDNA libraries

To identify chorion gene transcripts, we analyzed ESTs of two newly constructed cDNA libraries, fcP8 derived from day 8 pupal follicular cells and bmov from day 4 pupal ovaries. All ESTs derived from the bmov and fcP8 cDNA libraries are accessible at the DNA Database of Japan (acc # FY000001-FY021573 for bmov and BY918786-BY920388 and BY927072-BY928825 for fcP8). We identified ESTs of chorion genes by BLASTx search in public protein databases including nr of NCBI.

BAC screening

The silkworm BAC library (RPCI-96) used in this paper was obtained from BACPAC Resources Center, Children’s Hospital, Oakland Research Institute and previously described[8,10]. BAC clones derived from the chorion locus were screened by hybridization of BAC high density replica (HDR) filters arrayed in duplicate with RPCI-96 BAC clones (BACPAC Resources Center [bacpac.chori.org/]) using the ESTs of 10 chorion genes selected as representatives of the three chorion families which provided strong signals in hybridization with multiple BACs, among which some were cross-hybridized with different chorion families. A list of ESTs used for BAC screening is presented in Table 1. Labeling, hybridization and detection were performed using the ECL Direct Nucleic Acid Labeling and Detection System (GE Heathcare UK Ltd., Little Chalfont, Buckinghamshire, UK), in accordance with the manufacturer’s instructions[8].
Table 1

ESTs used as probes for screening BAC clones.

Gene IDcDNA cloneAccession #Type of chorionBAC clone #
BmCho-9fcP812C07BY919605middle class A081P21
BmCho-11fcP809B08BY919370middle class B081P21
BmCho-28fcP802B12BY927183late HcA081P21
BmCho-33fcP815F06BY919864late HcB018E13
BmCho-64fcP816D07BY919923late HcA018E13, 503L05
BmCho-65fcP806G06BY919206late HcB018E13, 503L05
BmCho-76fcP814C12BY919762middle class A503L05, 076K18
BmCho-82fcP818B03BY920045middle class B503L05, 076K18
BmCho-109fcP807G07BY927624early B077P06, 049B01
BmCho-110fcP809F08BY919409early A077P06, 049B01

Construction of a BAC contig covering the chorion locus

Two hundred and two BAC clones from early, middle and late chorion gene regions were screened with EST probes of representative chorion genes from the fcP8 cDNA library by hybridization of an HDR filter of the RPCI-96 silkworm BAC library. Among positive BAC clones, we chose highly positive BAC clones 077P06 and 094B01 for early chorion genes, 081P21 and 076K18 for middle chorion genes, and 018E13 for late chorion genes. We also selected clone 503L05, which had a strong positive signal and was known to cover a non-chorion domain of the locus based on its BAC end sequence, BES_503_L05 (acc # DE379518), in (http://sgp.dna.affrc.go.jp/KAIKObase/), and BAC 544H24, because we already knew that its full sequence was aligned with the 3′ part of the chorion locus and the neighboring region[7]. We performed contig construction for these BAC clones with the fingerprinting method described previously[10]. This resulted in two contigs; one was composed of four BACs covering the 5′ half of the chorion locus, while the other was composed of three BACs aligning with the 3′ half of the chorion locus (Fig. 2a). One of the 076K18 BAC-end sequences, BES_076_K18 (acc # DE307437), aligned to Bm_scaf166 at [chr2: 2,636,193-2,636,430], and the 5′ end of the other BAC contig, 077P06 BAC end-sequence BET_077_P06 (acc # DE354956), was located on the same scaffold, Bm_scaf166, at [chr2: 2,647,297-2,647,961]. Thus, the two BAC contigs, which were connected on Bm_scaf166, covered the whole chorion locus (Fig. 2a).

Genomic sequencing

Six BAC clones from 384 well plates[11] were streaked separately on chloramphenicol-containing LB plates. Three single clones from each plate were checked to confirm the correct BAC clone by using primers designed from the end sequences of each BAC (Table 2). Then BAC clones were cultured for isolation of BAC DNA in LB medium. BAC DNA was extracted using a Large-Construct Kit (QIAGEN) in accordance with the manufacturer’s instructions. Two kilobase and five kilobase shotgun libraries for each BAC were constructed using a pUC118 vector[12]. For each library, approximately 590 clones were picked for bidirectional sequencing performed with an ABI3730 DNA Analyzer (Applied Biosystems).
Table 2

The list of primers for detecting the BAC clones

BAC EndForward PrimerReverse Primer
Note: BET is a BAC-end sequence using a T7 primer; BES uses a SP6 primer. Both are vector primers which align adjacent to an insert. BET primer >Insert<BES primer.  
BET_081P21AGCATTCTTCCCCCACTGAGATTTAGATAGGCGGACGAA
BET_018E13CATCCACTGTAACCTCCATATACAGAGCAAGTGGATTTTC
BES_018E13AGCCACGTTTCTTCCAATCATGAGGATGTGGTGTCAAACG
BET_503L05TTTTCCGAATTTAAGCGATAGTGGAGTCAAAAAGTAGATGT
BES_503L05GCACAGTAATTCGCCAGTAGGCTGCCATTGACCTGATAGA
BES_076K18TAGTTATTCTACGCAGTTCAGGGGAGGTCTATGTCCAGCGG
BET_077P06ATTTTTATCCGACACCCTTATTCCCGCCAAAAAGTCATAC
BES_077P06GCGCATTTACGATGTAGATGCAATGTATGTTCCGCTGTGT
BET_094B01CTTAACGCAATTCGTCGGTAGGAAAGGTCACCTACGAATG
BES_094B01AAGCAACTCTTTTACGGGTCATTAGATAAATGAAGGCCGG

Sequence assembly and annotation of chorion genes

The low-quality bases (QV<20) were removed by Phred[13]. After trimming vector sequences using cross_match, all paired-end reads were assembled with the programs Phrap 1.08081222[14] and Consed 16.0[15]. The position of mis-assembled clone sequences could be adjusted according to the size of the clones (insertion segment) by both assembly programs. The small gap in assembly sequences was filled by primer walking. The software program fgenesh[16] was used to predict the chorion genes.

Data Records

Data record 1

The complete sequence of the chorion locus appears under DDBJ AB999997 (Data Citation 1).

Technical Validation

Probe selection and construction of BAC contig

Previous reports revealed that the chorion locus is composed of three types of clusters containing early, middle and late chorion genes[3]. Thus, we selected representatives for the three types of chorion gene ESTs to screen the BAC library (Table 1). Among ten probes, eight of them were identified and oriented in the published genome of B. mori[7], and both end sequences of BACs were used to confirm the orientation of BACs. BAC end sequence-based primers were used to confirm the orientation and position of BACs in the chorion locus by PCR (Supplementary Fig. 1; see Table 2 for primer sequences). The PCR experiment showed that the target BACs were sequentially connected with an overlap to cover the whole chorion locus, except for a small gap region. Then, we were able to obtain sequences for the gap region between BACs 076K18 and 077P06 from Bm_scaf166 in the silkworm genome sequence. These strategies enabled us to establish a complete BAC contig covering the chorion locus.

Sequencing and assembly

In a first attempt to obtain the complete sequence of the chorion locus, we used Ion PGMTM, a representative of a second generation sequencing platform characterized by low cost, high throughput and read lengths of up to 289 bp. Unfortunately, the presence of highly repetitive DNA sequences resulted in a failure to obtain an assembly of individual BACs despite a coverage of 150-fold. For further assistance in sequence assembly, we constructed 2 and 5 kb shotgun libraries for each BAC and sequenced them using the Sanger method. This enabled the generation of reads up to 500 bp, which were able to cover major exons of chorion genes, on the order of 500–800 bp. About 2,400 reads were generated for each BAC, which covered the chorion locus 10-fold. The positions of the BACs in the complete chorion locus are shown in Table 3.
Table 3

BACs and their position in the complete chorion locus

BACPosition in chorion locus (nt)
Note: BET is a BAC-end sequence using a T7 primer; BES uses a SP6 primer. Both are vector primers which align adjacent to an insert. BET primer>Insert<BES primer. 
081P211–169,292
018E13165,668–313,154
503L05279,542–438,320
076K18326,219–505,371
077P06515,752–687,226
049B01592,429–801,223
544H24669,876–871,711

Annotation of chorion genes

Two EST libraries from day 4 pupal ovary and day 8 pupal follicular cells were constructed which contained ESTs of all known chorion genes. ESTs were aligned to the chorion locus, which further confirmed the existence of the predicted chorion genes.

Usage Notes

The complete sequences of chorion locus data described here can be downloaded from DDBJ AB999997. This data descriptor provides an opportunity to present a strategy for obtaining precise sequence information for an extended region (>0.8 Mb) of a highly repetitive genome. The complete sequence of the chorion locus and detailed gene annotation data are provided for users to study developmental regulation of gene expression using the silkmoth chorion gene model.

Additional Information

How to cite this article: Chen, Z. et al. Construction, complete sequence, and annotation of a BAC contig covering the silkworm chorion locus. Sci. Data 2:150062 doi: 10.1038/sdata.2015.62 (2015).
  16 in total

1.  Organization of the Chorion Genes of BOMBYX MORI, a Multigene Family. II. Partial Localization of Three Gene Clusters.

Authors:  M R Goldsmith; E Clermont-Rattner
Journal:  Genetics       Date:  1979-08       Impact factor: 4.562

2.  Assembling genomic DNA sequences with PHRAP.

Authors:  Melissa de la Bastide; W Richard McCombie
Journal:  Curr Protoc Bioinformatics       Date:  2007-03

3.  Base-calling of automated sequencer traces using phred. II. Error probabilities.

Authors:  B Ewing; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

4.  Consed: a graphical tool for sequence finishing.

Authors:  D Gordon; C Abajian; P Green
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

5.  Organization of the chorion genes of Bombyx mori, a multigene family. I. Evidence for linkage to chromosome 2.

Authors:  M R Goldsmith; G Basehoar
Journal:  Genetics       Date:  1978-10       Impact factor: 4.562

6.  Organization of the chorion genes of Bombyx mori, a multigene family. III. Detailed marker composition of three gene clusters.

Authors:  M R Goldsmith; E Clermont-Rattner
Journal:  Genetics       Date:  1980-09       Impact factor: 4.562

7.  A stable and efficient transformation system for Butyrivibrio fibrisolvens OB156.

Authors:  C E Beard; M A Hefford; R J Forster; S Sontakke; R M Teather; K Gregg
Journal:  Curr Microbiol       Date:  1995-02       Impact factor: 2.188

8.  Evolution of the silk moth chorion gene superfamily: gene families CA and CB.

Authors:  R Lecanidou; G C Rodakis; T H Eickbush; F C Kafatos
Journal:  Proc Natl Acad Sci U S A       Date:  1986-09       Impact factor: 11.205

9.  The genome of a lepidopteran model insect, the silkworm Bombyx mori.

Authors: 
Journal:  Insect Biochem Mol Biol       Date:  2008-12-16       Impact factor: 4.714

10.  A comprehensive analysis of the chorion locus in silkmoth.

Authors:  Zhiwei Chen; Junko Nohata; Huizhen Guo; Shenglong Li; Jianqiu Liu; Youbing Guo; Kimiko Yamamoto; Keiko Kadono-Okuda; Chun Liu; Kallare P Arunkumar; Javaregowda Nagaraju; Yan Zhang; Shiping Liu; Vassiliki Labropoulou; Luc Swevers; Panagiota Tsitoura; Kostas Iatrou; Karumathil P Gopinathan; Marian R Goldsmith; Qingyou Xia; Kazuei Mita
Journal:  Sci Rep       Date:  2015-11-10       Impact factor: 4.379

View more
  1 in total

1.  Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta.

Authors:  Michael R Kanost; Estela L Arrese; Xiaolong Cao; Yun-Ru Chen; Sanjay Chellapilla; Marian R Goldsmith; Ewald Grosse-Wilde; David G Heckel; Nicolae Herndon; Haobo Jiang; Alexie Papanicolaou; Jiaxin Qu; Jose L Soulages; Heiko Vogel; James Walters; Robert M Waterhouse; Seung-Joon Ahn; Francisca C Almeida; Chunju An; Peshtewani Aqrawi; Anne Bretschneider; William B Bryant; Sascha Bucks; Hsu Chao; Germain Chevignon; Jayne M Christen; David F Clarke; Neal T Dittmer; Laura C F Ferguson; Spyridoula Garavelou; Karl H J Gordon; Ramesh T Gunaratna; Yi Han; Frank Hauser; Yan He; Hanna Heidel-Fischer; Ariana Hirsh; Yingxia Hu; Hongbo Jiang; Divya Kalra; Christian Klinner; Christopher König; Christie Kovar; Ashley R Kroll; Suyog S Kuwar; Sandy L Lee; Rüdiger Lehman; Kai Li; Zhaofei Li; Hanquan Liang; Shanna Lovelace; Zhiqiang Lu; Jennifer H Mansfield; Kyle J McCulloch; Tittu Mathew; Brian Morton; Donna M Muzny; David Neunemann; Fiona Ongeri; Yannick Pauchet; Ling-Ling Pu; Ioannis Pyrousis; Xiang-Jun Rao; Amanda Redding; Charles Roesel; Alejandro Sanchez-Gracia; Sarah Schaack; Aditi Shukla; Guillaume Tetreau; Yang Wang; Guang-Hua Xiong; Walther Traut; Tom K Walsh; Kim C Worley; Di Wu; Wenbi Wu; Yuan-Qing Wu; Xiufeng Zhang; Zhen Zou; Hannah Zucker; Adriana D Briscoe; Thorsten Burmester; Rollie J Clem; René Feyereisen; Cornelis J P Grimmelikhuijzen; Stavros J Hamodrakas; Bill S Hansson; Elisabeth Huguet; Lars S Jermiin; Que Lan; Herman K Lehman; Marce Lorenzen; Hans Merzendorfer; Ioannis Michalopoulos; David B Morton; Subbaratnam Muthukrishnan; John G Oakeshott; Will Palmer; Yoonseong Park; A Lorena Passarelli; Julio Rozas; Lawrence M Schwartz; Wendy Smith; Agnes Southgate; Andreas Vilcinskas; Richard Vogt; Ping Wang; John Werren; Xiao-Qiang Yu; Jing-Jiang Zhou; Susan J Brown; Steven E Scherer; Stephen Richards; Gary W Blissard
Journal:  Insect Biochem Mol Biol       Date:  2016-08-12       Impact factor: 4.714

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.