| Literature DB >> 18927222 |
Taishi Umezawa1, Tetsuya Sakurai, Yasushi Totoki, Atsushi Toyoda, Motoaki Seki, Atsushi Ishiwata, Kenji Akiyama, Atsushi Kurotani, Takuhiro Yoshida, Keiichi Mochida, Mie Kasuga, Daisuke Todaka, Kyonoshin Maruyama, Kazuo Nakashima, Akiko Enju, Saho Mizukado, Selina Ahmed, Kyoko Yoshiwara, Kyuya Harada, Yasutaka Tsubokura, Masaki Hayashi, Shusei Sato, Toyoaki Anai, Masao Ishimoto, Hideyuki Funatsuki, Masayoshi Teraishi, Mitsuru Osaki, Takuro Shinano, Ryo Akashi, Yoshiyuki Sakaki, Kazuko Yamaguchi-Shinozaki, Kazuo Shinozaki.
Abstract
A large collection of full-length cDNAs is essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. We obtained a total of 39,936 soybean cDNA clones (GMFL01 and GMFL02 clone sets) in a full-length-enriched cDNA library which was constructed from soybean plants that were grown under various developmental and environmental conditions. Sequencing from 5' and 3' ends of the clones generated 68 661 expressed sequence tags (ESTs). The EST sequences were clustered into 22,674 scaffolds involving 2580 full-length sequences. In addition, we sequenced 4712 full-length cDNAs. After removing overlaps, we obtained 6570 new full-length sequences of soybean cDNAs so far. Our data indicated that 87.7% of the soybean cDNA clones contain complete coding sequences in addition to 5'- and 3'-untranslated regions. All of the obtained data confirmed that our collection of soybean full-length cDNAs covers a wide variety of genes. Comparative analysis between the derived sequences from soybean and Arabidopsis, rice or other legumes data revealed that some specific genes were involved in our collection and a large part of them could be annotated to unknown functions. A large set of soybean full-length cDNA clones reported in this study will serve as a useful resource for gene discovery from soybean and will also aid a precise annotation of the soybean genome.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18927222 PMCID: PMC2608845 DOI: 10.1093/dnares/dsn024
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Collection of RNA samples for constructing a soybean full-length cDNA library
| Sample name | Treatments or strains | Condition 1 | Condition 2 | Tissues | Placea | |
|---|---|---|---|---|---|---|
| 1 | Drought stress | Removal of media | Hydroponics | Green house | Whole plants | Ibaraki |
| 2 | Salt stress | 100 mM NaCl | Hydroponics | Green house | Whole plants | Ibaraki |
| 3 | Chilling stress | 4°C | Hydroponics | Green house | Whole plants | Ibaraki |
| 4 | Low temperature | 15°C | Pot | Green house | Whole plants | Hokkaido1 |
| 5 | Pi starvation | Nutrient solution without Pi | Hydroponics | Green house | Whole plants | Hokkaido2 |
| 6 | Flooding | Imbibition | Hydroponics | Green house | Whole plants | Hokkaido1 |
| 7 | SCN infected roots | Soil | Field | Roots | Hokkaido1 | |
| 8 | Flower buds | Normal condition | Soil | Field | Flower buds | Chiba |
| 9 | Roots and nodules | Vermiculite and soil | Field | Roots | Chiba | |
| 10 | Developing seeds | Normal condition | Soil | Field | Seeds | Saga |
We prepared RNA samples from soybean plants distributed by several laboratories in Japan. The regions, conditions and treatments for soybean plants widely ranged as shown here. Additional information was described in Materials and Methods.
aThe latitude and longitude for each place is as follows: Ibaraki, 36°0′N, 140°1′E; Hokkaido1, 43°1′N, 141°9′E; Hokkaido2, 43°3′ N, 141°2′ E; Chiba, 35°8′N, 139°9′E and Saga, 33°2′N, 130°3′E.
Figure 1Scheme for construction and data processing of a soybean full-length-enriched cDNA library. We constructed a soybean full-length-enriched cDNA library using a biotinylated CAP-trapper method from multiple sources of soybean plants under various conditions (Table 1). A total of 39 936 cDNA clones (GMFL01 and GMFL02 clone sets) were sequenced and 68 661 both-end sequences were derived. These sequences were deposited to the DDBJ. The sequences were clustered into 22 674 scaffolds. We subsequently selected 4712 clones for full-length cDNA sequences and deposited them to DDBJ.
Summary of soybean cDNA sequences for assembly and clustering
| Groups | Records |
|---|---|
| Number of initial clones | 39 936 |
| Number of available clones | 37 834 |
| Number of good sequences | 68 661 |
| 5′-EST sequences | 36 512 |
| 3′-EST sequences | 32 149 |
| Average trimmed EST length (bp) | 526.2 |
| Clones with 5′- and 3′-sequences | 30 827 |
| Contigs | 11 036 |
| Average contig length (bp) | 697.4 |
| Singletons | 15 255 |
| Scaffolds | 22 674 |
| Max. scaffold size (no. of EST) | 199 |
| Average scaffold size (no. of EST) | 1.7 |
| Distinct genes | 13 526 |
| Putative splicing variants | 4325 |
| Full-length sequenced clones | 4712 |
| Full-read clones in EST sequences | 2580 |
| A total of full-length cDNA sequences | 6570 |
Figure 2Distribution of numbers of soybean cDNA clones involved in each cluster of sequence assembly. We derived 68 661 sequences from 39 936 soybean cDNA clones, and clustered them into 22 674 scaffolds. Sequence assembly performed by CAP3 reveals a large distribution of the numbers of clones per scaffold.
Figure 3Length distributions of soybean cDNA inserts and ORFs. Sequence length of soybean cDNA inserts (A) and its ORFs (B) was obtained from a total of 4712 full-length sequences of soybean cDNAs. Used definitions and calculation methods were described in the Materials and methods section.
Figure 4Functional annotation of soybean genes. The 22 674 scaffolds (A) and 4712 full-length sequences (B) of soybean cDNAs were classified into functional groups by the KOG database.[44] The colors of each functional group are indicated in the table. Graphs are outlined with multi-color frames which represent four subcategories: ‘information storage and processing’ (light red), ‘cellular processing and signaling’ (bright yellow), ‘metabolism’ (greenish brown) and ‘poorly characterized’ (pink).
Comparative analysis of cDNAs between soybean and other plants
| Species | No. of records | Hit | No hit | No hit among species | No hit in all searches |
|---|---|---|---|---|---|
| Ath | 31 921 | 21 047 | 1627 | ||
| Osa | 40 041 | 19 969 | 2 705 | 1194 | |
| Ptr | 45 555 | 21 277 | 1397 | 1085 | |
| Lja | 148 457 | 13 987 | 8687 | 5789 | |
| Mtr | 232 299 | 14 798 | 7876 | ||
Soybean full-length cDNA sequences from 22 674 scaffolds were submitted to BLASTX search (e-value <1e− 5) against data sets of Arabidopsis thaliana (Ath), rice (Osa) or poplar (Ptr), or BLASTN search (e-value <1e− 30) against data sets of L. japonicus (Lja) or M. truncatula (Mtr). All sequence data were obtained from public databases. The URLs are http://www.arabidopsis.org/ (Ath: 31 921 records), http://rapdb.lab.nig.ac.jp/ (Osa: 40 041 records), http://genome.jgi-psf.org/Poptr1_1/ (Ptr: 45 555 records) and http://www.ncbi.nlm.nih.gov/ (Lja: 148 457 records and Mtr: 232 299 records).
List of data sets for comparative analyses with other plants
| Data set | Source |
|---|---|
| TAIR7 releasea | |
| RAP1 based on the IRGSP sequence build 3b | |
| JGI Populus trichocarpa ver1.1c | |
| Collected in NCBI (GenBank) as of July 2007 and cleaned from contamination of vector and | |
| Non-redundant proteins | NCBI-nr 28 May 2007 released |
| Orthologous groups of proteins for eukaryotic | NCBI-KOGs 3 March 2003 released |
The version and date of data sets were listed for Arabidopsis, rice, poplar, L. japonicus, M. truncatula, soybean, non-redundant protein sequences and KOGs. Data sets were obtained from public databases as indicated by superscripts.
ahttp://www.arabidopsis.org/.
bhttp://rapdb.lab.nig.ac.jp/.
chttp://genome.jgi-psf.org/Poptr1_1/.
dhttp://www.ncbi.nlm.nih.gov/.
Putative functions of soybean cDNAs which were not homologized to other plants data sets
| InterPro Name | InterPro ID | No. of genes |
|---|---|---|
| Nodulin | IPR003387 | 24 |
| Proteinase inhibitor I4, serpin | IPR000215 | 5 |
| Albumin I | IPR012512 | 4 |
| Peptidase M, neutral zinc metallopeptidases, zinc-binding site | IPR006025 | 3 |
| Plant lipid transfer/seed storage/trypsin-alpha amylase inhibitor | IPR003612 | 3 |
| Aldo/keto reductase | IPR001395 | 3 |
| Ankyrin | IPR002110 | 2 |
| Peptidase S1 and S6, chymotrypsin/Hap | IPR001254 | 2 |
| ATP-dependent DNA ligase | IPR000977 | 2 |
| Glycine rich | IPR010800 | 1 |
| Protein of unknown function DUF581 | IPR007650 | 1 |
| Zinc finger, C2H2-type | IPR007087 | 1 |
| Late embryogenesis abundant protein 3 | IPR004926 | 1 |
| Aminotransferases class-I pyridoxal-phosphate-binding site | IPR004838 | 1 |
| Immunoglobulin/major histocompatibility complex | IPR003006 | 1 |
| Phosphotransferase system, HPr serine phosphorylation site | IPR002114 | 1 |
| Aldehyde dehydrogenase | IPR002086 | 1 |
| C-5 cytosine-specific DNA methylase | IPR001525 | 1 |
| Annexin | IPR001464 | 1 |
| Lipoxygenase | IPR000907 | 1 |
| Endoplasmic reticulum targeting sequence | IPR000886 | 1 |
| GPCR, family 2, secretin-like | IPR000832 | 1 |
| Oxidoreductase, molybdopterin binding | IPR000572 | 1 |
| Glutelin | IPR000480 | 1 |
| ATPase, F1/V1/A1 complex, alpha/beta subunit, nucleotide-binding | IPR000194 | 1 |
| Glyceraldehyde 3-phosphate dehydrogenase | IPR000173 | 1 |
| Peptidase, cysteine peptidase active site | IPR000169 | 1 |
Figure 5Length distributions of 5′- and 3′-UTR sequences of soybean cDNA clones. The 5′-UTR (A) and 3′-UTR sequences (B) were derived from 68 661 soybean EST sequences. The definitions and calculation methods were described in the Materials and methods section.
Figure 6The web-based interface to the soybean full-length cDNA database. The query set should be prepared as nucleotide and peptide sequences, or keywords for gene functions/annotations. This website includes a full-set of BLAST programs against five data sets of nucleotide—G. max cDNA (RIKEN), G. max mRNA (GenBank), G. max (UniGene), L. japonicus (UniGene) and M. truncatula (UniGene)—and four data sets of peptide—UniProt-TrEMBL plants, Arabidopsis thaliana (TAIR), Oryza sativa (RAP-DB) and Populus trichocarpa (JGI). These tools can be accessed from the following URL: http://rsoy.psc.riken.jp/