Literature DB >> 17062622

GABI-Kat SimpleSearch: an Arabidopsis thaliana T-DNA mutant database with detailed information for confirmed insertions.

Yong Li1, Mario G Rosso, Prisca Viehoever, Bernd Weisshaar.   

Abstract

Insertional mutagenesis approaches, especially by T-DNA, play important roles in gene function studies of the model plant Arabidopsis thaliana. GABI-Kat SimpleSearch (http://www.GABI-Kat.de) is a Flanking Sequence Tag (FST)-based database for T-DNA insertion mutants generated by the GABI-Kat project. Currently, the database contains >108,000 mapped FSTs from approximately 64,000 lines which cover 64% of all annotated A.thaliana protein-coding genes. The web interface allows searching for relevant insertions by gene code, keyword, line identifier, GenBank accession number of the FST, and also by BLAST. A graphic display of the genome region around the gene or the FST assists users to select insertion lines of their interests. About 3500 insertions were confirmed in the offspring of the plant from which the original FST was generated, and the seeds of these lines are available from the Nottingham Arabidopsis Stock Centre. The database now also contains additional information such as segregation data, gene-specific primers and confirmation sequences. This information not only helps users to evaluate the usefulness of the mutant lines, but also covers a big part of the molecular characterization of the insertion alleles.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17062622      PMCID: PMC1781121          DOI: 10.1093/nar/gkl753

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Insertional mutagenesis approaches by using transposable elements or Agrobacterium T-DNA, play important roles in plant functional genomics (1–4). The use of T-DNA as an insertional mutagen has several advantages (3,4). T-DNA integration results in stable mutations in the genome, as opposed to transposons which often excise after integration. Also, the low number of insertions per transformant significantly reduces the additional work required to remove unwanted mutations. The development of a simple in planta Agrobacterium transformation method for Arabidopsis thaliana allowed the high-throughput production of T-DNA insertion mutants in this model plant (5,6). A number of large T-DNA mutagenized A.thaliana populations have been generated that are intensively used for gene function search studies and other reverse genetics experiments (7–11). Initially, these populations were screened for the desired insertion mutants in the gene of interest by PCR. The PCR was performed with a gene-specific primer and a T-DNA-specific primer on DNA templates from large pools of mutant plants. The identification of a specific PCR product indicated the existence of a T-DNA insertion in the gene of interest, and the respective mutant was then identified by pool deconvolution. An alternative strategy, which gained popularity in recent years, was to amplify the DNA fragments flanking the T-DNA insertion sites of individual plants with special PCR-based methods, and to sequence-characterize the various insertion alleles. When the whole genome sequence of the species is available, the resulting Flanking Sequence Tags (FSTs) can be mapped to the pseudo-chromosomes and indexed in databases. As a result, the screening for a mutant insertion allele of a given gene is simplified to a search for the corresponding FST in the database. Obviously, the number of insertion mutants and FSTs in the database should be large enough to allow most of the genes being covered (3). The aim of the GABI-Kat project was to build a large T-DNA mutagenized A.thaliana population with sequence-indexed insertion sites (11). In June 2002, GABI-Kat SimpleSearch, the web interface of the database containing data about the GABI-Kat population, was opened (12). Since then, GABI-Kat lines have been ordered by many scientists from all around the world, and thousands of confirmed insertion mutants have been delivered to the scientific community. This shows that the GABI-Kat population became one of the major reverse genetics resources for A.thaliana functional genomics. During the last few years and since we described the basic FST analysis pipeline (12), we have constantly improved the database as well as the interface, and incorporated several major updates. This took place in addition to the regular increase in the amount of FST data held in the database. Here, we summarize and describe the improvements, updates and new features of the SimpleSearch tool, including the recent addition of detailed information on insertion sites from allele-specific sequence data and genetic segregation data from 3509 (number as of June 2006) lines with confirmed insertions.

DATABASE CONTENT

FSTs and annotation

The FST production pipeline and the annotation procedure were described in detail elsewhere (12,13). In brief, the genome sequences flanking the T-DNA insertion site (FSTs) were obtained by an adaptor-ligation PCR method which was adapted to high-throughput conditions (13). Each FST was mapped to the A.thaliana genome by BLAST (14), and annotated with the information of the corresponding BAC clone, and with the AGI gene code when the insertion qualifies as a ‘gene hit’ (12). We define ‘gene hit’ as the insertion site being located between 300 bp upstream of the ATG and 300 bp downstream of stop codon of a gene, and ‘CDSi hit’ as the insertion site being located between ATG and stop codon (CDS plus introns). A.thaliana genome sequence and annotation data from the TIGR version 5 dataset were used as the basis for mapping and annotation (15). Table 1 gives a summary of the data in the SimpleSearch database as of the current release (GK release 21 of June 25, 2006). There are >108 700 FSTs, and they were from >63 800 lines. Based on the TIGR v5 annotation, >16 900 genes (64% of all annotated A.thaliana protein-coding genes excluding pseudogenes) were covered by at least one ‘gene hit’ and >12 200 genes (46%) were covered by at least one ‘CDSi hit’.
Table 1

Summary of data in the GABI-Kat SimpleSearch databasea

Data typeNumber of entries
FSTs∼108 700
Lines∼63 800
    Lines with segregation data∼6 000
    Lines available in NASC3 509
Distinct genes covered16 939
Distinct CDSi covered12 239
Confirmation sequences8 840

aNumbers are of GK release 21 as on 25 June 2006.

Summary of data in the GABI-Kat SimpleSearch databasea aNumbers are of GK release 21 as on 25 June 2006.

Data concerning confirmed insertions

One problem associated with the T-DNA mutagenized population is that not all the insertions deduced from FSTs that were obtained from T1 plants can be confirmed in the next (T2) generation. The T1 generation refers to plants that were selected for resistance conferred by the inserted T-DNA. Both SAIL (Syngenta population) and GABI-Kat reported a confirmation rate of ∼76% (9,11). At present, the confirmation rate at GABI-Kat is 78%. This number is derived from ∼6000 lines for which confirmation was attempted during the process of in-house confirmation to make sure that only confirmed insertion lines were delivered to users. The confirmation process consists of segregation analysis, genomic DNA extraction from T2 plantlets, PCR using a T-DNA primer and a gene-specific primer designed to fit the insertion site predicted on the basis of the original FST annotation, and sequencing of this allele-specific PCR product if the PCR was successful (11). For segregation analysis, we record the number of seeds plated, the number of seeds germinated and the number of resistant seedlings. This information is very useful as it gives an approximate number of T-DNA loci of the specified line after statistical evaluation according to Mendel's laws. In addition, distorted segregation patterns often indicate that the line contains a mutation in a gene important in pollen or female gametophyte development (16,17). We have stored this information for lines for which confirmation has been attempted in the database, and in total there are 6000 lines for which segregation data are available (Table 1). The database also contains information on ∼20% failed confirmation attempts (search for the FST with GenBank accession No. ‘CR405437’ to see an example). The products resulting from a successful allele-specific PCR were sequenced from both directions, so usually two confirmation sequences were obtained for a confirmed insertion. Currently, there are 8840 confirmation sequences in the database, and they were stored together with the primer information so that the allele-specific PCR can be reproduced. Since July 2005, T3 seeds (seeds produced by T2 plants) of confirmed GABI-Kat lines are transferred to the Nottingham Arabidopsis Stock Centre (NASC). This allows direct user access to potential homozygous mutant materials, and the SimpleSearch database provides the molecular and genetic background information for these lines.

WEB INTERFACE

In addition to the search for insertion alleles by the AGI gene (or locus) code or keyword and the sequence-based search by BLAST, we recently introduced the feature of searching by line ID or GenBank accession number. This gives a more thorough access to the data in database. Searching by line ID or GenBank accession number will directly lead to the FST data display page (Figure 1). On top of the FST page is information for the line from which the FST was generated. These include the vector used to transform the line, with a link to the GenBank entry of the vector at NCBI, information about line availability and when available segregation data. The ‘line availability’ field tells if a given line is dead (e.g. because no viable seeds were produced by the T1 plant), or if it is available from GABI-Kat or NASC. When a line is available from NASC, the NASC code is given and linked to the NASC seeds stock detail page of the line, so that the user can order it easily. The segregation data are presented as three numbers: total number of seeds evaluated, number of seeds germinated and number of resistant seedlings. Since more than one FSTs were produced for many T1 plants in the population, the line information page includes data for one or more FST-derived (predicted) insertion sites or loci. For each displayed insertion, the FASTA format sequence of the original FST, the respective GenBank accession number (which is linked to the GenBank entry at NCBI), a link to the graphic locus view (Figure 2), the information about the location of the T-DNA border/plant DNA junction if detected, the BAC clone code and the confirmation status are shown. If the insertion site qualifies as a gene hit, links to the respective TIGR, TAIR (18), MIPS MAtDB (19) and SIGnAL (10) gene pages are provided. For confirmed lines available in NASC, a link is provided to a page showing all the allele-specific confirmation sequences derived from the respective insertion site. Unlike the original FST sequences, which are of varied quality, the confirmation sequences are generally of high quality and length. The sequences representing the T-DNA were not trimmed from the confirmation sequences. Therefore, the user has access to the exact T-DNA border/plant genome junction structures of the insertion sites.
Figure 1

Screenshot of the FST page that results from a search for GenBank accession number AL936383. The data are from line 048E04. The meaning of the three numbers describing the result of the genetic segregation data is found by following the link on the numbers. The upper part shows the line-specific information followed by the FST DNA sequence data in the lower part of the page. For the chosen line, data on additional insertions other than the one displayed are available. Therefore, a link to information about all the insertions from the line is given at the bottom.

Figure 2

Screenshot of the graphical view page of gene At4g23270. All annotated genes and FSTs in the genome region centered on the locus At4g23270 are presented in an image map. Mouse-over text on the FSTs and the genes tells the line ID and the gene annotation text, respectively. Clicking on the FST icon leads to the respective FST page.

Screenshot of the FST page that results from a search for GenBank accession number AL936383. The data are from line 048E04. The meaning of the three numbers describing the result of the genetic segregation data is found by following the link on the numbers. The upper part shows the line-specific information followed by the FST DNA sequence data in the lower part of the page. For the chosen line, data on additional insertions other than the one displayed are available. Therefore, a link to information about all the insertions from the line is given at the bottom. Screenshot of the graphical view page of gene At4g23270. All annotated genes and FSTs in the genome region centered on the locus At4g23270 are presented in an image map. Mouse-over text on the FSTs and the genes tells the line ID and the gene annotation text, respectively. Clicking on the FST icon leads to the respective FST page. For all the different kind of searches, results are organized around the FST page. When coming to the FST page from searching GenBank accession number or from the BLAST search result page, only the corresponding FST is displayed. If there are more than one insertion from the same line, a link to showing all the insertions of the line is given. When coming to the FST page from searching line ID, all the insertions of the line are shown and for each insertion the best FST is displayed.

AVAILABILITY

The GABI-Kat SimpleSearch database is freely available at and can be queried through the web interface described above. The FST data were submitted to EMBL/GenBank/DDBJ as genomic survey sequence, and can also be downloaded in a flat file format from our web site. Finally, users can find our FST data at external A.thaliana web sites such as MIPS MAtDB (19), FLAGdb++(20) and SIGnAL (10). However, it is worth noting that minor differences do exist in the FST annotation on these sites when compared with SimpleSearch as they do not necessarily use an identical annotation procedure. Also, the external databases that rely on the FST data in GenBank may indicate the existence of an insertion in a given gene, but the respective line is dead which in turn means that the insertion allele described by the FST does not exist any more.

DISCUSSION

FST-based insertion mutant databases exist for various species for which insertional mutagenesis could be carried out in large scale. For A.thaliana, most of the insertion mutant databases were initially developed for holding the data generated in their own projects. While they are quite similar in the FST annotation and query interface, each database has its own unique features. Like GABI-Kat SimpleSearch, FLAGdb initially was a project database of mapped FSTs from the Versailles T-DNA population (8). Currently, the FST data of FLAGdb has been integrated into FLAGdb++ (), a database with JAVA front-end integrating different high-throughput functional genomics data for A.thaliana (20). ATIDB (Arabidopsis thaliana insertion database; ) is a federated database for archiving FSTs of transposon or T-DNA insertions for A.thaliana from different sources including SIGnAL, SAIL, GABI-Kat, FLAGdb and the John Innes Centre (21). Besides the various search options, ATIDB provides analysis tools to study the insertion distribution at a genome scale. The T-DNA Express at SIGnAL () is currently the most comprehensive database of mapped FSTs for A.thaliana (10). It integrates not only FSTs from sources collected by ATIDB mentioned above, but also include FST data from the Wisconsin T-DNA population (7) and the RIKEN transposon population (22). So far, >360 000 insertion sites have been stored in T-DNA Express, and cover ∼90% of all A.thaliana genes. The search interface at T-DNA Express provide different ways to access the insertion data. A useful tool at this site is ‘iSect toolbox’. Among the many functions this tool offers, a very important one is to design genomic primers to confirm the T-DNA insertions. Although the GABI-Kat SimpleSearch database concentrates on data generated from GABI-Kat, it allows searches and result presentations comparable to these large and complex databases. What distinguishes SimpleSearch from other databases is that it includes detailed information on single GABI-Kat lines like segregation data, line availability information, insertion site-specific primers and confirmation sequences for >3500 confirmed insertion sites. These data show the exact T-DNA border/plant genome junction structures. This detailed information not only helps users to evaluate the usefulness of the mutant lines, but also covers a big part of the molecular characterization of the insertions. The transfer of confirmed lines to NASC is an ongoing process at GABI-Kat. In addition to confirming lines that are requested by users, we have started to confirm a number of key GABI-Kat lines. These key lines were identified by analysing the results from the SIGnAL, SAIL, Wisconsin and GABI-Kat FST datasets for unique T-DNA insertion sites in the A.thaliana accession Columbia at the level of insertion alleles predicted to cause a knock-out of the respective gene. The seeds of these confirmed lines will also become available from NASC and this will contribute significantly to the goal of finding at least one mutant for every A.thaliana gene. Results from the genetic and molecular insertion confirmation pipeline will constantly be added to SimpleSearch. A few further developments to further strengthen the SimpleSearch database were planned for the future. We are tracking published papers which made use of GABI-Kat mutant lines, and these references are currently listed on our web site on a HTML page. In most of these publications, it was clearly mentioned which GABI-Kat line was used. We plan to store the reference to these publications in the database and link them to the respective lines for display on the FST page. While GABI-Kat SimpleSearch will continue to function as a project database, we are also exploring the possibility for better interoperability with other A.thaliana database by ways such as a BioMOBY service (23).
  23 in total

1.  An Arabidopsis thaliana T-DNA mutagenized population (GABI-Kat) for flanking sequence tag-based reverse genetics.

Authors:  Mario G Rosso; Yong Li; Nicolai Strizhov; Bernd Reiss; Koen Dekker; Bernd Weisshaar
Journal:  Plant Mol Biol       Date:  2003-09       Impact factor: 4.076

Review 2.  Control of male gametophyte development.

Authors:  Sheila McCormick
Journal:  Plant Cell       Date:  2004-03-22       Impact factor: 11.277

Review 3.  Female gametophyte development.

Authors:  Ramin Yadegari; Gary N Drews
Journal:  Plant Cell       Date:  2004-04-09       Impact factor: 11.277

4.  A collection of 11 800 single-copy Ds transposon insertion lines in Arabidopsis.

Authors:  Takashi Kuromori; Takashi Hirayama; Yuki Kiyosue; Hiroko Takabe; Saho Mizukado; Tetsuya Sakurai; Kenji Akiyama; Asako Kamiya; Takuya Ito; Kazuo Shinozaki
Journal:  Plant J       Date:  2004-03       Impact factor: 6.417

Review 5.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

Review 6.  T-DNA insertion mutagenesis in Arabidopsis: going back and forth.

Authors:  R Azpiroz-Leehan; K A Feldmann
Journal:  Trends Genet       Date:  1997-04       Impact factor: 11.639

7.  High-throughput generation of sequence indexes from T-DNA mutagenized Arabidopsis thaliana lines.

Authors:  Nicolai Strizhov; Yong Li; Mario G Rosso; Prisca Viehoever; Koen A Dekker; Bernd Weisshaar
Journal:  Biotechniques       Date:  2003-12       Impact factor: 1.993

8.  MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics.

Authors:  Heiko Schoof; Rebecca Ernst; Vladimir Nazarov; Lukas Pfeifer; Hans-Werner Mewes; Klaus F X Mayer
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

9.  Genome-wide insertional mutagenesis of Arabidopsis thaliana.

Authors:  José M Alonso; Anna N Stepanova; Thomas J Leisse; Christopher J Kim; Huaming Chen; Paul Shinn; Denise K Stevenson; Justin Zimmerman; Pascual Barajas; Rosa Cheuk; Carmelita Gadrinab; Collen Heller; Albert Jeske; Eric Koesema; Cristina C Meyers; Holly Parker; Lance Prednis; Yasser Ansari; Nathan Choy; Hashim Deen; Michael Geralt; Nisha Hazari; Emily Hom; Meagan Karnes; Celene Mulholland; Ral Ndubaku; Ian Schmidt; Plinio Guzman; Laura Aguilar-Henonin; Markus Schmid; Detlef Weigel; David E Carter; Trudy Marchand; Eddy Risseeuw; Debra Brogden; Albana Zeko; William L Crosby; Charles C Berry; Joseph R Ecker
Journal:  Science       Date:  2003-08-01       Impact factor: 47.728

10.  GABI-Kat SimpleSearch: a flanking sequence tag (FST) database for the identification of T-DNA insertion mutants in Arabidopsis thaliana.

Authors:  Yong Li; Mario G Rosso; Nicolai Strizhov; Prisca Viehoever; Bernd Weisshaar
Journal:  Bioinformatics       Date:  2003-07-22       Impact factor: 6.937

View more
  27 in total

1.  Web-based Arabidopsis functional and structural genomics resources.

Authors:  Yan Lu; Robert L Last
Journal:  Arabidopsis Book       Date:  2008-10-28

2.  Mediation of clathrin-dependent trafficking during cytokinesis and cell expansion by Arabidopsis stomatal cytokinesis defective proteins.

Authors:  Colleen M McMichael; Gregory D Reynolds; Lisa M Koch; Chao Wang; Nan Jiang; Jeanette Nadeau; Fred D Sack; Max B Gelderman; Jianwei Pan; Sebastian Y Bednarek
Journal:  Plant Cell       Date:  2013-10-31       Impact factor: 11.277

3.  Multi-omics Analysis Reveals Sequential Roles for ABA during Seed Maturation.

Authors:  Frédéric Chauffour; Marlène Bailly; François Perreau; Gwendal Cueff; Hiromi Suzuki; Boris Collet; Anne Frey; Gilles Clément; Ludivine Soubigou-Taconnat; Thierry Balliau; Anja Krieger-Liszkay; Loïc Rajjou; Annie Marion-Poll
Journal:  Plant Physiol       Date:  2019-04-04       Impact factor: 8.340

4.  RNA-Seq analysis of developing nasturtium seeds (Tropaeolum majus): identification and characterization of an additional galactosyltransferase involved in xyloglucan biosynthesis.

Authors:  Jacob K Jensen; Alex Schultink; Kenneth Keegstra; Curtis G Wilkerson; Markus Pauly
Journal:  Mol Plant       Date:  2012-04-02       Impact factor: 13.164

5.  A protein complex regulates RNA processing of intronic heterochromatin-containing genes in Arabidopsis.

Authors:  Cheng-Guo Duan; Xingang Wang; Lingrui Zhang; Xiansong Xiong; Zhengjing Zhang; Kai Tang; Li Pan; Chuan-Chih Hsu; Huawei Xu; W Andy Tao; Heng Zhang; Jian-Kang Zhu
Journal:  Proc Natl Acad Sci U S A       Date:  2017-08-14       Impact factor: 11.205

6.  High-throughput generation of an activation-tagged mutant library for functional genomic analyses in tobacco.

Authors:  Feng Liu; Daping Gong; Qian Zhang; Dawei Wang; Mengmeng Cui; Zhiguo Zhang; Guanshan Liu; Jinxia Wu; Yuanying Wang
Journal:  Planta       Date:  2014-11-19       Impact factor: 4.116

7.  Control of cell proliferation, organ growth, and DNA damage response operate independently of dephosphorylation of the Arabidopsis Cdk1 homolog CDKA;1.

Authors:  Nico Dissmeyer; Annika K Weimer; Stefan Pusch; Kristof De Schutter; Claire Lessa Alvim Kamei; Moritz K Nowack; Bela Novak; Gui-Lan Duan; Yong-Guan Zhu; Lieven De Veylder; Arp Schnittger
Journal:  Plant Cell       Date:  2009-11-30       Impact factor: 11.277

8.  Functional Redundancy and Divergence within the Arabidopsis RETICULATA-RELATED Gene Family.

Authors:  José Manuel Pérez-Pérez; David Esteve-Bruna; Rebeca González-Bayón; Saijaliisa Kangasjärvi; Camila Caldana; Matthew A Hannah; Lothar Willmitzer; María Rosa Ponce; José Luis Micol
Journal:  Plant Physiol       Date:  2013-04-17       Impact factor: 8.340

9.  GabiPD: the GABI primary database--a plant integrative 'omics' database.

Authors:  Diego Mauricio Riaño-Pachón; Axel Nagel; Jost Neigenfind; Robert Wagner; Rico Basekow; Elke Weber; Bernd Mueller-Roeber; Svenja Diehl; Birgit Kersten
Journal:  Nucleic Acids Res       Date:  2008-09-23       Impact factor: 16.971

10.  Arabidopsis Hormone Database: a comprehensive genetic and phenotypic information database for plant hormone research in Arabidopsis.

Authors:  Zhi-yu Peng; Xin Zhou; Linchuan Li; Xiangchun Yu; Hongjiang Li; Zhiqiang Jiang; Guangyu Cao; Mingyi Bai; Xingchun Wang; Caifu Jiang; Haibin Lu; Xianhui Hou; Lijia Qu; Zhiyong Wang; Jianru Zuo; Xiangdong Fu; Zhen Su; Songgang Li; Hongwei Guo
Journal:  Nucleic Acids Res       Date:  2008-11-10       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.