| Literature DB >> 28013277 |
Nils Kleinboelting1, Gunnar Huep1, Bernd Weisshaar1.
Abstract
SimpleSearch provides access to a database containing information about T-DNA insertion lines of the GABI-Kat collection of Arabidopsis thaliana mutants. These mutants are an important tool for reverse genetics, and GABI-Kat is the second largest collection of such T-DNA insertion mutants. Insertion sites were deduced from flanking sequence tags (FSTs), and the database contains information about mutant plant lines as well as insertion alleles. Here, we describe improvements within the interface (available at http://www.gabi-kat.de/db/genehits.php) and with regard to the database content that have been realized in the last five years. These improvements include the integration of the Araport11 genome sequence annotation data containing the recently updated A. thaliana structural gene descriptions, an updated visualization component that displays groups of insertions with very similar insertion positions, mapped confirmation sequences, and primers. The visualization component provides a quick way to identify insertions of interest, and access to improved data about the exact structure of confirmed insertion alleles. In addition, the database content has been extended by incorporating additional insertion alleles that were detected during the confirmation process, as well as by adding new FSTs that have been produced during continued efforts to complement gaps in FST availability. Finally, the current database content regarding predicted and confirmed insertion alleles as well as primer sequences has been made available as downloadable flat files.Entities:
Keywords: Arabidopsis thaliana; T-DNA integration; genomics; insertional mutagenesis; knockout mutants; reverse genetics; systems biology and evolution
Mesh:
Substances:
Year: 2017 PMID: 28013277 PMCID: PMC5444572 DOI: 10.1093/pcp/pcw205
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Comparison of insertion allele annotation based on TAIR10 to that based on Araport11
| TAIR10 (2016‐08‐14) | Araport11 (2016‐08‐15) | |
|---|---|---|
| Number of T1 plants selected and planted | 92,657 | 92,657 |
| Number of lines with intact T1 DNA and T2 seeds | 89,705 | 89,705 |
| Total number of lines with genome hits | 77,034 | 77,034 |
| Number of lines with gene hits | 52,206 | 56,114 |
| Lines with gene hits, protein coding genes only | 49,062 | 51,620 |
| Lines with hits in RNA-encoding genes | 1,147 | 4,742 |
| Lines with gene hits, pseudogenes only | 1,006 | 1,062 |
| Number of lines with hits in transposable elements | 6,681 | 6,646 |
| Total number of lines with CDSi hits | 47,360 | 47,515 |
| Number of genes with at least one hit | 22,337 | 23,582 |
| Number of protein coding genes with at least one hit | 20,235 | 20,697 |
| Number of CDSi with at least one hit | 14,137 | 14,235 |
a Numbers from immediately before and after update to Araport11.
b Since the data content has not been changed, these numbers are the same.
c Pseudogenes, protein coding and RNA-encoding genes counted.
d These numbers do not add up to the ‘Number of lines with gene hits’ value because a single line can contain hits of different types.
Summary of data added to the GABI-Kat SimpleSearch database since 2011
| Data type | Number of entries | Number of entries (2016‐08‐15) |
|---|---|---|
| GK FSTs | ∼133,000 | 143,601 |
| Lines | 71,235 | 77,034 |
| with segregation data | 15,289 | 20,037 |
| available at NASC | 9,644 | 13,967 |
| Insertion alleles (predicted genome hits) | 88,580 | 95,233 |
| analyzed with final result | 16,081 | 26,319 |
| delivered to individual users | 6,816 | 7,819 |
| confirmed and available at NASC | 9,653 | 14,280 |
| Distinct genes covered | 21,005 | 24,789 |
| protein coding genes | 19,120 | 20,697 |
| RNA-encoding genes | 182 | 988 |
| pseudogenes | 420 | 481 |
| transposable element genes | 1,283 | 1,416 |
| Distinct CDSi covered | 13,037 | 14,235 |
a Numbers as of September 15, 2011, taken from (Kleinboelting et al. 2012).
b Database release version 24.
c Database release version 28 from August 15, 2016.
d Insertion alleles are different from lines, because a line can contain several insertions. An insertion is expected to be different from another one in the same line if the distance between the two predicted insertion positions is at least 20 kbp (Kleinboelting et al. 2015). The gain of 6,653 predicted insertion alleles (from 88,580 (September 15, 2011) to 95,233 (August 15, 2016)) is in part due to data from the Ecker group (O’Malley et al. in preparation). Selected GK-lines were analyzed by TDNA-Seq using Illumina technology (NCBI accession numbers KG779961 to KG787552), and the resulting predictions have been included in SimpleSearch. In addition, 119 cases are derived from ‘composite FSTs’ as described (Huep et al. 2014).
e A final result can be ‘confirmed’, but also ‘failed to confirm’ or ‘part of a contamination group’; see (Kleinboelting et al. 2012).
f For each confirmed insertion there are confirmation sequences available which are generated from the amplicon that spans the T-DNA/genome sequence junction. For about 1,400 insertions there are data from both (the ‘north’ and the ‘south’) junction of the inserted T-DNA sequences (Kleinboelting et al. 2015).
Fig. 1Summary of features of the updated visualization component. There are several ways to access the visualization component, usually via a click on one of the triangles somewhere in the SimpleSearch content. There are two views to the visualization: the ‘Line view’ (top) and the ‘Insertion view’ (bottom). The ‘Line view’ can be accessed when coming from inspecting the details of a specific line. In this case, only primers, confirmation sequences and insertion predictions (triangles) are displayed that correspond to the line selected (1). The user can navigate in the visualization by entering a position or an AGI locus code (2), or by scrolling through the genome using the arrowheads (3). Further information is provided by tooltips (4), and by links to pages within SimpleSearch (a click on the symbols for primers, FSTs, confirmation sequences or the triangles representing lines causes calling the respective information) or to Araport for genes. Since annotation units/BACs are not supported by Araport11, these are linked to TAIR. To access the ‘Insertion view,’ the respective button can be used (5). This view shows all insertions within the selected range without FSTs, confirmation sequences and primers that are related to a specific line/insertion allele. When there are multiple insertions close to each other, they are displayed in a larger triangle. These larger triangles might show multiple colors if there are confirmed and failed insertion alleles (originating from a contamination) at nearby positions (6); other color combinations in the larger triangles are also possible (7). When the visualization was initially started in the line view, it is possible to switch back (8). It is also possible to switch to the primer design tool (Huep et al. 2014) to generate primers for the selected position (9).
Fig. 2Correction of predicted insertion site using confirmation sequences. For the confirmation of a predicted insertion site, a position specific primer is generated (A) and the resulting amplicon is sequenced from both directions to generate confirmation sequences (B). These are evaluated using BLAST, and if the evaluation yields a positive result the best confirmation sequence is used to correct the predicted insertion position (C). If there is a significant distance between the original insertion position and the confirmed insertion position, we import the confirmation sequence as FST in addition, to make the insertion better accessible in external public databases (D).