| Literature DB >> 22080561 |
Nils Kleinboelting1, Gunnar Huep, Andreas Kloetgen, Prisca Viehoever, Bernd Weisshaar.
Abstract
T-DNA insertion mutants are very valuable for reverse genetics in Arabidopsis thaliana. Several projects have generated large sequence-indexed collections of T-DNA insertion lines, of which GABI-Kat is the second largest resource worldwide. User access to the collection and its Flanking Sequence Tags (FSTs) is provided by the front end SimpleSearch (http://www.GABI-Kat.de). Several significant improvements have been implemented recently. The database now relies on the TAIRv10 genome sequence and annotation dataset. All FSTs have been newly mapped using an optimized procedure that leads to improved accuracy of insertion site predictions. A fraction of the collection with weak FST yield was re-analysed by generating new FSTs. Along with newly found predictions for older sequences about 20,000 new FSTs were included in the database. Information about groups of FSTs pointing to the same insertion site that is found in several lines but is real only in a single line are included, and many problematic FST-to-line links have been corrected using new wet-lab data. SimpleSearch currently contains data from ~71,000 lines with predicted insertions covering 62.5% of the 27,206 nuclear protein coding genes, and offers insertion allele-specific data from 9545 confirmed lines that are available from the Nottingham Arabidopsis Stock Centre.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22080561 PMCID: PMC3245140 DOI: 10.1093/nar/gkr1047
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of the improved insertion site prediction. The insertion position is determined using the best BLAST hit of the FST sequence vs. the A. thaliana genome sequence, and the location of the T-DNA within the FST sequence determined by pregap4. If no T-DNA is detected at the start of the FST sequence, the insertion site is located x bases upstream of the BLAST hit, where x is the number of bases before the start of the BLAST hit minus the distance of the T-DNA specific primer to the T-DNA border. Otherwise, the start of the BLAST hit is considered as insertion position.
Figure 2.Definition of gene hits at GABI-Kat. (a) For protein-coding genes with annotated UTR-regions in TAIRv10, we differentiate between CDSi hits (insertion position between ATG and STOP), 5′- and 3′-TS2TE hits (insertion position in the 5′- or 3′-UTR) and promoter hit [insertion position up to 300-bp upstream of transcription start (TS)]. (b) If the UTR is not annotated in TAIRv10 (and for pseudogenes), insertion positions 300-bp up or downstream of ATG and STOP are considered as 5′- and 3′-hits. (c) For RNA genes and transposable elements, TS2TE hits are annotated, if the insertion is located between TS and transcript end.
Summary of data in the GABI-Kat SimpleSearch database
| Data type | Number of entries |
|---|---|
| FSTs | ∼133 000 |
| Lines | 71 235 |
| with segregation data | 15 289 |
| available at NASC | 9644 |
| Insertions with predicted insertion position | 88 580 |
| analysed with final result | 16 081 |
| delivered to individual users | 6816 |
| confirmed and available at NASC | 9653 |
| Distinct genes covered | 21 005 |
| protein coding genes | 19 120 |
| ncRNA coding genes | 182 |
| pseudogenes | 420 |
| transposable element genes | 1283 |
| Gene hits available only from GABI-Kat | 2114 |
| Confirmed ‘GABI-Kat only’ hits at NASC | 1201 |
| ‘GABI-Kat only’ hits to be adressed | 765 |
| Distinct CDSi covered | 13 037 |
aNumbers as of 15 September 2011.
bDatabase release version 24 (affects FSTs and lines that are in the database, the data values for the items in the database are updated every 24 h).
cInsertions are different from lines, because a line can contain several insertions. Example: 011F01, which is confirmed for a genome hit at F26P21 (Chr4) and a TS2TE hit in At5g05180.
dA final result can be ‘confirmed’, but also ‘failed to confirm’ or ‘part of a contamination group’ are considered.
eFor each confirmed insertion there are confirmation sequences available which are generated from the amplicon that spans the T-DNA/genome sequence junction.
fOnly hits that may cause a NULL allele (CDSi hits and hits in the 5′-UTR) are counted. Only lines in the accession Columbia-0 are considered, which is the accession used by the main FST-based insertion line collections.
gAbout 150 ‘GABI-Kat only’ alleles are either in the queue already and wait for mature T3 seed, or did fail to confirm.
Figure 3.Resolution of contamination groups. A contamination group contains predicted insertions in different lines that share very similar insertion positions (within 50 bp at most). After the confirmation process, only one line is confirmed, the others failed and are considered as contaminations. When searching for insertion alleles in SimpleSearch, the user is guided to confirmed allele if the contamination group is solved.