Literature DB >> 17999994

Update of ASRP: the Arabidopsis Small RNA Project database.

Tyler W H Backman1, Christopher M Sullivan, Jason S Cumbie, Zachary A Miller, Elisabeth J Chapman, Noah Fahlgren, Scott A Givan, James C Carrington, Kristin D Kasschau.   

Abstract

Development of the Arabidopsis Small RNA Project (ASRP) Database, which provides information and tools for the analysis of microRNA, endogenous siRNA and other small RNA-related features, has been driven by the introduction of high-throughput sequencing technology. To accommodate the demands of increased data, numerous improvements and updates have been made to ASRP, including new ways to access data, more efficient algorithms for handling data, and increased integration with community-wide resources. New search and visualization tools have also been developed to improve access to small RNA classes and their targets. ASRP is publicly available through a web interface at http://asrp.cgrb.oregonstate.edu/db/.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17999994      PMCID: PMC2238918          DOI: 10.1093/nar/gkm997

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

High-throughput sequencing has enabled the discovery of hundreds of thousands of unique small RNA sequences from Arabidopsis and other plants. These small RNAs include both conserved and non-conserved microRNAs (miRNAs), which arise from self-complementary foldback structures, and several classes of siRNA that derive from long inverted duplications, bidirectional transcription or the activity of RNA-dependent RNA polymerases (1,2). The Arabidopsis Small RNA Project (ASRP) database was developed to provide a public resource for genome-wide small RNA data from the model plant Arabidopsis thaliana (3). Here, we describe the recent updates to the ASRP database and new software algorithms for efficient searching, accessing and displaying of data on small RNAs and related features of the Arabidopsis genome. The ASRP database can be accessed via a website, downloaded as data files, or accessed via the BioMOBY web service (4).

DATABASE CONTENT AND DESIGN

The ASRP database contains data from several sources, and continues to grow as new small RNA libraries are sequenced. As of mid-2007, the database contained 218 585 unique small RNA sequences (663 312 reads) from wild-type and mutant plants generated by picoliter-scale pyrosequencing from the authors’ group (3,5,6). Small RNA sequences are cataloged according to 30 unique small RNA libraries from various developmental stages of Arabidopsis (Col-0 ecotype), including inflorescence, seedling and leaf tissue, and from mutants with a range of defects in small RNA silencing pathways. Small RNA has been analyzed from mutants with defects in all four Dicer-like genes (dcl1-7, dcl2-1, dcl3-1 and dcl4-2 alleles), and three known functional RNA-dependent RNA-polymerase RDR genes (rdr1-1, rdr2-1 and rdr6-15 alleles) (7,8). Small RNAs and associated features from wild-type and mutant libraries can be viewed graphically from the ASRP website using the Generic Model Organism Database Project genome browser (9). A user can select genome coordinates, small RNA names or other locus-defining codes to view small RNA locus positions that are color-coded by size. The viewer also displays transcripts, miRNA precursors, repeat elements and other annotation features. Information pages about small RNAs and other features can be accessed by mouse-over clicks. In addition to small RNA sequences derived in-house, additional small RNA data from other groups, including those of the David Bartel group (10,11), are viewable in distinct tracks. Data from other groups are currently not included in ASRP database searches, but will be added in the future. Also, genome-wide DNA methylation data from the Steven Henikoff group (12), chromosome data from NCBI, transcript data from TAIR and repeat element data from Repbase at GIRI can be displayed (13–15). Clicking the image for a gene locus in the browser directs the user to an information page containing notes, exon/intron coordinates and small RNAs that derive from the gene. Importantly, the relative transcript accumulation profiles, based on microarray data from a series of RNA silencing defective mutants (5,16), are shown graphically. The hardware and software used to maintain and update the database has been designed to accommodate large quantities of data from millions of small RNA sequences. The addition of new ASRP data is expedited with the use of software automation, and efficient database population algorithms written specifically for the ASRP database. For example, new small RNA library results are processed through a data-mining pipeline developed with hashes to parse data from raw reads, and then through an XOR string-matching algorithm to find sequences with identity to the Arabidopsis genome. Custom algorithms have been developed to minimize iterative interactions with the MySQL (http://www.mysql.com) database during the introduction of new data. These algorithms employ large hash table data structures residing in computer RAM to rapidly identify new sequences and to update existing sequences. Prior to updating the database tables, the large hash tables are used to create MySQL dump files. The dump files are used to update the database, which is much more efficient than iteratively inserting each row. All software have been developed in Perl (http://www.perl.com) and C++ and are available upon request. Frequently requested data and downloads are cached or pre-rendered on the server, enabling the site to serve data simultaneously to multiple users while automatically updating the caches when the data are revised. The ASRP database utilizes a multistep system of checks and balances to ensure data quality. Known datasets ranging in size from 10 to 1 million data points are generated with built-in false discovery and false positive data. The known datasets can be used to debug and fix changes to the parsing algorithms, and to test timing and throughput of systems and data-mining operations. Known datasets are reviewed regularly, and when changes are made, all code are re-tested and new base lines are established. Data-handling algorithms are also analyzed for logical flaws, and then audited by multiple individuals after being converted to code.

DATABASE ACCESS AND WEB INTERFACE

There are three primary means for outside researchers to access ASRP data. These include the ASRP homepage, direct data file downloads and the BioMoby web service. The ASRP database homepage itself provides multiple ways to access information including a variety of search tools, the genome browser and hyperlinked lists of small RNA families and classes (Figure 1).
Figure 1.

Sitemap of functions available using the ASRP website.

Sitemap of functions available using the ASRP website. The primary search tool allows users to find sequences, genes and small RNA classes by searching with genome coordinates, keywords (such as ‘miR171’) or sequences. Multiple queries of the same or different category can be done simultaneously and results are displayed quickly using AJAX technology (http://www.adaptivepath.com/ideas/essays/archives/000385.php). Results are viewed one category at a time. Hyperlinks allow users to view additional information or to download search results as a .csv spreadsheet file. A quick search tool is provided on the navigation toolbar of each page. This tool facilitates rapid access to data for a particular miRNA, tasiRNA, database entry, sequence or gene locus by entering a locus name, label or sequence. A third search tool is provided on the ASRP homepage through a visual map of the five Arabidopsis chromosomes. Mouse-over clicks of any region on the chromosome diagrams forwards the user to the corresponding region in the ASRP genome browser. Detailed miRNA data are provided through hyperlinked lists sorted by miRNA family and target gene family. The main list displays numbers of locus-specific and miRNA family-specific reads from different small RNA libraries. A page for each miRNA family shows sequence variants associated with the family, reads associated with each sequence, miRNA gene transcript data, miRNA target genes and foldback structures. Reads are displayed both in raw counts/library and in library size-normalized form to allow for comparison between small RNA libraries. A hyperlink is provided to link each miRNA to miRBase (17). Data on individual tasiRNAs and tasiRNA families are presented in the same format as for miRNAs. All data in the ASRP database are provided both as downloadable files accessible from the homepage and as a large MySQL dump file that contains the entire database. Downloads of small RNA data are provided for specific small RNA libraries, for miRNAs and tasiRNAs. Each dataset is provided in three separate formats where appropriate. Genome coordinates are provided in a general feature format (GFF) file, sequences are provided in a FASTA format and both sequences and coordinates are provided as a comma separated spreadsheet (.csv) that can be accessed with a wide array of spreadsheet applications. The ASRP database can also be accessed via the BioMOBY (http://biomoby.org) web service using a BioMOBY client, such as Taverna workbench (http://taverna.sourceforge.net)(4). The BioMOBY web service allows other researchers to integrate data from the ASRP database into their databases and programs automatically without manually downloading and converting files. One of the main advantages to this approach is that updated data can quickly and automatically propagate to users without manually downloading and re-processing files for each update.

FUTURE DIRECTIONS

Recent sequencing technology, such as sequencing by synthesis (18) will yield millions of small RNA reads in single runs, soon resulting in orders of magnitude increases in the ASRP database content. New algorithms and data-displaying techniques are being integrated into the ASRP database to accommodate the increased quantity and resolution of data from our group as well as other groups. As other databases begin to offer their data via web services, such as BioMOBY, the ASRP database will be modified to integrate more closely with external resources. We envision that the small RNA component of genetic and epigenetic regulation in Arabidopsis will become much more apparent and better understood as more systems data are integrated.
  18 in total

1.  The generic genome browser: a building block for a model organism system database.

Authors:  Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

Review 2.  Revealing the world of RNA interference.

Authors:  Craig C Mello; Darryl Conte
Journal:  Nature       Date:  2004-09-16       Impact factor: 49.962

Review 3.  RNA silencing in plants.

Authors:  David Baulcombe
Journal:  Nature       Date:  2004-09-16       Impact factor: 49.962

4.  microRNA-directed phasing during trans-acting siRNA biogenesis in plants.

Authors:  Edwards Allen; Zhixin Xie; Adam M Gustafson; James C Carrington
Journal:  Cell       Date:  2005-04-22       Impact factor: 41.582

5.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

Review 6.  Repbase Update, a database of eukaryotic repetitive elements.

Authors:  J Jurka; V V Kapitonov; A Pavlicek; P Klonowski; O Kohany; J Walichiewicz
Journal:  Cytogenet Genome Res       Date:  2005       Impact factor: 1.636

7.  DICER-LIKE 4 functions in trans-acting small interfering RNA biogenesis and vegetative phase change in Arabidopsis thaliana.

Authors:  Zhixin Xie; Edwards Allen; April Wilken; James C Carrington
Journal:  Proc Natl Acad Sci U S A       Date:  2005-08-29       Impact factor: 11.205

8.  Genome-wide profiling and analysis of Arabidopsis siRNAs.

Authors:  Kristin D Kasschau; Noah Fahlgren; Elisabeth J Chapman; Christopher M Sullivan; Jason S Cumbie; Scott A Givan; James C Carrington
Journal:  PLoS Biol       Date:  2007-03       Impact factor: 8.029

9.  ASRP: the Arabidopsis Small RNA Project Database.

Authors:  Adam M Gustafson; Edwards Allen; Scott Givan; Daniel Smith; James C Carrington; Kristin D Kasschau
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  Genetic and functional diversification of small RNA pathways in plants.

Authors:  Zhixin Xie; Lisa K Johansen; Adam M Gustafson; Kristin D Kasschau; Andrew D Lellis; Daniel Zilberman; Steven E Jacobsen; James C Carrington
Journal:  PLoS Biol       Date:  2004-02-24       Impact factor: 8.029

View more
  43 in total

1.  Known and novel post-transcriptional regulatory sequences are conserved across plant families.

Authors:  Justin N Vaughn; Sally R Ellingson; Flavio Mignone; Albrecht von Arnim
Journal:  RNA       Date:  2012-01-11       Impact factor: 4.942

2.  High-resolution experimental and computational profiling of tissue-specific known and novel miRNAs in Arabidopsis.

Authors:  Natalie W Breakfield; David L Corcoran; Jalean J Petricka; Jeffrey Shen; Juthamas Sae-Seaw; Ignacio Rubio-Somoza; Detlef Weigel; Uwe Ohler; Philip N Benfey
Journal:  Genome Res       Date:  2011-09-22       Impact factor: 9.043

3.  Global effects of the small RNA biogenesis machinery on the Arabidopsis thaliana transcriptome.

Authors:  Sascha Laubinger; Georg Zeller; Stefan R Henz; Sabine Buechel; Timo Sachsenberg; Jia-Wei Wang; Gunnar Rätsch; Detlef Weigel
Journal:  Proc Natl Acad Sci U S A       Date:  2010-09-24       Impact factor: 11.205

Review 4.  Functional analysis of transcription factors in Arabidopsis.

Authors:  Nobutaka Mitsuda; Masaru Ohme-Takagi
Journal:  Plant Cell Physiol       Date:  2009-05-28       Impact factor: 4.927

5.  Proper regulation of a sperm-specific cis-nat-siRNA is essential for double fertilization in Arabidopsis.

Authors:  Mily Ron; Monica Alandete Saez; Leor Eshed Williams; Jennifer C Fletcher; Sheila McCormick
Journal:  Genes Dev       Date:  2010-05-15       Impact factor: 11.361

6.  Complex regulation of the TIR1/AFB family of auxin receptors.

Authors:  G Parry; L I Calderon-Villalobos; M Prigge; B Peret; S Dharmasiri; H Itoh; E Lechner; W M Gray; M Bennett; M Estelle
Journal:  Proc Natl Acad Sci U S A       Date:  2009-12-16       Impact factor: 11.205

7.  Clusters and superclusters of phased small RNAs in the developing inflorescence of rice.

Authors:  Cameron Johnson; Anna Kasprzewska; Kristin Tennessen; John Fernandes; Guo-Ling Nan; Virginia Walbot; Venkatesan Sundaresan; Vicki Vance; Lewis H Bowman
Journal:  Genome Res       Date:  2009-07-07       Impact factor: 9.043

8.  Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes.

Authors:  Casey R Richardson; Qing-Jun Luo; Viktoria Gontcharova; Ying-Wen Jiang; Manoj Samanta; Eunseog Youn; Christopher D Rock
Journal:  PLoS One       Date:  2010-05-26       Impact factor: 3.240

9.  deepBase: a database for deeply annotating and mining deep sequencing data.

Authors:  Jian-Hua Yang; Peng Shao; Hui Zhou; Yue-Qin Chen; Liang-Hu Qu
Journal:  Nucleic Acids Res       Date:  2009-12-04       Impact factor: 16.971

10.  Plant polycistronic precursors containing non-homologous microRNAs target transcripts encoding functionally related proteins.

Authors:  Francisco Merchan; Adnane Boualem; Martin Crespi; Florian Frugier
Journal:  Genome Biol       Date:  2009-12-01       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.