| Literature DB >> 29069441 |
Ruijia Wang1, Ram Nambiar2, Dinghai Zheng1, Bin Tian1.
Abstract
PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3' region extraction and deep sequencing (3'READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3' ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data.Entities:
Mesh:
Year: 2018 PMID: 29069441 PMCID: PMC5753232 DOI: 10.1093/nar/gkx1000
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of PolyA_DB version 3.1
| Species | Human | Mouse | Rat | Chicken |
|---|---|---|---|---|
| No. of samples used | 24 | 59 | 11 | 9 |
| No. of PAS reads used | 59 090 907 | 153 989 213 | 23 616 600 | 29 104 491 |
| No. of PASs | 108 042 | 202 426 | 61 905 | 65 909 |
| No. of genic PASs | 85 275 | 121 163 | 36 941 | 45 116 |
| No. of genes listed | 20 998 | 21 588 | 14 529 | 12 292 |
| No. of genes with 3′ end extension | 8962 | 12 027 | 8302 | 8352 |
| Median 3′ end extension size (nt) | 758 | 469 | 617 | 1062 |
| No. of mRNA genes | 15 977 | 17 846 | 14 077 | 12 130 |
| No. of ncRNA genes | 5021 | 3742 | 452 | 162 |
Figure 1.Schematic of PAS identification and presentation in PolyA_DB version 3. The data flow is indicated by arrowed lines. See the main text for details.
Figure 2.An example of search result from PolyA_DB 3. (A) Gene view. Mouse gene Cstf3 is used as an example. The output includes a summary table of the gene as well as a link to UCSC genome browser. (B) PolyA SiteView. This table contains information of all individual PASs assigned to the queried gene and their links to UCSC genome browser.