Literature DB >> 31701148

JASPAR 2020: update of the open-access database of transcription factor binding profiles.

Oriol Fornes1, Jaime A Castro-Mondragon2, Aziz Khan2, Robin van der Lee1, Xi Zhang1, Phillip A Richmond1, Bhavi P Modi1, Solenne Correard1, Marius Gheorghe2, Damir Baranašić3,4, Walter Santana-Garcia5, Ge Tan6, Jeanne Chèneby7, Benoit Ballester7, François Parcy8, Albin Sandelin9, Boris Lenhard3,4,10, Wyeth W Wasserman1, Anthony Mathelier2,11.   

Abstract

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 31701148      PMCID: PMC7145627          DOI: 10.1093/nar/gkz1001

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Transcription factors (TFs) are proteins involved in the regulation of gene expression at the transcriptional level (1). They interact with DNA in a sequence-specific manner through their DNA-binding domains (DBDs), which are used to classify TFs into structural families (2). The genomic locations where TFs bind to DNA are known as TF binding sites (TFBSs), which are typically short (6–20 bp) and exhibit sequence variability (3). Genome-wide identification of TFBSs is key to understanding transcriptional regulation. As it is not possible to identify all TFBSs for every cell type and cellular condition experimentally, computational modeling of TF-binding specificities has been instrumental to predict TFBSs in the genome. These computational models aim at representing the complex interplay between nucleotide and/or DNA shape readout at TFBSs (4), and can be used to predict not only the precise location where TFs interact in the genome (5), but also TFs with enriched TFBSs in a set of sequences (6), or the impact of mutations on TF binding (7,8), amongst others. From the plethora of existing computational models (9), position frequency matrices (PFMs) (10) are one of the simplest and (still) most commonly used, although more complex models, for instance based on hidden Markov models or deep learning (11–13), are becoming more common. A PFM is a TF-binding profile that models the DNA-binding specificity of a TF by summarizing the frequencies of each nucleotide at each position from observed TF-DNA interactions. These interactions are usually derived from in vitro assays (e.g. SELEX (14) or protein binding microarrays (15)), which assess the binding affinity of TFs to DNA sequences, or from ChIP-based experiments (e.g. ChIP-seq (16), ChIP-exo (17), or ChIP-nexus (18)), which capture TF-DNA interactions in vivo, by looking for over-represented DNA sequences in regions bound by the ChIP’ed TF. With the advent of high-throughput sequencing more than a decade ago, the number of PFMs derived from in vivo and in vitro experiments has increased dramatically, leading to the creation of multiple databases storing PFMs or more complex TF-binding profiles such as JASPAR (19), CIS-BP (20) and HOCOMOCO (21) (see (22) for a comprehensive review). The JASPAR database (http://jaspar.genereg.net/) is one of the most popular databases of TF-binding profiles, and has been maintained for over 15 years (23). As such, many computational tools dedicated to the study of gene regulation incorporate profiles from JASPAR (e.g. TFBSshape (24,25), RSAT (26), MEME (27) or i-cisTarget (6)). At the heart of JASPAR is its CORE collection, which contains TF-binding profiles that are: (i) manually curated (meaning that orthogonal supporting evidence from the literature is required for each profile); (ii) non-redundant (one profile per TF with the exception of TFs with multiple DNA-binding sequence preferences (28)); (iii) associated with TFs from one of six taxa (vertebrates, nematodes, insects, plants, fungi, and urochordata) and (iv) freely available to the community through a user-friendly web interface, a RESTful API (29), and a dedicated R/Bioconductor data package (‘JASPAR2020’). Here, we present the 8th release of JASPAR, which comes with a major expansion and update of its CORE collection. Moreover, we introduce a new collection of unvalidated profiles, which stores quality-controlled PFMs for which our curators could not find orthogonal support. This collection has a dedicated web interface to engage the community of users in the curation of TF-binding profiles. Finally, we have updated the hierarchical clusters of TF-binding profiles, the genomic tracks of predicted TFBSs (now available for 8 genomes), and the profile inference tool.

EXPANSION AND UPDATE OF THE JASPAR CORE COLLECTION

For this 8th release of JASPAR, we added to the CORE collection 245 new TF-binding profiles for TFs in the following taxa: vertebrates (169 profiles, corresponding to an expansion of 29% for this taxon), plants (42 profiles, 9% expansion), nematodes (17 profiles, 65% expansion), insects (10 profiles, 8% expansion) and fungi (7 profiles, 4% expansion). We updated 156 profiles (Table 1). The new PFMs were derived from HT-SELEX (30), PBMs (20), ChIP-seq and DAP-seq experiments (data sourced from CistromeDB (31), ReMap (32,33), GTRD (34), ChIP-atlas (35) and ModERN (36), see Supplementary Text for method details). As previously described, the newly introduced profiles were manually curated to be supported by an orthogonal reference from the literature, which is provided in the metadata of the profiles. Moreover, the TF DBD class and family (following the TFClass classification (2)), the TF UniProt ID (37), and links to the TFBSshape (24,25), ReMap (32,33) and UniBind (38) databases are provided in the profiles metadata (whenever possible). Finally, the profiles previously associated with ID2, ID4 and TRB2 were removed from the CORE collection as these proteins are not TFs (1).
Table 1.

Overview of the growth of the number of PFMs in the JASPAR 2020 CORE and unvalidated collections compared to the JASPAR 2018 CORE collection

Taxonomic GroupNon-redundant PFMs in JASPAR 2018New non-redundant PFMs in JASPAR 2020Removed profilesUpdated PFMs in JASPAR 2020Total PFMs (non-redundant) in JASPAR 2020Total PFMs (all versions) in JASPAR 2020
Vertebrates57916921257461011
Plants48942128530572
Insects1331003143153
Nematodes2617004343
Fungi176700183184
Urochordata100011
Total CORE 1404 245 3 156 1646 1964
unvalidated 337 337
Overview of the growth of the number of PFMs in the JASPAR 2020 CORE and unvalidated collections compared to the JASPAR 2018 CORE collection Overall, the JASPAR 2020 CORE collection includes 1646 non-redundant PFMs (746 for vertebrates, 530 for plants, 183 for fungi, 143 for insects, 43 for nematodes and 1 for urochordates) (Table 1; Figure 1). Moreover, we continued with the incorporation of novel transcription factor flexible models (TFFMs), which are hidden Markov-based models capturing dinucleotide dependencies in TF–DNA interactions (11). We introduced new TFFMs for 217 TFs (136 for vertebrates, 38 for plants, 21 for insects, 17 for nematodes, and 5 for fungi) and updated TFFMs for 20 vertebrates TFs, which represents a 50% increase in the number of TFFMs available. All data is available on the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
Figure 1.

JASPAR CORE growth. The number of profiles in each taxon and overall (see legend) through all JASPAR releases.

JASPAR CORE growth. The number of profiles in each taxon and overall (see legend) through all JASPAR releases.

A NEW COLLECTION OF UNVALIDATED PROFILES FOR COMMUNITY ENGAGEMENT

We introduced a novel ‘unvalidated’ collection to store high-quality (i.e. passing multiple quality controls, see Supplementary Text) TF-binding profiles for which no independent support was found in the literature by our curators. This collection contains 337 PFMs. As these profiles are not yet supported by an orthogonal evidence, we recommend users to use this collection with caution. We encourage the community to engage in the curation of these profiles by providing the JASPAR curators with supporting complementary evidence (from their own work or others) whenever possible. This is facilitated by the availability of an individual submission form for each profile in the ‘unvalidated’ collection (Figure 2).
Figure 2.

Unvalidated TF-binding profile collection. Example with the ZNF793 profile. This high-quality PFM was derived from a ChIP-seq experiment and was built from thousands of potential TFBSs. Further, the TFBSs are enriched around the ChIP-seq peak summits. However, no orthogonal evidence supporting this profile was found by our curators. Users can upload relevant information about the profile in the unvalidated collection through the ‘Community curation’ box.

Unvalidated TF-binding profile collection. Example with the ZNF793 profile. This high-quality PFM was derived from a ChIP-seq experiment and was built from thousands of potential TFBSs. Further, the TFBSs are enriched around the ChIP-seq peak summits. However, no orthogonal evidence supporting this profile was found by our curators. Users can upload relevant information about the profile in the unvalidated collection through the ‘Community curation’ box. Further, we started a Q&A forum (https://groups.google.com/forum/#!forum/jaspar) to ease the communication between JASPAR curators and the community; we welcome the community to send us their questions and suggestions, or to report errors in JASPAR.

CLUSTERED PROFILES, GENOMIC TRACKS AND PROFILE INFERENCE TOOL

In the previous releases, we introduced novel features such as hierarchical clustering of TF-binding profiles in the CORE collection to visualize profile similarities, genomic tracks of predicted TFBSs, and an inference tool to predict TF-binding profiles likely recognized by TFs not available in the JASPAR CORE. We improved the profile inference tool using our own implementation of a recently described similarity regression method (20). We updated the generation of genomic tracks that are publicly available through the UCSC Genome Browser data hub (39) for 7 organisms: human (hg19, hg38), mouse (mm10), zebrafish (danRer11), Drosophila melanogaster (dm6), Caenorhabditis elegans (ce10), Arabidopsis thaliana (araTha1) and baker's yeast (sacCer3). For more details on the updated genomic tracks and inference tool, refer to the Supplementary Text. Finally, we generated the hierarchical clusters of available TF-binding profiles for each taxon with RSAT matrix-clustering (40). Users can explore the CORE/unvalidated collection through the trees and access directly the corresponding profiles by clicking on the TF name.

CONCLUSIONS AND PERSPECTIVES

Similar to previous releases, we substantially expanded the CORE collection of the JASPAR database. For this 8th release, we processed more than 18,000 ChIP-seq datasets. As a large number of the obtained high-quality TF-binding profiles were not supported with orthogonal supporting evidence, it motivated us to create the novel ‘unvalidated’ collection of profiles. We expect that upcoming experiments and publications will provide additional supporting evidence to some profiles to be incorporated into the JASPAR CORE collection. Meanwhile, we would like to extend our invitation to the research community to 1) help us curate these unvalidated profiles (e.g. by pointing us to supporting literature), and 2) send us their own novel profiles (e.g. determined experimentally) for incorporation in the next release of JASPAR. The JASPAR CORE vertebrates collection now contains 746 profiles, 637 of which are associated with human TFs with known DNA-binding profiles (1), which corresponds to a 58% of the 1,107 reported by Lambert et al. (1). While this is an impressive collective achievement by the field (the original JASPAR database only contained 81 profiles, a ∼7% coverage for human TFs), it suggests that targeted experimental efforts to find the binding preferences for remaining TFs will be important. Although computational approaches can be used to infer missing TF-binding profiles (20,41), especially for non-model organisms, the JASPAR approach is conservative, including profiles supported by at least two experiments in the literature. This is very important as we stand by the reliability of our data. Since its initial publication in 2004 (23), the JASPAR database has been committed to provide the research community with high-quality, manually curated, non-redundant TF-binding profiles. Lastly, although PFMs have dominated the field of gene regulation for decades, new profile representations have emerged. For example, profiles with expanded alphabets to represent methylated bases (42,43), modelling binding energy (44) or derived from deep learning importance scores (45). Depending on how the field evolves and how popular these profiles become, we will consider them for inclusion in JASPAR in the future. Click here for additional data file.
  42 in total

1.  HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.

Authors:  Ivan V Kulakovskiy; Ilya E Vorontsov; Ivan S Yevshin; Ruslan N Sharipov; Alla D Fedorova; Eugene I Rumynskiy; Yulia A Medvedeva; Arturo Magana-Mora; Vladimir B Bajic; Dmitry A Papatsenko; Fedor A Kolpakov; Vsevolod J Makeev
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

2.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.

Authors:  Aziz Khan; Oriol Fornes; Arnaud Stigliani; Marius Gheorghe; Jaime A Castro-Mondragon; Robin van der Lee; Adrien Bessy; Jeanne Chèneby; Shubhada R Kulkarni; Ge Tan; Damir Baranasic; David J Arenillas; Albin Sandelin; Klaas Vandepoele; Boris Lenhard; Benoît Ballester; Wyeth W Wasserman; François Parcy; Anthony Mathelier
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

3.  The ModERN Resource: Genome-Wide Binding Profiles for Hundreds of Drosophila and Caenorhabditis elegans Transcription Factors.

Authors:  Michelle M Kudron; Alec Victorsen; Louis Gevirtzman; LaDeana W Hillier; William W Fisher; Dionne Vafeados; Matt Kirkey; Ann S Hammonds; Jeffery Gersch; Haneen Ammouri; Martha L Wall; Jennifer Moran; David Steffen; Matt Szynkarek; Samantha Seabrook-Sturgis; Nader Jameel; Madhura Kadaba; Jaeda Patton; Robert Terrell; Mitch Corson; Timothy J Durham; Soo Park; Swapna Samanta; Mei Han; Jinrui Xu; Koon-Kiu Yan; Susan E Celniker; Kevin P White; Lijia Ma; Mark Gerstein; Valerie Reinke; Robert H Waterston
Journal:  Genetics       Date:  2017-12-28       Impact factor: 4.562

4.  RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections.

Authors:  Jaime Abraham Castro-Mondragon; Sébastien Jaeger; Denis Thieffry; Morgane Thomas-Chollier; Jacques van Helden
Journal:  Nucleic Acids Res       Date:  2017-07-27       Impact factor: 16.971

5.  TFBSshape: an expanded motif database for DNA shape features of transcription factor binding sites.

Authors:  Tsu-Pei Chiu; Beibei Xin; Nicholas Markarian; Yingfei Wang; Remo Rohs
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

6.  The MEME Suite.

Authors:  Timothy L Bailey; James Johnson; Charles E Grant; William S Noble
Journal:  Nucleic Acids Res       Date:  2015-05-07       Impact factor: 16.971

7.  ChIP-Atlas: a data-mining suite powered by full integration of public ChIP-seq data.

Authors:  Shinya Oki; Tazro Ohta; Go Shioi; Hideki Hatanaka; Osamu Ogasawara; Yoshihiro Okuda; Hideya Kawaji; Ryo Nakaki; Jun Sese; Chikara Meno
Journal:  EMBO Rep       Date:  2018-11-09       Impact factor: 8.807

8.  UniProt: a worldwide hub of protein knowledge.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  The next generation of transcription factor binding site prediction.

Authors:  Anthony Mathelier; Wyeth W Wasserman
Journal:  PLoS Comput Biol       Date:  2013-09-05       Impact factor: 4.475

10.  TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

Authors:  Lin Yang; Tianyin Zhou; Iris Dror; Anthony Mathelier; Wyeth W Wasserman; Raluca Gordân; Remo Rohs
Journal:  Nucleic Acids Res       Date:  2013-11-07       Impact factor: 16.971

View more
  514 in total

1.  Sexually dimorphic expression and regulatory sequence of dnali1 in the olive flounder Paralichthys olivaceus.

Authors:  Ling Wang; Xungang Tan; Congcong Zou; Lijuan Wang; Zhihao Wu; Yuxia Zou; Zongcheng Song; Feng You
Journal:  Mol Biol Rep       Date:  2021-04-20       Impact factor: 2.316

Review 2.  Redefining fundamental concepts of transcription initiation in bacteria.

Authors:  Citlalli Mejía-Almonte; Stephen J W Busby; Joseph T Wade; Jacques van Helden; Adam P Arkin; Gary D Stormo; Karen Eilbeck; Bernhard O Palsson; James E Galagan; Julio Collado-Vides
Journal:  Nat Rev Genet       Date:  2020-07-14       Impact factor: 53.242

3.  Pharmacogenomics of intracellular methotrexate polyglutamates in patients' leukemia cells in vivo.

Authors:  Elixabet Lopez-Lopez; Robert J Autry; Colton Smith; Wenjian Yang; Steven W Paugh; John C Panetta; Kristine R Crews; Erik J Bonten; Brandon Smart; Deqing Pei; J Robert McCorkle; Barthelemy Diouf; Kathryn G Roberts; Lei Shi; Stanley Pounds; Cheng Cheng; Charles G Mullighan; Ching-Hon Pui; Mary V Relling; William E Evans
Journal:  J Clin Invest       Date:  2020-12-01       Impact factor: 14.808

4.  Deep learning for inferring transcription factor binding sites.

Authors:  Peter K Koo; Matt Ploenzke
Journal:  Curr Opin Syst Biol       Date:  2020-06-11

5.  Polymorphisms in 5' proximal regulating region of THRSP gene are associated with fat production in pigs.

Authors:  Xiaohong Wang; Jin Cheng; Wenjuan Qin; Hua Chen; Gongwei Chen; Xuanjian Shang; Mengting Zhang; Nyamsuren Balsai; Hongquan Chen
Journal:  3 Biotech       Date:  2020-05-25       Impact factor: 2.406

6.  Insights into the Diversification and Evolution of R2R3-MYB Transcription Factors in Plants.

Authors:  Chen-Kun Jiang; Guang-Yuan Rao
Journal:  Plant Physiol       Date:  2020-04-14       Impact factor: 8.340

7.  TFmotifView: a webserver for the visualization of transcription factor motifs in genomic regions.

Authors:  Clémentine Leporcq; Yannick Spill; Delphine Balaramane; Christophe Toussaint; Michaël Weber; Anaïs Flore Bardet
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

8.  13q12.2 deletions in acute lymphoblastic leukemia lead to upregulation of FLT3 through enhancer hijacking.

Authors:  Minjun Yang; Setareh Safavi; Eleanor L Woodward; Nicolas Duployez; Linda Olsson-Arvidsson; Jonas Ungerbäck; Mikael Sigvardsson; Marketa Zaliova; Jan Zuna; Thoas Fioretos; Bertil Johansson; Karolin H Nord; Kajsa Paulsson
Journal:  Blood       Date:  2020-08-20       Impact factor: 22.113

9.  Analysis of zebrafish periderm enhancers facilitates identification of a regulatory variant near human KRT8/18.

Authors:  Huan Liu; Kaylia Duncan; Annika Helverson; Priyanka Kumari; Camille Mumm; Yao Xiao; Jenna Colavincenzo Carlson; Fabrice Darbellay; Axel Visel; Elizabeth Leslie; Patrick Breheny; Albert J Erives; Robert A Cornell
Journal:  Elife       Date:  2020-02-07       Impact factor: 8.140

10.  Conservation of peripheral nervous system formation mechanisms in divergent ascidian embryos.

Authors:  Joshua F Coulcher; Agnès Roure; Rafath Chowdhury; Méryl Robert; Laury Lescat; Aurélie Bouin; Juliana Carvajal Cadavid; Hiroki Nishida; Sébastien Darras
Journal:  Elife       Date:  2020-11-16       Impact factor: 8.140

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.