Literature DB >> 28053161

GETPrime 2.0: gene- and transcript-specific qPCR primers for 13 species including polymorphisms.

Fabrice P A David^1,2, Jacques Rougemont^3,2, Bart Deplancke^4,5.

Abstract

GETPrime (http://bbcftools.epfl.ch/getprime) is a database with a web frontend providing gene- and transcript-specific, pre-computed qPCR primer pairs. The primers have been optimized for genome-wide specificity and for allowing the selective amplification of one or several splice variants of most known genes. To ease selection, primers have also been ranked according to defined criteria such as genome-wide specificity (with BLAST), amplicon size, and isoform coverage. Here, we report a major upgrade (2.0) of the database: eight new species (yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis) now complement the five already included in the previous version (human, mouse, zebrafish, fly, and worm). Furthermore, the genomic reference has been updated to Ensembl v81 (while keeping earlier versions for backward compatibility) as a result of re-designing the back-end database and automating the import of relevant sections of the Ensembl database in species-independent fashion. This also allowed us to map known polymorphisms to the primers (on average three per primer for human), with the aim of reducing experimental error when targeting specific strains or individuals. Another consequence is that the inclusion of future Ensembl releases and other species has now become a relatively straightforward task.

Entities: Chemical Gene Species

Mesh：

Substances：
DNA Primers
RNA

Year: 2016 PMID： 28053161 PMCID： PMC5210624 DOI： 10.1093/nar/gkw913

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Genome-scale experiments have accumulated massive information over recent years and have greatly contributed to our understanding of gene expression and its regulatory mechanisms. These experiments have clearly revealed the ubiquitous nature of alternative splicing and isoform dosage effects (1,2). It is in this regard key to perform precise, quantitative measurements of selected genes and transcripts to assess specific expression patterns or functions. Such experiments typically involve the quantitative real-time polymerase chain reaction (qPCR), and the value of these qPCR assays depends in large part on the quality of the selected primer pair for the respective, targeted transcription unit (3). We have therefore undertaken the systematic design of primer pairs for every known gene and transcript for organisms with well-annotated genome references with in silico verification of optimal specificity. The design of these primer pairs follows the pipeline described in (4), which we briefly recall here: for designing gene- or transcript-specific primers pairs, exon junctions that are included in respectively the largest or smallest number of isoforms for each gene are first identified after which the corresponding transcript is processed with PerlPrimer (5) for the best primer set that overlaps these junctions. Candidate primers are then filtered according to (i) genome-wide specificity (running BLAST with an E-value of 100) and (ii) not spanning 5′ or 3′ untranslated regions (UTR), as well as ranked according to the number of isoforms they cover, amplicon length, and other primer quality parameters that were previously discussed (3,4). The top three primer pairs are then retained and displayed in the database with a star-based quality flag corresponding to the rank in this list. If no pair passes the filters, then the original primer design constraints are progressively relaxed until a candidate pair emerges, hence the warnings associated with some primers (the ‘warnings’ column that can be observed in Figure 1).

Figure 1.

The GETPrime 2.0 search interface and tabular display. The figure shows several of the 30 primer pairs found for human gene MDM1. Results can be downloaded in tab-separated format through the ‘Download’ link. The search is restricted to an organism, Ensemble release, and a maximum number of lines (the smaller the number, the faster the query). Each result line corresponds to a single primer pair, and displays its unique ID, the gene, and transcript(s) it targets, its star-based rank (among the best three pairs found for the gene), the fraction of isoforms it covers, the amplicon length, the primer sequences and their respective melting temperatures, and the Ensembl annotation for the gene (KNOWN or NOVEL). The last two columns provide respectively warnings if the primer search did not work with standard parameters and a link to a primer pair-specific page shown in Figure 3.

Figure 3.

The GETPrime 2.0 primer details page. All information about one particular primer pair is summarized in this page: gene and transcript IDs, GETPrime warnings, and detailed information about each forward and reverse primer. Particularly relevant are the indication of SNP positions (in red) and whether a primer spans an intron as well as the UCSC display link.

Since its inception in 2011, the database has been used continuously and access statistics show a large user base. For example, the GETPrime web interface received nearly 1800 visits (by 1000 users) over the first 6 months of 2016 alone. Individual users also provided constructive feedback to further improve GETPrime, which in large part prompted the major update of the database (2.0) that is presented here.

Data integration

GETPrime 2.0 cross-references a number of data sources to document gene structures, transcript sequences, genome sequences, and annotated variants. The database now incorporates data from three versions of Ensembl (6): 50 (July 2008), 61 (February 2011), and 81 (July 2015). This is to keep backward compatibility with the first release of GETPrime, while updates will be performed on a regular basis. Relevant data from Ensembl is automatically imported into our PostgreSQL database (https://www.postgresql.org). Thanks to the uniform structure of the Ensembl database for various species, we can now easily select additional species and we currently host yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis next to the previously established primers pairs for human, mouse, zebrafish, fly, and worm. Compared to version 1.0 (4), the database schema has been re-designed to improve the speed of queries via the web user interface and to provide two new interaction modes: a batch download capability and a programmatic interface (RESTful API).

User interface

The user interface of GETPrime 2.0 has been re-designed to make it faster, friendlier, and richer. It is based on a new 3-tier Ruby on Rails (RoR) (http://rubyonrails.org) application. Among many other features, this framework improves the efficiency of database queries and simplifies the rendering on web pages. It also implements a RESTful API that allows programmers to access the data directly (see documentation at http://bbcftools.epfl.ch/getprime/api_documentation). A new search engine allows searching by gene name, Ensembl gene ID or transcript ID or directly by the internal primer pair ID (Figure 1). The search box accepts up to 10 identifiers per search. When only one identifier is provided and does not match perfectly, a regular expression search is performed. This search tool uses the Jquery (mostly the Ajax method) and datatables.js Javascript libraries. The Ajax technology is used to update portions of the web pages following user selections without reloading the whole page. This improves the responsivity and flexibility of the display. Primers are linked to a view in the UCSC genome browser (7) where they are displayed in their genomic context. In the UCSC view, primer pairs are identified by a unique numeric ID, by the gene and transcript they target, and by their rank in the list of candidates (Figure 2). This UCSC display is generated by uploading a single custom track (as a BED file) generated for each organism and Ensembl version. The BED file can be directly downloaded as well as the full database as TAB-separated files. Each primer pair is clickable and linked back to the GETPrime website, and more specifically to the page containing details about the primer. This page contains more information than the previous version of GETPrime. For example, next to the position in the genome of the primer sequences, the position and the length of the introns are reported when applicable.

Figure 2.

The UCSC view of GETPrime 2.0 primer pairs. The two primers (in black) of each pair are displayed as thick bars connected by thin arrows revealing on which strand the pair of primers will amplify DNA. They are also mapped to their genomic coordinates, including the intron(s) that each primer potentially spans. In this example, six primer pairs are displayed. For the first three, both forward and reverse primers span an intron, whereas for the three other pairs, only the reverse primer spans an intron. Note that the format of the displayed identifier is the following: GETPrimeID|Ensembl-gene-ID_GETPrime-rank (e.g. 2111376|ENSG00000111554_3) and that the other primer pairs for MDM1 are not visible within this screenshot.

Sequence polymorphisms

Our knowledge of genomic variation within species and how such variants drive molecular and organismal diversity is rapidly increasing (8–12). One of the benefits of these advances is that we are now able to incorporate variant information (when available) in genomic experiments since such genetic variants may be an important source of experimental variability or even failure (13,14). Thus, to reduce experimental error, we decided to start displaying the presence of known SNPs within the GETPrime 2.0 primers to aid users in the design and interpretation of their experiments. So far, we were able to cover SNPs for human and mouse by importing them from dbSNP v145 (15) and to map these SNPs to the primers that overlap them. Corresponding positions in the primer sequences are then highlighted (Figure 3) and a link to the dbSNP-based evidence allows a more detailed evaluation of the nature and relevance of the polymorphism(s). The GETPrime 2.0 primer details page. All information about one particular primer pair is summarized in this page: gene and transcript IDs, GETPrime warnings, and detailed information about each forward and reverse primer. Particularly relevant are the indication of SNP positions (in red) and whether a primer spans an intron as well as the UCSC display link.

Database content

The GETPrime 2.0 database currently contains a total of 1 175 874 primer pairs (444 256 in human, 268 855 in mouse), corresponding to an average of six pairs per covered gene (across 13 species). In human, there are more than 20 pairs per gene and 12 in mouse. On average, 92% of Ensembl protein-coding genes are covered by our database, the remainder corresponding to non-unique sequences for which specific primers could not be designed (Table 1). Importantly, for human and mouse, this number exceeds 98%. However, some species are still only partially covered due to differences in the Ensembl annotation compared to the human database. In particular, for A.carolinensis or macaque, only a fraction of the annotated genes were processed in the pipeline (Table 1). Moreover, the incomplete status of the macaque assembly led to a high failure rate of the pipeline probably due to the repetitive nature of unassembled contigs (Table 1). We plan to resolve both issues in a next release. Regarding polymorphisms, a total of 2 864 885 variants were mapped to human primers (492 968 in mouse), indicating that more than 80% of human primers overlap a documented variant, with an average of about three SNPs per primer. This illustrates the importance of considering this information when designing or using primers.

Table 1.

Global statistics of GETPrime 2.0 for each of the 13 included species

Species	Number of genes in ensembl v81	Number of genes covered (% of total genes)	Number of primer pairs	Number of variants
Anolis carolinensis	19	19 (100%)	57
Caenorhabditis elegans	20 447	20 412 (99.8%)	104 810
Danio rerio	22 337	21 805 (97.6%)	121 576
Drosophila melanogaster	13 918	13 911 (99.9%)	99 032
Gallus gallus	5222	5204 (99.6%)	18 791
Homo sapiens	22 017	21 653 (98.3%)	444 256	2 864 885
Macaca mulatta	8693	1154 (13.2%)	5345
Mus musculus	22 155	21 835 (98.6%)	268 855	492 968
Ornithothynchus anatinus	170	149 (87.6%)	606
Pan troglodytes	140	140 (100%)	474
Rattus norvegicus	21 470	20 841 (97.0%)	88 311
Saccharomyces cerevisiae	6692	6620 (98.9%)	19 923
Tetraodon nigroviridis	1130	1125 (99.6%)	3838

CONCLUSION AND PERSPECTIVE

The steady access statistics of the GETPrime database are a testimony that the embedded primer information is useful and the release of GetPrime 2.0 responds to user feedback that we have received, namely: update the genomic data, extend to new species, and cross-reference new types of genomic data (polymorphisms). Our plan for the future is to maintain the availability of the database, keep it up-to-date and add new species when possible. In addition, we intend for GETPrime to closely follow and reflect the growth of genomic data resources at Ensembl and elsewhere. One additional important aspect would be a broader experimental validation of our in silico-designed primers. One way to do so would be to accommodate user feedback. We intend to implement a system that would allow the flagging of primers that have been successfully (or possibly even unsuccessfully) used in experiments, including links to the respective papers.

15 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. How to do successful gene expression analysis using real-time PCR.

Authors: Stefaan Derveaux; Jo Vandesompele; Jan Hellemans
Journal: Methods Date: 2009-12-05 Impact factor: 3.608

Review 3. The role of regulatory variation in complex traits and disease.

Authors: Frank W Albert; Leonid Kruglyak
Journal: Nat Rev Genet Date: 2015-02-24 Impact factor: 53.242

Review 4. Alternative splicing: a pivotal step between eukaryotic transcription and translation.

Authors: Alberto R Kornblihtt; Ignacio E Schor; Mariano Alló; Gwendal Dujardin; Ezequiel Petrillo; Manuel J Muñoz
Journal: Nat Rev Mol Cell Biol Date: 2013-02-06 Impact factor: 94.444

5. Mouse genomic variation and its effect on phenotypes and gene regulation.

Authors: Thomas M Keane; Leo Goodstadt; Petr Danecek; Michael A White; Kim Wong; Binnaz Yalcin; Andreas Heger; Avigail Agam; Guy Slater; Martin Goodson; Nicholas A Furlotte; Eleazar Eskin; Christoffer Nellåker; Helen Whitley; James Cleak; Deborah Janowitz; Polinka Hernandez-Pliego; Andrew Edwards; T Grant Belgard; Peter L Oliver; Rebecca E McIntyre; Amarjit Bhomra; Jérôme Nicod; Xiangchao Gan; Wei Yuan; Louise van der Weyden; Charles A Steward; Sendu Bala; Jim Stalker; Richard Mott; Richard Durbin; Ian J Jackson; Anne Czechanski; José Afonso Guerra-Assunção; Leah Rae Donahue; Laura G Reinholdt; Bret A Payseur; Chris P Ponting; Ewan Birney; Jonathan Flint; David J Adams
Journal: Nature Date: 2011-09-14 Impact factor: 49.962

6. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

7. Ensembl 2016.

Authors: Andrew Yates; Wasiu Akanni; M Ridwan Amode; Daniel Barrell; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Nathan Johnson; Thomas Juettemann; Stephen Keenan; Ilias Lavidas; Fergal J Martin; Thomas Maurel; William McLaren; Daniel N Murphy; Rishi Nag; Michael Nuhn; Anne Parker; Mateus Patricio; Miguel Pignatelli; Matthew Rahtz; Harpreet Singh Riat; Daniel Sheppard; Kieron Taylor; Anja Thormann; Alessandro Vullo; Steven P Wilder; Amonida Zadissa; Ewan Birney; Jennifer Harrow; Matthieu Muffato; Emily Perry; Magali Ruffier; Giulietta Spudich; Stephen J Trevanion; Fiona Cunningham; Bronwen L Aken; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2015-12-19 Impact factor: 16.971

8. Evaluation of the impact of single nucleotide polymorphisms and primer mismatches on quantitative PCR.

Authors: Brian Boyle; Nancy Dallaire; John MacKay
Journal: BMC Biotechnol Date: 2009-08-28 Impact factor: 2.563

9. Sequence polymorphism can produce serious artefacts in real-time PCR assays: hard lessons from Pacific oysters.

Authors: Nicolas Taris; Robert P Lang; Mark D Camara
Journal: BMC Genomics Date: 2008-05-20 Impact factor: 3.969

10. Extensive transcriptional heterogeneity revealed by isoform profiling.

Authors: Vicent Pelechano; Wu Wei; Lars M Steinmetz
Journal: Nature Date: 2013-04-24 Impact factor: 49.962

9 in total

1. The nuclear ubiquitin ligase adaptor SPOP is a conserved regulator of C9orf72 dipeptide toxicity.

Authors: Carley Snoznik; Valentina Medvedeva; Jelena Mojsilovic-Petrovic; Paige Rudich; James Oosten; Robert G Kalb; Todd Lamitina
Journal: Proc Natl Acad Sci U S A Date: 2021-09-30 Impact factor: 11.205

2. Modulating the systemic and local adaptive immune response after fracture improves bone regeneration during aging.

Authors: Emma Muiños Lopez; Kevin Leclerc; Malissa Ramsukh; Paulo El Parente; Karan Patel; Carlos J Aranda; Anna M Josephson; Lindsey H Remark; David J Kirby; Daniel B Buchalter; Tarik Hadi; Sophie M Morgani; Bhama Ramkhelawon; Philipp Leucht
Journal: Bone Date: 2022-01-06 Impact factor: 4.626

3. Prime Editor 3 Mediated Beta-Thalassemia Mutations of the HBB Gene in Human Erythroid Progenitor Cells.

Authors: Haokun Zhang; Qinlinglan Zhou; Hongyan Chen; Daru Lu
Journal: Int J Mol Sci Date: 2022-04-30 Impact factor: 6.208

4. Notch-Wnt signal crosstalk regulates proliferation and differentiation of osteoprogenitor cells during intramembranous bone healing.

Authors: S Lee; L H Remark; A M Josephson; K Leclerc; E Muiños Lopez; D J Kirby; Devan Mehta; H P Litwa; M Z Wong; S Y Shin; P Leucht
Journal: NPJ Regen Med Date: 2021-05-28

5. Selection and Evaluation of Candidate Reference Genes for Quantitative Real-Time PCR in Aboveground Tissues and Drought Conditions in Rhododendron Delavayi.

Authors: Lu Zhang; Yanfei Cai; Mingchao Zhang; Guanghui Du; Jihua Wang
Journal: Front Genet Date: 2022-04-14 Impact factor: 4.772

6. PCRdrive: the largest qPCR assay archive to date and endless potential for lab workflow revitalization.

Authors: Florian Burger; Michele Angioni; Gianluca Russo; Martina Schad; Jim Kallarackal
Journal: BMC Bioinformatics Date: 2018-11-22 Impact factor: 3.169

7. Partial loss of CFIm25 causes learning deficits and aberrant neuronal alternative polyadenylation.

Authors: Callison E Alcott; Hari Krishna Yalamanchili; Ping Ji; Meike E van der Heijden; Alexander Saltzman; Nathan Elrod; Ai Lin; Mei Leng; Bhoomi Bhatt; Shuang Hao; Qi Wang; Afaf Saliba; Jianrong Tang; Anna Malovannaya; Eric J Wagner; Zhandong Liu; Huda Y Zoghbi
Journal: Elife Date: 2020-04-22 Impact factor: 8.140

8. Downregulation of GLI3 Expression Mediates Chemotherapy Resistance in Acute Myeloid Leukemia.

Authors: Fabian Freisleben; Lena Behrmann; Vanessa Thaden; Jana Muschhammer; Carsten Bokemeyer; Walter Fiedler; Jasmin Wellbrock
Journal: Int J Mol Sci Date: 2020-07-18 Impact factor: 5.923

9. qPrimerDB: a thermodynamics-based gene-specific qPCR primer database for 147 organisms.

Authors: Kun Lu; Tian Li; Jian He; Wei Chang; Rui Zhang; Miao Liu; Mengna Yu; Yonghai Fan; Jinqi Ma; Wei Sun; Cunmin Qu; Liezhao Liu; Nannan Li; Ying Liang; Rui Wang; Wei Qian; Zhanglin Tang; Xinfu Xu; Bo Lei; Kai Zhang; Jiana Li
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

9 in total