| Literature DB >> 21382834 |
Wolfgang F Bluhm1, Bojan Beran, Chunxiao Bi, Dimitris Dimitropoulos, Andreas Prlic, Gregory B Quinn, Peter W Rose, Chaitali Shah, Jasmine Young, Benjamin Yukich, Helen M Berman, Philip E Bourne.
Abstract
The RCSB Protein Data Bank (RCSB PDB, www.pdb.org) is a key online resource for structural biology and related scientific disciplines. The website is used on average by 165,000 unique visitors per month, and more than 2000 other websites link to it. The amount and complexity of PDB data as well as the expectations on its usage are growing rapidly. Therefore, ensuring the reliability and robustness of the RCSB PDB query and distribution systems are crucially important and increasingly challenging. This article describes quality assurance for the RCSB PDB website at several distinct levels, including: (i) hardware redundancy and failover, (ii) testing protocols for weekly database updates, (iii) testing and release procedures for major software updates and (iv) miscellaneous monitoring and troubleshooting tools and practices. As such it provides suggestions for how other websites might be operated.Entities:
Mesh:
Year: 2011 PMID: 21382834 PMCID: PMC3056270 DOI: 10.1093/database/bar003
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Schematic representation of hardware redundancy and DNS failover. There are four clusters in three separate geographical locations: San Diego Supercomputer Center (SDSC) and Skaggs School of Pharmacy and Pharmaceutical Sciences (SSPPS), both at the University of California, San Diego (UCSD), and Rutgers, the State University of New Jersey. Each cluster contains multiple load balanced Web and FTP servers. A third party DNS provider is used to manage the DNS entries for the website (www.pdb.org) and the FTP site (ftp.wwpdb.org) including failover in case the primary cluster fails.
Figure 2.Staggered weekly update schedule of the Web and FTP servers. The overall aim is to balance the need for advanced staging of the update (red and orange) with as much failover to current data (green) as possible at any given time. The update cycle begins on Friday with the second SDSC cluster (SDSC 2). Two more clusters are updated on Monday and Tuesday. On Wednesday at 00:00 UTC, the update is made public by switching the DNS entry between the two SDSC clusters (thick outlines). A few hours are allowed for the DNS change to propagate until the update on the final and now out of date cluster (blue) is started. Green with thick outlines shows “live” clusters serving data to the public. Other clusters in green have the same current content. Clusters in red are being updated. Orange denotes a cluster with a finished update that contains “staged” data not yet available for public release. Blue shows a cluster whose data are out of date compared with the live public site.
Figure 3.Screenshot of fully executed Selenium IDE test suite in a Firefox browser. We have developed an extensive suite of testing scripts with several goals such as checking key elements of the user interface, verifying the integrity of the weekly update, and comparing results between multiple servers. The top-left panel shows the list of test scripts. The top-middle panel shows as an example the script for verifying that keyword searches are up to date. It selects the first entry ID from the weekly release, extracts a keyword from its title, and then performs a search for this keyword and asserts that the search results include the given entry ID. The top-right panel contains the controls for executing tests and shows the test results. The bottom panel is a regular browser window that shows the web pages being loaded by the scripts.
Code maintenance and troubleshooting tools most commonly used by RCSB PDB website developers
| Software tool | URL | Description |
|---|---|---|
| Code maintenance tools | ||
| JUnit | junit.org | Java unit testing framework |
| TestNG | testng.org | ‘Next Generation’ Java testing framework |
| FindBugs | findbugs.sourceforge.net | Program for finding bugs in Java code |
| FireBug | getfirebug.com | Firefox extension for web development |
| UCDetector | ucdetector.org | Eclipse plugin for finding unnecessary Java code |
| Troubleshooting tools | ||
| JConsole | download.oracle.com/javase/6/docs/technotes/tools/share/jconsole.html | Java monitoring and management console |
| LambdaProbe | lambdaprobe.org | Tomcat monitoring and management tool |
| jstack | download.oracle.com/javase/6/docs/technotes/tools/share/jstack.html | Java utility for monitoring stack traces |
| jmap | download.oracle.com/javase/6/docs/technotes/tools/share/jmap.html | Java utility for monitoring memory utilization |
| hprof | java.sun.com/developer/technicalArticles/Programming/HPROF.html | Java utility for heap and CPU monitoring |
Figure 4.Screenshot of Cacti monitoring tool (www.cacti.net). Parameters such as internal and external traffic, CPU and memory usage, thread counts and Tomcat session counts are collected every few minutes and presented in graphical form for each server or cluster. The time window starts after the cluster had just been reinstalled with a new quarterly software release and shows the server load during successive Selenium tests on each server. Server names are redacted for security reasons.