Literature DB >> 29124093

Analysis of the conformations of the HIV-1 protease from a large crystallographic data set.

Abstract

The HIV-1 protease performs essential roles in viral maturation by processing specific cleavage sites in the Gag and Gag-Pol precursor polyproteins to release their mature forms. Here the analysis of a large HIV-1 protease data set (containing 552 dimer structures) are reported. These data are related to article entitled "Conformations of the HIV-1 protease: a crystal structure data set analysis" (Palese, 2017) [1].

Entities: Chemical Disease Gene Mutation Species

Year: 2017 PMID： 29124093 PMCID： PMC5671413 DOI： 10.1016/j.dib.2017.09.076

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table Value of the data The described data set includes a very large number of the public available structures of the HIV-1 protease. The database can be useful in the drug design and analysis studies. The evidence that preferential conformations are adopted by different sequences could represent an interesting benchmark for the computational prediction and fine tuning of protein structures.

Data

Data sets

The large HIV-1 protease data set used in the analysis is reported in csv format (file name HIV-1_dataset.csv). Data in this file are arranged in columns (headers in the first row): the first column reports the PDB id of each entry; the second column refers to the internal sequence id; the last two columns report the calculated first and second principal component projections, respectively (calculated by the truncated SVD method [1]). The high quality structures are listed in the file HIV-1_HQ_dataset.csv. In the file are reported the PDB id, the available quality data (R observed, R all, R work, R free, refinement resolution, and the R difference); last column reports the sequence cluster id. The full set of fluctuations (see [1]) is reported in the file fluctuations.csv. Each row in this file represents an eigenvector (297 eigenvector describe the monomer), and each amino acid is reported as a column (99 amino acid compose the monomer). The first and second principal modes calculated for the monomer data set are reported as animated GIF image (see [1] for details). Some relevant modes are reported as nmd file [1], [2], [3]. Supplementary material related to this article can be found online at: doi:10.1016/j.dib.2017.09.076. The following is the Supplementary material related to this article Video 1, Video 2.

Video 1

Mode-1

Video 2

Mode-2 Some results of the analysis reported in [1] on the above described data set are reported as Fig. 1, Fig. 2, Fig. 3, Fig. 4. The reader could refers to [1] for full details.

Fig. 1

The PCA of the monomer structures calculated by the covariance matrix method.

Fig. 2

PCA projection of the dimer data set. The entries are colored in blue if their second PC was negative, in red if positive.

Fig. 3

Random projection of the dimer data set. Color code for each entry is the same as in Fig. 2.

Fig. 4

PCA of the HQ dimer data set (truncated SVD method).

The PCA of the monomer structures calculated by the covariance matrix method. PCA projection of the dimer data set. The entries are colored in blue if their second PC was negative, in red if positive. Random projection of the dimer data set. Color code for each entry is the same as in Fig. 2. PCA of the HQ dimer data set (truncated SVD method).

Relevant sequence clusters in the data set

Some of the sequence clusters of the HIV-1 protease data set discussed in [1] are reported in Fig. 5; differences respect to the Consensus B sequence (Stanford HIV database) [2], [3], [4], [5] are in red.

Fig. 5

Some sequence groups of the HIV-1 protease data set (see [1]).

Experimental design, materials and methods

The structures sharing the 90% identity with the Consensus B sequence (Stanford HIV database) [4], [5], [6], [7] were initially considered. The X-ray structures of the HIV-1 protease were obtained from the PDB [8], [9], [10]. A total number of 581 structures in the PDB met this criterion. The structures obtained by X-ray, of dimeric form, classified with an E.C. number 3.4.23.16 (HIV-1 retropepsin), and with a refinement resolution better of at least 3.1 Å were further selected. The number of alpha-carbon atoms in the downloaded pdb files was checked by the bash grep function after deleting the multiple conformations by the bash sed command. Few structures requested a further manual editing step. Finally 552 HIV-1 protease structures, as dimer, were included in the data set. The structures contained in a data set were aligned to a common reference by Tcl (www.tcl.tk) scripting in VMD [3]. The new atomic coordinates were stored in a pdb file. For the analysis, the Cartesian coordinates of alpha-carbon atoms of the superposed structures of the data set were extracted and arranged in a matrix form by a Tcl script in VMD. Bracket in the obtained text file were removed in vi (www.vim.org). The result was that the coarse grained data conformations were arranged in a matrix such that each row represented a sample, and each column a degree of freedom. This data matrix was analyzed by methods described in [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], as reported in [1].

Subject area	Chemistry, Biology.
More specific subject area	Biochemistry, HIV-1 protease structure.
Type of data	Table (csv files), text file, figure, animated figures.
How data was acquired	Input data for analysis were obtained as pdb files from public database.
Data format	Raw: pdb files (as text files). Analyzed: table (csv files), text file, graph, animated GIF.
Experimental factors	Raw pdb files were checked for quality.
Experimental features	The pdb files included in the database were analyzed by different computational protocols.
Data source location	Not applicable.
Data accessibility	Analyzed data are within this article.

14 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Human immunodeficiency virus type 1 reverse transcriptase and protease mutation search engine for queries.

Authors: R W Shafer; D R Jung; B J Betts
Journal: Nat Med Date: 2000-11 Impact factor: 53.440

3. Announcing the worldwide Protein Data Bank.

Authors: Helen Berman; Kim Henrick; Haruki Nakamura
Journal: Nat Struct Biol Date: 2003-12

4. Correlation Analysis of Trp-Cage Dynamics in Folded and Unfolded States.

Authors: Luigi L Palese
Journal: J Phys Chem B Date: 2015-12-14 Impact factor: 2.991

5. Amyloid beta(1-42) in aqueous environments: effects of ionic strength and E22Q (Dutch) mutation.

Authors: Fabrizio Bossis; Luigi L Palese
Journal: Biochim Biophys Acta Date: 2013-09-06

6. VMD: visual molecular dynamics.

Authors: W Humphrey; A Dalke; K Schulten
Journal: J Mol Graph Date: 1996-02

7. Conformations of the HIV-1 protease: A crystal structure data set analysis.

Authors: Luigi Leonardo Palese
Journal: Biochim Biophys Acta Proteins Proteom Date: 2017-08-26 Impact factor: 3.036

Review 8. Rationale and uses of a public HIV drug-resistance database.

Authors: Robert W Shafer
Journal: J Infect Dis Date: 2006-09-15 Impact factor: 5.226

9. Human immunodeficiency virus reverse transcriptase and protease sequence database.

Authors: Soo-Yon Rhee; Matthew J Gonzales; Rami Kantor; Bradley J Betts; Jaideep Ravela; Robert W Shafer
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

10. ProDy: protein dynamics inferred from theory and experiments.

Authors: Ahmet Bakan; Lidio M Meireles; Ivet Bahar
Journal: Bioinformatics Date: 2011-04-05 Impact factor: 6.937

4 in total

1. Translational control mechanisms in cutaneous malignant melanoma: the role of eIF2α.

Authors: Immacolata Maida; Paola Zanna; Stefania Guida; Anna Ferretta; Tiziana Cocco; Luigi Leonardo Palese; Paola Londei; Dario Benelli; Amalia Azzariti; Stefania Tommasi; Michele Guida; Giovanni Pellacani; Gabriella Guida
Journal: J Transl Med Date: 2019-01-11 Impact factor: 5.531