| Literature DB >> 35155377 |
Marco Pinto Corujo1, Adewale Olamoyesan2, Anastasiia Tukova2, Dale Ang2, Erik Goormaghtigh3, Jason Peterson4, Victor Sharov4, Nikola Chmel1, Alison Rodger1,2.
Abstract
A protein's structure is the key to its function. As protein structure can vary with environment, it is important to be able to determine it over a wide range of concentrations, temperatures, formulation vehicles, and states. Robust reproducible validated methods are required for applications including batch-batch comparisons of biopharmaceutical products. Circular dichroism is widely used for this purpose, but an alternative is required for concentrations above 10 mg/mL or for solutions with chiral buffer components that absorb far UV light. Infrared (IR) protein absorbance spectra of the Amide I region (1,600-1700 cm-1) contain information about secondary structure and require higher concentrations than circular dichroism often with complementary spectral windows. In this paper, we consider a number of approaches to extract structural information from a protein infrared spectrum and determine their reliability for regulatory and research purpose. In particular, we compare direct and second derivative band-fitting with a self-organising map (SOM) approach applied to a number of different reference sets. The self-organising map (SOM) approach proved significantly more accurate than the band-fitting approaches for solution spectra. As there is no validated benchmark method available for infrared structure fitting, SOMSpec was implemented in a leave-one-out validation (LOOV) approach for solid-state transmission and thin-film attenuated total reflectance (ATR) reference sets. We then tested SOMSpec and the thin-film ATR reference set against 68 solution spectra and found the average prediction error for helix (α + 310) and β-sheet was less than 6% for proteins with less than 40% helix. This is quantitatively better than other available approaches. The visual output format of SOMSpec aids identification of poor predictions. We also demonstrated how to convert aqueous ATR spectra to and from transmission spectra for structure fitting. Fourier self-deconvolution did not improve the average structure predictions.Entities:
Keywords: infrared absorbance; protein; secondary structure; self-organising map; validation
Year: 2022 PMID: 35155377 PMCID: PMC8830495 DOI: 10.3389/fchem.2021.784625
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.221
FIGURE 1(A) A 50-protein thin-film ATR reference set (see Supplementary Table S1 for list of proteins). Inset: Amide I maxima plotted versus total α-helix red and β-sheet blue content. Proteins F1−F7 (>60% helix) are purple; F8−14 (45–59% helix) are blue; F15−21 (34–44% helix) are turquoise; F22−F28 (26–33% helix) are green; F29−F33, F36, F38 (17–25% helix) are yellow; F34, F35, F37, F39, F40 (10–16% helix) are orange; F41−F50 (<10% helix) are red with the unfolded F50 dotted. (B) Overlay of some normalised ATR thin-film (solid lines) and aqueous transmission (dashed lines) spectra. (C) LOOV deviations of SS prediction from PDB structures for helix (α-helix + 310-helix) and β-sheet for the Amide I 50-protein thin-film reference set in order of decreasing helix content from left to right. 5 × NRMSD of the spectral fit is overlaid. Other category deviations are minus the sum of helix and β-sheet deviations. (D) Phosphoglycerate kinase (F17) LOOV SOMSpec output from 50-protein film reference set for a relatively poor quality example. In the map, U1, U2, U3 are the best matching nodes for the test protein. These can be expressed as linear combinations of their neighbouring reference set nodes. The proteins can be identified from Supplementary Table S1 in the SM, by noting that the test protein is F17 in the reference set, so proteins R1–R16 correspond to F1–F16, and R17–R49 correspond to F18–F50. The real spectrum is F17’s input data, and the predicted spectrum is the SOMSpec output.
FIGURE 2Deviations of predictions from PDB structures for average helix (α-helix + 310-helix) and β-sheet for Amide I of 68 aqueous test proteins presented in order of decreasing helix content from left to right. Other category deviations are minus the sum of helix and β-sheet deviations.
FIGURE 3(A) Amide I transmission IR spectra of 30 solid-state proteins normalised to 1. Proteins with α-helix content >45% are indicated with broad lines and have maxima above 1,650 cm−1. Papain and lysozyme are broad dashed lines (see text). Colour coding of spectra is the same as in Figure 1: >60% helix purple; 45–59% helix blue; 34–44% helix turquoise; 26–33% helix green; 17–25% helix yellow; 10–16% helix orange; <10% helix red. Inset: Amide I maxima plotted against α-helix and β-sheet content. (B) LOOV deviations of SS prediction from PDB structures for α-helix and β-sheet for the Amide I of 30-protein solid-state reference set presented in order of decreasing helix content from left to right (for protein identities see Supplementary Material). Other category deviations are minus the sum of helix and β-sheet deviations.
FIGURE 4Deviations of secondary structure prediction from PDB structures for helix and β-sheet for the Amide I band of the 50-protein film reference set presented in order of decreasing helix content from left to right for (A) direct Gaussian band-fitting and (B) the second derivative fitting approach reported in reference (Yang et al., 2015). See Supplementary Material for protein identities. Deviations of the Other category deviations are minus the sum of helix and β-sheet deviations.