Calculation of the relative metastabilities of proteins using the CHNOSZ software package
© Dick; licensee BioMed Central Ltd. 2008
Received: 08 March 2008
Accepted: 03 October 2008
Published: 03 October 2008
Skip to main content
© Dick; licensee BioMed Central Ltd. 2008
Received: 08 March 2008
Accepted: 03 October 2008
Published: 03 October 2008
Proteins of various compositions are required by organisms inhabiting different environments. The energetic demands for protein formation are a function of the compositions of proteins as well as geochemical variables including temperature, pressure, oxygen fugacity and pH. The purpose of this study was to explore the dependence of metastable equilibrium states of protein systems on changes in the geochemical variables.
A software package called CHNOSZ implementing the revised Helgeson-Kirkham-Flowers (HKF) equations of state and group additivity for ionized unfolded aqueous proteins was developed. The program can be used to calculate standard molal Gibbs energies and other thermodynamic properties of reactions and to make chemical speciation and predominance diagrams that represent the metastable equilibrium distributions of proteins. The approach takes account of the chemical affinities of reactions in open systems characterized by the chemical potentials of basis species. The thermodynamic database included with the package permits application of the software to mineral and other inorganic systems as well as systems of proteins or other biomolecules.
Metastable equilibrium activity diagrams were generated for model cell-surface proteins from archaea and bacteria adapted to growth in environments that differ in temperature and chemical conditions. The predicted metastable equilibrium distributions of the proteins can be compared with the optimal growth temperatures of the organisms and with geochemical variables. The results suggest that a thermodynamic assessment of protein metastability may be useful for integrating bio- and geochemical observations.
Owing to the growing body of compositional data for microbial proteins and the exploration of environments that are extreme from the human standpoint, it has become possible in recent years to draw correlations between the compositions of proteins and environmental parameters such as temperature . Accounting for the underlying causes of the observed correlations between environmental parameters and protein composition is an ongoing challenge. Biochemical approaches are based in part on the notion that proteins from thermophilic and hyperthermophilic organisms should have greater structural stabilities than their mesophilic counterparts . Compositional features of thermophilic proteins that may enhance their structural stabilities include increased numbers of hydrophobic residues, stronger charge interactions on the protein surfaces, and other properties of the amino acid sequence . However, it has also been suggested that, at least for sulfur, the elemental makeup of proteins is correlated with the chemical compositions of the environment . This study was motivated by the desire to explore a possible thermodynamic explanation for the relationship between protein composition and the extracellular environment, which is shaped in part by geochemical constraints.
A thermodynamic assessment of protein metastability provides a framework for describing the relationship between geochemistry and protein composition that until now has received relatively little attention. The geochemical literature abounds with examples of theoretical calculation of the compositions of stable and/or metastable equilibrium reference states as a way to predict the distributions of, and reaction pathways among, minerals and inorganic or organic aqueous species [5, 6]. In recent years, the calculation [7–11] and experimental investigation [12–14] of metastable equilibrium states in biogeochemical systems has gained traction. The primary advantage of extending a framework of this type to proteins and other biomacromolecules is that it places biochemical reactions in the same context as observations on the inorganic systems to which microbial metabolic pathways are coupled. Temperature, pressure, oxidation state and pH are just some of the variables that are commonly measured in geochemical studies that also appear explicitly in the thermodynamic representation of protein metastability reactions.
This study was undertaken in order to explore the thermodynamic relationships between geochemical variables and protein composition for model proteins from a number of organisms adapted to different environments. The cell-surface glycoproteins in archaea and the surface-layer proteins in bacteria [15, 16] were chosen for this purpose because they are intimately associated with the extracellular aquatic and mineralogical setting.
Because experimental values of the standard molal Gibbs energies of the model proteins were not available, they were calculated using previously reported group additivity and equations of state algorithms that are referenced to ionized unfolded aqueous proteins [17, 18]. These values are requisite for calculating the composition of the metastable equilibrium state in an open system described by chemical potentials of basis species, or perfectly mobile components [19–22]. The predicted chemical activities of species can then be displayed on chemical predominance and/or speciation diagrams whose axes correspond to intensive chemical variables. Because of the lack of integration of algorithms for calculating thermodynamic properties of proteins in available geochemical equilibrium software packages, the task of calculating and graphically representing the metastable equilibrium distributions of the proteins was managed through development of the CHNOSZ software package, which is introduced in this study.
The implementation of the thermodynamic algorithms and data into the package is described first below. The results of the calculations for the model system of proteins are then described and are displayed primarily in the form of diagrams depicting the calculated metastable equilibrium distributions of the proteins. The graphical depictions shown below are only limited portrayals of the metastable equilibrium states of systems of proteins, which are in fact multidimensional functions of thermodynamic variables. The predicted response of at least one of the metastability reactions between proteins from hyperthermophilic and mesophilic organisms appears to be aligned with the differences in temperature, pressure and oxidation state between their environments. However, more tests in other systems will be required to assess the generality of the approach. Some potential implications of the findings are addressed briefly in the concluding remarks, and the paper is finished with a section devoted to the methods adopted for writing protein metastability reactions and computing their thermodynamic properties.
The CHNOSZ software package consists of source code, data files, and documentation. It is written for the cross-platform R software environment . The package can be freely downloaded from the project website at http://www.chnosz.net. The features of the package, its basic program structure, and the thermodynamic database are summarized in the following paragraphs.
CHNOSZ was developed in order to ease calculations of 1) the standard molal thermodynamic properties of chemical species and reactions as a function of temperature and pressure, 2) the standard molal thermodynamic properties and equations of state parameters of neutral and ionized proteins using group additivity algorithms, 3) the chemical affinities of formation reactions of species of interest from basis species describing the system, and to assist in 4) generating metastable equilibrium activity diagrams for systems of biomolecules and/or other species.
The functions provided in CHNOSZ are suitable for either interactive use or scripted operation. The diagrams that are produced can be viewed on screen or saved as postscript files. Because the thermodynamic database includes the chemical formulas of species in addition to their standard molal thermodynamic properties, functions operating on user-input chemical reactions have the option to check, and possibly automatically correct, the mass balance of the reactions. This feature can speed up user interaction with the program and the writing of program scripts. The program has been designed with features in mind and is not presently optimized for speed. Most of the diagrams shown below can be produced in under a minute, but temperature-pressure diagrams of the same resolution require substantially more computational time, owing to the number of times the equations of state subroutines are called.
The package was developed with the goal of analyzing protein reactions, but the range of systems that can be studied using the software is limited only by the species available in the thermodynamic database, to which the user can make either temporary or persistent additions or updates. Complete documentation of the functions, including examples derived from the geochemical literature and this study, is provided with the package. Usage of the major functions in CHNOSZ is summarized below.
The accessory function water implements two computational options for calculating the thermodynamic and electrostatic properties of liquid H2O as a function of temperature and pressure. The first of these options provides an interface to the FORTRAN subroutine named H2O92D.F that was distributed with SUCPRT92  and that is included in the CHNOSZ source package. The calculation of the properties of liquid H2O in this case is consistent with data and equations from Refs. [25–27] and others (see Ref. ). The stated temperature and pressure limits of applicability for these calculations, described in Ref. , are from 0.01°C and PSAT (i.e., 1 bar at temperatures below 100°C and the saturation vapor pressure of H2O at higher temperatures) to 2250°C and 30000 bar. However, electrostatic properties of the solvent, which are required by the revised Helgeson-Kirkham-Flowers (HKF) equations of state for aqueous species, can not be computed above 1000°C and 5000 bar. An alternative computational option for the properties of liquid H2O corresponds to the IAPWS-95 formulation for thermodynamic properties  coupled with equations for electrostatic properties taken from Ref. .
The functions denoted by eos in Fig. 1 actually consist of two functions, hkf, for calculating as a function of temperature and pressure the standard molal thermodynamic properties of aqueous species using the revised HKF equations of state [30–33], and cgl, for calculating the properties of crystalline, gaseous and liquid (except H2O) species. The heat capacity equation implemented in CHNOSZ for these species contains up to six terms, as used in Ref. ; the first three terms are those in the Maier-Kelley equation [35, 36] which is used in the SUPCRT92 package.
The accessory function info provides a bridge between the thermodynamic and protein databases and the other functions. The function known as makeup is concerned with conversion between various computer- and human-readable representations of the chemical compositions of species. Its primary purpose is to transform the chemical formulas of species contained in the thermodynamic database (e.g., 'C4H6NO4-' for aspartate) into dataframe objects (which in R are similar to matrices with named columns and rows) so that other functions or makeup itself can perform further calculations on the stoichiometries of species. This function is also responsible for transforming a compositional dataframe back into a one-line chemical formula, and for calculating the reaction coefficients of basis species in formation reactions of the species of interest. It is with the aid of this function that subcrt checks whether a user-input chemical reaction is balanced with respect to mass and charge and automatically corrects the reaction if the necessary basis species have been defined.
The primary function subcrt and the related accessory functions permit calculation of the standard molal Gibbs energies of protein formation reactions and corresponding values of the equilibrium constants (K r in Eqn. M7). Calculation of the activity products and chemical affinities of reactions (Q r and A r in Eqn. M7) is implemented in the sequence of primary functions basis, species, affinity that is depicted in Fig. 1.
Two conditions are required of a valid set of basis species in CHNOSZ: 1) the number of basis species is equal to the number of elements (and charge, if present). 2) The stoichiometric matrix denoting the elemental composition (and charge if present) of the basis species, which is square according to condition (1), is non-singular and has a real inverse. These two conditions ensure that a formation reaction for any species of interest in the system can be written using only positive or negative real numbers as reaction coefficients on the basis species. The basis species themselves can be any species that are present in the thermodynamic database, including nonionized proteins. The function basis also permits redefining the physical states of basis species (if a corresponding species in that state is present in the thermodynamic database) and/or setting the activities (a) or fugacities (f) of the basis species to be used in the following calculations. These values have default settings given by log a = -3 for aqueous species, log f = 0 for gases and log a = 0 for other species. The function basis can also be used to assign a buffer to one or more basis species so that the activities or fugacities of those basis species are taken from the buffer system.
After defining the basis species, the user can select any number of species of interest using the primary function species. The user may also call species to remove species or to alter the chemical activities or fugacities of the species of interest to be used in the calculations of chemical affinity. These values default to log a = -3 for aqueous species, log f = 0 for gases and log a = 0 for other species.
The function affinity permits calculation of log Q r and A r of formation reactions (such as those represented generically by Reaction M1) using Eqn. (M7) taking into account the activities and/or fugacities of the basis species and the species of interest. The contributions of the Q r and K r terms to the calculation are denoted conceptually in Fig. 1 by the two arrows, from the top and left, respectively, pointing toward the box labeled affinity. The calculations of chemical affinity can be carried out at a single point in temperature, pressure, chemical activity space, or as a function of one or two of T, P and logarithms of chemical activity or fugacity of the basis species. The accessory function buffer is invoked by affinity if one or more basis species were previously associated with a buffer system; the activities or fugacities of the basis species constrained in this way are then used by the program to calculate log Q r using Eqn. (M5).
The results of the calculations performed by affinity are accepted as input by diagram, which produces the diagrams using plotting functions provided in the R distribution. Many options are available for adding labels and legends and otherwise customizing the plot style.
The database of thermodynamic properties packaged with CHNOSZ is contained in a file named OBIGT.csv. Work on this database was motivated by a software project developed by H. C. Helgeson and coworkers, named OrganoBioGeoTherm, that provides a Windows interface to the SUPCRT92 program (J. J. Donovan, personal communication).
The thermodynamic data file has records for over 2500 inorganic, organic and biochemical crystalline, gaseous, liquid and aqueous species. The thermodynamic data were originally taken from the data file distributed with the SUPCRT92 package. Updates since that time were taken from the SLOP98 data file downloaded from http://geopig.asu.edu and from recent reports of thermodynamic data and revised HKF equations of state parameters for aqueous inorganic and organic species, as well as proteins and other species of biogeochemical interest [[38–40], and others]. The records in the data file include the names, states and chemical formulas of the species, up to two literature citations, and values of the standard molal thermodynamic properties at 25°C and 1 bar and equations of state parameters. The comma-separated-value (.csv) file format permits rapid reading of the data file by the CHNOSZ program or other software as well as addition to or modification of the file contents by the user. The CHNOSZ package also provides utility functions that can be used to export or import thermodynamic data to or from the SUPCRT92 data file format.
The data file protein.csv of amino acid compositions of proteins has records for over 200 proteins including those referred to in the present study. The user can add the composition of a protein to CHNOSZ by modifying this file, or at run time by inputting the amino acid composition of the protein at the command line or requesting a search of the online Swiss-Prot database http://www.expasy.org through the function called protein.
Model proteins used in the present study.
T opt a
The relative metastabilities of the model proteins were calculated as a function of temperature, pressure and chemical activities or fugacities of basis species. Results of the calculations are presented below primarily on metastable equilibrium activity diagrams depicting either the predominant protein species as a function of two intensive variables, or on speciation diagrams showing the metastable equilibrium chemical activities of proteins as a function of a single variable. The computations were carried out using the CHNOSZ software package together with a program script for use with the package that is provided in Additional File 1.
where R stands for the gas constant and log K1 and A1 denote, respectively, the logarithm of the equilibrium constant and the chemical affinity of Reaction 1.
Execution of the first command shown in Example 3 defines the basis species characterizing the chemical system. Here, 'CHNOS+' is a keyword that identifies the basis species used in this paper and that appear in Reaction 1. The second command defines the species of interest, corresponding to the proteins listed in Table 1. With the third command, the chemical affinities of the formation reactions of each of the proteins are calculated on a two-dimensional grid as a function of pH and log and the results assigned to a temporary object. Finally, the fourth command instructs the program to produce a metastable equilibrium activity diagram for the system, which in this case is a predominance diagram as a function of pH and log . The reference temperature and pressure and activities of the basis species and proteins are not explicitly specified in Example 3, and are set to default values by the program that correspond to those described in the Methods.
Using CHNOSZ, the chemical affinities of Reaction 4 and its counterparts for any other specified proteins of interest are first computed using Eqn. (M7). The chemical affinities of the formation reactions are then compared with one another to determine the theoretically predominant protein given the input conditions, which is the one with the highest chemical affinity of formation per residue. In this way, it is possible to generate predominance diagrams like those shown in Figs. 3a and 3b for any number of proteins. The diagram shown in Fig. 3a was produced using all ten proteins listed in Table 1, but only some of the proteins predominate at different points in the diagram. Removing these proteins from consideration leads to the results shown in Fig. 3b, where the metastability relationships among some of the less metastable proteins are depicted.
Let us propose to write the formulas of proteins in metastability reactions as residue equivalents instead of whole protein formulas. The chemical formula or any standard molal thermodynamic property of a residue equivalent of a protein is defined to be that of the protein divided by the length of the protein. In contrast, assuming activity coefficients of proteins and residue equivalents to be unity, the chemical activity of the residue equivalent of the jth protein (aresidue, j) is equal to the chemical activity of the protein (a j ) multiplied by the length of the protein (n j ):
aresidue, j= n j × a j .
Let us now consider conditions such that the metastable equilibrium activities of the proteins are each equal to 10-3. From Eqn. (5) we have aresidue, CSG_METJA = 0.530 and aresidue,CSG_METVO = 0.553, so log (aresidue,CSG_METJA/aresidue,CSG_METVO) = - 0.018. Now, if log is decreased by one unit, it follows from Eqn. (7) that to maintain metastable equilibrium, log (aresidue,CSG_METJA/aresidue,CSG_METVO) = -0.018 + 0.163 = 0.145. Supposing aresidue,CSG_METJA to be held constant at 0.530 (aCSG_METJA = 10-3), aresidue,CSG_METVO would be 0.380 (aCSG_METVO = 10-3.16). This type of assessment leads to the results shown graphically in Fig. 5b, where it can be seen that the metastable equilibrium activities of the proteins as a function of log are within a few log units of each other, even for the non-predominant proteins.
where the value on the right-hand side corresponds to initial activities of the proteins each equal to 10-3. Solving Eqns. (12)–(14) gives = 0.307, = 0.776 and A(8 or 9)/2.303RT = 0.703.
The addition of any protein to the system increases by one the number of unknowns in Eqn. (14) but also provides another equation in the form of Eqns. (12) and (13). The procedure to set up and solve these equations has been encoded in a general form in CHNOSZ and was used to produce the diagrams shown in Fig. 5. The CHNOSZ program includes options to analyze the protein formation reactions using whole protein formulas or their residue equivalents, which were used to construct Figs. 5a and 5b, respectively. The logarithm of total activity of protein residues is 0.8211 in each of these figures, which corresponds to the sum of the activities of the residue equivalents of the ten model proteins whose starting activities are 10-3.
where denotes the number of proteins in the system, represents the total activity of protein residues, and . The degrees of formation of the proteins corresponding to the logarithms of activities shown in Fig. 5b are depicted in the figure in Additional File 2. This degree of formation diagram aids in visualization of the computed relative abundances of the proteins on a non-logarithmic scale.
The residue-equivalent approach was used in this study only to produce the diagrams shown in Fig. 5b and Additional File 2. The predominance diagrams shown elsewhere were produced using whole protein formulas in the formation reactions. Extending the residue-equivalent method to these diagrams would subtly alter the positions of the predominance field boundaries, more so for reactions between proteins that differ significantly in length. The differences in the locations of the predominance field boundaries can be assessed in part by comparing the locations of the crossover between predominant proteins in Figs. 5a and 5b.
We can recover nominal values of log in the natural environments of M. voltae and M. jannaschii from geochemical data. The first of these organisms was originally isolated from the sediment of an estuary  and the other inhabits submarine hydrothermal vent environments . Values of (activity of dissolved hydrogen) were taken from  and converted to log using the law of mass action for H2O ⇌ H2(aq) + 0.5O2(g) evaluated at 25°C and 1 bar to calculate a nominal range of log for estuarine sediment of -73 to -70. Values of log obtaining in mixed hydrothermal vent fluid and seawater at 100°C are in the range of -65 to -60 . The first of these ranges would plot near the CSG_HALJP – CSG_METVO boundary in Fig. 6a at 25°C and the second one near the boundary between CSG_METVO and CSG_METJA at 100°C. This observation might support the notion that proteins from hyperthermophilic organisms like M. jannaschii are thermodynamically favored relative to those from mesophilic organisms by increasing temperature accompanied by changes in the geochemical oxidation state.
It appears in Fig. 6b that increasing pressure also generally favors those proteins in lower oxidation states, but that the dependence of equilibrium log values on pressure is small relative to their dependence on temperature.
The chemical activities of basis species buffered by reacting protein assemblages correspond to the locations of the (pseudo)invariant points on metastable equilibrium predominance diagrams. Equal activities of three proteins correspond to the triple point, which is a pseudoinvariant point, in the predominance diagram shown in Fig. 3b. The number of independent variables on the axes of this diagram is two; in an eight-dimensional predominance diagram (of temperature, pressure and six chemical activities) one could distinguish the true invariant points in this system where nine proteins coexist with equal metastable equilibrium activities.
where the rows on the right-hand side and in the stoichiometric matrix on the left-hand side correspond to the proteins from the METXX organisms listed in Table 1. Solving Eqn. (18) gives /2.303RT = -0.739, log = -8.44, log = 7.92, log = 27.92 and log = -13.09. These values signify that the formation reactions of the proteins per residue are energetically unfavorable ( is negative) and that the hypothetical protein assemblage may not be metastably present (for example, the large positive values for and differ from probable natural ranges). Unambiguous identification of a natural metastable protein assemblage may require more comprehensive calculations coupled with insight gained from experiments and observations in the field.
A computer program called CHNOSZ was introduced in this paper for producing metastable equilibrium chemical activity diagrams for proteins. The methods used here were borrowed from geochemistry, and the program with the accompanying thermodynamic database is suitable for performing thermodynamic calculations in inorganic and mineral systems as well as organic and biochemical systems, or combinations thereof.
To investigate the utility of the program for a geochemical description of protein reactions, metastability diagrams were produced for surface-layer proteins from a number of bacteria and archaea. The diagrams show either the metastably predominant proteins as a function of two intensive variables or the metastable equilibrium chemical activities of proteins as a function of one variable. The primary variables of interest in this study were log , pH, temperature and pressure. It was found that the predicted metastable equilibrium state of the system responded dramatically to changes in these variables. Representing the proteins in reactions by their residue equivalents instead of whole protein formulas gave rise to predicted equilibrium states in which many proteins coexist metastably with comparable chemical activities.
In the preceding sections we have considered the theoretical metastable equilibrium relationships among only a few model proteins. Because the software is now available to do so, a plethora of predictions concerning the energetically favorable outcomes of any number of overall protein mutation reactions is now within reach. Consideration of the results presented above, and of the wide range of model systems that could potentially be investigated in a similar manner, leads to the conclusion that the metastable equilibrium distribution of proteins in many cases does not mirror geobiochemical reality. Nevertheless, the ability to quantify the characteristics of metastable equilibrium reference states as a function of geochemical variables may be of utility in identifying specific pathways in evolution where the resulting proteins are relatively energetically favored. These particular outcomes may reflect a tendency for natural selection to increase the fit between phenotypes and their environments .
A thermodynamic and geochemical perspective on the relative metastabilities of proteins permits a quantitative integration of observations on the geosphere and biosphere. This study has only touched the surface of the myriad possible environments and organisms, the properties and chemical compositions of which are becoming more well constrained through experiment and observation. As these data grow in abundance, they will provide other opportunities where thermodynamic description of the chemical speciation of proteins can be tested and calibrated.
The thermodynamic conventions and relations used to compute the relative metastabilities of proteins in the present study are summarized below. The computational assessment depends first on the adoption of standard states for the species appearing in chemical reactions.
The standard state convention adopted for aqueous species other than H2O corresponds to unit activity of a hypothetical one molal solution referenced to infinite dilution at any temperature and pressure [30, 47]. The conventional standard molal thermodynamic properties of both the aqueous electron and proton are taken to be zero at all temperatures and pressures . For gases, the standard state convention is unit fugacity of the hypothetical pure ideal gas at 1 bar and any temperature. The standard state convention adopted for solids and liquids, including H2O, corresponds to unit activity of the pure substance at any temperature and pressure.
The compositions of species of interest, such as proteins, are represented by linear combination of the compositions of basis species in a system (for an application in geochemical systems, see Ref. ). The number of basis species is the minimum required to write formation reactions for all possible species of interest. There are no thermodynamic restrictions on the actual identities of the basis species, and the basis species do not necessarily correspond to thermodynamic components in the system of interest . Hence, the choice of basis species may be constrained by the chemical activities that can be measured in a system or that are thought to behave as perfectly mobile components . The basis species used in the present study are CO2(aq), H2O, NH3(aq), H2S(aq), H+ and O2(g).
The reaction coefficients on the basis species in Reaction M1 are completely determined by the chemical formulas of the protein and of the basis species. Depending on the sign of the coefficients in front of the basis species, they would appear in specific statements of Reaction M1 as reactants or products.
which corresponds to the difference between specific statements of Reaction M1 for j = 2 and j = 1, divided by n2 or n1, respectively. Here, 1/n1 and 1/n2 denote the conservation coefficients for the corresponding proteins. Reaction M2 is balanced with respect to mass and charge for any values of n1 and n2. If n1 = n2 = 1, Reaction M2 denotes the mass balance constraints for the formation of one mole of product protein at the expense of one mole of reactant protein. Other values may be chosen for n1 and n2, depending on what is specified about the conservation constraints in the system. For example, if n1 = C1 and n2 = C2, the protein metastability reaction conserves carbon  (i.e., the coefficient on CO2(aq) in Reaction M2 becomes zero). The protein metastability reactions considered in the present study are written for n j equal to the length of the jth protein.
where a i represents the chemical activity of the ith species in the reaction. For gaseous species, a i in Eqn. (M5) is replaced by the fugacity of the species (f i ). Activity and fugacity coefficients are taken in a first approximation in this study to be unity.
where denotes the standard chemical potential of the ith species and stands for the fugacity of the species in its standard state, which is unity for gases.
The chemical affinities of reactions (A r ) can be computed from 
A r = 2.303RT log (K r /Q r ),
In an equilibrium state, A r = 0 for metastability reactions and Eqn. (M8) reduces to the logarithmic analog of the law of mass action equation for Reaction M2.
The reference temperature and pressure correspond to 25°C and 1 bar, respectively. The reference chemical activities of basis species used in this study are given by log = 0, log = -3, log = -4, log = -7 and log aH+ = -7 (pH 7). The reference value for log corresponds to pure water, and the others are nominal values that generally fall within the compositional ranges of hydrothermal fluids and seawater . The reference chemical activities of proteins are taken to be 10-3, which is a nominal value that is similar to experimental concentrations used in protein unfolding studies .
The standard molal thermodynamic properties of aqueous species as a function of temperature and pressure can be evaluated using the revised Helgeson-Kirkham-Flowers (HKF) equations of state [30–33, 54, 55]. The temperature dependence of the standard molal thermodynamic properties of crystalline, gaseous and liquid species other than H2O are calculated using a standard equation for heat capacity [34, 35, 56]. For the basis species other than H+ and e-, values of the standard molal thermodynamic properties and of the equations of state parameters were taken from Refs. [55, 57] (CO2(aq), NH3(aq) and H2S(aq)) and [58, 59] (O2(g)). The equations of state adopted for liquid H2O in the present study are those used in the SUPCRT92 software package .
where, for the ith type of ionizable sidechain or backbone group, ni, jrepresents the number of moles of the group in one mole of protein, α i denotes the degree of ionization of the group (0 <α i < 1), and corresponds to the standard molal Gibbs energy of ionization of the group. Values of α i and in Eqn. (M10) were taken in a simple approximation to be equal for all occurrences of a given ionizable group. It may be possible to refine this approach in the future by taking account of interactions of charged residues on the protein surfaces (Ref.  and others since).
where pK i represents the negative logarithm of the equilibrium constant for the deprotonation reaction of the ith ionizable group.
represents the total number of amino acid residues, or length of the protein. Values of for the model proteins considered in the present study were retrieved from the Swiss-Prot/UniProt protein sequence database  (see Table 1).
The thermodynamic properties of unfolded aqueous proteins calculated using the above equations are taken in a first approximation to be representative of the proteins of interest, which may be folded and/or present in crystalline form in cells. Two observations lend support to the applicability of the unfolded protein reference state for the present calculations: 1) The standard molal Gibbs energies of protein folding would tend to cancel each other in metastability reactions, in which proteins appear on both sides of the reaction. 2) The Gibbs energy of unfolding for a small to average-sized protein is about two or three orders of magnitude smaller than the standard molal Gibbs energy for the unfolded protein itself. For example, the Gibbs energy of unfolding of chicken lysozyme is ~14.5 kcal mol-1 at 25°C , but the standard molal Gibbs energy of this protein at 25°C and 1 bar is ~-4.2 × 103 kcal mol-1 (see Figs. 2a and 2b). The size of the unfolding property in this case is much smaller than the ca. ± 5% uncertainty ascribed to the group additivity algorithm . It should be noted, however, that the compositional consequences of protein folding include changes in ionization state, and preferential surface exposure of charged residues , which would be manifested by changes in the reaction coefficients of basis species that might affect the outcome of metastability calculations to a greater extent than the differences in Gibbs free energy alone.
I would like to acknowledge the late Professor Harold C. Helgeson for his friendship and advice during the Ph.D. research project that provided the foundation for this paper. This work was supported by grants EAR-0309829 from the U.S. National Science Foundation and DE-FG02-03ER15418 from the U.S. Department of Energy.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.