Ammm:Umbrella sampling and WHAM

From biowiki
Jump to navigation Jump to search

Reaction Coordinate

  • A reaction coordinate is an abstract one-dimensional coordinate which represents progress along a reaction pathway.
    • Real coordinate system: bond length, bond angle, torsion
    • Non geometric parameters: bond order, Hydrogen bonds, RMSD
  • Reaction coordinates are often plotted against free energy to demonstrate in some schematic form the potential energy profile associated to the reaction.

Umbrella Sampling (US)

Umbrella sampling is a method in computational physics and chemistry, which can enhance the sampling of different systems when it is hard to realize ergodicity due to the energy landscape. It involves the importance sampling in statistical mechanics and was first recommended by Torrie and Valleau in 1977[1].In many cases, the configuration space may have multiple high-energy barriers which are hard to conquer in the limited simulation time with common molecular dynamics (MD). Therefore, it is of high probability that the trajectory is trapped in the local minimum (local state) and leaves other inaccessible important states unsampled. In order to remove the impact of the energy barriers (activation case), the researchers can chosose the umbrella sampling (US) to bridge the gap between different configuration states by adding an extra bias potential. This bias potential should cancel out the influence of these energy barriers so that the energy landscape will become more smooth, making it easier to reach other states.

Why do we need biased potential?

Features of method

  • Pre-determined collective variables are required to describe the configuration space related to reaction.
  • The target of this method is to handle activation problems in sampling, where high energy barriers can be smoothened by adding extra bias potential. (Convert activation into diffusion)
  • The extra bias potential causes the loss of temporal properties.
  • This method supports parallel MD simulations with different bias potentials at the same time so that computational cost is reduced.
  • This method demands post-processing to remove the effects of bias potential.

Weighted ensemble

The original ensemble can be modified by using a weight function. The weight function should be positive here.

In the equation above, the denotes the weighted ensemble average and the probability distribution function can be shown as:

To reflect the connection between bias potential and the weight function, we can represent the weight function as an exponential form: , where means the bias potential.

Free energy calculation

After defining a collective variable (CV) , we can calculate the free energy at any point along the direction of CV by using the probability distribution function. (The reference point is chosen at )

When the bias potential is added, the free energy change under the weighted ensemble can be given by:

In this case, the original free energy change should be recovered by combining these two equations above: (the denominators have been offset in the ratio)

The best bias potential can help remove all the possible barriers on the energy landscape, which means the free energy change under this bias potential should be 0 everywhere. So, it is obvious to know that the best bias potential must counterbalance the true free energy.

Challenges of Umbrella Sampling

  • Pick up the best bias potential along the CVs to realize the sufficiently flat energy landscape
  • Ensure complete scanning of all the configuration space along the CVs under bias potential
  • Perform discretization of the probability distribution along CVs

WHAM

Historty

  • Initially developed in 1989 by Ferrenberg and Swenden: Ferrenberg-Swenden reweighting [2]
  • Generalized by Kumar et al.: Weighted Histogram Analysis Method [3]

Basic Idea

The Weighted Histogram Analysis Method (WHAM) is a practical technique for the calculation of potentials of mean force (PMFs) from a group of umbrella sampling simulations. It can be applied to analyze the energy data based on not only real collective variables (have relations to the atom coordinates) but also virtual variables (temperature or interactions in the system). This technique can give out a systematic way to improve the choice of the bias potential from a set of different bias potentials so that it yields uniform sampling along the collective variables. Meanwhile, the multiple simulations can start from different points along collective variables, then the full sampling can be realized. Also, the underlying theory of this method provides us with a practical way to do histogram analysis and calculate free energy under discretized language.

Discretization of configuration space along collective variable

Assume we have run simulations at the same time, with each using a different or same bias potential and temperature.

The whole configuration space can be divided into bins. The middle bins share the same width while the beginning and end bins have the width . After the division, the configuration space becomes a histogram graph, and the energy value for each bin in simulation should be given by the following equation: (energy for middle bins is set to be the value of the middle point of this bin, while the energy of the beginning bin is set to be the value at the starting point and the energy of the end bin is set to be the value at the endpoint)

Then, the bias potential energy of the simulation along collective variable can be expressed in discretized language:

is characteristic function: if is in the range then it takes value , otherwise it is .

From the operation of dividing the whole configuration space into discrete bins, it is obvious that more bins indicate higher resolution, but more snapshots from the simulations are required. To make the expression more clear, we directly use to denote the function in the following.

State density & probability distribution function

The most important property in WHAM is state density (or energy state density) . In an NVT ensemble, the generalized state density can be shown as:

then represents the volume of microstates in . The partition function can be illustrated by using state density:

is the configurational integral. From these equations above, the probability distribution function can be expressed as:

In WHAM, the probability and state density can be calculated by using statistical data: is the number of counts at in simulation , is the total number of snapshots in simulation .

denotes the proportionality constant unique for k-th simulation (related to configurational integral), which can be modified further to make the following equations more pretty: .

Derivation of WHAM[4]

When bias potential is shown explicitly, the state density becomes:

Because the original energy distribution is the same for all the simulations and will be reduced in the following deduction, then we can remove that in state density.

The state density with the best bias potential can be given by a linear combination of state density functions from all simulations, with each one being assigned a weight . The sum of all the weights should be .

To get the best state density, we need to minimize the mean square deviation (MSD)

The MSD of can be given by multiplication of a correlation factor and its average[5]: (the correlation factor is roughly equal if the division and running time are the same for all simulations, so it can be canceled)

An estimate can be made to obtain the average :

which can be compared with .

Then: .

The minimization can be finished by using the method of Lagrangian multipliers:

Because , then .

.

Then:

In bin :

The un-normalized density at bin in simulation can be given by:

Because is related to configurational integral:

The most common case is that all the temperatures are the same, and the correlation factors are cancelled out. Then the equations and can be simplified into:

Self-consistent algorithm

In equations and , the unknown values are and . We can work out their values by using the self-consistent method.

At first, we can assume and put them into Equation to obtain .

Next step: Put all into Equation to get the values of .

Repeat the two steps above until the deviations of all these values are smaller than pre-set tolerance. The true free energy change between any two bins in simulation without bias potential can be calculated by:

Procedure of umbrella sampling

  • Define collective variables appropriate to describe the reaction space
  • Divide the whole reaction space into bins with proper resolution
  • Generate different configurations at each bin by using steered MD (or directly do pulling during different simulations)
  • Set bias potential for each configuration (commonly used: harmonic restrain)
  • Run multiple MDs with different bias potentials and record the trajectories & energy
  • Put data into the WHAM processing program (use the module from Gromacs, …)

Examples

Butane[6]

  • Protocol
    • 18 independent simulations
      • 500ps
      • Restraint spring constant = 0.02 kcal/mol-deg
    • WHAM
      • 90 bins (40/bin)
      • Enforced periodcity
  • Histograms from Individual Trajectories

  • Histograms of Combined Trajectories

  • Butane PMF

Ion Channel [7]

Tutorial of US & WHAM

The Gromacs provides us with a good tutorial to show the way to perform Umbrella Sampling & WHAM post-processing: http://www.mdtutorials.com/gmx/umbrella/ [8]

References

  1. Torrie, G. M.; Valleau, J. P. (1977). "Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling". Journal of Computational Physics. 23 (2): 187–199. doi:10.1016/0021-9991(77)90121-8
  2. Ferrenberg; Swenden, Optimized Monte Carlo data Analysis, Phys.Rev.Lett. 1989, 63, 1195-1198
  3. Kumar S, Bouzida D, Swendsen RH, Kollmann PA, Rosenberg JM: The weighted histogramfckLRanalysis method for free-energy calculations on biomolecules. I. The method. J Comp ChemfckLR1992, 13: 1011-1021.
  4. Roux B: Extension to the weighted histogram analysis method: combining umbrella sampling with free energy calculations. Comp Phys Commu 2001, 135: 40-57.
  5. Ferrenberg, A. M., and Swendsen, R. H. (1989) Optimized Monte Carlo Data Analysis. Phys. Rev. Lett 63, 1195–1198.
  6. membrane.urmc.rochester.edu/wham/wham_talk.pdf
  7. Zhifeng Jing, Joshua A. Rackers , Lawrence R. Pratt, Chengwen Liu, Susan B. Rempe and Pengyu Ren. Thermodynamics of ion binding and occupancy in potassium channels. Chem. Sci., 2021, 12, 8920-8930
  8. Justin A. Lemkul and David R. Bevan. Assessing the Stability of Alzheimer's Amyloid Protofibrils Using Molecular Dynamics. J. Phys. Chem. B 2010, 114, 4, 1652–1660