Research ex:OpenMM
Software site: tinker-OpenMM
http://biomol.bme.utexas.edu/tinker-openmm/index.php/Main_Page
Theory:
This relase has been designed for the calculation of protein-ligand binding free energies on a cluster of CUDA compatible GPUs. In order to allow for the efficient of use of multiple GPUs, this release includes the ability to perform independant lambda-based energy perturbation calculations that can be summed to calculate ligand binding free energy. The parameter lambda is a desciption of the strength of interaction between a ligand and its environment ( protein, water, etc.). By first calculating the free energy difference that results from the electrostatic lambda being lowered from 1 to 0, and then lowering the Van Der Waals lambda from 1 to 0, we can calculate the magnitude of the interaction between a ligand and its protein environment. This binding calculation also contains the contribution of desolvation, so this calculation of the strength of ligand-environment interaction is repeated in a ligand-water environment. This desolvation energy can be subtracted from the protein-ligand simulation energy to get the free energy of binding.
The Van Der Waals force utalized in this release is a modified version of a standard 14-7 equation, shown below:
Note the important feature that when lambda is one, this term is identical to the standard 14-7 equation, and that at lambda=0, the energy resulting from the Van Der Waals force is equal to zero. Since energy is a state function, the energy value at intermediate lambdas is insignificant. Since this equation is identical to a standard 14-7 interaction at lambda=1, this modified force can be used for all environmental-environmental and intraligand interactions without affecting simulation results by merely setting lambda equal to one for these interactions.
Electrostatic lambda has been implemented by using lambda as a multiplicitive scalar to the permenant charge, dipole, quadupole and polarizability parameters of each particle. For each ligand atom, these parameters have been scaled down by a factor of lambda. At lambda=0, these particles have no associated electronic charge, and thus have a zero magnitude electrostatic force and energy.
Installation
The softcore modified openMM Source can be obtained by running the command git clone https://github.com/pren/openmm-amoeba.git
Environmental variables:
enter the following environmental variable export commands into ~/.bashrc. Change prefixes to match your system.
cuda libraries:
export PATH=$PATH:/usr/local/cuda-7.5/bin/ export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:/usr/local/cuda-7.5/lib:$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/cuda-7.5/
openMM directories( use the same prefix as in the installation below)::
export LD_LIBRARY_PATH=/users/mh43854/openMMstable4/bin/lib:$LD_LIBRARY_PATH export PATH=/users/mh43854/openMMstable4/bin/:$PATH (should be same folder as used for CMAKE_INSTALL_PREFIX in openMM make below); export OPENMM_PLUGIN_DIR=/users/mh43854/openMMstable4/bin/lib/plugins export OPENMM_CUDA_COMPILER=/usr/local/cuda-7.5/bin/nvcc
Step 1 Install openMM softcore
Make sure to enter above environmental variables before starting. Unzip the provided openMM zip file. Make a empty build directory, and inside this build folder run
ccmake../openmm-amoeba-master/ press t to go into advanced options and add the -std=c++0x flag into CXX flags (need for softcore mod using tuples.Then press 'g' to generate a makefile.
Note that this should be done on a machine with a CUDA compatible graphics card and a CUDA installation. make sure OPENMM_BUILD_AMOEBA_CUDA_LIB, OPENMM_BUILD_AMOEBA_PLUGIN, and OPENMM_BUILD_CUDA_LIB are on, and that CUDA_TOOLKIT_ROOT_DIR points to a valid cuda ( version 7.5+) installation. Set and record the option under "CMAKE_INSTALL_PREFIX" (use it in the PATH above);
Still inside this build folder, run
make make install make PythonInstall (if you use python to run openmm; not required for tinker-openmm)
Step 2 Make modified tinker dynamic_omm.x executable
. Download tinker by running git clone https://github.com/jayponder/tinker.git
First, set the environmental CC variable to icc, and F77 to ifort. Make sure icc is in your PATH variable. First, you need to compile fftw by entering the /fftw/ folder, and running ./configure --prefix=/Users/ponder/tinker/fftw --enable-threads( changing the prefix to be the absolute path of the current folder , then make , then make install. The prefix options changes installation directory. It defaults to /usr/lib/, but can be changed to allow install without root access..
copy /make/Makefile into /source/. modify this makefile so TINKERDIR is set to the tinker home directory, uncomment the appropriate platorm( intel is advised for performance reasons, and must match the fftw set above ), and then run make.
Copy the contents of /openmm/ into source, modifying the makefile to set TINKERDIR to the base untarred directory of the softcore modified tinker( the folder that contains the /source/ folder), and OPENMMDIR to the home directory of the openMM installation, containing the bin and lib directories( this was what you set in step 1 of the installation, not the build folder you ran make inside). Uncomment the compile options corresponding to the appropriate platform( again, use iniel), and make this Makefile. modify the libs line as shown above. This should result in the creation of a dynamic_omm.x executable which is used to run tinker simulations on a GPU using CUDA, as well as executables for tinker.
input setup
ligand
File containing a ligand in a water box should be in xyzformat( use tinker's pdbxyz for conversion if needed). To prepare a ligand parameter file, one can use Poltype( more information <a href="http://biomol.bme.utexas.edu/wiki/index.php/Research_ex:Poltype">here</a>). This will output a key file with ligand parameters for use in the simulation.
Waterbox addition
Both the ligand alone( solvation) and ligand-protein complex(binding) simulations require the addition of a waterbox with slightly more than 1/2 the larger of the VDW and Electrostatics cutoff. Premade water boxes have been provided athttp://biomol.bme.utexas.edu/~mh43854/openmm/water_boxes/ .
The sizes of these boxes are:
| box | size(Angstroms) |
| 216water | 18 |
| 500water | 24 |
| 1600water | 36 |
| 4kwater | 49 |
| 32kwater | ~100 |
| 96kwater | ~160 |
If you want to create a water box of a size not included in this folder, use crystal.x from tinker, option 4 to create a larger box from one of the smaller boxes.
The protein-ligand complex file also needs to be in xyz format
protein-ligand complex
To solvate the ligand ( for solvation simulations) and ligand-protein complex( for binding complexes), run xyzedit.x option 20 to soak the system inside the waterbox.
solvation key file
An example key file for solvation is shown below:
parameters /users/mh43854/openeye.prm
ligand -1 24
vdw-lambda 1.0
ele-lambda 0.5
archive
a-axis 40.000
integrator RESPA
ewald
neighbor-list
vdw-cutoff 12.0
ewald-cutoff 8.0
polar-eps 0.00001
polar-predict
The parameters setting should be set to the appropriate AMOEBA parameter file. The ligand line should contain the starting atom of the ligand( with a "-" sign prepended to indicate range) followed by the last atom of the ligand. If multiple atom index ranges are needed to be treated as ligand( such as a multi-ligand system), just add a new set of two numbers to this line( ex. if ranges 1-24 and 50-90 are ligands, the ligand line should read ligand -1 24 -50 90).
If using the preparation script described below, don't include vdw-lambda or ele-lambda lines( these will be added by the provided script).
These desired parameters should be prepended to the keyfile output by poltype.
binding key file
The binding key file parameters are identical to the solvation keyfile except for the addition of the following lines to enable a harmonic restraint of the form k(distance-targetdistance)^2 between the ligand and its surrounding protein atoms(protein atoms in residues within a ~5Å radius of the ligand). The distanced utalized in this calculation is the distance between the centroid of the ligand and the selected protein atoms).
GROUP 1 -1 24
GROUP 2 -25 150
RESTRAIN-GROUPS 1 2 15.0 0
The group lines aboce contain the group number followed by atom ranges with the same syntax as used in defining the ligand line( described under solvation key file heading). The first two numbers correspond to the group numbers to add restraints to, the next number corresponds to k, , and the last line corresponds to target distance in angstroms( most often 0).
Note that the energy needs to be corrected on both the end and beginning points to result in accurate energies.
For the correction going from full strength with restrain to full strength without restraint, one can either calculate this energy diffence by running extra simulations using bar, or by weaking restraint with larger Electrostatics.
For the correction at no intermolecular interaction( going from a low entropy, high energy state to a higher entropy, lower energy state, the magnitude is equal to RT*ln(C0)*V1. C0 is standard concentration, and V1 is the constrained volume, equal to (2piRT/k)3/.2.
running simulations
ren lab nodes
/users/mh43854/openMMstable2/tinker/bin contains the dynamic_omm.x executable. extension corresponds to the node number. use .205 for nodes 205 and 206, and .sugar for sugar and mars
environmental setup
Make sure the CUDA_HOME, OPENMM_PLUGIN_DIR, and OPENMMDIR environmental variables are setup correctly( this should have been done in installation. These variables are described in installation under step 1. Note that the OPENMM_PLUGIN_DIR and OPENMMDIR will be used to select library files, so be sure these environmental variables are correct. It would be advisable to always keep the ~/.bashrc file up to date with these variables so you don't have to constantly export.
Minimize system
The system created above needs to be minimized using the minimize.x routine inside tinker.
Heat and equilibriate
The systems need to be heated in order to start runs at 300K. Heating should be done for 100ps each step, increasing by 50K each step. This step should be done at NVT, with a reasonable estimate of box volume as the water box volume.
An example heating script setup( to put in a shell script) is:
preparation of simulation filesystem
A python script for the creation of the required files and folder is available here [http://biomol.bme.utexas.edu/~mh43854/openmm/scripts/foldercreatenodyn.py . Run inside the folder you want to place simulation results in. Example usage is: Python foldercreatenodyn.py bindingkeyfile.key(described above) bindingxyz.xyz solvationkeyfile.key(described above) solvationxyz.xyz . If you have a dyn velocity file, use http://biomol.bme.utexas.edu/~mh43854/openmm/scripts/foldercreate.py
To run foldercreate.py, append the dyn file to the end of each "block" of two commands. The parameter line should be an absolute( not relative) path. You do not need to include the lambda lines( these will be handled by the script). This script will make bind and solv folders, each containing subfolders with the keyfile for each simulation run.
running a simulation
select a card for the simulation by running the command export CUDA_VISIBLE_DEVICES=0. This environmental variable selects which GPU is used for this simulation. Note that GPUs are indexed starting at 0. Each GPU should only be running one simulation at a time. Running multiple simulations on a card will results in a dramatic reduction in performance.
Simulations are run using the dynamic_omm.x executable created during tinker compilation. This program can be run as an interactive terminal( in which case, include no arguments). To run, execute the command dynamic_omm.x input.xyz 6000000( number of steps) 2.0( timestep, in fs) 5.0( write frequency, in ps) 2( cannonical, NVT ensemble) 298( temperature, in Kelvin) N.
To ease this process, I have created a script to create .sh scripts for starting simulations. It is available at http://biomol.bme.utexas.edu/~mh43854/openmm/scripts/createRunScript.py. To use, run python createRunScript.py number of GPUS simulationRunScript( the command described above). Note that you do not need to include the xyz line(it will automatically include prod.xyz, ,which will be created by the folder creation script above.
Calculation binding free energy using Bennet Acceptance Ratio(BAR)
The tinker program bar is used to estimate the difference in energy between two lambda states( example, vdw-lambda=100, ele-lambda=90 to vdw-lambda=100, ele-lambda=80). To use, execute bar trajectory1.arc 100(starting frame) 5000(ending frame) 1(step size) trajectory2.arc 100 5000 1.Note that the step size should equal 1, as this step is must faster than actually running the simulation. This program will print out the resulting energy to the terminal. The energy for this state transition is listed as BAR Free Energy Estimate.
Note that in order to get the most commonly used metric ( negative energy difference= favorable binding ), one should use the lower lambda trajectory as state 1, and the higher as state 2. To calculate the binding free energy, subtract the sum of the solvation free energies from the sum of the binding free energies.
I have provided an automated script for the creation of shell scripts for bar >http://biomol.bme.utexas.edu/~mh43854/openmm/scripts/barscriptcreate.py</a>) To use, run barscriptcreate.py path/to/bar.x number of desired scripts startingframe endingframe stepsize inside the bind/ or solv/folder. The number of scripts should be the number of nodes in your server you want to use( each bar tasks takes up a whole node). simply ssh into the desired node, and run one of the scripts ( one script per node).
A parser for bar results is available ( <a href="http://biomol.bme.utexas.edu/~mh43854/openmm/scripts/readBAR.py">http://biomol.bme.utexas.edu/~mh43854/openmm/scripts/readBAR.py</a>). It will take the output of barscriptcreate.py, and print a summery file. run inside the bind/ or solv/ folder.