Software:GB Param

From biowiki
Jump to navigation Jump to search

Pavel

crsparam.f int tinker-bat

Bradley

http://biomol.bme.utexas.edu/wiki/index.php/Research:Dna#GAY-BERNE_PARAMETERIZATION /home/others/hamiltonba/dev/crs-param-cpp/trunk

Brad has prgroams to generating all atom dimer configurations at various distances, and at each distance the molecule is rotated about the symmetry axis to get a Boltzmann average all-atom energy for that distance.

/home/other/hamiltonab/nuc/vdw2

He has one script for each configuration, face, t-shape, and cross.

beside rotating, face-face also need to average flipping the molecules. For nonbenzene molecule e.g. base ring, the flipped one can not be sampled by rotation.
xbase may be newer than base.pl

Kelly

This is based on Brad's work /home/others/kstanton/dev/crs-param-cpp/trunk

Required files:

The parameterization program: coarse_grain.exe

The Config file: CrsParamConfig.ini

The post parameterization plot script: param_results_plot.txt


In order to check out the parameterization source compile the program and run it:

1. Check out the source using subversion: svn co svn+ssh://Bme-Earth/subversion/crs-param-cpp other useful commands include: svn list svn+ssh://Bme-Earth/subversion which will give you a look a the directory structure of the subversion on Bme-Earth

2. Run make on the trunk directory. This should compile the source

3. Edit the config file CrsParamConfig.ini and make any changes. It is important that the correct directory for Lenard Jones data is provided if running from weighted Lendard Jones data, or the correct archive file if running from xyz archive data.

3.1 Make sure that the lenard jones data is present in the specified directory and that the energy versus distance file with the name specified in the config file is also in that directory. If you would like the results plot after the simulation make sure that param_results_plot.txt is located in the same directory as it will get run on completion of parameterization.

4. Run the coarse_grain.exe program to obtained optimized paramters for the structures provided.


Program Flow

1. Configs are initialized from the config file

2. The archive file is split into xyz files

3. An array (vector container object) of molecule objects is made a molecule object represents an xyz file that has a pair of geometries to test energy values.

4. Each molecules object is initialized from the xyz files

5. Analyze is called to get the all atom energy

5.1 Alternatetively instead of using an archive, the Energy versus distance data can be obtained by using Lenard Jones Data.

6. xyzeul is called to get the Gay Berne coordinates and angles

7. All of this information is stored in the molecules object.

8. The array of molecules is then compared against the outlier cutoff energy and all molecule pairs with too large an energy are discarded

9. The optimizer is then called passing a minimization function to it and the scaled initial guess parameters (guesses and scales from the config file).

10. Minimization is performed on a least squared over all geometries basis. Least squared difference between the Gay Berne energy and the all atom energy over all the geometries. Egb (GB calculation) is invoked from a molecules method that calculates the gb energy at the current parameters on the fly. Scaling is applied within the optimizer as well.

11. The rms is calculated and the final parameters are sent to std out.

Config file

You must specify initial guesses for all the Gay Berne parameters. These are doubles.

Ex. dInitE0 = 0.5

You can specify not to optimize a parameter by providing the optimize flag a boolean false (0)

Ex. bFindE0Flag = 0 #don’t optimize this

Ex. bFindE0Flag = 1 #optimize this

strLJEnergyFile

strLJEnergyDir

If you are using Lenard Jones data from for example Brads program, specify the energy versus distance information in a file specified as the strLJEnergyFile in the config file. This file must reside in a directory specified by strLJEnergyDir in the config file. This same directory must also contain corresponding xyz files named using thier atomic separation distance. EX: d.10.9.xyz where 10.9 is the separation distance in angstroms. The format of the enery versus distance file is tab delimited distance then energy with each line separated by a carraige return.

The arc file is the archive of geometries to be tested (if testing using an archive)

The tinker key is needed so that analyze knows to use amoeba or whatever all atom dictionary you want.

Scaling are scaling factors for the optimizer.

Ex. dScaleDw = 7.0e1

Fminimum is a optimizer parameter (I don’t know what this does)

Fgrdmin is the minimum gradient goal for the optimizer. It will stop when it reaches this.

fDelta is the size of the delta value used in calculating the numerical gradients with repect to the parameters.

dOutlierEnergyThreshold is the cutoff energy of the all atom model for excluding a given geometry as an outlier

bUseLJdata: 1 specifies to use included Lenard Jones Data in the Format required

0 specifies to obtain parameters using analyze and an archive file (this requires analyze in the same directory)

Needed to compile and run

Source files (.c, .cpp, and .h)

Libtinker.a

Makefile


Needed to run

Executable

Config file

Xyzeul (no longer required)

Tinker.key

Amoeba

Analyze (only if not using Lenard Jones Data)

Specified archive file (only if not using Lenard Jones Data)

Lenard Jones data directory with correct file (only if using Lenard Jones Data)

For output graphs (the Gbvsallatom plot script is also needed)


Needed to debug

Fortran and C++ source files

Gdb

Eclipse (technically not required but manually using gdb without an IDE interface can be time consuming)

Compiled code with –g option on (should be turned on in the make file)

First need to compile libtinker.a with –g then compile the parameterization code with –g

This is currently on in the makefile and the libtinker.a in the crs-param-cpp directory is already compiled with -g

Scripting

Currently all of the input to the parameterization program comes from either the config file or the archive file. This means that perl scripts etc. must modify the config file to specify a different archive or initial guesses. Because it may be useful to be able to specify initial guesses or archive files on the command line, this may be added.

All output is sent in a stream to stdout with the exception of a Gb vs all atom energy vs distance profile plot that is output as well.

Output for long jobs can be sent to a file using the “>” command

Ex. Course_grain.exe > output.txt

Future Directions

Integration with brads RNA program to obtain Gay Berne values for biologically important molecules. (Done)

Fortran 90 ?