Tutorial:kturn tut: Difference between revisions

From biowiki
Jump to navigation Jump to search
Brendan (talk | contribs)
Page created
 
Brendan (talk | contribs)
No edit summary
Line 1: Line 1:
= Purpose =
= Purpose =
This page documents the standard protocol for preparing and running molecular dynamics (MD) simulations of RNA k-turn systems (and other similar small biomolecular solutes) using AMOEBA in Tinker9. It takes a starting PDB structure through system preparation, equilibration, and production MD. Umbrella sampling / pulling is '''not''' covered here.
This page documents the standard protocol for preparing and running molecular dynamics (MD) simulations of RNA k-turn systems (and other similar small biomolecular solutes) using AMOEBA in Tinker9. It takes a starting PDB structure through system preparation, equilibration, and production MD. Umbrella sampling / pulling is '''not''' covered here; see the umbrella sampling / REUS pages for that.
 
The protocol is implemented by <code>~/scripts/base_masterscript.sh</code> (copied into each project as <code>masterscript.sh</code>). This page describes what that script does, the steps before it runs, and the reasoning behind each choice.


= Overview =
= Overview =


# Obtain a clean starting PDB
# Obtain a clean starting PDB
# Standardize the PDB via a <code>pdbxyz</code> → <code>xyzpdb</code> round-trip, which also generates the <code>.seq</code> file
# Standardize the PDB via a <code>pdbxyz</code> → <code>xyzpdb</code> round-trip
# Solvate + neutralize + build a key file with Tinker-GUI
# Solvate + neutralize + build a key file with <code>Tinker-GUI/cli.py</code>
# Energy minimization (<code>minimize9</code>)
# Energy minimization (<code>minimize9</code>)
# Heating: 5 temperatures with strong position restraints on the solute (NVT)
# Heating: 5 temperatures with strong position restraints on the solute (NVT)
Line 12: Line 14:
# Production: unrestrained NPT MD
# Production: unrestrained NPT MD


Force field: AMOEBA (<code>amoebabio18.prm</code>). Engine: [https://github.com/TinkerTools/tinker Tinker8] and [https://github.com/TinkerTools/tinker9 Tinker9] (<code>dynamic9</code>, <code>minimize9</code>, <code>pdbxyz</code>, <code>xyzpdb</code>).
Force field: AMOEBA (<code>amoebabio18.prm</code>). Engine: Tinker9 (<code>dynamic9</code>, <code>minimize9</code>).


= 1. PDB preparation =
= 1. PDB preparation =
Line 18: Line 20:
== Source PDB ==
== Source PDB ==


Start from a crystal / cryo-EM structure in PDB format. A good test / example system for this protocol is '''[https://www.rcsb.org/structure/5FJ0 PDB 5FJ0]''' — a 27-nt RNA k-turn. Download the file in '''PDB format''' (on RCSB: ''Download Files → PDB Format''); the direct link is [https://files.rcsb.org/download/5FJ0.pdb files.rcsb.org/download/5FJ0.pdb]. The mmCIF format will not work with <code>pdbxyz</code>.
Start from a crystal / cryo-EM structure in PDB format. Before running anything:
 
Before running anything:


* Decide the protonation states of ionizable residues (for RNA/DNA the standard assumption is deprotonated phosphates; for proteins use PROPKA or similar).
* Decide the protonation states of ionizable residues (for RNA/DNA the standard assumption is deprotonated phosphates; for proteins use PROPKA or similar).
Line 28: Line 28:
== pdbxyz → xyzpdb round-trip ==
== pdbxyz → xyzpdb round-trip ==


The goal of this step is to produce a PDB in the exact format that <code>xyzpdb</code> will emit from then on, so that any future PDB written from an MD trajectory (for visualization, diffing, or restart) is line-for-line comparable to the starting reference.
The goal of this step is to produce a PDB in the exact format that <code>xyzpdb</code> will emit from then on, so that any future PDB written from an MD trajectory (e.g. for visualization, diffing, or restart) is line-for-line comparable to the starting reference.


  pdbxyz  myfile.pdb  -k tinker.key
  pdbxyz  input.pdb  -k tinker.key
  xyzpdb  myfile.xyz  -k tinker.key
  xyzpdb  input.xyz  -k tinker.key


Where <code>tinker.key</code> points at the AMOEBA parameter file:
Where <code>tinker.key</code> points at the AMOEBA parameter file:
parameters /home/yw24267/programs/tinker/Tinker8/2501/params/amoebabio18.prm


parameters /path/to/Tinker8/latest/params/amoebabio18.prm
The resulting <code>input.pdb</code> (overwriting the input, so back up the original) is the '''canonical''' PDB for this system. Use this as your reference structure from this point on — not the crystal PDB — when comparing to simulation frames.
 
'''pdbxyz outputs two files:'''
* <code>myfile.xyz</code> — the Tinker-format coordinate file
* <code>myfile.seq</code> — a sequence / base-pair map of the solute. This file is required whenever <code>xyzpdb</code> is run.
 
'''xyzpdb requires the .seq file to share the same prefix as the .xyz''' — there is no command-line option to point at a specific .seq. If you call <code>xyzpdb myfile.xyz</code>, Tinker looks for <code>myfile.seq</code> in the same directory. If you rename or move the .xyz, rename / copy the .seq to match.
 
'''Output files are not overwritten.''' Tinker writes to the next available numeric suffix:
 
myfile.pdb      (original)
myfile.pdb_2    (first xyzpdb output)
myfile.pdb_3    (next xyzpdb output)
 
Same convention for <code>.xyz_2</code>, <code>.xyz_3</code>, etc. The '''canonical''' PDB for this system is the first xyzpdb output (<code>myfile.pdb_2</code>). Use that — not the raw crystal PDB — as your reference structure from this point on when comparing to simulation frames.


'''Why do the round-trip:'''
'''Why do the round-trip:'''
* The crystal PDB contains headers (<code>HEADER</code>, <code>TITLE</code>, <code>SOURCE</code>, <code>COMPND</code>, <code>REMARK</code>, <code>MASTER</code>, <code>CONECT</code>…) and metadata that <code>xyzpdb</code> does not emit.
* The crystal PDB contains headers (<code>HEADER</code>, <code>TITLE</code>, <code>SOURCE</code>, <code>COMPND</code>, <code>REMARK</code>, <code>MASTER</code>, <code>CONECT</code>…) and metadata that <code>xyzpdb</code> does not emit.
* Hydrogens added by <code>pdbxyz</code> get standardized names/ordering.
* Hydrogens added by <code>pdbxyz</code> get standardized names/ordering.
* Atom numbering, chain ID handling, and record termination all match the Tinker convention after the round-trip.
* Atom numbering, chain ID handling, and record termination match the Tinker convention after the round-trip.


Without this step, a diff between the input PDB and a later xyzpdb output is dominated by format noise rather than real coordinate differences.
Without this step, a diff between the input PDB and later xyzpdb output is dominated by format noise rather than real coordinate differences.


= 2. Solvation and neutralization (Tinker-GUI) =
= 2. Solvation and neutralization (Tinker-GUI) =


System building is handled by <code>cli.py</code> in [https://github.com/prenlab/Tinker-GUI Tinker-GUI] at <code>/path/to/Tinker-GUI</code>. The program takes a YAML file and produces a minimization-ready <code>.xyz</code> / <code>.key</code> pair, plus intermediate files in <code>temp/</code>.
System building is handled by <code>cli.py</code> in [https://github.com/prenlab/Tinker-GUI Tinker-GUI]. The CLI takes a YAML config and produces a minimized-ready <code>.xyz</code> / <code>.key</code> pair plus intermediates in <code>temp/</code>.


== config.yaml ==
== config.yaml ==
Line 68: Line 54:


<syntaxhighlight lang="yaml">
<syntaxhighlight lang="yaml">
tinker_path: /path/to/Tinker8/latest/bin
tinker_path: /home/yw24267/programs/tinker/Tinker8/2501/bin
amoeba_prm:  /path/to/Tinker8/latest/params/amoebabio18.prm
amoeba_prm:  /home/yw24267/programs/tinker/Tinker8/2501/params/amoebabio18.prm
output_prefix: minimize
output_prefix: minimize
solutes:
solutes:
   nucleic_acid:
   nucleic_acid:
     - myfile.pdb           # use the round-tripped PDB from step 1
     - myrna.pdb             # use the round-tripped PDB from step 1
solvent:
solvent:
   name: water
   name: water
Line 97: Line 83:


'''Key fields:'''
'''Key fields:'''
* <code>output_prefix</code> — the prefix for the final output files. If set to <code>X</code>, Tinker-GUI writes <code>X_final.xyz</code> and <code>X_final.key</code>. In the rest of this page we use <code>minimize</code>, giving <code>minimize_final.xyz</code> / <code>minimize_final.key</code>.
* <code>output_prefix: minimize</code> — the downstream masterscript expects <code>minimize_final.xyz</code> / <code>minimize_final.key</code>, so keep this as <code>minimize</code>.
* <code>neutralizers</code> — Tinker-GUI adds whichever of these are needed to zero out the net charge. For RNA (negative backbone) you'll typically need Mg²⁺ and/or Na⁺.
* <code>neutralizers</code> — the CLI adds whichever of these are needed to zero out the net charge. For RNA (negative backbone) you'll typically need Mg²⁺ and/or Na⁺.
* <code>salts.concentrations</code> — set to 0.0 for pure counter-ion neutralization. Use non-zero values (mol/L) to add bulk salt on top of neutralization.
* <code>salts.concentrations</code> — set to 0.0 for pure counter-ion neutralization. Use non-zero values (mol/L) to add bulk salt on top of neutralization.
* <code>buffer: 12.0</code> — distance from solute to the nearest box face. 12–15 Å is standard; 15 Å is safer for larger conformational changes.
* <code>buffer: 12.0</code> — distance from solute to the nearest box face. 12–15 Å is standard; 15 Å is safer for larger conformational changes.


== Running Tinker-GUI ==
== Running the CLI ==


  python /path/to/Tinker-GUI/cli.py -c config.yaml
  python /home/bae969/programs/Tinker-GUI/cli.py -c config.yaml


'''Outputs:'''
'''Outputs:'''
* <code>minimize_final.xyz</code> — solvated, neutralized starting structure
* <code>minimize_final.xyz</code> — solvated, neutralized starting structure
* <code>minimize_final.key</code> — matching key file with box dimensions and parameter path
* <code>minimize_final.key</code> — matching key file with box dimensions and parameter path
* <code>temp/</code> — intermediate files (solute-only xyz, water box, per-ion-addition snapshots, logs)
* <code>temp/</code> — intermediate files (solute-only xyz, water box, per-ion-addition snapshots, logs). '''Do not delete''' <code>temp/minimize.xyz</code> — the masterscript reads it to determine how many solute atoms to restrain.
 
'''Rerunning:''' if Tinker-GUI errors or you need to change config and re-run, always clear the intermediate directory first:
 
rm -rf temp
 
Otherwise Tinker-GUI may pick up stale intermediates and silently produce a mismatched output. The same applies if you need to rebuild from scratch for any reason — delete <code>temp/</code>.


== Sanity check ==
== Sanity check ==
Line 123: Line 103:
  analyze  minimize_final.xyz  -k minimize_final.key  em
  analyze  minimize_final.xyz  -k minimize_final.key  em


* '''Net charge should be 0''' (within rounding). If it isn't, see below.
* '''Net charge should be 0''' (within rounding). If not, the neutralizer list in config.yaml needs adjustment.
* Total dipole should be finite (no NaNs).
* Total dipole should be finite (no NaNs).


Line 129: Line 109:


* Every atom should have multipole and vdW parameters. Missing parameters on any atom means an unrecognized residue / atom name that <code>pdbxyz</code> couldn't map.
* Every atom should have multipole and vdW parameters. Missing parameters on any atom means an unrecognized residue / atom name that <code>pdbxyz</code> couldn't map.
=== Fixing a non-zero net charge ===
If <code>analyze ... em</code> reports a non-zero net charge, either the neutralizer list in <code>config.yaml</code> was wrong, or the counts Tinker-GUI added don't quite cancel the solute charge. Don't re-run the full pipeline — edit the <code>.xyz</code> directly. Two operations cover every case:
'''Converting a water to an ion''' (when net charge is negative and you need another cation):
# Find a water molecule whose oxygen is far from the solute (e.g. ≥ 5 Å away).
# Delete both hydrogen lines of that water.
# Change the oxygen's atom type (last integer before the connectivity columns) to the desired ion's AMOEBA type — e.g. for Na⁺ in amoebabio18 the oxygen type (<code>349</code>) becomes Na⁺ type (<code>352</code>).
# Clear the oxygen line's connectivity entries (an ion has no bonds).
# Update the atom count on line 1 of the file to match the new total (decrease by 2 for every water → ion conversion).
# '''Do not renumber the remaining atom indices''' — Tinker is tolerant of gaps.
'''Deleting an ion''' (when net charge is positive and you have one more cation than needed):
# Delete the ion's atom line.
# Decrease the atom count on line 1 by 1.
# '''Do not renumber''' — leave the gap in the indices.
After either operation, re-run <code>analyze ... em</code> to confirm the net charge is now 0. (There's also a script in <code>~/scripts/</code> that does the water → Na⁺ replacement in bulk while respecting distance filters — useful when many waters need replacing to reach a target salt concentration.)


= 3. Minimization =
= 3. Minimization =


Performed by <code>minimize9</code> with a gradient tolerance of 0.5 kcal/mol/Å, using a key file that includes a position restraint on the solute:
Performed by <code>minimize9</code> with a gradient tolerance of 0.5 kcal/mol/Å using a key file that '''includes a position restraint on the solute''':


  RESTRAIN-POSITION  -1  N  50.0
  RESTRAIN-POSITION  -1  N  50.0


'''<code>N</code> = number of atoms in the original PDB''' (the solute only — the RNA). <code>RESTRAIN-POSITION -1 N 50.0</code> restrains atoms 1 through N, leaving everything after atom N (added water, neutralizing ions, bulk salt) free to move. The force constant 50.0 is in kcal/mol/Ų.
where <code>N</code> is the number of solute atoms (read by the masterscript from <code>temp/minimize.xyz</code>). The restraint keeps the solute near its starting coordinates while water and ions relax around it. The force constant 50.0 is in kcal/mol/Ų.
 
'''If the source PDB already contained waters or ions that you kept through pdbxyz''', set <code>N</code> to just the RNA atom count, not the RNA + original-waters count — otherwise those waters / ions are held in place too (a 1000-atom RNA plus 15 crystal waters gives a pdbxyz .xyz with 1045 atoms, but you should still set <code>N = 1000</code> so the crystal waters can move).


Command:
The exact command run by the masterscript:


  minimize9  minimize_final.xyz  -k minimize.key  0.5
  minimize9  minimize_final.xyz  -k minimize.key  0.5
The trailing <code>0.5</code> is the gradient tolerance (RMS gradient at which minimization stops, kcal/mol/Å). '''0.5 is a good default'''; if minimization is unusually slow or grinds without converging, loosen it to e.g. <code>2.0</code> — larger value = looser convergence = fewer iterations.


'''Why a restraint during minimization:''' without it, small clashes between water/ion placement and the solute can pull the solute geometry off the reference structure before dynamics even starts. The restraint also suppresses any collapse of the experimentally-determined conformation during the initial energy drop.
'''Why a restraint during minimization:''' without it, small clashes between water/ion placement and the solute can pull the solute geometry off the reference structure before dynamics even starts. The restraint also suppresses any collapse of the experimentally-determined conformation during the initial energy drop.
Line 169: Line 126:
= 4. Heating (restrained, NVT) =
= 4. Heating (restrained, NVT) =


Five sequential NVT runs at increasing temperatures. All use the same key file (<code>heating.key</code>) with <code>RESTRAIN-POSITION -1 N 50.0</code> on the solute and <code>polar-eps 0.01</code>.
Five sequential NVT runs at increasing temperatures, all with the same key file (<code>heating.key</code>, FC = 50 kcal/mol/Ų on all solute atoms, <code>polar-eps 0.01</code>):


{| class="wikitable"
{| class="wikitable"
Line 185: Line 142:
|}
|}


Command used for each temperature:
Command used for each:


  dynamic9  pre_production.xyz  -k heating.key  200000  2.0  10.0  2  <T>
  dynamic9  pre_production.xyz  -k heating.key  200000  2.0  10.0  2  <T> 1.0


Argument order: <code>nsteps  dt_fs  write_interval_ps  ensemble(2=NVT)  T_K</code>. NVT does not take a pressure argument — do '''not''' append <code>1.0</code> at the end.
Argument order: <code>nsteps  dt_fs  write_interval_ps  ensemble(2=NVT)  T_K P_atm</code>.


The output trajectory <code>pre_production.arc</code> accumulates across all five heating steps (Tinker9 appends to an existing <code>.arc</code>). Total heating time: 2 ns across 5 temperature rungs.
The output trajectory <code>pre_production.arc</code> accumulates across all five heating steps (Tinker9 appends to an existing <code>.arc</code>). Total heating time: 2 ns across 5 temperature rungs.
Line 200: Line 157:
= 5. Restraint phase-down (NVT, 298 K) =
= 5. Restraint phase-down (NVT, 298 K) =


Four sequential NVT runs at 298 K, stepping the solute position-restraint force constant down:
Four sequential NVT runs at 298 K, each stepping the solute position-restraint force constant down:


{| class="wikitable"
{| class="wikitable"
Line 211: Line 168:
| Restraint_0.1  || 0.1  || 200 000 || 400
| Restraint_0.1  || 0.1  || 200 000 || 400
|-
|-
| Restraint_None  || — (no restraint line) || 200 000 || 400
| Restraint_None  || —   || 200 000 || 400
|}
|}


Each step uses a separate key file (<code>restraint_10.key</code>, <code>restraint_1.key</code>, <code>restraint_0.1.key</code>, <code>restraint_none.key</code>) that differs only in the RESTRAIN-POSITION line (absent in the last one).
Each step uses a separate key file (<code>restraint_10.key</code>, <code>restraint_1.key</code>, <code>restraint_0.1.key</code>, <code>restraint_none.key</code>) that differs only in the RESTRAIN-POSITION line (absent in the last one).


Command (NVT, no pressure argument):
Command:


  dynamic9  pre_production.xyz  -k restraint_<FC>.key  200000  2.0  10.0  2  298
  dynamic9  pre_production.xyz  -k restraint_<FC>.key  200000  2.0  10.0  2  298 1.0


'''Why step down rather than release at once:''' an abrupt release at FC = 50 → 0 can cause a sudden "spring release" where the solute snaps to a lower-energy conformation that is different from the experimental reference. A 50 → 10 → 1 → 0.1 → 0 sequence lets the solute relax gradually while solvent / ion distribution tracks it.
'''Why step down rather than release at once:''' an abrupt release at FC = 50 → 0 can cause a sudden "spring release" where the solute snaps to a lower-energy conformation that is different from the experimental reference. A 50 → 10 → 1 → 0.1 → 0 sequence lets the solute relax gradually while solvent/ion distribution tracks it.


After this stage the system is fully equilibrated at 298 K, unrestrained, at the constant-volume box from solvation. 1.6 ns total across the 4 steps.
After this stage the system is fully equilibrated at 298 K, unrestrained, at the constant-volume box from solvation. 1.6 ns total across the 4 steps.
Line 226: Line 183:
= 6. Production (NPT) =
= 6. Production (NPT) =


The production run is NPT at 298 K, 1 atm, unrestrained:
The production run is NPT at 298 K, 1 atm, unrestrained, with a tighter polarization convergence:


* <code>polar-eps 0.0001</code> (vs 0.01 during equilibration)
* <code>barostat MonteCarlo</code>, <code>thermostat bussi</code>
* <code>barostat MonteCarlo</code>, <code>thermostat bussi</code>
* <code>integrator respa</code>, 2 fs timestep
* <code>integrator respa</code>, 2 fs timestep
* <code>polar-eps</code> — tunable, see below


'''Starting frame:''' extract the final frame of the pre-production trajectory into <code>production.xyz</code>:
'''Starting frame:''' the masterscript extracts the '''final frame''' of the <code>pre_production.arc</code> trajectory (the last <code>xyz_linecount</code> lines) into <code>production.xyz</code>, then launches production from there:


xyz_linecount=$(wc -l < pre_production.xyz)
  tail -n "$xyz_linecount" pre_production.arc > production.xyz
  tail -n "$xyz_linecount" pre_production.arc > production.xyz
 
  dynamic9  production.xyz  -k production.key  20000000 2.0  10.0  4  298  1.0
Then launch production (NPT, so a pressure argument is required):
 
  dynamic9  production.xyz  -k production.key  5000000 2.0  10.0  4  298  1.0


Argument order (NPT): <code>nsteps  dt_fs  write_interval_ps  ensemble(4=NPT)  T_K  P_atm</code>.
Argument order (NPT): <code>nsteps  dt_fs  write_interval_ps  ensemble(4=NPT)  T_K  P_atm</code>.


'''Production length is case-dependent.''' 5 000 000 steps × 2 fs = '''10 ns''' is an arbitrary default that's fine for a first pass; adjust <code>nsteps</code> based on the system and the scientific question. Short equilibration-like checks can be 5 ns; publication-quality sampling for slow degrees of freedom may need 50–100 ns or more.
20 000 000 steps × 2 fs = '''40 ns''' production. Frames are written every 10 ps (4 000 frames total). Adjust <code>nsteps</code> for the system / question — 20–100 ns is typical.
 
== polar-eps tradeoff ==
 
<code>polar-eps</code> sets the convergence threshold for the induced-dipole solver each step. '''It directly trades simulation speed against energy / force accuracy.'''
 
{| class="wikitable"
! polar-eps !! Relative speed !! Notes
|-
| 0.01    || fastest  || Often sufficient for structural observables and general equilibration
|-
| 0.001    || medium  || Reasonable middle ground for routine production
|-
| 0.0001  || slow    || High accuracy; use when energetics / forces matter (e.g. free energy work)
|-
| 0.00001  || slowest  || Very precise; rarely needed for structural work
|}
 
More zeros = tighter convergence = more solver iterations per step = slower. Pick based on the tradeoff: 0.01 is often good enough, 0.0001 is very precise. Default to something around 0.001 unless you know you need tighter.


'''Why NPT for production:''' the box dimensions from the solvation step correspond to a density that is close to but not exactly 1.0 g/cc at 298 K under the AMOEBA potential. NPT lets the box settle to the correct density. Skipping this and running NVT with the as-built box can leave the system slightly compressed or stretched, which biases structural observables.
'''Why NPT for production:''' the box dimensions from the solvation step correspond to a density that is close to but not exactly 1.0 g/cc at 298 K. NPT lets the box settle to the correct density under the AMOEBA potential. Skipping this and running NVT with the as-built box can leave the system slightly compressed or stretched, which biases structural observables.


= 7. Running on a GPU =
'''Why tighter <code>polar-eps</code>:''' 0.01 is fine for equilibration (faster, noise is averaged out over the short run). For production the induced-dipole error should be as small as practical because it enters every observable; 1e-4 is the routine choice and 1e-5 is worth it if energetics are the main observable.


All Tinker9 MD above runs on a single GPU. Select which GPU before launching:
= 7. Running the whole pipeline =


export CUDA_VISIBLE_DEVICES=0
Copy <code>~/scripts/base_masterscript.sh</code> into the project directory as <code>masterscript.sh</code>. It assumes:


The integer is the GPU index on that node. Change it (<code>1</code>, <code>2</code>, …) to select a different GPU on multi-GPU nodes.
* <code>minimize_final.xyz</code> and <code>minimize_final.key</code> are present in the current directory (outputs of step 2).
* <code>temp/minimize.xyz</code> is present (so <code>N</code> = solute atom count can be read).
* Tinker9 binaries are at <code>/home/yw24267/programs/tinker/Tinker9/latest/build</code> (edit <code>TINKER_BIN</code> if yours differ).
* The correct GPU is selected via <code>CUDA_VISIBLE_DEVICES</code>.


Typical launch on a GPU node:
Typical launch on a GPU node:
Line 280: Line 218:
  nohup bash masterscript.sh > masterscript.log 2>&1 &
  nohup bash masterscript.sh > masterscript.log 2>&1 &


= Notes and common pitfalls =
'''What the script generates on the fly''' (overwriting each run):
 
* <code>heating.key</code>, <code>minimize.key</code> (FC = 50)
* '''Induced dipole not converging''' during heating almost always means the starting structure has a bad contact. Re-check minimization went to a reasonable gradient, and that the solvation step didn't place a water atop a solute atom. You can temporarily add <code>USOLVE-CUTOFF 0.0</code> to the heating key if convergence is marginal; remove for production.
* <code>restraint_10.key</code>, <code>restraint_1.key</code>, <code>restraint_0.1.key</code>
* '''RESTRAIN-POSITION force constant is in kcal/mol/Ų''', not kcal/mol. Don't change the numbers without understanding the scale.
* <code>restraint_none.key</code>, <code>production.key</code>
* '''Restarting a crashed run:''' Tinker9 resumes from <code>.dyn</code> if present. To restart from scratch, remove both <code>.dyn</code> and <code>.arc</code> (otherwise the new trajectory appends to the old one).
* '''Stopping a running MD cleanly:''' <code>touch pre_production.end</code> (or <code>production.end</code>). Tinker9 checks for this file each frame and stops gracefully, leaving a valid <code>.dyn</code> to resume from.
* '''Always <code>rm -rf temp</code>''' before re-running Tinker-GUI.


= Example masterscript =
All derived from <code>minimize_final.key</code> — it reads the <code>a-axis</code> / <code>b-axis</code> / <code>c-axis</code> / <code>pme-grid</code> lines from there and appends them, then adds or omits the <code>RESTRAIN-POSITION</code> line and sets the appropriate <code>polar-eps</code>.


A reference Bash script that runs minimization → heating → restraint phase-down → production using the key-file and convention described above. Edit <code>TINKER_BIN</code> and the parameter path at the top to match your installation.
= Summary of wall-clock and file sizes =


<syntaxhighlight lang="bash">
{| class="wikitable"
#!/usr/bin/env bash
! Stage !! MD time !! Runtime (rough, single GPU) !! Output
set -euo pipefail
|-
| Minimization                || —      || minutes      || minimize_final.xyz_2
|-
| Heating (5 × 400 ps)        || 2.0 ns || ~few hours    || pre_production.arc (appended)
|-
| Restraint phase-down        || 1.6 ns || ~few hours    || pre_production.arc (appended)
|-
| Production (20 M steps)    || 40 ns  || ~1–2 days    || production.arc
|}


# -------------------------------------------------------------------
Runtime scales with system size and GPU generation; the numbers above are order-of-magnitude for a ~700-atom solute in a ~60 Å water box on a modern GPU.
# Configuration
# -------------------------------------------------------------------
TINKER_BIN="/path/to/Tinker9/latest/build"
AMOEBA_PRM="/path/to/Tinker8/latest/params/amoebabio18.prm"


COUNT_FILE="./minimize_final.xyz"
= Notes and common pitfalls =
INIT_KEY="./minimize_final.key"


if [[ ! -f "$COUNT_FILE" ]]; then
* '''Induced dipole not converging''' during heating almost always means the starting structure has a bad contact. Re-check minimization went to a reasonable gradient, and that the solvation step didn't place a water atop a solute atom. You can temporarily add <code>USOLVE-CUTOFF 0.0</code> to the heating key if convergence is marginal; remove for production.
    echo "Error: $COUNT_FILE not found." >&2; exit 1
* '''Net charge non-zero after solvation''' means the <code>neutralizers</code> list in config.yaml didn't cover the real net charge. Re-run with the right ion set.
fi
* '''RESTRAIN-POSITION force constant is in kcal/mol/Ų''', not kcal/mol. The masterscript embeds this unit convention; do not change the numbers without understanding the scale.
if [[ ! -f "$INIT_KEY" ]]; then
* '''Do not delete <code>temp/</code>''' until after the masterscript has read <code>temp/minimize.xyz</code>. If you've deleted it, re-running <code>Tinker-GUI/cli.py</code> regenerates it.
    echo "Error: $INIT_KEY not found." >&2; exit 1
* '''Restarting a crashed run:''' Tinker9 resumes from <code>.dyn</code> if present. To restart from scratch, remove both <code>.dyn</code> and <code>.arc</code> (otherwise the new trajectory appends to the old one).
fi
* '''Stopping a running MD cleanly:''' <code>touch pre_production.end</code> (or <code>production.end</code>). Tinker9 checks for this file each frame and stops gracefully, leaving a valid <code>.dyn</code> to resume from.
 
# N = number of solute atoms in the original PDB (from the solute-only xyz
# produced by Tinker-GUI in temp/). Adjust manually if you want to restrain
# a different range.
first_line="$(head -n 1 ./temp/minimize.xyz)"
N="$(echo "$first_line" | grep -oE '[0-9]+' | head -n 1)"
if [[ -z "$N" ]]; then
    echo "Error: could not read atom count from ./temp/minimize.xyz" >&2
    exit 1
fi
 
# -------------------------------------------------------------------
# Pull a-axis / b-axis / c-axis lines from the starting key file
# -------------------------------------------------------------------
get_line_or_empty() {
    local key="${1}"
    grep -i -E "^[[:space:]]*$key\b" "$INIT_KEY" | head -n1 || true
}
A_AXIS="$(get_line_or_empty 'a-axis')"
B_AXIS="$(get_line_or_empty 'b-axis')"
C_AXIS="$(get_line_or_empty 'c-axis')"


append_axes() {
= See also =
  local outfile="$1"
  [[ -n "$A_AXIS" ]] && echo "$A_AXIS" >> "$outfile"
  [[ -n "$B_AXIS" ]] && echo "$B_AXIS" >> "$outfile"
  [[ -n "$C_AXIS" ]] && echo "$C_AXIS" >> "$outfile"
}


# -------------------------------------------------------------------
* [[Tutorial:tinkertut|Using Tinker and AMOEBA]] — general Tinker/AMOEBA reference
# Key file generator
* [[Software:tinkergpu|Tinker GPU]] — compiling and running Tinker9
# -------------------------------------------------------------------
* [https://github.com/prenlab/Tinker-GUI Tinker-GUI on GitHub]
write_key_file() {
  local outfile="$1"
  local restraint_val="$2"
  local polar_eps="$3"
  local add_restrain="$4"
 
  cat > "$outfile" <<EOF
parameters $AMOEBA_PRM
integrator respa
archive
 
ewald
ewald-cutoff 7.0
 
vdw-cutoff 12.0
vdw-correction
 
neighbor-list
polar-predict
 
polar-eps $polar_eps
 
thermostat bussi
barostat MonteCarlo
 
EOF
 
  if [[ "$add_restrain" == "yes" && -n "$restraint_val" ]]; then
    echo "RESTRAIN-POSITION -1 $N $restraint_val" >> "$outfile"
  fi
  append_axes "$outfile"
}
 
# filename              FC    polar-eps  restraint?
write_key_file "heating.key"        "50.0"  "0.01"    "yes"
write_key_file "minimize.key"        "50.0"  "0.01"    "yes"
write_key_file "restraint_10.key"    "10.0"  "0.01"    "yes"
write_key_file "restraint_1.key"    "1.0"  "0.01"    "yes"
write_key_file "restraint_0.1.key"  "0.1"  "0.01"    "yes"
write_key_file "restraint_none.key"  ""      "0.01"    "no"
write_key_file "production.key"      ""      "0.001"  "no"    # tune polar-eps to taste
 
# -------------------------------------------------------------------
# Minimization
# -------------------------------------------------------------------
echo "--- Energy minimization ---"
"$TINKER_BIN/minimize9" minimize_final.xyz -k minimize.key 0.5
 
cp minimize_final.xyz pre_production.xyz
 
# -------------------------------------------------------------------
# Heating (NVT, no pressure arg)
# -------------------------------------------------------------------
echo "--- Heating (5 rungs) ---"
for T in 10 75 150 225 298; do
    "$TINKER_BIN/dynamic9" pre_production.xyz -k heating.key \
        200000 2.0 10.0 2 "$T"
done
 
# -------------------------------------------------------------------
# Restraint phase-down (NVT, 298 K)
# -------------------------------------------------------------------
echo "--- Restraint phase-down ---"
for FC in 10 1 0.1 none; do
    "$TINKER_BIN/dynamic9" pre_production.xyz -k "restraint_${FC}.key" \
        200000 2.0 10.0 2 298
done
 
# -------------------------------------------------------------------
# Production (NPT, with pressure arg)
# -------------------------------------------------------------------
echo "--- Production ---"
xyz_linecount=$(wc -l < pre_production.xyz)
tail -n "$xyz_linecount" pre_production.arc > production.xyz
 
"$TINKER_BIN/dynamic9" production.xyz -k production.key \
    5000000 2.0 10.0 4 298 1.0
 
echo "--- Done ---"
</syntaxhighlight>

Revision as of 16:57, 22 April 2026

Purpose

This page documents the standard protocol for preparing and running molecular dynamics (MD) simulations of RNA k-turn systems (and other similar small biomolecular solutes) using AMOEBA in Tinker9. It takes a starting PDB structure through system preparation, equilibration, and production MD. Umbrella sampling / pulling is not covered here; see the umbrella sampling / REUS pages for that.

The protocol is implemented by ~/scripts/base_masterscript.sh (copied into each project as masterscript.sh). This page describes what that script does, the steps before it runs, and the reasoning behind each choice.

Overview

  1. Obtain a clean starting PDB
  2. Standardize the PDB via a pdbxyzxyzpdb round-trip
  3. Solvate + neutralize + build a key file with Tinker-GUI/cli.py
  4. Energy minimization (minimize9)
  5. Heating: 5 temperatures with strong position restraints on the solute (NVT)
  6. Restraint phase-down: gradually relax position restraints at 298 K (NVT)
  7. Production: unrestrained NPT MD

Force field: AMOEBA (amoebabio18.prm). Engine: Tinker9 (dynamic9, minimize9).

1. PDB preparation

Source PDB

Start from a crystal / cryo-EM structure in PDB format. Before running anything:

  • Decide the protonation states of ionizable residues (for RNA/DNA the standard assumption is deprotonated phosphates; for proteins use PROPKA or similar).
  • Remove or keep heteroatoms intentionally (crystallographic waters, buffer ions, ligands). pdbxyz will silently drop anything it doesn't recognize.
  • Decide what to do with missing atoms/residues. If the structure is incomplete, fix it externally first (e.g. with PDBFixer) before the steps below.

pdbxyz → xyzpdb round-trip

The goal of this step is to produce a PDB in the exact format that xyzpdb will emit from then on, so that any future PDB written from an MD trajectory (e.g. for visualization, diffing, or restart) is line-for-line comparable to the starting reference.

pdbxyz  input.pdb  -k tinker.key
xyzpdb  input.xyz  -k tinker.key

Where tinker.key points at the AMOEBA parameter file:

parameters /home/yw24267/programs/tinker/Tinker8/2501/params/amoebabio18.prm

The resulting input.pdb (overwriting the input, so back up the original) is the canonical PDB for this system. Use this as your reference structure from this point on — not the crystal PDB — when comparing to simulation frames.

Why do the round-trip:

  • The crystal PDB contains headers (HEADER, TITLE, SOURCE, COMPND, REMARK, MASTER, CONECT…) and metadata that xyzpdb does not emit.
  • Hydrogens added by pdbxyz get standardized names/ordering.
  • Atom numbering, chain ID handling, and record termination match the Tinker convention after the round-trip.

Without this step, a diff between the input PDB and later xyzpdb output is dominated by format noise rather than real coordinate differences.

2. Solvation and neutralization (Tinker-GUI)

System building is handled by cli.py in Tinker-GUI. The CLI takes a YAML config and produces a minimized-ready .xyz / .key pair plus intermediates in temp/.

config.yaml

Example for an RNA k-turn system with neutralizing Mg²⁺/Na⁺/Cl⁻, no added bulk salt, 12 Å buffer:

tinker_path: /home/yw24267/programs/tinker/Tinker8/2501/bin
amoeba_prm:  /home/yw24267/programs/tinker/Tinker8/2501/params/amoebabio18.prm
output_prefix: minimize
solutes:
  nucleic_acid:
    - myrna.pdb             # use the round-tripped PDB from step 1
solvent:
  name: water
ions:
  neutralizers:
    - Mg2+
    - Na+
    - Cl-
  salts:
    names:
      - Mg2+
      - Na+
      - Cl-
    concentrations:
      - 0.0
      - 0.0
      - 0.0
box:
  type: rectangular          # cubic or rectangular
  buffer: 12.0               # Å between solute and box wall
  # size: [60, 60, 60]       # if set, overrides buffer

Key fields:

  • output_prefix: minimize — the downstream masterscript expects minimize_final.xyz / minimize_final.key, so keep this as minimize.
  • neutralizers — the CLI adds whichever of these are needed to zero out the net charge. For RNA (negative backbone) you'll typically need Mg²⁺ and/or Na⁺.
  • salts.concentrations — set to 0.0 for pure counter-ion neutralization. Use non-zero values (mol/L) to add bulk salt on top of neutralization.
  • buffer: 12.0 — distance from solute to the nearest box face. 12–15 Å is standard; 15 Å is safer for larger conformational changes.

Running the CLI

python /home/bae969/programs/Tinker-GUI/cli.py -c config.yaml

Outputs:

  • minimize_final.xyz — solvated, neutralized starting structure
  • minimize_final.key — matching key file with box dimensions and parameter path
  • temp/ — intermediate files (solute-only xyz, water box, per-ion-addition snapshots, logs). Do not delete temp/minimize.xyz — the masterscript reads it to determine how many solute atoms to restrain.

Sanity check

Before proceeding, verify the system is sensible:

analyze  minimize_final.xyz  -k minimize_final.key  em
  • Net charge should be 0 (within rounding). If not, the neutralizer list in config.yaml needs adjustment.
  • Total dipole should be finite (no NaNs).
analyze  minimize_final.xyz  -k minimize_final.key  ep
  • Every atom should have multipole and vdW parameters. Missing parameters on any atom means an unrecognized residue / atom name that pdbxyz couldn't map.

3. Minimization

Performed by minimize9 with a gradient tolerance of 0.5 kcal/mol/Å using a key file that includes a position restraint on the solute:

RESTRAIN-POSITION  -1  N  50.0

where N is the number of solute atoms (read by the masterscript from temp/minimize.xyz). The restraint keeps the solute near its starting coordinates while water and ions relax around it. The force constant 50.0 is in kcal/mol/Ų.

The exact command run by the masterscript:

minimize9  minimize_final.xyz  -k minimize.key  0.5

Why a restraint during minimization: without it, small clashes between water/ion placement and the solute can pull the solute geometry off the reference structure before dynamics even starts. The restraint also suppresses any collapse of the experimentally-determined conformation during the initial energy drop.

4. Heating (restrained, NVT)

Five sequential NVT runs at increasing temperatures, all with the same key file (heating.key, FC = 50 kcal/mol/Ų on all solute atoms, polar-eps 0.01):

Step Temperature (K) Steps dt (fs) Total (ps)
Heating_to_10K 10 200 000 2.0 400
Heating_to_75K 75 200 000 2.0 400
Heating_to_150K 150 200 000 2.0 400
Heating_to_225K 225 200 000 2.0 400
Heating_to_298K 298 200 000 2.0 400

Command used for each:

dynamic9  pre_production.xyz  -k heating.key  200000  2.0  10.0  2  <T>  1.0

Argument order: nsteps dt_fs write_interval_ps ensemble(2=NVT) T_K P_atm.

The output trajectory pre_production.arc accumulates across all five heating steps (Tinker9 appends to an existing .arc). Total heating time: 2 ns across 5 temperature rungs.

Why this schedule:

  • Starting at 10 K avoids the initial velocity kick that would shake apart a minimized-but-cold system.
  • Going straight to 298 K routinely produces "induced dipole not converging" errors in AMOEBA because of transient bad contacts. Gradual heating in 5 rungs drains that risk.
  • The strong restraint (FC = 50) keeps the solute essentially frozen while solvent and ions thermalize.

5. Restraint phase-down (NVT, 298 K)

Four sequential NVT runs at 298 K, each stepping the solute position-restraint force constant down:

Step FC (kcal/mol/Ų) Steps Total (ps)
Restraint_10 10.0 200 000 400
Restraint_1 1.0 200 000 400
Restraint_0.1 0.1 200 000 400
Restraint_None 200 000 400

Each step uses a separate key file (restraint_10.key, restraint_1.key, restraint_0.1.key, restraint_none.key) that differs only in the RESTRAIN-POSITION line (absent in the last one).

Command:

dynamic9  pre_production.xyz  -k restraint_<FC>.key  200000  2.0  10.0  2  298  1.0

Why step down rather than release at once: an abrupt release at FC = 50 → 0 can cause a sudden "spring release" where the solute snaps to a lower-energy conformation that is different from the experimental reference. A 50 → 10 → 1 → 0.1 → 0 sequence lets the solute relax gradually while solvent/ion distribution tracks it.

After this stage the system is fully equilibrated at 298 K, unrestrained, at the constant-volume box from solvation. 1.6 ns total across the 4 steps.

6. Production (NPT)

The production run is NPT at 298 K, 1 atm, unrestrained, with a tighter polarization convergence:

  • polar-eps 0.0001 (vs 0.01 during equilibration)
  • barostat MonteCarlo, thermostat bussi
  • integrator respa, 2 fs timestep

Starting frame: the masterscript extracts the final frame of the pre_production.arc trajectory (the last xyz_linecount lines) into production.xyz, then launches production from there:

tail -n "$xyz_linecount" pre_production.arc > production.xyz
dynamic9  production.xyz  -k production.key  20000000  2.0  10.0  4  298  1.0

Argument order (NPT): nsteps dt_fs write_interval_ps ensemble(4=NPT) T_K P_atm.

20 000 000 steps × 2 fs = 40 ns production. Frames are written every 10 ps (4 000 frames total). Adjust nsteps for the system / question — 20–100 ns is typical.

Why NPT for production: the box dimensions from the solvation step correspond to a density that is close to but not exactly 1.0 g/cc at 298 K. NPT lets the box settle to the correct density under the AMOEBA potential. Skipping this and running NVT with the as-built box can leave the system slightly compressed or stretched, which biases structural observables.

Why tighter polar-eps: 0.01 is fine for equilibration (faster, noise is averaged out over the short run). For production the induced-dipole error should be as small as practical because it enters every observable; 1e-4 is the routine choice and 1e-5 is worth it if energetics are the main observable.

7. Running the whole pipeline

Copy ~/scripts/base_masterscript.sh into the project directory as masterscript.sh. It assumes:

  • minimize_final.xyz and minimize_final.key are present in the current directory (outputs of step 2).
  • temp/minimize.xyz is present (so N = solute atom count can be read).
  • Tinker9 binaries are at /home/yw24267/programs/tinker/Tinker9/latest/build (edit TINKER_BIN if yours differ).
  • The correct GPU is selected via CUDA_VISIBLE_DEVICES.

Typical launch on a GPU node:

ssh nodeXXX
cd /path/to/project
export CUDA_VISIBLE_DEVICES=0
nohup bash masterscript.sh > masterscript.log 2>&1 &

What the script generates on the fly (overwriting each run):

  • heating.key, minimize.key (FC = 50)
  • restraint_10.key, restraint_1.key, restraint_0.1.key
  • restraint_none.key, production.key

All derived from minimize_final.key — it reads the a-axis / b-axis / c-axis / pme-grid lines from there and appends them, then adds or omits the RESTRAIN-POSITION line and sets the appropriate polar-eps.

Summary of wall-clock and file sizes

Stage MD time Runtime (rough, single GPU) Output
Minimization minutes minimize_final.xyz_2
Heating (5 × 400 ps) 2.0 ns ~few hours pre_production.arc (appended)
Restraint phase-down 1.6 ns ~few hours pre_production.arc (appended)
Production (20 M steps) 40 ns ~1–2 days production.arc

Runtime scales with system size and GPU generation; the numbers above are order-of-magnitude for a ~700-atom solute in a ~60 Å water box on a modern GPU.

Notes and common pitfalls

  • Induced dipole not converging during heating almost always means the starting structure has a bad contact. Re-check minimization went to a reasonable gradient, and that the solvation step didn't place a water atop a solute atom. You can temporarily add USOLVE-CUTOFF 0.0 to the heating key if convergence is marginal; remove for production.
  • Net charge non-zero after solvation means the neutralizers list in config.yaml didn't cover the real net charge. Re-run with the right ion set.
  • RESTRAIN-POSITION force constant is in kcal/mol/Ų, not kcal/mol. The masterscript embeds this unit convention; do not change the numbers without understanding the scale.
  • Do not delete temp/ until after the masterscript has read temp/minimize.xyz. If you've deleted it, re-running Tinker-GUI/cli.py regenerates it.
  • Restarting a crashed run: Tinker9 resumes from .dyn if present. To restart from scratch, remove both .dyn and .arc (otherwise the new trajectory appends to the old one).
  • Stopping a running MD cleanly: touch pre_production.end (or production.end). Tinker9 checks for this file each frame and stops gracefully, leaving a valid .dyn to resume from.

See also