MOIL - on line tutorial

1. What is Moil?
2. What is Molecular Dynamics?
3. Why Moil?

4. Getting started.
5. Defining the force field
6. Preparing simulation system.
7. Performing energy minimization.
8. Running dynamics.

9. Case study: simulating ligand photodissociation in myoglobin

1. What is MOIL ?

MOIL is a molecular dynamics (MD) package written by Ron Elber and co-workers. There are also
some other features, which go beyond MD - see section Why MOIL?
You may find  a paper describing most of the MOIL modules in a detailed manner:
Elber R., Roitberg A., Simmerling C., Goldstein R., Li H., Verkhiver G., Keasar C., Zhang J. and Ulitsky A. 
MOIL: A program for simulation of macromolecules. Comp. Phys. Comm. 91: 159-189 (1994)

The purpose of this page is to provide some additional on-line help in the form of a tutorial that would
make using MOIL easier and more enjoyable. You may also check MOIL page .

2. What is Molecular Dynamics?

Well, as you landed here while browsing through 3W,  you probably know this very well.
Nevertheless (just in case) I am going to attach here a page devoted to MD, based on the article that I
am supposed to finish soon ...

As for now, I will discuss the most basic items only. MD deals with Newtonian dynamics i.e. it solves
the Newton's equations of motion:

  d2X(t)/dt2 = 1/M (-grad U(X))     ... (1)  

MOIL is a suite of programs aiming at studying proteins and other biologically important
macromolecules. It can be also used for computer simulations of liquids. Thus, our goal would be
usually to compute the time evolution (trajectory X(t)) of a large (composed of thousands of atoms)
and  inhomogeneuos system. In order to solve the differential equations of motion  (1)  and to obtain
trajectory X(t) one has to specify:

a) the initial coordinates and velocities
b) the potential energy function U(X).

The equations of motion are solved in MOIL using the Verlet algorithm, which is very robust
(symplectic as mathematicians would say) and commonly used numerical integrator.

The potential U(X) is an empirical function and it is often called  force field. It takes the form of a
linear combination of a number of  terms, corresponding to different types of interactions. Parameters
of U(X) are chosen to reproduce experimental data in terms of thermodynamical averages - for
example heat of vaporizations.  In MOIL U(X) takes the following form:

    U =  1/2 Sum bonds kb (b-b0)2 + 1/2 Sum angles ka (a-a0)2 +
           1/2 Sum impr kw (w-w0)2 + 1/2 Sum torsions kt [ 1+ cos(nt+d) ] + 
           Sum atom pairs ij [  qi qj / Rij + Aij / Rij12- Bij / Rij6]      ...    (2)

One can clearly recognize the so-called covalent interactions i.e. chemical bonds, angles and torsions,
as well as the so-called non-bonded interactions i.e. electrostatic interaction (between partial
charges)  and Van der Waals interactions (representing the interactions between induced charges in
the form of the Lennard-Jones 6-12 potential).

The variables b, a, w, t and Rij stand for the actual bond lengths, valence bond angles, so-called
improper torsions (out of plane displacements), dihedral angles and distances between particles.
The remaining symbols denote parameters of the force field e.g. b0 is an equilibrium bond length and qi
is a partial charge of particle i. Particles that are most apart in the torsion angle setup (particles 1 and
4 in the definition of the torsion) may be regarded as interacting through non-bonded interactions to
some extent (tuned by additional parametr). In fact MOIL force field is a combination of two other
force fields: AMBER and OPLS.

Coming back to more practical questions - MOIL will compute a trajectory for you provided that you
feed it with a force field (sets of parameters for proteins and some other systems are already available)
and initial coordinates of the system (initial velocities may be sampled from Boltzmann distribution).

3. Why MOIL?

Well, you may certainly find many MD packages. There are however specific features included in
MOIL, as they were introduced by Ron Elber and his co-workers. These include:

a) Locally Enhanced Sampling (LES)
b) Landau-Zerner model based approach to instantaneous dissociation and  recombination of ligands
c) Stochastic Path (STO) for long time dynamics and  other functional approaches to reaction path

One of the main difficulties in MD concerns time scales of various biologically (or chemically) important
processes, given the limitation of numerical integrators - the time step used in the standard Verlet
algorithm cannot be much larger than just a femtosecond because there are usually fast motions in the
system (e.g. vibrations of bonds involving light atoms, for which the time scale is of the order of 10 fs).
Thus folding of proteins for instance, which may take even seconds, cannot be approached using
standard MD. If you want to perform simulation with a much larger time step - use MOIL and STO -
see page STO for more details.

Last but certainly not least - you may get the source code from the author (as long as he does not
change his mind, which I doubt) and you may start building your own features.

4. Getting started

MOIL is written in Fortran 77 and is available on many UNIX platforms (SGI, AIX, Solaris, Linux) and
for WinNT. The distribution is a compressed and tarred file and contains quite a bit of documentation,
tutorials, benchmarks and examples. The force field is defined in text files included in subdirectory
moil.mop, whereas the source code is in subdirectory moil.export.

In the latter you may find Makefile, which has to be edited sometimes or replaced with a machine specific one (while moving from one platform to another). Provided that you pick up the right version of
the Makefile or uncomment the correct flags in the Makefile, typing make should produce executable
files for all the modules included in the package (there are quite a few of them, so it may take a while).

The next thing to do is to try some examples or to read this page (or the documentation in the
subdirectory moil.doc). You may also check the MOIL page.

5. Defining the force field

Defining a force field is equivalent to specifying all the parameters of the potential energy function U(X)
or in other words all the parameters that define the Hamiltonian to be used e.g. equilibrium bond
distances and force constants, partial charges, vdw parameters.

Each system (molecule) is defined in MOIL in terms of `monomers' - parts or individual chemical
entities it is composed of. In case of proteins these are just amino acids, in case of other systems
monomer may simply refer to individual chemical molecules (for example, each water molecule is a

In any case all the monomers have to be defined in a text file, which is then read when building
so-called connectivity file (see section Preparing the simulation system).

Monomers are defined in terms of particles (you may think of generalized atoms) and topology of
chemical (covalent) connections between them (including parameters specifying strength of those
connections). The parameters are defined in a separate file.

The standard MOIL force field is defined in two files included in directory moil.mop:


Suppose that our system consists of benzene molecules (it could be just liquid benzene) and we
want to call them BENZ throughout the simulations (specifically in the connectivity file and CRD
file with the coordinates). Then our ALL.MONO should contain the following section (sign ~ is used
within the MOIL to indicate comments inserted into input files - commented lines are marked in red):

~   we will only define one monomer, but more can be listed
~   (start with MONO=    finish with DONE, just like for this one);
~   this monomer has 6 particles and zero total charge:
MONO=(BENZ) #prt=6 chrg=0.
~   the unique particle names differentiate the atoms within the
~   monomer, even though they may be of the same type:
~ don't forget to state which particles are linked together:
C1-C2 C2-C3 C3-C4 C4-C5 C5-C6 C6-C1
~    tell the program that's it for this file:
~    (the angles, torsions, and improper torsions are generated automatically,
~    so there is no need to specify them in the monomer file)
~    add *EOD if end of ALL.MONO

The file ALL.PROP should then define the bonds, angles and torsions in consistency with the above
defined topology:

~   first, the info about the individual particle types:
~    here C and H are treated as a united atom:
PNAM=(CH) PMAS=13.0  PCHG=0.00 PEPS=0.110 PSGM=3.750
~    end this subsection:
~    name   name    force constant     equilibrium value
CH     CH                469.0               1.40
CH   CH   CH             85.0             120.0
~  If you are not sure what a torsion angle is:
~  For a set of 4 atoms connected in a sequence, ABCD,
~  a proper torsion is the angle between the planes of ABC and BCD
~                          B        D
~                       /    \    /
~                     A         C
~    NOTE: The torsion parameters are derived from AMBER
~    but MOIL uses a different notation:
~          E = SUM V(n)/2( 1 + cos(n*phi+gamma)
~    as k(n)=V(n)/2 and phase=cos(gamma)
~    So the MOIL data structure is:
~    name  name  name  name  k(1)  k(2)  k(3)   n   cos(gamma)
CH  CH  CH  CH        0.0000   5.3000   0.0000  2    -1.0
~   the k's are the Fourier coefficients V(n)/2
~   and -1 in this case corresponds to angle
~   gamma equal to 180 degrees.
~  Now if you are not sure what an improper torsion is, here is the
~  atomic arrangement it describes. Basically, it is the angle between
~  planes ACD and BCD, giving one an idea as to how much atom C goes out
~  of the plane defined by atoms ABD.
~                 A     B
~                   \ /
~                    C
~                    |
~                    D
~ Thus, benzene doesn't really have improper torsions in our rendition
~ (i.e., for CH taken as a unified atom)
~ taken from CHARMM paper (Brooks et al. J. Comput. Chem ...)
CH  CH  CH  CH   25.0     0.0
~    end of data for this section - add *EOD if end of ALL.PROP:

The example shown above was taken from a short tutorial that was written by Danuta Rojewska.
You may find there other examples of how to build force field for your favorite molecules.
In the case study presented below I will refer to a force field already included in ALL.MONO and
ALL.PROP - see section 9..

6. Preparing the simulation system

As was already mentioned, one needs basically two things to run MD simulation: force field definition
and initial coordinates. Suppose that the force field data is already included in the files ALL.MONO and
ALL.PROP. Then one may use module conn (short form of connectivity) to get the so-called
connectivity file, which contains complete description of the system in terms of chemical entities (monomers, atoms, bonds etc.) and incorporates definition of all the relevant parameters of the force
field (e.g. force constants and equilibrium distances for the bonds involved). See  MOIL page for further

Consider an example corresponding to a case study of  section 9. The simulation system is
in this case composed of a protein called myoglobin (containing heme prostethic group at its
active site), a small ligand attached (chemically bonded) to the heme, a number of waters and
counterions. It is a realistic example used by us.

Since the ligand (carbonmonoxide in this particualr case) can dissociate from heme  we need to describe
the system at two electronic states. Thus, we need a counterpart of the system without the bond
between Fe of the heme and CO and with a new set of parameters i.e. in practice we need two
connectivity files: one for bound (oxy) and another for unbound (deoxy) form of the heme.

Another complication arises due to the use of LES for the ligand diffusion in the protein matrix (which
means that we will deal with a number of copies of CO). Suppose that we want to introduce three LES
copies of carbonmonoxide, then the input for conn module should look as follows:

~ get connectivity for  myoglobin with LESed CO ligand
file prop name=(../../moil.mop/ALL.PROP)   read
file mono name=(../../moil.mop/ALL.MONO)   read
file poly name=(poly.mb)  read
file ubon name=(addb.mb)  read
file uedi name=(edit.mb)  read
file wcon name=(les3-29loxy-unbd.WCON)  wovr
MULT pick chem mono CO done #cpy=3

If we type now conn < input > output , then we should obtain connectivity file, provided that:

  1. conn module was properly built during compilation and can be found when we invoke it
  2. files ALL.* do exist, contain the relevant sections and are accesible in the directory specified
  3. file poly.mb can be found in the current directory and it contains consistent definition of the system - see below
  4. files add.mb and edit.mb are correctly built to add and/or remove some bonds and angles
Building properly the executable conn module means among other things that the memory
(arrays used in the program) should be allocated such that there is enough room to contain description
of our system. This is Fortran - no dynamic allocation!!

If your system is large you have to remember to adjust the predefined sizes of arrays in the LENGTH.BLOCK, which is just a text file in the subdirectory moil.export/COMMON/ , and then to
recompile MOIL typing make in the directory moil.export again. This remark is also crucial for other
MOIL modules.

Regarding the remaining files used to define input of the conn module, they are:

        poly.mb :

MOLC=(XXXX) #mon=2480
TIP3 TIP3   ....   ......   TIP3
        addb.mb :
~mors atm1=1486 atm2=1533 requ=1.9
~ bond between Fe and CO
bond chem HIS 94 NE2 HEM1 156 FE
~ bond between Fe and proximal HIS
~ in case of oxy form use HEME instead of HEM1, comment unbd,  uncomment line below unbd
        edit.mb :
remo angle chem HEM1 156 NA HEM1 156 FE HEM1 144 NC
remo angle chem HEM1 156 NB HEM1 156 FE HEM1 144 ND

The number of water molecules (TIP3 monomers) must be consistent with the total number of
monomers in poly.mb.  The line:

bond chem HIS 94 NE2 HEM1 156 FE

of thr add.mb file defines additional bond between proximal histidine and iron atom (Fe) of the heme.
There are angles that have to be removed from the standard definition due to the specific structure of
metallo-porphyrin (heme) - see edit.mb.

Such defined input of the conn module should result in the connectivity file for the unbound (deoxy)
form of myoglobin with three LES copies of the ligand. In order to obtain system with CO chemically
bound to heme, follow the remark from addb.mb to generate another connectivity file, which could be
called 29loxy-bound.WCON. The line (which is currently commented):

mors atm1=1486 atm2=1533 requ=1.9

should result in additonal bond (defined by a Morse potential) between Fe of the heme and C of
CO. Note that this additional bond is defined by the indices of the ordered particles in already existing
connectivity file. It might be generated without this line in add.mb first, to check whether numbering is
correct. One should also comment in this case the last line of the main input file:

~MULT pick chem mono CO done #cpy=3

as in the bound (oxy) form one does not need all the copies - for the equilibration of the system it is
better to stay with just one copy and add others while preparing the connectivity file of the unbound
structure, with the bond between CO and Fe removed and 3 copies of CO.

And the last remark - you'd better check your connectivity file before you start simulations.
It certainly costs much less to re-run conn than to re-run all your production trajectories if you
discover that something  was wrong too late ...

If the connectivity file is ready one has to prepare another file containing the initial coordinates of all
the particles in the system in CHARMM format (CRD). In our case it is easier since we may use
crystal structure of myoglobin, which may be taken from the  PDB. The structures used in our
simulations are listed in section 9. This is going to be the most common situation as long as you deal
with proteins.

For other molecules one may go to Cambridge Crystallographic Data or in the worst case prepare this
file manually. See the MOIL page to read about conversion of protein coordinates in pdb format into
crd one. Ones this is done one may use small utility solvate, which adds water molecules to
the system to build a simulation box with explicit solvent molecules in the framework of periodic
boundary conditions.

When you reach this point you should read carefully the section: Molecular Dynamics protocols .
What I am going to describe now is a standard protocol to prepare our system to `produce' trajectories.
It consists of `bad-contact' run, minimization, heating and equilibration of the system
This is a critical phase to get reliable statistics from your working trajectories.

The first step is a short MD trajectory with a frozen protein, to avoid so-called `bad-contacts'
between the `just soluted' water molecules and the protein. We use the module dyna
and assume that it was properly built etc. Then we may type dyna < input > output
where the file input contains something like:

~ DYNAMICS - myo in water box
~ myo frozen
file conn name=(29loxy-bound.WCON) read
file rcrd name=(MBCO.CRD)  read
file wcrd name=(MBCO.DCD) bina wovr
file wvel name=(MBCO.VCD) bina wovr
nfrz pick chem mono TIP3 done
#ste=5500 #equ=5500 info=1 #crd=100 #vel=20000 #lis=10
#scl=1  #tmp=1  rand=-3151187 step=0.001 
tmpi=50  tmpf=50 
relx=12.  rvmx=9.   cdie   epsi=1.
v14f=8. e14f=2.
symm xtra=39 ytra=54 ztra=56
sym2 xtr2=38 ytr2=53 ztr2=55

We specify the name of the connectivity file, CRD file with initial coordinates and then
files to which the future coordinates and velocities will be saved. Next we use MOIL script language
to define which entities should not be frozen:

nfrz pick chem mono TIP3 done

Only TIP3 (nickname for water) monomers will be relaxed. The next part of the input will be described
in  the section: running dynamics.

Some comments are mandatory. The temperature in this run is low (tmpi=tmpf=50 K).
This is to take away the extra `heat' generated by bad contacts before integrator looses stability.
In the worst case scenario one may move some waters (or remove them) by hand. To avoid such
manipulations as much as possible one may define in MOIL two sizes of the simulation box
(symm and sym2). Starting form the larger box reduces the effect of bad contacts due to an additional
buffer region. Note also that scaling the velocities, to maintain the temperature,  is done at each
step (#scl=1), which should also protect us against overheating. You should inspect your structure
(using  Moil-View for instance) after this run.

There is another step to be undertaken, before we really start equilibrating our system. It is described
in the next section.

7. Performing energy minimization

A crystal structure of the protein from the PDB is used. This structure was obtained under
specific conditions (temperature, pH) and can has specific crystal symmetry.  Our goal is to
simulate the protein in the liquid phase.

It is expected that our protein will adopt a different conformation in solution. The same is true for the
structure of the water around the protein. Let's say that our force field is expected to reproduce
(at least on average) the structure and dynamics of the system in solution. Then, before running
production trajectories to calculate observables, we should allow the structure to adjust to the
force field to be used and to specific conditions chosen - most importantly the temperature and the
type of the solution.

In order to do so we run energy minimization. Our aim here is not to find a global energy minimum
for a specific form of the potential energy function. Rather we relax internal stress in our system
or in other words we assume that the crystal structure represents a point close to a `good' local
minimum, which we would like to reach just doing something in the spirit of simple gradient steepest
descent minimization. To do so in MOIL one has to run one of the energy minimizers - e.g. the module called mini_pwl (Powell's method).

In our particular case the input is:

~ minimization run - use mini_pwl
file rcon name=(29loxy-bound.WCON)  read
file rcrd name=(MBCO.CRD)  read
file wcrd name=(MBCO_MIN.CRD)  wovr
file wmin name=(mb.min)    wovr
mors Dmor=30.  alph=2. 
relx=10. rvmx=9. epsi=1.  cdie 
tolf=0.01  mistep=500    list=50 
symm xtra=40 ytra=54 ztra=56

Notice, that we optimize the bound (oxy) form with the chemical bond between  CO and Fe. This is
why we use connectivity file of the bound form and we specify explicitly bond between ligand and
heme (ligand - heme bond are not part of the fixed covalent structure of the system). It is constructed
as an additional Morse potentials corresponding to certain bond length and  strength.
We use only one box now (symm) assuming that `bad contacts' had been eliminated and
the starting configuration is of reasonable energy. The number of minimization steps is defined by mistep.

Other keywords are used to specify cutoff distances to restrict the range of elctrostatic and vdw series
(relx and rvmx). However, in this case the keyword ewald is used, which means that the electrostatic
series will be summed up to infinity using Ewald summation in the form of PME as implemented by
Darden et. al. Since Ewald summation splits the long range series into two parts: one in real space, which correspond to standard summation with cutoff, and another in k-space, the relx is still provided
(now it merely specifies the division into r- and k-space sums).

If everything went successfully you should be getting much lower energy and structure matching the
ideal covalent values e.g. bond lengths should correspond to that of equilibrium distances defined in

8. Running dynamics

Suppose now that we have a minimized structure and we want to perform room temperature
simulations, corresponding to the experimental conditions. Minimization means that we
freeze our structure at 0 K (there are no vibrations etc.). How do we impose life (motions,
temperatures etc. - there are many good synonyms) on our system?

We might start from the minimized structure, sample velocities from Boltzmann distribution at
300 K and go for picnic and return to collect data in a couple of days (months ...). It is not that
difficult to imagine however that if we do so, the internal motions will not correspond to that of the
system in the thermodynamical equilibrium. Those poor cold guys there will suddenly get big kicks
due to this exceptionally rapid heating and I assure you - your data, when you get back from the picnic,
is going to be meaningless. The common protocol to avoid the problem (that I admittedly simplified)
is to perform thermal equilibration of the system, as described below and in section 9.

By now, you should know most of the standard keywords that may be useful in our case study. The
input for the dyna module that is going to be used for the equilibration and for the production runs is
certainly very similar. However, in the heating/equilibration phase we start from a low temperature
(e.g. tmpi=10) and over a number of steps (e.g.  #equ=50000, #ste=100000 - yes, yes - you may
need as many as this to get your system finally equilibrated - watch fluctuation in your system to decide when you can stop) we gradually increase the temperature (we `heat up' the system) to the
final value (e.g. tmpf=300) and then we allow the system to thermally equilibrate at the final

In the production run we most probably use a constant temperature (it certainly depends on the
ensemble you want to simulate) and in our case it is temperature 300 K, as specified in the following
input example:

~ DYNAMICS myo in water box
~ full run
file conn name=(les3-29loxy-unbd.WCON) read
file rcrd name=(MBCO_EQ.CRD)  read
file wcrd name=(MBCO.DCD) bina wovr
file wvel name=(MBCO.VCD) bina wovr
~mors alph=2.0 Dmor=30.
#ste=500000 #equ=1 info=500 #crd=1000 #vel=20000 #lis=10
#scl=500 #tmp=1 rand=-3151187 step=0.001
cent kcnt=8.0 mshk shkb shac=0.0001 shav=0.0001
tmpi=300 tmpf=300 relx=9. rvmx=9. epsi=1. cdie 
symm xtra=40 ytra=54 ztra=56
ewald dtol=0.000001 grdx=32 grdy=32 grdz=32

There is still a number of things that need to be explained here. First of all we assume, that using a
modified (according to remarks given previously) version of the above dyna input and starting from
minimized structure MBCO_MIN.CRD we ended up with already equilibrated system stored in the
file called MBCO_EQ.CRD. Secondly - the connectivity file has been  replaced by its counterpart for
the unbound (deoxy) form. This means that we remove the bond between CO and Fe all of the sudden
(notice that the line of input defining Morse bond is commented and the name of the connectivity file
is different), modeling in this way photodissociation - see section 9.

The heme and CO are left (at the initial position corresponding to oxy form geometry) without the bond
and they have adjust to the new force field (when we replace connectivity we in fact replace force field
as well) - indeed, iron quickly goes out of the heme plane and CO finds its favorite place in distal
pocket of myoglobin to escape finally after some nanoseconds ...

Another comment refers to the use of the PME method. Previously we simply specified ewald keyword,
allowing the program to adjust all the necessary parameters according to some defaults. It is worth
commenting that the same applies to many keywords - for instance, it is enough to give a keyword
shkb (and mshk if water molecules are to be constrained as well) to get the bonds with hydrogen
atoms constrained - see section 9 - and thus increasing the time step to 1 fs - step=0.001.
The other specifications (shac, shav) are optional. In case of cent, which introduces additional
harmonic force to keep the center of mass of our system in the center of the box, the strength of
the force constant is specified by an optional parameter kcnt (there is some default value).

When using PME one may play with the following parameters (if you are not sure about their meaning
you'd better leave it to the program):

ewald -> the ONLY  necessary keyword for PME - all other parameters are optional (defaults in parenth.):
dtol (5*10E-5) -> tolerance for direct space summation - it essentially
                  sets up the Ewald coefficient (for a given direct space cutoff)
grdx, grdy, grdz (function of box sizes) -> (receiprocal sum) grid dimensions  - one grid point per
                  A is recommended choice, choose also powers of 2,3 or 5 if possible (for speed of fft)
iord (4) ->       interpoletion order + 1
sgdx, sgdy, sgdz (1) -> additional scaling parametrs for further adjustment of xgrd, ygrd, zgrd

There are some issues related to the accuracy of PME, which should be considered if you really
want to set up the values of the parameters yourself:

1. Suppose the size of the box is fixed. Let the tolerance of the direct sum
   be also fixed. Then enlarging the cutoff for the direct sum means that
   you don't have to "dump" the direct energy as much as previously -
   you get the same (1/cutoff)*erfc(ewaldcof*cutoff) with smaller ewaldcof,
   which means in turn, that you include more in the direct sum (erfc in-
   creases as ewaldcof decreases for a given r) comparing to the total
   energy. Thus, enlarging the cutoff when the dtol and box sizes are
   fixed (cutoff is now nothing but parameter for controling the ewaldcof
   and hence the splitting of the total energy into direct and receiprocal
   energies) you "feel" less the grid inaccuracy of the receiprocal sum.

2. It may be cheaper however to keep your cutoff small and enlarge the
   grid density instead (using xgrd, ygrd, zgrd) or/and enlarge the inter-
   polation order for the receiprocal sum.

3. For the evaluation of the direct sum one uses interpolated values of
   erfc - in order to increase accuracy of its evaluation you must enlarge
   the number of grid points per unit interval - perft.

4. You may also tune the splitting into direct and receiprocal part by
   changing the tolerance dtol when keeping other parameters fixed.
   Decreasing dtol, you "dump" more direct energy and you shift more to the
   receiprocal part (opposit to 1. - good when you want to check invariance
   of your results with respect to changes of ewaldcof).

The above solutions are certainly only applicable if you have some memory
still available and enough time to await for the results. See also PME setup to
check what we used for myoglobin simulations.

9. Case study: simulating ligand photodissociation in myoglobin

Let us now consider a `real' example. The purpose of this section is to illustrate logical and
technical problems that may appear while doing computer simulation with MOIL package.
In fact however, the basic steps described below are part of the common Molecular Dynamics protocol
and may be treated as short MD tutorial. I learned those things from Ron Elber while working with
him on this project. I learned much more than just tricks of the trade - there are non-trivial issues when
linking microscopic models and macroscopic observables by the virtue of computer simulations.

Based on the alpha version of the paper:   J.Meller and R.Elber;
Computer Simulations of Carbon Monoxide Photodissociation in Myoglobin:
Structural Interpretation of the B states, Biophysical Journal, 74, (1998)

You may actually skip Abstract and Introduction (but on the other hand they may help to understand
what is described in terms of computational protocols) and go to Methods.

    9-1.  Abstract

Photodissociation of carbon monoxide from fully solvated sperm whale myoglobin and its Phe29 mutant is studied using classical molecular dynamics simulation. Particle Meshed Ewald (PME) algorithm is employed to account for the long range electrostatic interactions using periodic boundary
conditions. The CO distribution in the haem pocket after dissociation is investigated and the results
are compared to recent experimental and theoretical findings. Two different models of photodissociation are compared. It is found that on average the ligand is confined to a small part of the haem pocket around the low temperature x-ray positions (Hartmann et. al. 1996), with almost parallel orientation with respect to the haem plane, in agreement with the recent experimental studies using femtosecond time-resolved infrared absorption spectroscopy (Lim et. al. 1995a). The impact of point mutation Phe29, as well as the different choice of simulation box are discussed. The point mutation Leu29 -> Phe29 does not influence significantly the angular and spatial distribution of CO in the haem pocket. The haem relaxation, as measured by the displacement of the haem iron atom with respect to the haem plane, is found to be essentially completed within the first 3 ps. Finally, the ligand diffusion is examined, using the Locally Enhanced Sampling (LES) (Elber and Karplus 1990) method and also the standard molecular dynamics with single ligand copy in a 'shaky' protein environment, enabling the ligand escape in the computationally available time, proving the existance of multiple escaping channels, as suggested by previous works (Verkhiver et. al. 1992). 

    9-2.  Introduction

The fundamental role of myoglobin and haemoglobin in understanding of protein structure and function, follows from their well defined structural and spectroscopical features, manifesting during conformational changes associated with their physiological functions (Karplus 1972). These globin proteins are responsible for currying and storage of oxygen in vertebrate organisms, due to the ability of binding small ligands to the haem prostetic groups that form their active sites. Upon binding or dissociating a ligand, both myoglobin and haemoglobin undergo a series of characteristic structural transitions, that have been extensively studied using various experimental techniques and computer simulations (Kuczera 1997). 

Recent experimental advances allowed a more detailed description of some aspects of ligand binding reaction and ligand and protein dynamics after photodissociation. Low temperature x-ray (Hartmann et. al. 1996, Schlichting et. al. 1994, Srajer et. al. 1996) and room temperature infrared femtosecond time-resolved studies (Lim et. al. 1995a, 1995b, 1997) of the photolyzed state of the carbonmonoxy-myoglobin (MbCO) revealed that the CO is confined to a small region of the myoglobin haem pocket, with the parallel orientation with respect to the haem plane. 

Based on those findings Lim, Jackson and Anfinrud (Lim et. al. 1995a) suggested a model of a docking site, in which the CO
adopts, unfavorable for rebinding, parallel with respect to the haem plane orientation . The existence of such a docking site
for CO, could possibly explain the ability of the haem proteins to discriminate between binding of O_2 and CO. It is known
that while a free haem binds CO molecule about 1000 times stronger then O_2 molecule, in the haem proteins this ratio is reduced approximately 50 times (Springer et. al. 1994). 

The x-ray low temperature and room temperature infrared spectroscopy experiments are in a sense complementary. While the latter one can provide only the orientation distribution of the CO molecule in the haem pocket and it infers the rotationally constraint environment (docking site) from the narrow features in the infrared absorbance spectra, the x-ray structure provide directly the relative position of CO with respect to the haem. So far, two different positions of CO have been reported (Teng et. al. 1994, Schlichting et. al. 1994). If these two different locations are also present in the room temperature structure of photodissociated MbCO, the results of the infrared polarized absorbance experiment by Lim et. al. (Lim et. al. 1995b) impose that steric constraints for CO rotation are identical in both positions. 

A number of molecular dynamics simulations, aiming at the description of a ligand photodissociation process, has been reported in the past. They examined the protein relaxation following photolysis, subsequent rebinding of the ligand or its escape from the haem pocket and further diffusion through the protein environment, as well as presence of different time scales for the protein transitions (Kuczera 1997). In particular, the rotationally constrained environment for the CO motions in the haem pocket has been predicted (Straub and Karplus 1991). However, the detailed picture of the initial stages of the ligand dynamics in terms of its distribution has not been investigated until the very recent studies of dissociated CO in myoglobin by Vitkup, Petsko and Karplus (Vitkup et. al. 1997) and by Ma, Huo and Straub (Ma et. al. 1997). These molecular dynamics simulations were certainly stimulated by the above mentioned new experiments, and indicated the existence of favorite CO orientation and location in the haem pocket, in accordance with the experimental results. 

Yet the detailed picture arising from those theoretical studies provide a number of intriguing suggestions, that may be used for further experimental and theoretical refinement of the model of ligand discrimination in the haem proteins. In the work by Vitkup et. al. 28 room temperature, 10 ps trajectories for photodissociated CO in myoglobin with a solvent shell of 492 molecules were used to generate the probability distribution for the carbon monoxide molecule in the haem pocket. Two minima have been located in apparent agreement with CO positions observed in the different low temperature x-ray structures. First minimum corresponds to CO position reported by Schlichting et. al., whereas the second corresponds to position indicated by Teng et. al. In another structure analyzed by Hartmann et. al. both minima were observed. Some dynamical aspects of the CO distribution are discussed and another binding discrimination mechanism is suggested, namely the orientation of CO oxygen toward haem iron while CO approaches Fe atom. This is due to the orientation distribution, which is found to be dependent on the distance between Fe atom and CO molecule. When the ligand is close to the iron, its orientation with respect to the haem plane normal changes by about 40^o (with the carbon atom pointing away from the iron).  As discussed previously, it seems that the experiment of Lim et. al. (Lim et. al. 1997) does not support the presence of two minima with quite different orientations of CO with respect to the haem plane. 

In the work by Ma et. al. a fully solvated photolyzed MbCO with approximately 3000 water molecules is considered. However only four 10 ps trajectories are reported. The observed spatial and angular distributions of CO are found to be in qualitative agreement with the experimental results, although they are quite broad. There are some differences with the 
vacuum simulations of the reference Vitkup et. al. In particular, the distance between CO and haem plane is found to be much larger than measured in x-ray low temperature structures. Both reported simulations provide an interesting insight into the problem, proving that molecular dynamics may be valuable complement to the experimental studies. Nevertheless, due to restricted statistics or lack of appropriate treatment of solvation they may not be conclusive. Moreover, in both cases an approximation to the long range interactions, which are cut at distances larger than several Angstroms is used. 

In light of recent development of computer facilities and new algorithms to account for the long range interactions in an exact way, such oversimplified model should be replaced by a more appropriate technique. One such technique has been recently proposed by Darden, York and Pedersen (Darden et. al. 1993). The Particle Meshed Ewald (PME) algorithm allows to avoid the computational bottleneck of the traditional Ewald summations for the long range interactions, due to the use of the Fast Fourier Transform over the grid representing the effective reciprocal potential. This enables replacing the previously common cutoff calculations without significant increasing in the CPU requirements. 

The primary purpose of the present paper is to provide a more comprehensive and systematic molecular dynamics study of photodissociated carbonmonoxy-myoglobin at room temperature, using  advanced simulation protocol, with fully solvated 
protein and full treatment of long range electrostatic interactions. As the new technique is used, the second goal of the paper is to study the impact of the PME method on the final results, depending on the particular choice of the representation of the studied system. 

To accomplish these goals photodissociated trajectories of a CO molecule in in native myoglobin (Leu29) and its 29 mutant (Phe29) were used to provide orientational and spatial distributions of the CO in the haem pocket. Simultaneously, the protein relaxation, as measured by the displacement of the iron atom with respect to the haem plane is investigated. For both considered mutants two different models of the photodissociation process, available in the literature, are used and cross-checked. The first model employs an electronic repulsion curve and two site CO molecule model, whereas the second 
model uses three site CO model and does not introduce an electronic repulsion potential upon dissociation (see Section Methods). 

For the sake of studying the features of the PME method, additionally two different box simulations (for each model of 
dissociation) are performed. The distribution of the CO molecule in the haem pocket is given in this study by multiple, short time (10 ps) trajectories, that are analyzed in terms of local changes upon photodissociation. It is very likely however, that the influence of PME methodology would manifest more clearly in case of long time scale phenomena, such as the diffusion of the ligand through the protein environment. The diffusion is known to be crucially dependent on the protein fluctuations, 
that could be potentially affected by the presence of a lattice of the copies of the system, as employed in the PME methodology. Therefore, a number of diffusion trajectories is also studied, using Locally Enhanced Sampling (LES) multiple copies protocol (Elber and Karplus 1990) or a single copy in a somewhat 'shaky' (due to fast equilibration) protein simulations in order to enable ligand escape to the solvent on the time scale of hundreds of picoseconds, which is much shorter than the observed experimentally time of order of hundreds of nanoseconds (Kuczera 1997). 

In the next section we give a detailed analysis of the molecular dynamics protocol used in the present studies. The results and discussion can be found in the paper published in Biophysics Journal.

    9-3. Methods

    A: Computational model: potentials and simulation systems

All the calculation presented in this paper were performed using the program MOIL. Standard molecular dynamics methodology is employed with the protein potential energy function of MOIL, as described in ref. Elber et. al. 1994.

The unified atom approach is applied, with only polar hydrogen atoms represented explicitly, unless specifically stated otherwise. The heme force field parameters of Kuczera et. al. 1989 are used. The water molecules are represented by TIP3 model of Jorgensen et. al. 1983.

The electrostatic interactions are not truncated. The PME method (see subsection B) has been implemented in MOIL and used to take into account the long range electrostatic interactions within the periodic boundary condition models of the systems under consideration. The much faster decaying van der Waals interactions are truncated at 9 Angstroms for the sake of computational efficiency.

Starting from the crystal structures of Carver et. al. 1992. from the hexagonal P6 crystals of the native carbonmonoxy-myoglobin and its 29 mutants (Phe29, Trp29), obtained from a synthetic gene (Phillips et. al. 1990), a fully solvated, neutral (neutral pH) simulation systems are prepared.

Two different rectangular unit cells of periodic boundary condition representation of the system are constructed for the native MbCO. First unit cell (called later Box A) has dimensions 40 A - 54 A - 56 A and contains the myoglobin protein, CO molecule bound to haem iron and 2627 water molecules (including 334 crystallographic waters). SO_4 ion present in the crystallographic structure and two sodium ions (to insure neutrality of the unit cell) are also included.

Second unit cell (Box B) has dimensions 50 A - 45 A - 50 A. Hence it has slightly smaller volume (by about 10%) and due to different orientation of myoglobin in the box, it introduces different packing of the lattice copies of the system. Because of smaller volume and different orientation of the protein there are 2332 water molecules in the box (including again crystal structure waters). For the 29 mutants (Phe29, Trp29) simulation boxes have the same dimensions as the Box A for the native structure. The orientation of myoglobin is also very similar to that used when employing Box A for Leu29.

The methionin residue has been added to the N-terminus of myoglobin, as a result of the recombinant
experiments, and it is present in the x-ray structures of Phe29 and Trp29 (Carver et. al. 1992), used as a starting point in this work. The additional Met0 residue is subsequently kept in all the reported mutant simulations. 2709 and 2598 water molecules are included in the Phe29 and Trp29 simulations, respectively. One sodium ion is added to insure the neutrality of the systems (SO_4 ion is included as previously).

For each box simulations, two dissociation models (as described in subsection C) are used, two
different representation of CO molecule, and two defferent models of the electronic repulsive potential between iron and ligand upon photodissociation. The photodissociated trajectories were generated according to the molecular dynamics protocol described in detail in subsection E.

    B: Particle Meshed Ewald method

The macromolecular systems in solution are commonly represented in molecular dynamics simulations using periodic boundary conditions. The long range electrostatic interactions are either truncated using finite cutoffs or evaluated using Ewald summation or reaction field method. It is known, that the currently available empirical force fields may lead to the artifactual behavior of the simulated system, when using truncated Coulombic interactions (Steinbach and Brooks 1994). On the other hand, the traditional implementations of the Ewald sum and reaction field methods are prohibitively expensive for large systems.

Recent methodological developments enabled however an efficient treatment of macromolecular electrostatics. Fast multiple algorithm of Greengard and Rokhlin (Greengard 1988) and Particle Meshed Ewald algorithm of Darden, York and Pedersen (Darden et. al. 1993) emerged, with the latter one being employed in the present work. The PME algorithm has been implemented in MOIL using the Fortran subroutines provided by the authors of the method. A comprehensive description of the PME methodology may be found in the ref. Essmann et. al. 1995.

Briefly, in the PME method the reciprocal Ewald sum is approximated by a discrete convolution on an interpolating grid, using the three dimensional fast Fourier transform. It has been confirmed by several numerical studies, that when the direct and reciprocal contributions are reasonably balanced by a proper choice of direct sum cutoff and Ewald coefficient, the grid density of about one grid point per Angstrom and the 4th interpolation order for the cubic spline grid interpolation of the effective reciprocal potential, is sufficient for achieving good accuracy in terms of rms force error and conservation of the total energy in microcanonical ensemble simulations (Essmann et. al. 1995).

After some experimentation, the following PME setup has been chosen for the simulations
presented here. The direct sum Coulombic cutoff was 9.1 Angstrom and the tolerance for the direct contributions on the cutoff radius sphere was equal to 1.E-6. Those two combined parameters effectively define the Ewald coefficient and hence the trade off between direct and reciprocal sums. The fourth order cubic spline interpolation and 32-32-32 points grid is used (note, that for the sake of efficiency of FFT powers of 2,3 or 5 should be used). It has been verified in a series of control simulations that the use of a finer grid (64-64-64) brings about only very small corrections to the reciprocal forces, without significant impact on the control trajectories. The error function, which is extensively used during evaluation of the direct and correction terms is tabulated with the density 5000 points per unit interval.

    C: Two models of photodissociation of CO

The photodissociation process is usually modeled by a sudden transition to the unliganded state (Kuczera 1997). Here we employ the same approach. Starting with the coordinates from the liganded trajectories, the liganded heme potential energy is replaced by an unliganded one. The Fe-C bond is replaced either by a weak repulsive interaction to simulate the motion of the system on the repulsive electronic excited state hypersurface, as used in the previous studies of geminate recombination (Li et. al. 1993), or it is just replaced by non-bonded (electrostatic and van der Waals) interactions between haem atoms and CO molecule, as employed in ref. Vitkup et. al. and Ma et. al.

In the first dissociation protocol, with the exponential repulsion potential between ligand and iron, the
two-site carbon monoxide model is employed (Brooks et. al. 1983, Elber and Karplus 1990). It consists of Lenard-Jones sites and small charges (0.021 au) of opposite sign, located at the ligand atomic
positions and approximately reproduces the experimental dipole moment. The above protocol, as used in the previous CO geminate recombination studies will be further referred to as Photodiss. A.

The second dissociation protocol (Photodiss. B), in which only the non-bonded interactions between haem and CO molecule are activated upon dissociation, uses a three-site `quadrupolar' CO model (Straub and Karplus 1991). This model was fit to reproduce experimental dipole and quadruple moments, as well as some ab initio quantum chemistry theoretical results, such as interaction energies with water. It consists of three relatively large point charges located at the carbon and oxygen nuclei (-0.75 and -0.85 au respectively) and at the center of mass (1.6 au) (Straub and Karplus 1991). Both dissociation protocols are used in the present work and the influence of a given model of the photodissociation process is examined.

    D: Locally Enhanced Sampling and ligand diffusion

It is has been experimentally observed that the diffusion of carbon monoxy ligand through the myoglobin environment and the final escape to the solvent is a process of the time scale of hundreds of nanoseconds. Hence, it is not directly accessible for the standard nowadays molecular dynamics
simulations, hardly excessing a few nanoseconds for the system of this size with significant number of trajectories.

To test the effect of using the PME methodology for ligand diffusion pathways in the protein matrix, the Locally Enhanced Sampling (LES) method (Elber and Karplus 1990) is used. LES is a mean field approach enhancing the sampling of the conformation space due to the presence of multiple copies and their higher (with respect to `cold' protein) effective temperature. Several copies of the ligand may be introduce within LES protocol. In this study we use three copies in the LES diffusion path trajectories.

It should be noted, that LES method may be easily joint with the PME methodology. As the LES copies do not `see' each other they may occupy the same position, causing singularity of the so-called correction terms. This may be however taken into account by a proper limit expressions or introducing simple approximation: when the two LES particles are on the top of each other (in practice there is a threshold for the distance between them) one particle is formed for the reciprocal calculations. The correction terms associated with this pair of particles are then not calculated, whereas the reciprocal forces are again redistributed among particles forming one virtual entity during reciprocal calculations.

    E: Molecular dynamics protocols

Starting from the x-ray structures simulation boxes has been constructed, including additional solvent water molecules and sodium counter-ions. For each box, 15 ps long `bad contacts' runs, with the protein structure frozen, have been performed, followed by 1500 steps of conjugated gradient minimization with the bound haem parameters. Next, the liganded systems have been slowly heated up from 0 to 300 K, over the period of 60 ps and further equilibrated at 300 K over additional 20 or 40 ps period (depending on the number of the structures to be picked up). From these equilibration runs, initial liganded structures for the unliganded trajectories have been sampled, with the separation between them of at least 0.5 ps.

Changing the force field parameters of a bound haem to unbound ones and the character of the interactions between haem and CO molecule, one initiates the photodissociated trajectories. In all the cases, the SHAKE algorithm has been employed to constrain the bonds involving hydrogen atoms (Van Gusteren and Beremdsen 1977). The Verlet integration algorithm has been used, with the time step 1 fs. The non-bonded neighbors lists have been updated every 10 steps. Velocities have been sampled from the Maxwell distribution. During the equilibration period the temperature has been scaled to 300 K every step. When generating the photodissociated trajectories, the temperature always remained in the window 300+-15 K, due to the use of PME and not truncated electrostatics, and the rescaling has not been used.

During the equilibration runs, the rms deviation from the x-ray structures stabilized at about 1.3-1.5 Angstrom for backbone C_alpha atoms and 1.7-2.0 Angstrom for all the backbone and side chains atoms except for termini. This is slightly more than the usual rms deviation reported in the vacuum simulations with solvation shells (Loncharich and Brooks 1992).

For the native structure, 30 photodissociated trajectories, each 10 ps long have been generated for each box (Box A, Box B) and for each dissociation protocol and CO model used (Photodiss. A, Photodiss B). This gives 120 trajectories (each 10 ps long): 30 trajectories for Box A and Phtodiss. A, 30 trajectories for Box A and Photodiss. B a.s.o. Only one box (Box A) is used for the Phe29 mutant. Ninety 10 ps long trajectories have been generated to investigate to impact of this point mutation: 30 trajectories for Photodiss. A and 30 trajectories for Photodiss. B., as well as 30 additional trajectories for Photodiss. B with all atom representation of the phenylalanine rings in the haem pocket (Phe29, Phe 43), in order to check the influence of the non-polar hydrogen atoms on the dynamics of ligand.

These multiple trajectories are combined to get the probability distribution for the CO ligand in the haem pocket and to investigate the protein relaxation upon dissociation. A number of 500 ps trajectories has been also generated to check the long time scale relaxation, using Box A and both two-site and three-site models of CO for the native structure. Several LES multiple copy and single copy in a `shaky' (due to fast heating and equilibration over the period of 5 ps only) protein environment diffusive trajectories have been also generated.


Darden T.A., York D.M. and Pedersen L.G. 1993. Particle mesh Ewald: An N*log(N) method for Ewald sums in large systems. J. Chem. Phys. 98: 10089-10092

Elber R. and Karplus M. 1990. Enhanced sampling in molecular dynamics: Use of the time-dependent Hartree approximation for a simulation of carbon monoxide diffusion through myoglobin. J. Am. Chem. Soc. 112: 9161-9175

Elber R., Roitberg A., Simmerling C., Goldstein R., Li H., Verkhiver G., Keasar C., Zhang J. and Ulitsky A. 1994. MOIL: A program for simulaton of macromolecules. Comp. Phys. Comm. 91: 159-189

Essmann U., Perera L., Berkowitz M.L., Darden T. and Pedersen L.G. 1995. A smooth particle mesh Ewald method. J. Chem. Phys. 103 (19): 8577-8593

Gordon R.G. 1965. Molecular Motion in Infrared and Raman Spectra. J. Chem. Phys. 43 (4): 1307-1312

Jorgensen W.L., Chandrasekhar J., Madura J.D., Impey R.W. and Klein M.L. 1983. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79: 926-935

Karplus M. 1972. Structure and function of hemoglobin. Proceedings of the R.A. Welch Foundation Conference on Chemical Research XVI (Theoretical Chemistry): 121-158

Kuczera K. 1997. Dynamics and Thermodynamics of Globins. in Recent Developments in Theoretical Studies of Proteins, World Scientific, Singapore, ed. Ron Elber: 1-64

Li H., Elber R. and Straub J.E. 1993. Molecular Dynamics Simuation of NO Recombination to Myoglobin Mutants. J. Biol. Chem. 268: 17908-17916

Lim M., Jackson T.A. and Anfinrud P.A. 1995b. Binding of CO to Myoglobin from a Heme Pocket. Docking Site to Form Nearly Linear Fe-C-O. Science 269: 962-965

Phillips G.N., Arduini R.M., Springer B.A. and Sligar S.G. 1990. Crystal Structure of Myoglobin From a Synthetic Gene. Proteins 7: 358-365

Schlichting I., Berendzen J., Phillips Jr. G.N. and Sweet R.M. 1994. Crystal structure of photolyzed carbonmonoxymyoglobin. Nature 371: 808-812

Srajer V., Teng T., Ursby T., Pradervand C., Ren Z., Adaci S., Schildkamp W., Bourgeois D., Wulff M. and Moffat K. 1996. Photolosysis of the Carbon Monoxide Complex of Myoglobin: Nanosecond Time-Resolved Crystallography. Science 274: 1726-1729

Van Gunsteren W.F. and Beremdsen H.J.C. 1977. Algorithms for molecular dynamics and constraint dynamics. Mol. Phys. 34: 1311-1327

Verkhiver G., Elber R. and Gibson Q.H. 1992. Microscopic modeling of ligand diffusion through a protein: Carbon monoxide in leghemoglobin. J. Am. Chem. Soc. 114: 7866-????

Author: Jarek Meller