User:WillWare/Mixed-mode molecular simulation

Molecular dynamics
The most accurate approach to molecular simulation involves finding numerical solutions to the Schr%C3%B6dinger equation for the probability densities of the electrons. The nuclei are massive enough to be considered point masses with charge. The electron probability densities permit the computation of electrostatic force vectors. Differential equations of motions can then be solved numerically. All this requires some simplifications (the Born-Oppenheimer approximation, Hartree-Fock) to be computationally feasible.

'Say a lot more about the various levels of simulation. Damian has some notes, check for more. See the Wikipedia DFT article.'

Molecular dynamics is a further simplification. Electrons are ignored and the forces between nuclei are treated as non-linear springs, with constants determined by experiment and by quantum mechanical prediction. This mass-spring system is computationally much cheaper, and accurate enough for fairly large-scale work on biomolecules. MD is an atom-by-atom calculation, and it is necessary to iterate equations using timesteps of about a femtosecond (10-15 seconds).

Modeling cellular structures at different levels of detail
Cellular structures include things like proteins, membranes, organelles, large and small molecules.
 * High-res -- a structure is modeled on an atom-by-atom basis using molecular dynamics methods, with a time step typically around a femtosecond (10-15 seconds). Lots of code has been developed for this purpose, eg. Amber, CHARMM, GROMACS, NAMD. In many cases the code is optimized for operation on a small cluster.
 * Low-res -- a structure is modeled using physics engine methods, such as rigid body dynamics, soft body dynamics, or finite element methods, at vastly lower computational cost than atom-by-atom simulation. Time steps for modeling these accurately should be in the picosecond (10-12 seconds) or nanosecond (10-9 seconds) range. Low-res models include lumped electrostatic charges. They may also have special widgets to represent binding sites and other special bits of machinery.

You want a computational model of a cell that is affordable for hobbyists and high schools. It should help to produce little movies of processes in the cell. It should be accurate in its portrayal of physical and chemical phenomena. It should be able to represent useful and interesting biological phenomena, some of them taking tens of minutes to happen. These goals may be feasible if almost all the modeling can be done at low-res without loss of accuracy.

Some of the biological phenomena that should be modelable would include:
 * A virus attack, including self-assembly of virus particles
 * Binding between a ligand and a receptor

Creating a low-res model

 * Get a standard MD force field running on a laptop. For convenience this would probably be MMTK so that stuff could be done in Python.
 * Set up a protein molecule in water. Make sure atom trajectories due to thermal motion are reproducible.
 * Fix one end of the protein in space and nudge the other end. It will bounce around. Record average trajectories for a few bunches of atoms at the free end.
 * Treating each bunch of atoms as a rigid body (including the fixed atoms), translate per-bunch average trajectories into trajectories and orientations for rigid bodies.
 * Look for spring and damper constants to create connectors between rigid bodies that give the same kind of motion.

You run a low-res model constantly, and certain events in the low-res world will trigger high-res modeling in certain regions. For instance when a ligand approaches a binding site, you want details of how that interaction works, so you turn on high-res modeling there, and things slow down. When you've modeled that phenomenon and it's finished, you go back to just the low-res modeling.

You need two triggers. One trigger happens in the low-res world, and it says "You need to start high-res modeling, and you need to do that in this region right here". The other trigger happens in the high-res world and it says "You've got what you needed from the high-res model, the phenomenon of interest is done, you can go back to low-res modeling".

There will need to be some kind of protocol for overlaying high-res and low-res models. Low-res models should graciously accept different sizes of time steps. This overlay protocol needs to be something that anybody can implement easily because you're going to need to have lots of different researchers contributing different organelles, proteins, structures, and other stuff.

The protocol doesn't just define the interface between those two models. It also tells a researcher how to develop the low-res model using information from the high-res model.

Soft body dynamics
Soft bodies are finite-element models comprised of rigid bodies (or maybe just points with mass) connected by springs and dampers.
 * http://panoramix.ift.uni.wroc.pl/~maq/soft2d/index.php - this guy appears to be widely cited among game programmers
 * http://panoramix.ift.uni.wroc.pl/~maq/soft2d/howtosoftbody.pdf - PDF version of the same thing
 * http://software.intel.com/en-us/articles/multi-core-simulation-of-soft-body-characters-using-cloth/
 * http://www.aidandoolan.com/blog/post/Refactoring-C-Soft-Body-Physics-Code.aspx
 * http://www.academypublisher.com/jcp/vol02/no08/jcp02083443.pdf
 * http://www.ep.liu.se/ecp/010/007/ecp01007.pdf
 * http://www.merl.com/papers/docs/TR2001-11.pdf
 * http://www-ljk.imag.fr/Publications/Basilic/com.lmc.publi.PUBLI_Inproceedings@117681e94b6_5bcece/CADS06.pdf
 * http://www.teknikus.dk/tj/gdc2001.htm

Software

 * http://avogadro.openmolecules.net/wiki/Main_Page -- Avogadro is for designing and displaying molecules. It doesn't do simulation.
 * http://ambermd.org/ -- The Folding@Home people say that Amber is better than Tinker or Gromacs, at least for the work they're doing.
 * http://www.msg.ameslab.gov/GAMESS/ -- GAMESS is more about quantum mechanical calculations than the simple mechanical model with the energy terms.
 * http://dasher.wustl.edu/tinker/ -- A bit dated, according to the Folding@Home folks.
 * http://www.ks.uiuc.edu/Research/namd/
 * http://folding.stanford.edu/English/FAQ-gromacs
 * http://www.stanford.edu/group/pandegroup/folding/AMBER.html
 * http://www.stanford.edu/group/pandegroup/folding/QMD.html
 * http://www.ameslab.gov/hoomd/about.html -- HOOMD stands for Highly Optimized Object-oriented Molecular Dynamics. It performs general purpose molecular dynamics simulations on a single workstation, taking advantage of the NVIDIA GPUs to attain a level of performance equivalent to roughly dozens of processor cores on a fast cluster. The object oriented design of HOOMD makes it versatile and expandable. A number of different force fields and integrators are present in the current version, and additional ones can be added easily. Simulations are setup and run using very simple python scripts, allowing complete control over the force field, integrator, all parameters, how many time steps are run, and so on. The scripting system is designed to be as simple as possible to the non-programmer.
 * http://wiki.simtk.org/openmm/ -- OpenMM a library which provides tools for modern molecular modeling simulation. As a library it can be hooked into any code, allowing that code to do molecular modeling with minimal extra coding. Moreover, OpenMM has a strong emphasis on hardware acceleration, thus providing not just a consistent API, but much greater performance than what one could get from just about any other code available.
 * http://dirac.cnrs-orleans.fr/MMTK/getting-mmtk/installing-mmtk -- MMTK is molecular modeling in Python but it has a boatload of prerequisites: Python, Numerical Python, netCDF (use the option --enable-shared when running the configure script), Scientific Python

Tasks

 * Build a molecular modeling workstation and create a HOWTO for doing that, which would include instructional material about the science and how the programs all work and the motivation for all of that, and how to make videos and put them on YouTube or DVDs.
 * Teach myself all the QM that I've ignored all this time. This could be done in fits and starts, and is not too difficult.
 * Work with the GAMESS code, see if it can be CUDA-fied. I'll need to prove that the accuracy of the high-res model isn't obliterated by being surrounded by low-res modeling.
 * Get mixed-mode simulation working for a few proteins, including the two triggers identified above.
 * Come up with a logarithmic scale of target phenomena and objects. Start with small proteins, work up to larger proteins, then viral assembly, then a ribosome building a protein, eventually an entire mitochondrion, and ultimately an entire cell.

Related pages

 * GPUs and CUDA
 * Physics engines
 * Mixed-mode molecular simulation
 * Whole-cell simulation