Validation of small molecule crystal structures with Open Force Field

A short project with Sam Kennedy (grad student at CU Boulder) to

Outline:

  • Download/load in crystal structures for COD

  • Convert crystals into formats that OpenFF can use:

  • Select structures from COF:

    • For now, just take structures that are organic crystals and not weird. Eventually, we will want to figure these out.

    • Can start by just selecting a few (5-10) COF files to get the process working.

    • COF has SMILES database, but some stereochemistries are not explicitly defined, which will cause errors when generating openFF molecule.

  • Minimize the structures using different force fields:

    • Start with OpenFF (different versions)

    • Example here: could be used to start with (energy minimize rather than simulate)

    • Issue: we may need to do lattice minimization (minimize the box vectors as well as the atoms), which will require implementing some code.

    • Will need to minimize unit cells as well. Many programs implement this, we should figure out the best algorithms:

      • https://dasher.wustl.edu/tinker/downloads/tinker-guide.pdf

      • Basically, one needs to minimize in 3*natoms + 6 variables (box vectors).

        • Use any optimization routine (in scipy?) and calculate the derivatives as follows:

          • For coordianate derivitives, calculate the force from OpenMM

          • for box vector derivatives, pick a spacing “esp”, and increase/decrease the box vectors by “esp” by calling the OpenMM energies, then calculate the finite difference derivative. Eps should probably be around 0.001 relative of the variable.

  • Observe how much the RMSD changes from the experimental structure.

    • Note: there are other more direct comparisons to experiment (see below), but let’s get the other things working first

    • Look at RMSD20 and RMSD15

  • Goals: make it a workflow that can run relatively easily and be plugged into benchmarking

    • Will synchronize with @Simon Boothroyd once the basic version is working.

Previous notes:

 

Tentative plan/outline

  • Download/load in crystal structures.

    • Questions:

    • Start with just downloading (svn) the most recent database for OCD?

    • What’s the right way to convert from CIFs to mol2’s to parameterize and PDB structures to run?

  • Run a minimization of the sets of crystals

    • Questions:

      • Presumably using OpenMM

      • How do we select the crystals data set? Will probably need to convert from CIF to mol2, and do some informatics on what is present.

  • Observe how much the RMSD changes from the experimental structure.

  • Question:

    • As an alternative to seeing how much the RMSD changes, do we want to try to match electron density?

      • May have to see how many of the structures have electron density.

      • Do we have a useable tool for predicting electron density from a structure?

      • In theory, we could run a short finite-temperature MD simulation of the minimized structure and match the electron density. This may be overkill to start.

      • Can we just stick with RSMD from experimental structure to start with until the student gets the project working?

  • USeful notes:

    • Some code about parsing the COD using rdkit: . See the links below as well.

    • To get CIF into OpenFF, see PyCIFRW ( )

    • Open Eye toolkit can also load CIF files: