Validation of small molecule crystal structures with Open Force Field
A short project with Sam Kennedy (grad student at CU Boulder) to
Outline:
Download/load in crystal structures for COD
Convert crystals into formats that OpenFF can use:
To get CIF into OpenFF, see PyCIFRW (https://github.com/openforcefield/openff-toolkit/pull/982/files )
Open Eye toolkit can also load CIF files: https://docs.eyesopen.com/toolkits/python/oechemtk/OEChemConstants/OEIFlavor.html?highlight=cif#OEChem::OEIFlavor::CIF
Use openBabel to convert CIF to pdb file and generate openFF Molecule object with pdb + SMILES string
Use PyMol to generate an appropriate unit cell and supercell based on crystal symmetry information in cif/pdb files. Use supercell to generate openMM topology.
Select structures from COF:
For now, just take structures that are organic crystals and not weird. Eventually, we will want to figure these out.
Can start by just selecting a few (5-10) COF files to get the process working.
COF has SMILES database, but some stereochemistries are not explicitly defined, which will cause errors when generating openFF molecule.
Minimize the structures using different force fields:
Start with OpenFF (different versions)
Example here: could be used to start with (energy minimize rather than simulate)
Issue: we may need to do lattice minimization (minimize the box vectors as well as the atoms), which will require implementing some code.
Will need to minimize unit cells as well. Many programs implement this, we should figure out the best algorithms:
Basically, one needs to minimize in 3*natoms + 6 variables (box vectors).
Use any optimization routine (in scipy?) and calculate the derivatives as follows:
For coordianate derivitives, calculate the force from OpenMM
for box vector derivatives, pick a spacing “esp”, and increase/decrease the box vectors by “esp” by calling the OpenMM energies, then calculate the finite difference derivative. Eps should probably be around 0.001 relative of the variable.
Observe how much the RMSD changes from the experimental structure.
Note: there are other more direct comparisons to experiment (see below), but let’s get the other things working first
Look at RMSD20 and RMSD15
Goals: make it a workflow that can run relatively easily and be plugged into benchmarking
Will synchronize with @Simon Boothroyd once the basic version is working.
Previous notes:
Tentative plan/outline
Download/load in crystal structures.
Questions:
from CCSD ( ) or from Crystallography Open Database (http://www.crystallography.net/cod/)?
Start with just downloading (svn) the most recent database for OCD?
What’s the right way to convert from CIFs to mol2’s to parameterize and PDB structures to run?
Run a minimization of the sets of crystals
Questions:
Presumably using OpenMM
How do we select the crystals data set? Will probably need to convert from CIF to mol2, and do some informatics on what is present.
Observe how much the RMSD changes from the experimental structure.
Question:
As an alternative to seeing how much the RMSD changes, do we want to try to match electron density?
May have to see how many of the structures have electron density.
Do we have a useable tool for predicting electron density from a structure?
In theory, we could run a short finite-temperature MD simulation of the minimized structure and match the electron density. This may be overkill to start.
Can we just stick with RSMD from experimental structure to start with until the student gets the project working?
USeful notes:
Some code about parsing the COD using rdkit: . See the links below as well.
To get CIF into OpenFF, see PyCIFRW ( )
Open Eye toolkit can also load CIF files: