2020-11-18 Benchmarking for Industry Partners - Development Meeting notes

Date

18 Nov 2020

Participants

Goals

Updates from project team members
Identify and address development issues encountered
Identify and address project risks

Discussion topics

Item

Notes

Progress updates

DH: started porting functions from benchmarkff, switching from openeye to rdkit, split out long functions into more granular pieces
- also SDF files Victoria used were multi-conformer SDFs
- likely need to store molecule id in the filename, but also internally as SDF data pairs; this can be used as a check for our loading machinery
JW: we pull in the data pairs with the openforcefield toolkit; write out only does single conformer.
- data pairs go into offtk.Molecule.properties; can round trip from and to SDFs
JH: any danger of using the toolkit on any of these steps? Will it e.g. add hydrogens / change protonation state?
- JW: should be alright; haven’t seen a case where it changes hydrogen positions
JH: Deployment Procedure started here: Deployment Procedure . This is the document we’ll be referring back to constantly to get partners up and running, and especially to debug issues
JW: Started on id assignment, realized it will be a major validation step
- filling in test data into openff/benchmark/data
- Using click for the CLI
- Need to ensure we don’t use OpenEye for anything by specifying toolkit_registry everywhere, or just pop off the OpenEye handler at the start under the imports
- Being strict on inputs so we get what we need downstream.
- DH: what about conformers that are already provided by partners.
DD: working on energy minimization deployment / optimization execution
- JH: probably better not to abstract away datasets; would make it hard for users to expand their use of QCF later for other internal purposes (would have to relearn)

JW: Code snippet for handling SD data in OpenFF mols:

from openforcefield.topology import Molecule
offmol = Molecule.from_smiles('CCO')
offmol.generate_conformers()
offmol.properties['groupID'] = 'OpenFF'
offmol.to_file('test.sdf', file_format='sdf')

(off-dev) jeffreywagner@JW-MBP$ cat test.sdf

-OEChem-11182007173D
9 8 0 0 0 0 0 0 0999 V2000
1.0616 -0.2681 -0.0006 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9101 0.9126 -0.4220 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5170 1.3236 -1.7228 O 0 0 0 0 0 0 0 0 0 0 0 0
1.3393 -0.6108 1.0001 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 -0.0002 -0.0006 H 0 0 0 0 0 0 0 0 0 0 0 0
1.1801 -1.0990 -0.7040 H 0 0 0 0 0 0 0 0 0 0 0 0
2.9684 0.6361 -0.4446 H 0 0 0 0 0 0 0 0 0 0 0 0
1.7730 1.7499 0.2687 H 0 0 0 0 0 0 0 0 0 0 0 0
2.0787 2.0794 -1.9612 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
1 4 1 0 0 0 0
1 5 1 0 0 0 0
1 6 1 0 0 0 0
2 7 1 0 0 0 0
2 8 1 0 0 0 0
3 9 1 0 0 0 0
M END

> <groupID>
OpenFF

These last two lines are the result of the properties attribute in the code

Risks

DH: QM calculations first in original Lim paper; then MM starting from the QM
- bit of a scientific question: want to ensure that we end up in basically the same minima in both MM and QM
- JH: could achive two-stage compute as an error-cycling like approach; QM data that has finished is pulled then used as starting point for additions to the MM dataset
JH: If after the QM step export into SDF could be sufficient for allowing Schrodinger/OPLS3e execution
DH: Will verify with Mobley on whether this chaining is desirable/necessary

Action items

David Hahn will reach out to Mobley on whether MM optimizations should start from QM minima, or whether MM, QM should start from precisely the same conformer geometries