2020-11-18 Benchmarking for Industry Partners - Development Meeting notes

Date

Nov 18, 2020

Participants

  • @David Dotson

  • @David Hahn

  • @Joshua Horton

  • @Jeffrey Wagner

Goals

  • Updates from project team members

  • Identify and address development issues encountered

  • Identify and address project risks

Discussion topics

Item

Notes

Item

Notes

Progress updates

  • DH: started porting functions from benchmarkff, switching from openeye to rdkit, split out long functions into more granular pieces

    • also SDF files Victoria used were multi-conformer SDFs

    • likely need to store molecule id in the filename, but also internally as SDF data pairs; this can be used as a check for our loading machinery

  • JW: we pull in the data pairs with the openforcefield toolkit; write out only does single conformer.

    • data pairs go into offtk.Molecule.properties; can round trip from and to SDFs

  • JH: any danger of using the toolkit on any of these steps? Will it e.g. add hydrogens / change protonation state?

    • JW: should be alright; haven’t seen a case where it changes hydrogen positions

  • JH: Deployment Procedure started here: https://openforcefield.atlassian.net/wiki/spaces/FF/pages/873922575 . This is the document we’ll be referring back to constantly to get partners up and running, and especially to debug issues

  • JW: Started on id assignment, realized it will be a major validation step

    • filling in test data into openff/benchmark/data

    • Using click for the CLI

    • Need to ensure we don’t use OpenEye for anything by specifying toolkit_registry everywhere, or just pop off the OpenEye handler at the start under the imports

    • Being strict on inputs so we get what we need downstream.

    • DH: what about conformers that are already provided by partners.

  • DD: working on energy minimization deployment / optimization execution

    • JH: probably better not to abstract away datasets; would make it hard for users to expand their use of QCF later for other internal purposes (would have to relearn)



JW: Code snippet for handling SD data in OpenFF mols:

from openforcefield.topology import Molecule offmol = Molecule.from_smiles('CCO') offmol.generate_conformers() offmol.properties['groupID'] = 'OpenFF' offmol.to_file('test.sdf', file_format='sdf')

 

(off-dev) jeffreywagner@JW-MBP$ cat test.sdf

-OEChem-11182007173D

9 8 0 0 0 0 0 0 0999 V2000
1.0616 -0.2681 -0.0006 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9101 0.9126 -0.4220 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5170 1.3236 -1.7228 O 0 0 0 0 0 0 0 0 0 0 0 0
1.3393 -0.6108 1.0001 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 -0.0002 -0.0006 H 0 0 0 0 0 0 0 0 0 0 0 0
1.1801 -1.0990 -0.7040 H 0 0 0 0 0 0 0 0 0 0 0 0
2.9684 0.6361 -0.4446 H 0 0 0 0 0 0 0 0 0 0 0 0
1.7730 1.7499 0.2687 H 0 0 0 0 0 0 0 0 0 0 0 0
2.0787 2.0794 -1.9612 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
1 4 1 0 0 0 0
1 5 1 0 0 0 0
1 6 1 0 0 0 0
2 7 1 0 0 0 0
2 8 1 0 0 0 0
3 9 1 0 0 0 0
M END

> <groupID>
OpenFF

These last two lines are the result of the properties attribute in the code

Risks

  • DH: QM calculations first in original Lim paper; then MM starting from the QM

    • bit of a scientific question: want to ensure that we end up in basically the same minima in both MM and QM

    • JH: could achive two-stage compute as an error-cycling like approach; QM data that has finished is pulled then used as starting point for additions to the MM dataset

  • JH: If after the QM step export into SDF could be sufficient for allowing Schrodinger/OPLS3e execution

  • DH: Will verify with Mobley on whether this chaining is desirable/necessary

Action items

@David Hahn will reach out to Mobley on whether MM optimizations should start from QM minima, or whether MM, QM should start from precisely the same conformer geometries

Decisions