2020-11-18 Benchmarking for Industry Partners - Development Meeting notes

Date

18 Nov 2020

Participants

David Dotson

Goals

Discussion topics

Item

Notes

Progress updates

DH: started porting functions from benchmarkff, switching from openeye to rdkit, split out long functions into more granular pieces
- also SDF files Victoria used were multi-conformer SDFs
- likely need to store molecule id in the filename, but also internally as SDF data pairs; this can be used as a check for our loading machinery
JW: we pull in the data pairs with the openforcefield toolkit; write out only does single conformer.
- data pairs go into offtk.Molecule.properties; can round trip from and to SDFs
JH: any danger of using the toolkit on any of these steps? Will it e.g. add hydrogens / change protonation state?
- JW: should be alright; haven’t seen a case where it changes hydrogen positions
JH: Deployment Procedure started here: Deployment Procedure . This is the document we’ll be referring back to constantly to get partners up and running, and especially to debug issues
JW: Started on id assignment, realized it will be a major validation step
- filling in test data into openff/benchmark/data
- Using click for the CLI
- Need to ensure we don’t use OpenEye for anything by specifying toolkit_registry everywhere, or just pop off the OpenEye handler at the start under the imports
- Being strict on inputs so we get what we need downstream.
- DH: what about conformers that are already provided by partners.
DD: working on energy minimization deployment / optimization execution
- JH: probably better not to abstract away datasets; would make it hard for users to expand their use of QCF later for other internal purposes (would have to relearn)

JW: Code snippet for handling SD data in OpenFF mols:

from openforcefield.topology import Molecule
offmol = Molecule.from_smiles('CCO')
offmol.generate_conformers()
offmol.properties['groupID'] = 'OpenFF'
offmol.to_file('test.sdf', file_format='sdf')

(off-dev) jeffreywagner@JW-MBP$ cat test.sdf

-OEChem-11182007173D
9 8 0 0 0 0 0 0 0999 V2000
1.0616 -0.2681 -0.0006 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9101 0.9126 -0.4220 C 0 0 0 0 0 0 0 0 0 0 0 0
1.5170 1.3236 -1.7228 O 0 0 0 0 0 0 0 0 0 0 0 0
1.3393 -0.6108 1.0001 H 0 0 0 0 0 0 0 0 0 0 0 0
-0.0000 -0.0002 -0.0006 H 0 0 0 0 0 0 0 0 0 0 0 0
1.1801 -1.0990 -0.7040 H 0 0 0 0 0 0 0 0 0 0 0 0
2.9684 0.6361 -0.4446 H 0 0 0 0 0 0 0 0 0 0 0 0
1.7730 1.7499 0.2687 H 0 0 0 0 0 0 0 0 0 0 0 0
2.0787 2.0794 -1.9612 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
1 4 1 0 0 0 0
1 5 1 0 0 0 0
1 6 1 0 0 0 0
2 7 1 0 0 0 0
2 8 1 0 0 0 0
3 9 1 0 0 0 0
M END

> <groupID>
OpenFF

These last two lines are the result of the properties attribute in the code

Risks

DH: QM calculations first in original Lim paper; then MM starting from the QM
- bit of a scientific question: want to ensure that we end up in basically the same minima in both MM and QM
- JH: could achive two-stage compute as an error-cycling like approach; QM data that has finished is pulled then used as starting point for additions to the MM dataset
JH: If after the QM step export into SDF could be sufficient for allowing Schrodinger/OPLS3e execution
DH: Will verify with Mobley on whether this chaining is desirable/necessary

Date

Participants

Goals

Discussion topics

ction items

Decisions