Updates | | JH: Tried using CLI; broken at the moment tried to build the conda env, doesn’t work on MacOS Should make an analysis environment that works on MacOS Tests of cli: can directly test CLI components as library functions
JW: merged conformer generation on Monday Code we use in toolkit read SDFs doesn’t split on 3d vs. 2d inputs do round-tripping to ensure we catch errors in validation Increases error rate observed to about 5% on test datasets, which would have happened further down the workflow This week: if user provides no geometry, just graph, job is straightforward for conformer generation if they do provide geometry, then we have to do post-trimming, guaranteeing keeping the ones they provided doing a greedy search; generate conformers (up to 10); start with first user conformer, compare to others, drop those within 2A; repeat for other user conformers DD: may want to start with e.g. 30 generated conformers, since the culling may give much fewer than 10 total if only 10 are generated after comparison to user conformers DH: one approach would be to generate conformers with a cutoff criteria of 0 angstroms (to get a guaranteed number of conformers), then filter them as desired after JW: if there is an energy-ordering of conformers from e.g. RDKit, then have to be careful we don’t systematically drop certain conformers DH: If it’s not possible to meet the 10 conformer threshold, that’s okay, but nice to hit 10, even if they are similar JW: would a crude minimum be two conformers? If we have a molecule that has no flexibility, should it be removed?
JW: Will pursue an iterative-ratcheting approach for the RMS cutoff filter, starting at 5A What about cases where a user puts in two conformers that are very similar? Should we throw any out?
(General) – Should we do any MM minimization of QM geometries? What if conformers “wander off” from QM minimum during MM optimization? DD – Could either use QM minimum only, or record two separate energies for each QM geometry (MM energy at QM minimum geometry, and MM energy after brief minimization) JW – Maybe we could make it constrained MM minimization, so that MM minimization can clean up things like simple bond stretches, and improve the signal/noise ratio for “useful” conformer energetics DD + JH – Could it actually be useful to have some geometries minimize to the same geometry? Or end up with closely-spaced output geometries? (General) – Different strategies here would shift the focus of this benchmarking between “getting energies right” and “getting geometries right”. No study to date does a good job of measuring both, and there are some major inherent differences in workflows for each. Decision – There are a bunch of ways we could go on this, but let’s stick closely to the Lim paper for initial work.
DH: playing with the drug-like molecules integrating analysis into CLI now the CLI commands work: openff-benchmark report compare-force-fields QM reference comparison to MM RMSD, torsion fingerprints Plotting also produced; 2d scatter plots, correlation plots,
DD: will add units to all quantities exported DH: specify ref_method DD: ref_method can refer to spec keys; I’ll need to add into the SDFs DH: report given as a set of CSVs; these are the inputs for any visualization that follows
DD: Trying to test out components and standardize how the CLI looks and how tests are run. Working on ingesting Fox and Swope sets Some issues with modifications to initial steps – Will have working session with JW.
DD – Preparing to submit public industry datasets to QCA. Will need CLI components to prepare these molecules.
|