2021-03-03 Benchmarking for Industry Partners - Development Meeting notes

Date

03 Mar 2021

Participants

Goals

Discuss action for iodine cases, which are likely just wrong
Updates from team

Discussion topics

Item	Presenter	Notes
Iodine cases	Jeff	JW: Bill Swope has identified that his iodine cases produce bizarre energies (~40 hartrees away from expected), though some “succeed” at optimizations and many fail with SCF convergence errors a more expensive computation approach would be needed to address iodine cases at this time `scf_type` `pk`, `out_of_core`, or `direct` appear to give consistent results; not clear if they are correct For season 1, do we: Take action on iodine, introduce a protocol specific for iodine for partners to execute `optimize execute` and `optimize execute-from-server` we could add a `--scf-type` option that allows alternatives from `df` to be chosen Take action on iodine, introducing a more expensive basis set to use for all calculations Take action on iodine, filtering out these cases explicitly from downstream analysis Take no action on iodine, understanding that these cases will likely be wrong Discussion: DH: a bit worried that option (a), which would require partners to do surgery on their data files, deleting only iodine-containing cases, then running only those; error prone DD: in agreement with David Hahn JW: leaning toward option (c); perhaps we can talk ourselves into (a) or (b) DH: could still see offering option (a), but not introducing as part of any protocol; gives e.g. Bill Swope something to work with there DD: I’m in favor of option (d); the chosen basis set is the one Parsley is fitted to, and this is now a known problem; action could be taken for season 2 DH: could be a bit unfair to the other FFs, as in this case OpenFF would fit the QM cases pretty well I think JW: if we proceeded with option (c), would this be a one-line grep feeding to `rm`? DH: would like analysis script option to take a SMARTS pattern that would subset results production JH: good that we found an issue, but we know that the `psi4` results are wrong, so some action is needed DH: perhaps this is something that can just be communicated by partners? JW: I like the idea of a SMARTS filter; could solve this problem and be used to solve this category of problem in the future DH: perhaps this would be a nice first task for Lorenzo? new Janssen post-doc; DH’s successor will continue partially on the PLBenchmarks work DD: will add him to the invite for this call JW: would be happy to take a first swing at the SMARTS filter; would like either DH or Lorenzo to review DD: 3 weeks until the 3/25 plenary call; ideally would be able to tell partners to re-run step 5 of protocol to give analysis results, then instructions on where to drop their results JW: can aim for this, can lean on option (D) if this proves difficult DD: can ask partners to report (at their discretion) which molecules have iodine in them JW: concerned about coverage report consistency if we do filtering like this DD: is it technically an issue if the coverage report doesn’t exactly match the set of molecules coming from the end of the protocol? DH: I don’t believe it’s too big a deal; even with some filtering of iodines won’t impact the numbers much; the overall distribution of SMARTS matches in the coverage report for each partner is more interesting JW: for a refactor, would be worthwhile putting together a `Dataset` class or similar that can perform filtering and other operations consistently at any point in the protocol Decision: JW will take stab at the SMARTs filter in the `report {match-minima, compare-forcefields}` commands JH: If we start putting together a `Dataset` class, we’ll probably end up remaking a lot of functionality in `openff-qcsubmit`; would want to start there if possible DD: could approach this with composition, where the `openff-benchmark` `Dataset` (or whatever we end up calling it) has a `openff-qcsubmit` `OptimizationDataset` object, calls this internally as its state store JW: taking a look at docs; will give this some thought, in particular the SMARTS filter machinery JH: take a look here also an option to subclass components where needed to inject functionality that isn’t present
Updates		DH: no update, working again on Schrodinger pathway on Friday Onboarding Lorenzo this week; will get him started with running the burn-in set through the protocol Gary and David Mobley interested in driving forward torsiondrives for benchmarking JH: no update, feel free to reach out for the torsiondrive stuff, implementing something similar for qcsubmit JW: copied logic from QCSubmit to do the checking for stereochemistry issues, but wasn’t sure where to plug into the export logic DD: unit test coverage of `optimize` command tree is up, now at 74% coverage just PR #65 to finish up, then release this week
Season 2		We will seek information from Gary on his preferences, and aim for this discussion to be an agenda item for the 3/24 meeting

Action items

Jeffrey Wagner will create initial implementation of SMARTS filtering for the report {match-minima, compare-forcefields} commands
David Dotson will add initial torsiondrive execution PR to openff-benchmark; pursue assistance from Joshua Horton and Lorenzo to familiarize him
David Dotson will invite Lorenzo to this weekly benchmarking development call
David Hahn will get Lorenzo started with the protocol, burn-in set
David Dotson will finish integration work in openff-benchmark#65; pursue final review from Jeffrey Wagner ; proceed to openff-benchmark release

Date

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments