Discuss action for iodine cases, which are likely just wrong
Updates from team
Discussion topics
Item
Presenter
Notes
Iodine cases
Jeff
JW: Bill Swope has identified that his iodine cases produce bizarre energies (~40 hartrees away from expected), though some “succeed” at optimizations and many fail with SCF convergence errors
a more expensive computation approach would be needed to address iodine cases at this time
scf_typepk, out_of_core, or direct appear to give consistent results; not clear if they are correct
For season 1, do we:
Take action on iodine, introduce a protocol specific for iodine for partners to execute
optimize execute and optimize execute-from-server
we could add a --scf-type option that allows alternatives from df to be chosen
Take action on iodine, introducing a more expensive basis set to use for all calculations
Take action on iodine, filtering out these cases explicitly from downstream analysis
Take no action on iodine, understanding that these cases will likely be wrong
Discussion:
DH: a bit worried that option (a), which would require partners to do surgery on their data files, deleting only iodine-containing cases, then running only those; error prone
DD: in agreement with David Hahn
JW: leaning toward option (c); perhaps we can talk ourselves into (a) or (b)
DH: could still see offering option (a), but not introducing as part of any protocol; gives e.g. Bill Swope something to work with there
DD: I’m in favor of option (d); the chosen basis set is the one Parsley is fitted to, and this is now a known problem; action could be taken for season 2
DH: could be a bit unfair to the other FFs, as in this case OpenFF would fit the QM cases pretty well I think
JW: if we proceeded with option (c), would this be a one-line grep feeding to rm?
DH: would like analysis script option to take a SMARTS pattern that would subset results production
JH: good that we found an issue, but we know that the psi4 results are wrong, so some action is needed
DH: perhaps this is something that can just be communicated by partners?
JW: I like the idea of a SMARTS filter; could solve this problem and be used to solve this category of problem in the future
DH: perhaps this would be a nice first task for Lorenzo?
new Janssen post-doc; DH’s successor will continue partially on the PLBenchmarks work
DD: will add him to the invite for this call
JW: would be happy to take a first swing at the SMARTS filter; would like either DH or Lorenzo to review
DD: 3 weeks until the 3/25 plenary call; ideally would be able to tell partners to re-run step 5 of protocol to give analysis results, then instructions on where to drop their results
JW: can aim for this, can lean on option (D) if this proves difficult
DD: can ask partners to report (at their discretion) which molecules have iodine in them
JW: concerned about coverage report consistency if we do filtering like this
DD: is it technically an issue if the coverage report doesn’t exactly match the set of molecules coming from the end of the protocol?
DH: I don’t believe it’s too big a deal; even with some filtering of iodines won’t impact the numbers much; the overall distribution of SMARTS matches in the coverage report for each partner is more interesting
JW: for a refactor, would be worthwhile putting together a Dataset class or similar that can perform filtering and other operations consistently at any point in the protocol
Decision:
JW will take stab at the SMARTs filter in the report {match-minima, compare-forcefields} commands
JH: If we start putting together a Dataset class, we’ll probably end up remaking a lot of functionality in openff-qcsubmit; would want to start there if possible
DD: could approach this with composition, where the openff-benchmarkDataset (or whatever we end up calling it) has a openff-qcsubmitOptimizationDataset object, calls this internally as its state store
JW: taking a look at docs; will give this some thought, in particular the SMARTS filter machinery
Add Comment