Interchange release/ productionization discussion | Working together on other unblockers Regression testing This migration has to be accompanied by a suite of regression tests. In this case the perfect must not be the enemy of the good, and yes while there may be many permutations of possible inputs, covering even a subset of those is absolutely better than covering none. I'd also be very cautious about making arguments that 'the silent bugs will be there for the first Interchange users anyway'. While bugs do happen, you need to be showing in the regression tests that in at least 90% of use cases (most of which boil down to - here is a box of molecules, where's my OpenMM system...) the behaviour hasn't changed from what the previous TK version, or if it does, show unambiguously that what the TK was previously doing was actually a silent bug. Regression tests would have caught some of the most severe TK bugs such as #807 and #990.Moreover, the regression tests for such a critical change must be publicly documented and easily reproducible. Ideally I'd be able to look over a list of 'this is exactly what was tested' so I can see if my use case is covered. That way I can easily say, 'well what about my X use case' if it is missing from the list?For this particular migration, at minimum I would need to see: for every molecule in the Enamine 10K diversity set and the public OpenFF benchmark set, and every condensed phase systems used to fit Sage: the OpenMM system created using openff-2.0.0.offxml and the new version of the TK matches exactly (to within 6 d.p.) the OpenMM system created using the old TK. If differences occur, they either would have to be fixed or very clearly documented as to why they are permissible (i.e. a bug was fixed or a previously un-supported feature was now supported)
for a small set of diverse molecules that match a force field containing a bond parameter with WBO interpolation, a torsion parameter with WBO, every type of virtual site and every permutation of the match attribute: the OpenMM system created using said force field, and the new version of the TK matches exactly (to with 6 d.p.) the OpenMM system created using the old TK. The test must show that the parameters do get applied, and they must also use randomised values to ensure that changes to parameter attributes are actually captured and its not just the default values that get passed through the geometry of v-sites created using the new OpenMM system matches the geometry using the old OpenMM system
for every attribute of every built-in parameter handler (e.g. cutoff, v-site distance, bond length):
Feature parity After the migration every 'feature' that was previously supported by the OpenFF toolkit on the 01/01/2022 must either be supported, or a concrete roadmap for re-adding support must be given because, after all, tomorrow never comes. A feature freeze should be in place on the TK's FF and handler classes until previous feature parity is met (i.e. no moving targets).In this case, I'd define feature parity as follows: every built-in handler and their associated attributes must be able to be stored in an Interchange object and be exported to an OpenMM system. since 0.2, the toolkit has given the option to ingest user provided parameter handler classes. These must be able to be stored in an Interchange object, and at minimum be able to be exported to an OpenMM system.
I know @Matt Thompson says there is some ambiguity here as 'For implementation details and/or gaps in the existing SMIRNOFF spec, it’s unclear if Interchange must duplicate the toolkit’s behaviour, follow the spec, or start from the ground up.' I think this can be also resolved concretely.The SMIRNOFF spec which is the specification of what can exist within a SMIRNOFF force field and the OpenFF toolkit is currently the reference implementation of how the specification should be implemented. Interchange should aim to follow the OpenFF TK implementation exactly as it is the reference. In cases where the OpenFF toolkit contradicts the spec, an issue should be raised on the TK repo and the SMIRNOFF committee should weigh in on whether the spec should be altered to match the reference, or the reference should be fixed to match the spec. I hope all of this is a very concrete ask, but please let me know if anything is not clear and if so, we can have a call to discuss. |
| MT – Sticking points Combining nonbonded forces will be very hard GBSA would be several weeks. Plugins in their current form will not be supported ToolkitRegistries will be hard to wire through Energy tests – How to go from files on disk to energies? Unresolved spec ambiguities. JW – This can be a good time for a discontinuity in output systems/force types - Could have a table of input permutations and list the inputs combinations where the outputs will now be different. MT – 99% of users use the same nonbonded settings (the defaults from parsley/sage, with or without periodic input). We can look for consistency with other input combinations but the space of inputs is absolutely huge and we can’t cover them all.
Inputs to permute Permutable settings vdW ES constraints Topology periodicness GBSA Vsites chargeincrements? WBOs Non-LJ vdW?
Plugin spec Should new functional forms be supported automagically? Or should people experimenting with new functional forms be required to implement their own functional forms? (specifically “should interchange automagically generate equivalent code to the current create_force methods in plugins, or should plugins need to write a new method to tell interchange how to export to OpenMM?” JW – Researchers who have written plugins will need to write new exporters, we shouldn’t support it automagically. “You’ll have to rewrite parts of your plugin, but it will be about as complex as writing the create_force , just with different rules.”
How to incorporate plugins into current Interchange?
- Jeffrey Wagner will fill out spec for completion of “OpenMM Export” item in 2022 roadmap. Jeff will assemble stakeholders (SB as primary), drivers (MT as primary), and have a deadline for speccing in two weeks.
|