2021-05-24 Core Developers meeting notes

Participants

Discussion topics

Item	Notes
Updates	SB Worked with MT on System object, found some bugs and suggested some PRs Smirnoffee can now do energy evals for entire system object. Can get gradients for the same Synced up with DD on openff-benchmark. JW – I’ll be handlingt he benchmarking refactor in July - Sept. Let me know if there are specific design points to know about. SB – Parallelization, migration to toolkit, more testing MT Mostly System stuff. SB helped a lot with his review. MT – SB suggestd making create_openff_system effectively be a classmethod fo the System object JW – I’m interested to see exactly how this works. MT - PR is here: Met with TH who volunteered to be a guinea pig for System. In 11-12 months he’ll be moving on to nonbonded stuff (right now he’s doing valence terms). He’s very interested in using parameter change propagation in his experiments. System work, not very exciting. Moves openff-utilities and openff-units into openforcefield org, made alpha releases. Openff-utilities is on conda-forge. openff-units is in a PR that requires review from C-F staff. After both of those, I’ll open a C-F PR to make the System package. SB – Turnaround on C-F hasn’t been as fast as I would like. We should make sure to keep pinging those PRs to keep them moving. Openff-units and openff-utils are an effort to reduce code duplication across packages – They have oft-repeated code blocks, in one standard form. Utils is currently pure-python and can have quick turnaround for PRs into packages. OpenFF units is a kind of wrapper around pint. Even if we end up married to simtk units, it will be good to stick with the openff-units package. IP Focused on `perceive_residues` – We now have working code to assign residues to SDF input. Handles capped and uncapped residues. Works on the T4 lysozyme in toolkit data. SB – When/under which circumstances will residue perception happen? IP – When a molecule is read from SDF, it won’t have residue info. The user will have to explicitly run `perceive_residues` on the inputs that they want. SB – OpenMM tries to unconditionally perceive residues, which has been a problem in the past. Now we’re looking at getting “Standard” residue definition from the CCD (`components.cif`). Big hurdle is finding a good MMCIF parser. Two main candidates: Biopython, or pycifrw. Pycifrw is slower and has high memory usage, but it can handle multi-entry CIF files. Biopython can’t handle multi-entry CIFs. Made some progress on deterministic RDKit conf gen, by canonically ordering the atoms before conf gen. Found that the 2021 version added some features, and the default behavior changed to do heavy-atom RMSD. So now RDKit generates less conformers than OpenEye. It turns out that the test molecule for the test that should have caught it, in the order that we were inputting it as SMILES, didn’t notice this change. But in a reversed order, it would have caught it. SB – Toolkit has a canonical-atom-ordering method, but it doesn’t return the mapping between the old and new order. It would be good to expand this method to return the canonical ordering. SB – If you canonicalize molecules using OpenEye, make sure that the methods are in sync (their main canonicalize and omega canonicalize can behaave differently) JW – We’ll canonicalize unconditionally during conf gen using both packages. IP + JW – We also found some counterintuitive behavior when setting `omega.SetCanonOrder(True)` Background: 2021-05-20 Topology Working Session Meeting notes JW: Beginning work with Andrew Dalke this week. Will initially work on getting a better molecule test set, and documenting assumptions for taking in molecules from RDKit+OpenEye MT – Separating integration vs. unit tests? JW – He’ll mostly touch test_toolkits (all integration tests) and test_forcefields (some of each) Moving through PRs – Need to do charge rounding + switch_width before next release Flaky? behavior in CI tests Topology work Deterministic conf gen with IP Blog post PB: Sulfonamide debugging Fitting experiments – Looking at gradients for optgeo targets and torsiondrives JW – Woudl system object help with lookig at gradients? PB – SB made a package (graffan) that wraps ForceBalance, which I can use to get gradients for torsiondrive+optgeo targets. Cyclic and acyclic alkanes LW Working on charges. Built interactive display that lets me iterate over how charges can be applied to fragmented molecules, and how the fragments can be stitched back together JW – How bad are these numbers? LW – On a technical level, this is easy to implement. I’m still looking at how accurate they are. Working on how to make matching large SMARTS patterns more memory efficient JW – I’m working on other changes to make SMARTS matching more efficient. If we have some options on how to do this, we can try to merge this into the OFF toolkit. SB – Two-stage matching? Do heavy atoms first, and then use those to narrow down the regions that the all-atom search could look at? I’ve done this before in some other projects. SB – Also, it may be worthwhile to bring these cases to the RDKit+OpenEye devs, since their fixes could be used by everyone. JW – Agree TG – How many atoms are in the SMARTS / how much memory does it take? LW – 50-200 atoms. Blows up to about 11 GB. LW – Made polymeterizer package, currently private. TG Working on fitting, nothing super interesting on that front. Working on vsites. In the new PR that Matt’s looking at, there’s a new addition that needs to go in. In the chargeincrements, there’s an arbitrary number of charge attributes. The mapping of charges to particles gets mangled by canonical ordering at some point. TG – So, if we have a chargeincrement with indices 0 and 1, then we need to ensure that they get mapped to the JW – Is this about trying to do different things with chemically equivalent atoms? TG – No. Let’s say I have a vsite that tags 5 atoms, but we only have 2 chargeincrements. Then, find_matches puts the atoms in the order of the match. Then we transform the… JW – … is this related to that other thing we did in the design of the vsitehandler …? TG – No TG – I have a proposed solution, but I need to think about it a bit more (General) – This is very complex, TG+SB+MT+JW will meet after sprint planning about this
Sprint planning	Will return at 50 past the hour
Charge increments on vsites	TG – The problem is that the chargeincrement indices go out of sync with the smarts indices. When I look at it, the only place where the indexing problem is solved is in `ChargeIncrementHandler`.create_force and `VirtualSiteHAndler.create_force` In the vectorized parameter PR, we don’t use create_force. So should I duplicate this code? Or could I move itinto something like `find_smarts_matches`? JW – Could this overlap with the idea of implementing a `find_multi_smarts_matches` method? When we call label_molecules, it only returns atom indices. But vsites will need to include particle indices. TG – If I have a particle, I need to know how it maps to the charge increment list. So, if I do a virtualsitehandler.find_matches, it gives me specific permutations that I use for orientation. TG – I’d like it to be so that, if I run find_matches, the keys in the returned dictionary correspond to the atom order in the chargeincrement list JW – This sounds reasonable. We could change the behavior of find_matches to change thekeys of this dictionary to be up-to-date before they’re returned. JW – It’s good by me to transform the keys of the matches to be in the correct order for the chargeincrement list (these lines: ) TG – There’s something wrong with this – For chargeincrementhandler, if you have the same SMARTS, but different ordering, is that allowed? JW – If two ChargeIncrement parameters applied to the same atoms (even in a different order), they’d overwrite/”compete” with each other. TG – ChargeIncrementHandler and VSiteHandler are the only handlers where the number of terms depends on the number of tagged atoms in the SMARTS. MT – Also LibraryCharges. JW – I’d be fine with these three handlers having the results of their `find_matches` be guaranteed to be in the same order as their indexed attributes. It’d be best for these changes to be in the places where we override `find_matches` in the specific ParameterHandler classes, instead of in the base ParameterHandler class. TG – Should this be rolled into the vectorization PR? JW – No, it should be a separate PR MT – Should these changes be required for the next release? JW – I don’t think so, the vectorization PR doesn’t need to happen before the next release. (General) – What should be the order of PRs? VSite PR (#909) find_matches ordering PR Vectorization PR (#940)
Switch-width	JW – I think it should be `force.setSwitchingDistance(self.cutoff - self.switch_width)` MT – Agree SB – I think we should do BOTH this, and update the SMIRNOFF spec to require the switching function. I think we should NOT merge this PR, since the SMIRNOFF spec doesn’t say enough about how to implement the switching width. So the SMIRNOFF spec needs to include some explicit description of the switching function, and a few other things. MT – Agree. This seems well scoped. JW – We can’t add a new attribute to the 0.3 vdW tag in the SMIRNOFF spec. Adding a new attribute requires a version change. But it would be LESS bad to change the existing definition of the 0.3 spec to give a meaning to the switch_width other than “the switch_width is always ignored and set to 0” MT – It’s not clear that assigning a meaning to it after the fact is worse than always assuming 0. JW – I think we need to change the 0.3 spec to say explciitly how switch_width is interpreted. Given that we need to make this change, I think it’d make the most sense to say “the previous implementation was wrong, the spec meant to say `cutoff-switch_width`) MT – Two questions: Should we make an 0.4 vdW tag version? JW – Yes MT – Yes Should we retroactively modify 0.3 vdW tag with additional meaning? JW – Yes MT – This seems like a really bad precedent to set, It would prevent people from reproducing our previous results using future versions of the infrastructure. JW – We’ll need to eventually write an upconverter from 0.3 to 0.4 vdW spec. How will that upconverter interpret the 0.3 switch values? MT – So the more general question is “how should the upconverter handle known bugs?”. And more generally, how do we handle the fact that this bug is baked into the values in the FFs?

Participants

Discussion topics

Action items

Decisions