Page Comparison

...

Discussion topics

Item

Notes

General updates

Trello update
- Done
JW – Thanks for handling Bienek PR, maintenance all over the rest of the stack
- MT – MB isn’t adding test, even though I asked him to
- JW – That’s disappointing, I’d require that from someone we pay but for such a small change from an external person I’d personally let it slide. (commented as much on the PR)
JW – Can we close enumerate_X PR?
- MT – Already closed
JW – OrderedDict PR has a bunch more changes in it. Intentional?
- MT – Yes, lots of things weren’t being cevered in tests, dealt with hacky unit conversion.
- JW – I’ll keep this on my to-do list

Forcebalance future plans

How to resolve current PRs/how to best scope changes to prioritize your attention?
FB meeting prep - Thinking we can say something like “we have the following options: …” What do you think about these paths?Thoughts on …?
- ForceBalance future plans/roadmap?
- Feature reduction
- Code rearrangement
- Maintenance
- Liability/”make-right” for various users after breaking releases
- “partial forking”/separation of responsibility via plugins/external code
LPW – Past year of FB hasn’t had much activity - My focus has been on geometric and transition state optimization. I still have big items that I’m excited about on the forcebalance roadmap, may get to them in 2023. I’ve seen the PRs from the last year, often it’s hard to get the context of everything that’s going on. But I haven’t gotten the sense that I was holding anything up. So if that’s what’s happening maybe it’s a communication issue.
JW – could be useful to have a sort of inbox/outbox for organizations, where we can request stuff from each other, but have to put the requests in a specific order, and maybe also have a way to indicate when things are “super important”.
(Re: pyMBAR 4)
- LPW – This is kinda tricky - pyMBAR 3 was in many ways worse/buggier than 2, so for a while I vendored 2. Now with pymbar 4 I think I’m rightly hesitant. It’s my responsibility to make sure that previous studies still get the same results.
- MT – I’m not sure that validating pymbar4 is your responsibility. Trying to verify/validate upstreams will add a lot of cost, and tracking which version ranges are trusted will add a lot of complexity.
- JW – At OpenFF we aim for reproducibility by saying old versions of software are available, and folks can install old verisons of deps to reproduce historical work. But it becomes excessively cumbersome to guarantee reproducibility indefinitely.
- LPW – pyMBAR is just an implementation of a mathematical method. So outputs from different releases may differ by some insignificant amount, just in the statistical noise. So there’s some range of behavior change that would be acceptable. But I’m not yet convinced that pymbar 4 is a correct implementation.
- MT – Could we add a test for this? It could check that FB takes the same step given the same situation.
- LPW – Would be good to check whether value of objective function is the same, and that gradient is the same… So specificially maybe we want to see whether pymbar gives the same output given the same input
- JW – Are there other deps that ForceBalance would be this cautious with?
  - LPW – The other deps are pretty lightweight - had an API break from networkx a little while ago and updated for that. But OpenMM/GROMACS are optional deps or deps-of-deps that wouldn’t get this level of scrutiny.
  - MT – Maybe to rephrase - JW may be saying that there’s behavior that’s important to FB but isn’t tested, and it comes down to human judgement/is unautomatable.
  - LPW – I think we could write a test for this, like checking objective function value or gradient.
  - JW – Maybe to be more direct - Requiring more testing for pymbar is an unanticipated cost, and I don’t know how many other things like this are in ForceBalance, and may make it hard for us to keep to our external-facing deadlines.
  - LPW – I think this is a unique case for pymbar specifically. If you look at the forcebalance deps, there aren’t many of them, and the list of testing deps isn’t very long, so we’re probably not too susceptible for this.
  - MT – If we know that this is a one-off for pymbar, that would be reassuring, so maybe that’s the key question.
  - LPW – It does seem like a lot of the newer packages and their deps are rapidly updating, and it makes them require constant maintenance or they’ll breka/become irrelevant. I’m philosophically a little different, where I want the outputs of my code to be consistent, even over long timescales, and even when deps are updating. So the answer to the pymbar release may be to add tests, and this sort of situation probably won’t come up very often. But I don’t want an 8-hour overhead every time I start coding something new where I have to debug breaking changes from the deps.
  - MT – I think I’m a little philosophically different, where I would recommend that folks install old versions of a stack to reproduce old results. But for the latest work, I’m in favor of continuous testing and frequent releases. This is certainly something that requires constant work/attention and it’s clearly a slightly different situation than you’re in.
  - LPW – I think we can be philosophically different and still make this work. So with pyMBAR we can add a new test. But it may be hard for FB to change as the pace that we’d like it to change if I need to understand+review PRs, and need to keep old results reproducible. But I think that we can handle differences in philosophy through discussions and one-off decisions.
    - MT – With some portion of the pyMBAR API, we can have a bunch of try/except blocks that give it multi-version compatibility against pymbar. I did this in openmmforcefields recently, so I could do the same in Evaluator as well, and this would let our stack move independently of FB’s pymbar pins. May need to change some other places as well.
    - LPW – Didn’t suggest this before, but that would work.
  - JW – Seems like three options:
    - Make OpenFF stack multi-version-compatible against pymbar 3 and 4
    - Add test for pymbar behavior in forceblanace
    - Add test for pymbar behavior in pymbar
    - LPW – Test for pymbar would require computing result for a 40x50x40000 matrix.
    - MT – That would require a lot more work than making our own stack be multi-version compatible.
    - JW – We wouldn’t need to necessarily add this test to any test suite, could just run it one time if we’re concerned that pymbar 3 → 4 transition fundamentally broke something.
    - LPW – That’s right, I could test this myself.
    - JW – We’ll go ahead with making our stack multi-version-compatible, and LPW will test for identical behavior with new pymbar
JW – For something like the OpenFF Toolkit 0.11 changes, if they weren’t intertwined with pymbar, could I review MT’s PR and make the release myself? Or what level of notification would you like?
- LPW – Could say “I need to merge+release this in two weeks”, and if I don’t understand the changes, we could have a meeting to understand it. But I’d prefer to be in the loop on these PRs, and not have you solely review+merge them. This is important to keep the context/functionality needed by of non-openff users in mind.
JW – Hosting something like smirnoffio in our own repo, so we can be responsible for updating that and fixing it if we break it? Would you find that architecturally possible, and/or advisable?
- LPW – That could work well.
- LPW + MT – (openmmio.py… Updates to openmm/simtk import paths/api… multi-version compatibility, reproducibility, )
- LPW – smirnoffio could become a separately-hosted file, that could work.
- MT – We’d need to make a plugin interface to forcebalance to make that work. There may be some technical issues where that wouldn’t be as simple as we like.
JW – Process moving forward will be more regular slack DMs and ideally ~quarterly meetings. Though I’ll warn that we have our roadmap planning coming up so we’ll update you in early/mid 2023.

Versions Compared

Old Version 1

New Version Current

Key

Discussion topics

Action items

Decisions