2023-01-17 Protein-ligand benchmarks meeting notes

Participants

@David Dotson
@John Chodera
@David W.H. Swenson
@Irfan Alibay
Ivy Zhang
@Iván Pulido
Jenke Scheen
@Mike Henry
@Richard Gowers
@Jeffrey Wagner

Goals

DD : fah-alchemy name change - nominations
- review proposed name list, nominate your favorite
  - each nomination needs to be seconded to threshold interest
- nominated names will be assembled into a Slack poll on #free-energy-benchmarking; voting will last three days
- the top three names will be placed in a second round of polling; the winner of this poll will be the new name of fah-alchemy
DD : new meeting invite will go out for next week
DD : current sprint - deadline 1/24
- architecture overview : https://drive.google.com/file/d/1Elw5vWYXuGKSuO-E3jNMxkYnSQaioVF8/view?usp=share_link
- coordination board status : fah-alchemy : Phase 1 - MVP
- updates on In Progress cards
- seeking volunteers for unassigned cards in Available
- DD : fah-alchemy 0.1.0 milestone
IZ: Cinnabar bug
JC : object models for representing free energies + uncertainties

Discussion topics

Item	Notes

Item	Notes
DD : `fah-alchemy` name change - nominations	https://openforcefieldgroup.slack.com/archives/C02GG1D331P/p1673413806334149 review proposed name list, nominate your favorite each nomination needs to be seconded to threshold interest nominated names will be assembled into a Slack poll on `#free-energy-benchmarking`; voting will last three days the top three names will be placed in a second round of polling; the winner of this poll will be the new name of `fah-alchemy` `scaled-alchemy` `alchemy-at-scale` `alchemscale` `alchemiscale` `scalechemy` `alchemagic` `alchemage` `alchemicloud` `alchemega` `alchemistar` `alchemistra` `alchemotron` `alchemistron` `alchemistrat` `alchemistrata` `alchemistrati` `alchemistream` `alchemistorm` `alchemyx` `alchemicore` `alchemore` `alembic` `cucurbit` `firelute` `chrysopoeia` `alkahest` `alchemylab` `aludel` `athanor` `paracelsus` Nominations: `alchemicore` `alchemylab` `alchemscale` `alchemicloud` `alchemitron` `alchemiscale` `alchemyx` DD – multiple votes in the initial round? (General) – yes MH – Twitter poll once we have the top 3? DD – Nah, let’s keep it internal IA – Do we have the rights to these/are they used anywhere else? MH – I’ll check these out MH - All are available on pypi and conda DS – What messages are we trying to convey here? JC – The concepts of alchemy, scalability, distributed computing, and object models.
new meeting invite will go out for next week	DD – Current meeting is on Diego’s account. Going to remake it with a new zoom link. JW – I’ll handle this
DD : current sprint - deadline 1/24	architecture overview : https://drive.google.com/file/d/1Elw5vWYXuGKSuO-E3jNMxkYnSQaioVF8/view?usp=share_link coordination board status : fah-alchemy : Phase 1 - MVP updates on In Progress cards seeking volunteers for unassigned cards in Available DD : `fah-alchemy` 0.1.0 milestone fah-alchemy #25 (deployment PR) MH – Ready for review. Would love to have some eyes on this. I’ll do further testing on EC2 before I click merge. I’d like to walk through this with DD and/or Hugo on screenshare for review. DD – Let’s do that just following this call MH – Sounds good. gufe #111 DD – Ready for review. Implements topological approach and protocoldagresults. Met with OpenFE on Friday to hash this out, now it’s ready for review. gufe #110 MH – Still in progress DS – Need anything from me? We can probably pair up on this and knock it out in an hour. MH – I’ll coordinate this. IP – I’m working on making this work with noneq cycling. Will update you if I find anything. Perses 1066 (noneq cycling) IP – Met with DS last week, implemented free energy estimate with error… IP – JC reviewed, I need to make changes in response to that. IP – Some other stuff was moved from perses to gufe so I need to check into those and do any needed restructuring. DD – Want to pair up with anyone on this? IP – First let me get up to date on gufe 110, then I’ll reach out to pair up. plb 83 IA – Just awaiting review. IP – This is waiting on me. I noticed that we have some conflicting results but I think it’s a problem with my script. So I need to do that and check that the results are consistent. IA – Thanks for looking into this, IP. IA – PLB 87 is probably also ready for review. Some things we can resolve now IA – TPO entry in TPK2 has a linker. Is this standard? We’d mentioned that we’d remove anything nonstandard, should we remove this? JC – I was thinking of removing cofactors. The TPO should stay. RG – The PDB says that the TPO is fine and follows the rules. IA – There are CONECT records for Zns and Mgs, which I’m not used to. IP – I remember MBoby leaving those because she classified those as structurally relevant. So we should check with her. IA – I have no strong opinions for or against. JC – OpenMM can’t handle this. I’ve seen these done both ways. DD – Details? JC – These are structural ions that may be required for protein to hold its conformation. Sometimes CONECT records are added for residues that are coordinating these. I think there may be another way to handle this in the PDB - we’d done a survey and 15% of PDB entries had coordinated ions and they’re handled in different ways. DD – If the PDB entries do this differently, are we free to pick a solution? JC – I’d say to just get rid of them. Conventionally there are 3 ways to do it … Give them off site charges in geometrically reasonable places Define harmonic bonds IA – I’d advocate just adding them to the LINK section. JC + DD – Agree JC will review this PR fah-alchemy 66 DD - Hugo is working on this, for adding some CLI functionality. This is ready for review. Any Qs for me to pass to Hugo? fah-alchemy 56 DD – We have end to end tests in place for submitting networks, making tasks, actioning tasks, executing tasks, pushing results,and puilling results. Adding tests for successes and failures. Hugo already reviewed. I could also use a review from DS. DS – Given this structure we can’t make it identical to my implementation, but we should be able toput a wrapper around this to make it look very similar. DD – That’s fine, if there are easy changes I can make to simplify this/make them more similarlet me know fah-alchemy 34 DD – I’ll return to this once I’ve finished #56 fah-alchemy 46 (notebook documentation) DD – IIRC this was going to be migrated to GUFE. Mind if I do this, RG? RG – Sure. We can add a notebooks directory at the top level of docs. MH – It may make sphinx stuff easier to put it in the docs folder. But the docs env has everything so that should be capable of testing. JC – Can we include this notebook in CI? DS – If the notebook does analysis it may get bulky. Could store (something) in S3 DD – This notebook builds a single alchemical network and… oh it is a little tricky. But it should be able to run fast. JC – Has anyone used jupyter-book as an alternative to nbval and sphinx? Toni mey just pointed me to it and it may be a good alternative to nbval. https://jupyterbook.org/en/stable/intro.html DS – Not sure how that works if you’re deploying to the same place as your sphinx docs. DD – This touches on “do docs live in their own repo?” RG – To get this actioned, let’s focus on getting this into gufe for now and putting it somewhere. Then we can experiment with things later. MH – jupyter-book looks promising, but we can experiment with it later DS – I’m a fan of keeping notebooks out of the core repo if possible. Especially if they have images. MH – … DS – … DD – Is it OK to move this PR into GUFE or do we know it should go into a separate repo? DS – Not a blocker for now. RG – We have an example notebook repo already, so let’s put it there. https://github.com/OpenFreeEnergy/ExampleNotebooks DD – I’ll do this. PLB 78 (moving data into python package) DD – I’m waiting for some of IA’s PRs before I go ahead with this. fah-alchemy 42 DD – I’m coordinating with Hugo on this
IZ: Cinnabar bug	IZ – Some discussion already on the issue. To recap - I ran cinnabar on the dataset. Most things looked good, though there were some cases where RMSE and MUE were outside 95% CI(?). Talked with JC and IA and figured we want to figure out how to resolve this. IZ – There are two types of error/noise. One is subsampling within dataset. Another is to add outside noise…. Maybe we have two settings, one where we add noise… JC – Usually we just do bootstrapping to account for finite sample size to predict whether we’d see a difference between methodologies on larger datasets. HBM implemented an alternative approach… which may have been less typical. Maybe we should change the behavior so just using the finite sample size bootstrapping is used, and the other ones would be an optional feature that would need to be manually enabled. IA – Two things happening here. One is a bugfix where when you use … with noise then you should be using the peaked mean instead of the MLE one…. I’d propose doing the bugfix now. Then we could change the default behavior later. Then in the future we could implement the behavior change in a more major release. JC – I see the old behavior as entirely a bug. It’s making it hard to compare the same data on a result. It’s impossible to find statisical dsignificance between results as is. IA – Could implement the default change immediately and make it a major release. MH – That’s a good idea. JC – The packaging has changed anyway IA – Related question: How urgent is this? Would RG’s upcoming major API breaks fall into the same timeline? JC – This would be a good change to have. This has caused trouble in my lab multiple times in the last few weeks. So I’d like to get this out the door immediately. MH – The larger API changes are fairly decoupled from this. I don’t think this will cause much duplicate work. IA – Sounds good to me RG – Sure, we can put this is. MH – IZ, can you carry this forward? IZ – Sure, could use advice on how to expose kwargs and a code review. JC – Would call the `bootstrap_calculated/computed_uncertainty` and `bootstrap_experimental_uncertainty` IZ – … JC – … JC – Lines 117 and 118 in the PR code snippet should be controlled yb different flags IA – Is there a reasonable use case where we’d want to decouple these? JC – … IA – OK, that’s fine
JC : object models for representing free energies + uncertainties	JC – Related to noneq cycling. It looks liek there’s separate API calls for get free energies and get uncertainties. Does the objet have a stderr and MLE storae? Or will there be more optionas like storing boostrap samples or other things? DD – Using noneq cyc as an example, there are different methods to get estimate and get uncertainty. But the method has to do the same work either way… JC – Big Q for object model are do it only represent dE and stderr, or can we do more? DD – Only exposing get energy, get uncertainty, and get rate of convergence. IA – The reason we implemented get_uncertainty is so that protoocls could define their own way of getting uncert. So this leaves the door open for user defined method. That was the intention. JC – This will be difficult - If you have different watys of propagating uncertainty through a network you need to have different ways to represent them. So in the future it would be good to expose additional types of uncertainty to support things like diffnet. Also the uncerts are often not following a normal distribution. Not urgent, but worth thinking about in the future. DD – JC, could you write up a proposal on this? For noneq cycling we may have one set but for something else we may have another set. JC – Sure. Please tag me on an issue and I’ll write up my thoughts. DD – Sure, I’ll do this MH – This makes sense in GUFE, where protocols can extend it. This will also help making analysis tools modular. DD – When it comes to free energies, things like get_estimate are pretty straightforward compared to get_uncertainty. So the uncertainty implementations/API will need room to expand.

Meetings

2023-01-17 Protein-ligand benchmarks meeting notes

Participants

Goals

Discussion topics

Action items

Decisions