2022-05-17 Protein-ligand benchmarks meeting notes

Participants

  • @Diego Nolasco (Deactivated)

  • @Jeffrey Wagner

  • @Irfan Alibay

  • @Richard Gowers

  • @David Dotson

  • David Swenson

  • @Lorenzo D'Amore

  • @Mike Henry

Goals

  • RG : Overall scope of gufe & arsenic

  • DS : Results store proposal

  • RG : openff-arsenic status

    • Proposed FEMap API

    • Proposed plotting design patterns

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Scope of gufe

RG

  • RG – we decided upfront that we wanted shared data models, which we placed in gufe. What we didn’t do upfront was define the boundaries of gufe (what goes in, what doesn’t).

  • “GUFE is the stuff in the box”

  • https://docs.google.com/document/d/1k7qakmhanwYkUMPpwR8zDApd9nphTuOQSOxQIP7RCWY/edit?usp=sharing

  • (status as of the time of this meeting)

  • Red = Doesn’t really exist yet, yellow = kinda exists, green=pretty close to final

  • RG – DD, does this look good to you?

    • DD – At a glance it seems consistent with the picture in my head

  • JW – question on the ProteinComponent: does this include crystallographic waters?

    • RG – yes, cofactors, crystal waters, etc. are considered part of the protein, or biological unit

    • JW – what about protonation states?

      • RG – as a first pass we treat this as read-only

  • DD – Does Arsenic take in multiple edges and use a maximum likelihood estimator?

    • IA + RG – Yes

  • DS – Some aspect

  •  

Results storage

DS

  • https://docs.google.com/document/u/1/d/1wcTy4TqBIGbxdt9p2reVTY4J5iKAzF673DQsZeZ8vg8/edit?pli=1

  • DS – Some aspeccts of results store will be in GUFE, others won’t.

  • DS – Big question is “how do we label this data?” Plan is to store as bytes and perform conversion as late as possible. So results will be indexed by a string that looks like a file path.

    • DD – Was talking with IA yesterday. When you do some switching calculation, you need to do some equilibrium calcs on the endpoints. So if we get the labels to recognize when we’re requesting a duplicate calculation, then we could save a good amount of work. But this raises the question of how we’d ensure that the results are in the results store….(?)

    • (Full details of this discussion are in comments on that doc/ongoing discussion will happen offline)

    •  

openff-arsenic status

RG

  • RG – “arsenic” name is already taken on pypi

  • Proposed FEMap API

    • Pretty large API break coming

    • Right now FEMap constructor requires a CSV

    • FEMap object lightly wraps a networkx graph. I think we’ll drop the wrapper and just hold a nx graph directly.

    • RelativeResult will represent an edge. AbsoluteResults will represent a node.

    • DD – Is the goal to use GUFE objects inside arsenic?

      • RG – I don’t think that’s necessary. Since nx allows any hashable object to be used as a label, then I don’t think we need to do anything special, since GUFE objects will be hashable.

      • DD – Should be easy to have a polymorphic-ish API - Let GUFE users use things natively, but let other users provide hashable objects from other worlds

      • RG – If we want people to be able to contribute other math/estimators, we’ll want to keep the internal data models generic.

    • JW – It’d be great to do the API break and the name change in the same release - Keeps it easy for users.

  • Proposed plotting design patterns

    • The functionality here is pretty good.

    • The methods here don’t even take a FEMap object, they take nx graphs. So that will go well with our proposed API plans

    • I don’t like that there’s so much math in the plotting fucitons. I’d like to keep the plotting code and the data processing code separated. So we’ll make steps toward that in the future.

    • Currently can use either plotly or matplotlib based on a kwarg. Both have merits. JC recommended generalizing this to having a plotting function that could push to multiple backends. DSwenson recommended polyviews (sp?). But I’m not sure that the cost/benefit would work out for us there.

      • JW – It’s a lot of work to make/maintain toolkitregistry/toolkitwrapper style interfaces.

    •  

 

 

  • JW – Would be good to start setting time constraints. DN and I will be meeting with our protein FF people to know when we need to run protein ligand calcs.

  • RG – Everything yellow in my diagram is targeted for august. We need to make our release then.

  • DD – When does CCavender need the PL stuff running?

    • JW – DN and I are meeting with CC tomorrow, and we’ll try to come to the meeting next week with a rough deadline for the OpenFF user stories

  • DN – It will be worth understanding deliverables and scope. For OpenFE it is absolutely necessary to have this running in august. So it’d be good to know what are the deadlines for the OpenFE components.

    • DD – I see that there are different timescales, and that OpenFE needs to have stuff out in August. OpenFF needs this on Chapin’s deadline. Choderalab and associates/the new center need this on a different deadline (roughly a month). So I need these stakeholders to tell me what their deadlines are so we can reason about this. But it’s clear that we’re benefitting greatly from having involvement from the different developer communities here.

    • JW – I think there are two kinds of deadlines - “MVP” deadlines and “final” deadlines. Sometimes we just need “something that runs today and there are no downsides to radically changing it later”, but sometimes we need “a final decision about a core component that will get cemented into place and we can’t really change it later.” So next week I’ll try to identify which of OpenFF’s deadlines are which kind

  •  

  •  

Action items

Decisions