2022-03-29 Protein-ligand benchmarks meeting notes

Participants

@David Dotson
@Jeffry Setiadi
@Jeffrey Wagner
@Richard Gowers
@Diego Nolasco (Deactivated)
@John Chodera
Irfan Alibay
David Swenson

Goals

RG : introduce gufe; shared data model approach
JC : merits of using openff.toolkit.topology.Topology for ChemicalState
JC : consider alternative names for gufe
DD : Questions for JC - FAH project onboarding process
DN : Plans for advancing to having a backlog? Decision on including/excluding user stories?

Discussion topics

Item	Notes

Item	Notes
General updates	DD is offline next Tuesday - No F@H meeting (or at least no decisions) Next week’s meeting is cancelled (DD + RG + JC + JW approve)
introduce `gufe`; shared data model approach	RG – DD and I got together over the past week and realized that we have a lot of shared objects in common. Namely, objects that represent chemistry, so we should be able to share maintainance of those objects. So we were thinking of having a standalone repo for data representations. RG – Looking at “stable release” in a few weeks. Goal is for this to be a package that DD can work with during early design phases. RG – not sure about sharing edges yet, even if nodes are shared between `openfe` and `fah-alchemy`; will become clearer as edges geared toward `fah-alchemy` are created as to whether these have shared use JW – is it accurate to call this an interface? RG – this is a data models package, first and foremost JC – see this as a definition for the inputs to a calculation, and perhaps the outputs, but not the compute components that implement e.g. protocols an aspirational goal would be to have dummy mockups of inputs mapping to outputs; basically empty function bodies in the middle JW – Can this be seen as a SUBSET of the final interface? JC – Maybe, will require further discussion. But this could certainly follow a plugin/base class implementation. DD – I view this as basically the running design of components for the INPUTS. The design of the results objects isn’t being finalized yet.
merits of using `openff.toolkit.topology.Topology` for `ChemicalState`	JC – There’s a need for a tagging system that identifies “this is a ligand” or “this protein needs to be mutated”. The OFF Molecule handles names and other metadata in conversions to/from RD and OE mols. Some hesitation about loading from PDB. Would like to take advantage of solved work in `openff.toolkit` on sharp edges in e.g. RDKit conversions DD – Big issues seemed to be hashability and mutability. The OFF Toolkit’s chemical representations may be too heavy for OFE’s needs. But the information content of these two things are fundamentally different. DD – JC – OFE component hashability = ? RG – Order, Element, formal charge, connectivity, NOT positions DS – Equality needs to be “correct”. Needs JC – OFFMol could support multiple definitions of equality. … JC – Definition of “Ligand” - Could be a small mol, or a peptide, or… DS – We discussed this yesterday and will make things more explicit. JC – Atom order is important, will be needed to understand how to interpret coordinates. Eg, crystal waters RG – one thing we do different to the `openff` `Topology` is that we don’t have explicit solvent A protein component could contain crystal waters; not just the amino acid chain JC – the biological unit is the more operative term here, which often needs to be generated from a PDB entry; e.g. a dimer JW – You’re trying to understand your requirements, and so a couple strategies an interface strategy, where requirements can change and you can swap out underlying implementations at some points requirements will solidify, and you can perhaps simplify, though this wouldn’t be easy direct use of say an `openff` `Molecule` RG – we have basically subclassed `openff` Molecule but removed the things we don’t want/need it to do this is valuable because it simplifies our surface, avoids behaviors we don’t want to support we don’t have vsites, for example, which are a FF-support feature; out of context for this data structure DD – I’m for the composition / interface approach JW – I’m fine with the interface approach would like to push for `openff` `Molecule` usage where possible RG – I think we can put an `openff` `Molecule` in there; wasn’t sure if the serialization would change, however, so that made me nervous need to know if that’s stable on six month timescale JC – think there have been more broken RDKit packages, but point well made if there are things you think you need, can you pass these on to the `openff` developers? These will help improve the interoperability of the surrounding stack RG – we don’t think equality is wrong, but just differen’t; didn’t feel it was an arguable case toward a change in `openff` toolkit JW – I think this repo can be a useful point for determining what’s possible for movement in `openff` toolkit some things may make a lot of sense making their way into the `toolkit` JC – happy to move forward with this approach if use-cases can be made to work with them if there are no problems, then proceed DD – Proposal: Let’s proceed with this composition approach (as implemented in GUFE) for chemical components. JC – I think that there will be issues on this route, but thsoe issues will illustrate actual needs (All approvers +driver agree)
consider alternative names for `gufe`	JC – Interested to know about JW’s decision to refactor openff packages into a common namespace. I’d recommended doing the same thing in GUFE issues/6. JW – This makes it easier for users to find packages namespace approach is cosmetic, but works as implicit documentation; indicates the bounds of our ecosystem, what’s intended to work well together vs. what isn’t necessarily if you’re not married to any names, now, recommend doing it however, not clear to me how many users would be turned away by not having it RG – understand pov on namespace package; do feel however that namespace packages are counterintuitive, and in particular if you type `import openff` nothing comes in by design we’re also trying to go for a strategy where everything works out of the box, but want to go for a batteries-included approach we fail at import stubs, e.g. `openfe.gromacs` if dependencies not present JC – have you considered metapackages? could make e.g. `conda install openfe` give everything we need DS – We’ve had this disucssion before in OpenFE and considered these ideas. We’re happy to hear if there are new reasons to change our decision but I did propose the metapackage idea before. RG – there is a concept of branding here, do see the value I don’t want to go for an approach with `openfe-` because it appears exclusionary to new contributors that are not affiliated with the org prefer it to look like an undifferentiated zoo so that it is actually more inviting JW – agree with your approach of a more disparate constellation of software intentionally NOT branded together; appreciate that you’ve given it thought JC – is `gufe` globally unique? RG – yes, but will double check also important to note that `openfe` is intended as our user surface, not this
Questions for JC - FAH project onboarding process	JC – Best way to do this is to have a deep dive with whoever is doi. They’ll need input files in a particular format (there are example scripts for OpenMM, for GROMACS you’ll need to get in touch with VVoelz group) Have to get onboarded via Bowman group would spin up a WS, likely in Chodera Lab environment would need input files for benchmarking and testing; can get in touch with Voelz group for gromacs Operative concepts are PROJECT, RUN, CLONE, GEN PROJECT creation is automatable create a directory on a work server; create an XML file, restart work server process; RESTful API for this do also need to set constraints on assignment server; believe there is a RESTful API for this there is a roll-out process, can be worked-around, or we create our own approach DD – Is it possible to remove all human touchpoints form this? JC – Yes. You’d place files in a particular path, generate an XML file describing the contents of the project. Then there’s a REST API to talk to in controlling the project. For manual projects there’s a testing/rollout phase at small scale to ensure that it doesn’t crash lots of volunteer computers. But once the automation has had some trust/experience in production we can probably skip those. DD – So creation of a project doesn’t have human touchpoints? JC – We have to go to the assignmentserver tos et the constraints fo that project. There is (I think) a RESTFUL API for this but we can pre-create a bunch of these that we can populate over time. Just need an estimate of how many projects we need to make. DD – For a star graph with RBFE, how many projects is that? JC – Two projects - One for the ligand+solvent system, one for the protein-ligand complexes. Though there is a way around that to get it down to one. DD – So, it’s system size that draws the boundaries between projects/limits project size? JC – Yes. Basically I imagine there being “buckets” for groups of atoms… DD – Good, that will be the usual pattern.
Plans for advancing to having a backlog? Decision on including/excluding user stories?	DN – Let’s discuss in next meeting

Action items

@David Dotson will continue with PRs against openfe/gufe, defining Transformation (edge) structure and up through AlchemicalNetwork

Meetings

2022-03-29 Protein-ligand benchmarks meeting notes

Participants

Goals

Discussion topics

Action items

Decisions