2022-03-29 Protein-ligand benchmarks meeting notes

Participants

  • @David Dotson

  • @Jeffry Setiadi

  • @Jeffrey Wagner

  • @Richard Gowers

  • @Diego Nolasco (Deactivated)

  • @John Chodera

  • Irfan Alibay

  • David Swenson

Goals

  • RG : introduce gufe; shared data model approach

  • JC : merits of using openff.toolkit.topology.Topology for ChemicalState

  • JC : consider alternative names for gufe

  • DD : Questions for JC - FAH project onboarding process

  • DN : Plans for advancing to having a backlog? Decision on including/excluding user stories?

Discussion topics

Item

Notes

Item

Notes

General updates

  • DD is offline next Tuesday - No F@H meeting (or at least no decisions)

    • Next week’s meeting is cancelled (DD + RG + JC + JW approve)

introduce gufe; shared data model approach



  • RG – DD and I got together over the past week and realized that we have a lot of shared objects in common. Namely, objects that represent chemistry, so we should be able to share maintainance of those objects. So we were thinking of having a standalone repo for data representations.

  • RG – Looking at “stable release” in a few weeks. Goal is for this to be a package that DD can work with during early design phases.

  • RG – not sure about sharing edges yet, even if nodes are shared between openfe and fah-alchemy; will become clearer as edges geared toward fah-alchemy are created as to whether these have shared use

  •  

  • JW – is it accurate to call this an interface?

    • RG – this is a data models package, first and foremost

    • JC – see this as a definition for the inputs to a calculation, and perhaps the outputs, but not the compute components that implement e.g. protocols

      • an aspirational goal would be to have dummy mockups of inputs mapping to outputs; basically empty function bodies in the middle

    • JW – Can this be seen as a SUBSET of the final interface?

    • JC – Maybe, will require further discussion. But this could certainly follow a plugin/base class implementation.

  • DD – I view this as basically the running design of components for the INPUTS. The design of the results objects isn’t being finalized yet.

merits of using openff.toolkit.topology.Topology for ChemicalState

  •  

    • JC – There’s a need for a tagging system that identifies “this is a ligand” or “this protein needs to be mutated”. The OFF Molecule handles names and other metadata in conversions to/from RD and OE mols. Some hesitation about loading from PDB.

      • Would like to take advantage of solved work in openff.toolkit on sharp edges in e.g. RDKit conversions

    • DD – Big issues seemed to be hashability and mutability. The OFF Toolkit’s chemical representations may be too heavy for OFE’s needs. But the information content of these two things are fundamentally different.

    • DD –

    • JC – OFE component hashability = ?

      • RG – Order, Element, formal charge, connectivity, NOT positions

    • DS – Equality needs to be “correct”. Needs

    • JC – OFFMol could support multiple definitions of equality.

    • JC – Definition of “Ligand” - Could be a small mol, or a peptide, or…

      • DS – We discussed this yesterday and will make things more explicit.

    • JC – Atom order is important, will be needed to understand how to interpret coordinates. Eg, crystal waters

    • RG – one thing we do different to the openff Topology is that we don’t have explicit solvent

      • A protein component could contain crystal waters; not just the amino acid chain

      • JC – the biological unit is the more operative term here, which often needs to be generated from a PDB entry; e.g. a dimer

    • JW – You’re trying to understand your requirements, and so a couple strategies

      • an interface strategy, where requirements can change and you can swap out underlying implementations

        • at some points requirements will solidify, and you can perhaps simplify, though this wouldn’t be easy

      • direct use of say an openff Molecule

  • RG – we have basically subclassed openff Molecule but removed the things we don’t want/need it to do

    • this is valuable because it simplifies our surface, avoids behaviors we don’t want to support

    • we don’t have vsites, for example, which are a FF-support feature; out of context for this data structure

  • DD – I’m for the composition / interface approach

  • JW – I’m fine with the interface approach

    • would like to push for openff Molecule usage where possible

  • RG – I think we can put an openff Molecule in there; wasn’t sure if the serialization would change, however, so that made me nervous

    • need to know if that’s stable on six month timescale

  • JC – think there have been more broken RDKit packages, but point well made

    • if there are things you think you need, can you pass these on to the openff developers? These will help improve the interoperability of the surrounding stack

    • RG – we don’t think equality is wrong, but just differen’t; didn’t feel it was an arguable case toward a change in openff toolkit

    • JW – I think this repo can be a useful point for determining what’s possible for movement in openff toolkit

      • some things may make a lot of sense making their way into the toolkit

    • JC – happy to move forward with this approach if use-cases can be made to work with them

      • if there are no problems, then proceed

  • DD – Proposal: Let’s proceed with this composition approach (as implemented in GUFE) for chemical components.

    • JC – I think that there will be issues on this route, but thsoe issues will illustrate actual needs

    • (All approvers +driver agree)

consider alternative names for gufe

  • JC – Interested to know about JW’s decision to refactor openff packages into a common namespace. I’d recommended doing the same thing in GUFE issues/6.

  • JW – This makes it easier for users to find packages

    • namespace approach is cosmetic, but works as implicit documentation; indicates the bounds of our ecosystem, what’s intended to work well together vs. what isn’t necessarily

    • if you’re not married to any names, now, recommend doing it

    • however, not clear to me how many users would be turned away by not having it

  • RG – understand pov on namespace package; do feel however that namespace packages are counterintuitive, and in particular if you type import openff nothing comes in by design

    • we’re also trying to go for a strategy where everything works out of the box, but want to go for a batteries-included approach

      • we fail at import stubs, e.g. openfe.gromacs if dependencies not present

  • JC – have you considered metapackages?

    • could make e.g. conda install openfe give everything we need

  • DS – We’ve had this disucssion before in OpenFE and considered these ideas. We’re happy to hear if there are new reasons to change our decision but I did propose the metapackage idea before.

  • RG – there is a concept of branding here, do see the value

    • I don’t want to go for an approach with openfe- because it appears exclusionary to new contributors that are not affiliated with the org

    • prefer it to look like an undifferentiated zoo so that it is actually more inviting

  • JW – agree with your approach of a more disparate constellation of software intentionally NOT branded together; appreciate that you’ve given it thought

  • JC – is gufe globally unique?

    • RG – yes, but will double check

      • also important to note that openfe is intended as our user surface, not this

Questions for JC - FAH project onboarding process

  • JC – Best way to do this is to have a deep dive with whoever is doi. They’ll need input files in a particular format (there are example scripts for OpenMM, for GROMACS you’ll need to get in touch with VVoelz group)

  • Have to get onboarded via Bowman group

    • would spin up a WS, likely in Chodera Lab environment

    • would need input files for benchmarking and testing; can get in touch with Voelz group for gromacs

    • Operative concepts are PROJECT, RUN, CLONE, GEN

      • PROJECT creation is automatable

        • create a directory on a work server; create an XML file, restart work server process; RESTful API for this

        • do also need to set constraints on assignment server; believe there is a RESTful API for this

        • there is a roll-out process, can be worked-around, or we create our own approach

  • DD – Is it possible to remove all human touchpoints form this?

    • JC – Yes. You’d place files in a particular path, generate an XML file describing the contents of the project. Then there’s a REST API to talk to in controlling the project. For manual projects there’s a testing/rollout phase at small scale to ensure that it doesn’t crash lots of volunteer computers. But once the automation has had some trust/experience in production we can probably skip those.

    • DD – So creation of a project doesn’t have human touchpoints?

      • JC – We have to go to the assignmentserver tos et the constraints fo that project. There is (I think) a RESTFUL API for this but we can pre-create a bunch of these that we can populate over time. Just need an estimate of how many projects we need to make.

    • DD – For a star graph with RBFE, how many projects is that?

      • JC – Two projects - One for the ligand+solvent system, one for the protein-ligand complexes. Though there is a way around that to get it down to one.

      • DD – So, it’s system size that draws the boundaries between projects/limits project size?

      • JC – Yes. Basically I imagine there being “buckets” for groups of atoms…

      • DD – Good, that will be the usual pattern.

Plans for advancing to having a backlog? Decision on including/excluding user stories?

  • DN – Let’s discuss in next meeting

Action items

@David Dotson will continue with PRs against openfe/gufe, defining Transformation (edge) structure and up through AlchemicalNetwork

Decisions