/
2022-07-26 Protein-ligand benchmarks meeting notes

2022-07-26 Protein-ligand benchmarks meeting notes

Participants

  • @Irfan Alibay

  • Ben Ries

  • @John Chodera

  • @David Dotson

  • @David W.H. Swenson

  • @Jeffrey Wagner

  • Levi Naden

  • @Iván Pulido

  • @Lorenzo D'Amore

  • @Richard Gowers

Goals

  • DD : fah-alchemy - current board status

    • fah-alchemy : Phase 1 - MVP

    • 5 weeks out from 9/1 deadline for biopolymer benchmarking

    • currently focused on resolving approach to tokenization and serialization used within gufe via #39 and #36

  • DD : request rework of Milestones and Deadlines with DN to reflect current state, forward milestones

  • MH : ProtocolSettings taxonomy update

  • DD : protein-ligand-benchmark - 0.3.0 update

  • JC : can we add two more large systems to the free energy benchmarks?

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

fah-alchemy - current board status

@David Dotson

  • fah-alchemy : Phase 1 - MVP

  • 5 weeks out from 9/1 deadline for biopolymer benchmarking

  • currently focused on resolving approach to tokenization and serialization used within gufe via #39 and #36.

    • Been working with DS and RG on this. Have a working session after this call to iron out some tough details. Requires a balance between completeness and size. Also will become a template for how we serialize DAGs in general.

  •  

request rework of Milestones and Deadlines with DN to reflect current state, forward milestones

@David Dotson

  • DD – The current project page is out of date. Wanted to get approval before I start modifying the page.

    • RG + JW + JC – Approved

    • JW – One way that I’ve done this before is to make the desired edits with red font/red highlights. Then when the approvers are gathered next, go through them item by item and quickly approve them.

ProtocolSettings taxonomy update

@Mike Henry

  • MH – Made a PR for protocol settings, currently hashing out what belongs in it and what doesn’t belong in it. My starting point was the SMIRNOFF spec since that’s so extensive. Wondering if it’s too complex because it allows for thing slike non-harmonic bonds.

  • MH – Also looking at typing/serialization/unit handling. So I want to make sure that we are safely handling things like cutoffs from different ecosystems where we want to unambiguously record the units.

  • JC – looks like you’re getting down to the level of bonds; is this something that Interchange does?

    • aware that for e.g. GAFF functional form is fully specified

      • but electrostatics is separate

    •  

  • MH – If a user brings in their own FF file, we need to associate it with a functional form. For example, we need to handle the factor of 1/2 in amber bond constants.

  • IA (chat) – I think where I'm blocked on this a little bit (despite yesterday's conversation where I was convinced this made sense) is conceptualising where this "copy" of the force field information is being used. If we're feeding these xml/itp/frcmod/etc... files getting the data out, storing it, and then passing these files on to some parameterisation step, what are we getting out of the extracted settings?

    I know ben mentioned gromos (which I have no idea about), but essentially is that data we can extract at the parameterisation step?

  • JC – Is this a parallel way of specifying a FF that runs would be redundant with the SMIRNOFF spec? I’m worried that this could turn into the N+1 format problems.

    • MH – I have it set up to read this info if an OFFXML is passed in. So if the FF format basically says that this is going to be a harmonic angle, then I’m good with that. But if the information comes from an ambiguous source, I need the user to tell me what the functional form is. Maybe this is overkill right now?

    • JC – This could make conflict - If you have an AMBER FF, but then you change the bonds to morse, then what do you do with the information? So it becomes complex to think about where different info would get stored/how it would get trransformed..

    • MH – They’d need to tell us it’s a morse potential at loading-time

    • JC – I guess this requires us to decide what we want this product to do - Should it be a flexible, general tool for users? Or should it just meet our immediate project requirements?

    • MH – I’d initially just hoped to cover OpenFF/OMM XML. But it seems like we’ll need to prepare for other FFs as well. So the important thing to do now is to decide the spec/scope for the current work.

    • JC – Can you provide a concrete example of what we’ll need to support (like, what would users load in and what would they expect to export it to?)

    • MH – It depends on when the “system” object gets created in the protocol. So I want the imporant FF stuff to be in there somewhere, but I’m not sure where the transformation will happen. It’s not clear where in the process the user would bring in their own FF/other inputs.

    • JC – Would this be a situation where we store some information that we later feed into an outside typing/parameter assignment engine?

    • MH – …. There’s no way to tell whether the FF that someone gave us will successfully apply to their input.

    • DD – One idea we’ve kicked around is, because a protocol has its settings, it has this info hanging off it, and so it would make sense to have a validator that ensures that the inputs (FF+input structure) make sense.

    • JC – So the object that represents these “level 1” settings would know how/whether it could apply to inputs? So the protocol would know which FF would be used on solvents, which FF on small molecules, which one on proteins/receptors, etc. OMMForcefields can handle some amount of flexibility WRT applying different FFs to different components with some customization.

    • MH – The protocol will have that info for each edge.

    • JC – So the FF application machinery will be indepdently implemented in each protocol?

    • DD – Can probably share a base implementation for the most part.

    • JC – Does this belong in protocolsettings? To keep things local, you’d think that you’d want a separate object. …

    • DD – Right now we’re looking at making a pydantic class that does some basic validation, but it doesn’t validate the chemistry/physics. So an edge has a protocol, the protocol has protocolsettings, and that does basic validation.

    • JC – The idea that we’ll make a universal FF wrapper seems like it would be a useful object outside of its attachment to a protocol.

    • MH – Then we also need to figure out how Interchange fits into this, since I think we’re talking about something that largely overlaps with its jurisdiction. Because it seems like our shortest path will be contributing to- or wrapping- interchange

    • RG – Is Interchange limited to OFFXML?

    • JC – No. Interchange is a general container for parameterized systems.

    • JC – It seems like this functionality we’re describing could be handled by a series of docker containers. Right now we’re trying to discuss solutions to this using a bunch of new classes that hold external FFs rather than letting the native tools handle those FFs.

    • RG – I think JC’s right that we’re trying to capture the entire FF, and we’re not doing a great job. There’s two parts: Capturing the entire the FF, and applying the FF, and both of those are hard. So I think this is a bridge too far.

    • JC – I think we want a nice little API that insulates us from the future. For this case we just need to support OpenFF force fields, which is just a few lines. We can also support GAFF and many of the AMBER FFs using OpenMMForceFields. So that scope should get us to the MVP. Then we can support further things later.

  • JW – what would be the downside of having standard-named Docker containers?

    • RG – our partners wouldn’t be able to run Docker in their environments

    • DD – It would be very complicated/backwards devops-wise to have our stack DEPEND on docker containers.

  • MH – I’ll need to chat with OpenFE to discuss in more detail how we want the information flow to work.

protein-ligand-benchmark - 0.3.0 update

@David Dotson

  • IA – Had a good meeting with JC and MB last week, and had a good discussion about how to fix things up. It turns out that there were historical issues in how these proteins were set up. Eg, BACE was set up at pH 7 while the assay was run at pH 4(?). Another one had different sequences in its experiment and model (human vs. rat). MB has done great work on this so far.

    • for BACE and PDE10, will take some more work vs. other systems; should we just drop these for now since they will take much longer (long tail of hard systems).

    • my view is we should proceed with what works, then try to add these hard ones back in 0.4.0

    • JC – I think that going for the simplest thing that runs is a good idea. But we should record why we dropped what we dropped.

    • IA – will open up some issues to encode this, and th add them to 0.4.0 milestone

    • JW – I think some of the conflict around this is we’re making big changes; for OpenFF what we need is something that works first of all. So yes, I’ve come to the mentality that we should proceed with the minimum of what we can actually use.

      • can also opt to use e.g. -rc versioning, or even make no release at all

  • JC – So you want to make it a pre-release?

  • JW – I think that will be a useful compromise to reach for if people are concerned about the quality of the dataset. I’m also fine with this just being a full release too.

  • DD – I think we shouldn’t kick the can and instead just make a full release. Otherwise we’ll always be short of perfection and we’ll never make a full release.

  • JC – WE’re already short of a 1.0 release, and so we already are signalling that this isn’t super final and great.

    • DD – Agree. We have room to make a big statement of suitability when we make the 1.0 release, and if that needs more major evolutions then we can make a 2.0 etc

    • JC – Agree

    • JW – Agree

    •  

  •  

Adding two systems to the free energy benchmark

@John Chodera

  • JC – There are two 150-300 compound targets that we’ve compiled in conjunction with Relay. This was with the goal of having large compound series to simulate optimization pathways, the data seems high-quality and I’d recommend we ingest these. This doesn’t need to be ingested immediately but I’d like to know whether these would be of future interest. Would folks be interested in expanding the set with more highly curated data?

    • DD – Do these sets meet the “blue criteria” in the benchmarking paper?

      • JC – Yes. The one thing is that one of the targets has two phosphorylation sites. Also we haven’t run these through spruce. Targets are JAK2 and CDK6.

      • DD – Great, then I think these are suitable for the benchmarking set, but they should be tagged for a release project board after 0.3.0

    • IA – Seeing the difficulty that MB had collecting the assay data, I’d like to check that this is super clear already.

      • JC – We have CHEMBL IDs for all the ligands, which can be used to trace them back to the assays

    • RG – With datasets there was an idea to have them not live in the repo, but rather have them live somewhere else and be accessed at runtime. This could be something to reach consensus on before we commit to adding targets. See Issue #18.

      • DD – This should be alleviated by the removal of git LFS and the use of just simple inputs (pdb) instead of bulky inputs.

      • RG – How will the targets be versioned separately from the entire dataset?

      • DD – I don’t think we have the bandwidth to handle multiple datasets across multiple locations.

      • MH – We could take detailed notes in the changelog.

      • JC – I think the changelog is our best bet right now. In the future we

    • JW – one place where we run into versioning trouble is openff-forcefields. Started with version of the repo is the version of the FF. But then this became awkward when we added bugfixes to old FF lines. So I think having subversions (perhaps on a per-target basis?) could be done, with the repo version being separate.

      • JC – could having a different versioning scheme on a per-target basis help as well (e.g. alphabetical characters for target versions, numerical or semver for repo)?

      • JW – version numbers of the entire repo would solve the problem of reproducibility. If a user said “I ran JAK2 from version X of the repo” then we can track down the exact inputs, even if they have a different target version. So this is safe. BUT it gets complicated if a user wants to see whether their individual target has been updated in a new version. So we’d need to have good changelog discipline so people can easily see if their target of interest has changed.

      •  

    • DD – Would people need individual target versions, or would they run the entire repo as a whole?

      • JC – Both are valid uses.

      • IP – For debugging, it will be useful to run just a few systems rather than everything.

    • JC – I think we have decided to

      • a single changelog for the repo, with use of consistent directory names for referencing targets

      • DD – will create issue; add to 0.3.0 milestone

Action items

Decisions

Related pages