2022-09-27 Protein-ligand benchmarks meeting notes

Participants

  • @Diego Nolasco (Deactivated)

  • @David W.H. Swenson

  • @Irfan Alibay

  • @Iván Pulido

  • Jenke Scheen

  • @John Chodera

  • @Mike Henry

  • @Jeffrey Wagner

Discussion topics

Item

Notes

Item

Notes

General updates

  • JW – OpenFF is running a series of “follow up workshops” to our annual meeting - First one is in ~5 hours and focuses on advanced toolkit usage to simulate a protein with a PTM. Second one is in ~24 hours and covers usage of Interchange. No RSVP is needed - See this page for details.

@David W.H. Swenson

  • Protein serialization update (gufe#45):

  • DS – I’m just watching CI run on the final set of changes. I’ll merge once that this passes.

@Mike Henry

  • MH – Working on AmberTools on M1 macs. Have a local M1 mac mini that we may be able to set up for CI runs and share with neighboring projects. Some details of virtualization/build envs are yet to be resolved. Probably going to be container-based.

  • IA – We’ve been getting M1 CI on cirrus - Maybe you could use this as well. It’s “free”, but it seems like they may get overwhelmed.

    • MH – I’ll contact you offline

@Iván Pulido

Nonequilibrium Cycling Protocol (perses#1066) update:

  • Component objects attributes for proteins, small molecules and solvent.

  • IP – Big challenge right now is changing relative FEP object setup in perses2 to be built out of objects instead of files. This works now - I ran a series of calcs with it and that’s merged. What we’re missing now is to plug this in with GUFE objects. So I need to write code to extract OpenMM topologies, position, other things from GUFE and insert them into the protocol. Once that’s done, DD can start playing with it, but I anticipate needing to write more tests while he does that.

    • JW – Could you contact DD once this is ready for him to try out?

    • IP – Will do once it’s ready, I have a scheduled meeting today but he’s probably offline

    • JC – Let’s discuss at our Thursday meeting if we need to catch his ear.

  • DS – For protein components, getting topologies and positions should be a one-liner

    • IP – Thanks!

@Mike Henry

  • ProtocolSettings taxonomy update (gufe#37) update:

  • MH – Really close to wrapping this up. There’s something going on unit-wise. Related to having a nested hierarchy with some quantities at lower levels (like json). There’s some trickiness where it sometimes raises an error during (de)serialization and I haven’t found a root cause.

    • JW – OFFTK has an unsafe behavior where all items in a serialized object attempt to be turned into quantities. This is really dangerous and may be your root cause.

    • MT – I don’t think it’s OFFTK - This is more related to our implementation in pydantic, and I think the right place to discuss is in openff-models. I think we could benefit from separating a strict vs. non-strict option for quantity serialization.

Melissa Boby/ @Iván Pulido

  • protein-ligand-benchmark - 0.3.0 update

  • IP – Last meeting, we settled on me running one iteration of each edge to ensure that inputs are loadable/runnable. I tried this and commented that there are 3 targets where running fails. IA also saw this issue. There are some where we’re going to run more iterations, but it’s not clear whether I’m doing that or OpenFE is.

  • IA – On the status of files, I’m just finalizing some fixes. The issues we’ve found are related to residue naming/composition. I’m travelling tonight but may be able to fix this on the plane. I’d also like to coordiate with IP to conserve compute. So IP will run with perses and we’ll run with OpenFE stack. We can chat later in the week on this.

  • JW – Cool, I recall RG saying that the results from this repreparation is comparable to the previous state.

    • IA – I’ll have results ready for our board next Tuesday, may be able to show that here as well.

  • (after meeting)

    • IA – Two things are shaping up - Ligand network definition and atom mapping

    • IP – For each target, we have edges.yaml, but this isn’t synced with the reprepared ligands files. For testing I’m filteirng out the errors caused by this (just running the stuff that works out of the box)

    • IA – What I’d propose is that we ditch edges.yaml completely… Does perses have away to make a network?

    • IP – No

    • IA – OpenFE has a stopgap network maker that I can run on this in the meantime. Any preference on star map vs. minimum spanning graph?

      • IP – No strong opinion, people like DHahn might have a better answer.

      • IA – I’ll assume we want to do a minimum spanning graph.

      • IP – IIRC DHahn basically extracted these manually from papers.

    • IA – If I run an edges2.yaml formatted as (ligand a, ligand b, atom mapping), would that work? So like the minimal infomation for perses to know the atom correspondence.

    • IP – Perses basically needs to know (ligand a, ligand b) and will try to make its own atom mapping. There’s some stochasticity in the atom map that’s created. It’s also possible to provide a user-defined atom map, but it’s a little hacky so I’d need to help with it.

  • IA – Which FF?

    • IP – AMBER with TIP3P standard for protein+solvent, Openff-2.0.0 for small molecules.

    • IA – This is the default in the CLI yaml?

    • IP – Yes

    • IA – HMR? (2fs vs. 4fs timesteps?)

      • IP – 4fs timesteps

    • IA – Could you send me your settings like n_steps_between_exchanges, other equil info…?

    • IP – We’re using the CLI yaml defaults which should cover a lot of this. I tried upping the number of equil iterations to be nonzero but didn’t see a change in results

      • IA – TYK2 is a really easy system, so it might not capture the importance of equilibration.

  • IA – What’s our big picture goal here?

    • IP – JC requested that I run everything from the repo. I anticipated that this would take ~a week. I think it may be better to base the amount of testing based on what the hard deadline requires.

    • IA – There’s probably a middle ground here, where we run a full set of simulations from Perses. So I’ll send over the sampling settings I’m using and we can sync up on that. I’d like to think about some way to keep from swamping JC’s cluster. What we probably need to do is two things:

      • Both in OpenFE and Perses, try running one iteration of everything once the dataset is fixed and ensure that it doesn’t crash.

      • Perses should do at least one replica of a majority of the dataset. Looks like the whole dataset is 136 transformations.

        • IP – Could run some edges for all of them, like 10 edges per target.

        • IA – It may be better to run ALL edges for a SUBSET of the targets, so that there’s a full set of information.

        • IP – That may make more sense - We know that things CAN run from the 1-iteration tests. So the next thing to test could be all transformations for a few targets.

        • IA – And actually, we may have the files ready for this - The ones that are having loading errors will probably be changing in the future as MBoby further refines the dataset/prep workflow. So the ones that DO pass the one-iteration tests right now are more likely to be stable.

        • IP – Could you send my which targets those are?

        • IA – Yes, I’ll send you the inputs (targets, ligands, edges, mappings, ffs (will use yaml defaults), sampling settings), that we choose to run with OpenFE software, so that you can also run them with Perses. Still getting NaNs from some targets and I have to debug a bit more

        • IP – I was getting NaNs for a lot of transformations when I started working on Perses. It ended up being related to minimization/equil. I found that the FIRE minimizer from OpenMM solved it

          • IA – We’re using the FIRE minimizer, I think based on the same information as you.

      • IP – Perses can do BOTH noneq and repex. We’re talking about repex here, right?

        • IA – Yes, let’s both do repex. Then this will be a good benchmark once Dotson starts trying noneq.

        •  

 

 

 

 

Action items

Decisions