2021-12-08 BespokeFit meeting notes

Participants

  • @David Mobley

  • @Daniel Cole

  • @Pavan Behara

  • @Joshua Horton

  • Mary Pitman

  • @Jessica Maat (Deactivated)

  • @Mateusz Bieniek

  • @Jeffrey Wagner

  • @Simon Boothroyd

Discussion topics

Item

Notes

Item

Notes

FE calcs

  • DC – JH has been fitting bespoke FFs for an example set of mols from TYK2 from JACS set. Interested in benchmarking the performance of these parameters for PL FEs using same methods as Gapsys and Hahn. SB pointed us to a repo that has much of the data from that paper. So MB in my group is looking into using that repo and installing pmx. Looks like there were no problems installing, but there are a few things missing from the repo that present rerunning.

  • MB – Right, we’ve installed pmx and we’re interested in using it. However there’s a specific protocol that they’re using - The script to produce hybrids works, but I didn’t find any higher-level script to do the 50ps simulation+switching ligands.

  • DC – Right, we’re interested in getting the script to automate at that level

  • MB – Would be really handy to have a tutorial, since that would be a lot more straightforward than trying to put it together myself.

  • DC – Commented similar thoughts on the Sage manuscript - It would be really helpful to circulate open workflows.

  • DM (in chat)

    • At the high level with PMX there are two main types of things one might want to do I think, which are rather different:

      • Prepare some NEW system for simulation and do free energy calculations

      • Do free energy calculations with an already prepared system like those in PLBenchmarks

    • (1) requires a lot lot lot more… Mary has been doing some in that area.

  • MP –

  • DM – OpenFF was explicitly not supposed to do FE calcs, so DH deliberately avoided trying to do feature developement in PMX

  • MP – I used a subset of the pmx package, called “pmx workflow scripts“, which may automate the steps you need. I can share this with you, and I can also share a powerpoint on it. I have some scripts that can make the folder structure and name files correctly. One big thing is that the ligand and protein have to be input separately.

  • DC – I think we have the file/folder structure handled.

  • MP – I didn’t see anything to automatically calculate the edges. You have to start out with the ligands and proteins….

  • DC – If I recall RFEs correctly, you have to start with the endpoints, and then …

  • MP – So it does need to do some amount of simulaiton, and then periodicially write out/process the trajectory. So it can run on a supercomputer, and process the supercomputer and submit next steps (?) automatically

  • MB – I didn’t know about the PMX workflow scripts, that would simplify things a lot for me.

  • MP – Yunhui Ge in the Mobley lab has a GitHub where he has these scripts, I’ll share that.

  • DM – This is all kinda getting passed around from DHahn, where it originated. One key question is “who is going to do the real work to make this something more official?” Right now these tools seem to get passed around a lot and there isn’t a single responsible owner. The De Groot group doesn’t seem to be in this role/they’re satisfied with the current functionality.

  • DC – This would be good to keep in mind as MB works through it - Basically, figuring out where it can be improved.

  • MB – I can keep track of that - Basically where the automation could be improved or tested.

  • DM – Dangerous to do too mjuch work, since then you beceom de facto owner

  • MP – It’s a bit finnicky, there are some places where it, for examples, requires things of itp files that go beyond the basic gromacs spec. So there is a bit of cruft.

  • DC – We’re hoping to gradually move toward openmm as a more long-term future.

  • SB – Is the point of these scripts to set

  • MP – Setup as in preparing the molecules, or running the sims?

  • SB – Does it start from parameterized protein+ligand and then just figure out which sims to run, or does it start earlier, with chemical protein+ligand?

  • MP – It should be able to do the whole thing - start from ligand and protein, and go all the way through sims to get FE numbers

  • DM – One tough thing that OFF needs to do is take the same inputs and parameterize components with different FFs than in the standard workflow.

  • DM – Hm, specifically “if you wanted to chagne the parameters for the ligand in a bunch of calculations that we could set up outside PMX, could we use the PMX stuff”?

  • MP – No, when I tries to do things with a modified itp file, then pmx didn’t help me. But in the future we could probably look at the functions that pmx is calling internally and get new things into automation.

  • SB – I’d be up to try implementing the workflow where we have some minimal functionality to take a different FF and pass parameters through pmx. Would like to meet with MP and MB to determine the minimum viable product here before I start work.

  • DM – Let’s consolidate this discussion on the #binding-benchmarks channel.

    • MP – I’ll drop the link to Yunhui Ge’s scripts there as well [1]

      • (JW pasted full post at bottom of this document)

  • SB – MP and MB, let’s schedule a chat in the next few days to define this scope.

    • MP – Sounds good

    • MB – Sounds good, I’ll keep trying to use the workflow.

  • DM – MP is also working on automating ligand placement for starting sims, but that isn’t in scope for this current work.

    • DC – We’re actually interested in that, too.

    • MP – my workflow superimposes new ligands to known bound ligands and energy minimized to provide a starting point for FE calcs.

    • DC – MB has been looking at building R groups on bound ligands/scaffolds and scoring them.

    • MP – I’d figured that the software already existed, somewhat to my surprise.

  •  

Update from SB on DEXP

(SB had left by this point, will postpone full update)

  • JH – We’ve done some fits and are still benchmarking this using AbSolv.

Silicon data selection

  • DM – Matt Clark, who used to do MC-based ABFE calcs, works at reaxys now. He saw my ask for data on linkedin and got in touch. So the question isn’t just “does the data exist?” but also “can we release it under compatible terms?”. I showed some of the data on slack and got followup questions from Owen and others, which I’ll send back to them.

  • DC – I looked at the data, there were more molecules than I could look at by hand

  • JH – Data look good.

  • DC – As background - Some new drugs have silicon. But many tools have issues processing them - For example RDKit can make confs, but OE can’t. And until we have an entire toolchain that can handle silicon-containing mols, then we can’t support it. But if we start making some progress on this, then we can break into the chicken-and-egg cycle of getting those into other tools.

  • DC – The process that CRingrose showed a few weeks ago to get parameter values from QM should work fine here.

  • DM – Agree, that will work for valence terms, but not vdW. Reaxys datasets could have both hvap and density.

  • JW – So that covers attractive and repulsive interactions?

    • DM – Yes.

 

  • DM – My student, Danielle, is working through using bespokefits on hosts. Thanks for your help, Josh.

  • JH – You’re welcome. It’s been helpful to have early users.

 

[1]

 

@Mateusz Bieniek @Simon Boothroyd During out meeting, I said that I would share the link for the pmxworkflow scripts that were developed by David Hahn and that I have been running. The version that I am using is located athttps://github.com/MobleyLab/PME-RF-benchmark/tree/main/SI/analysis and is in the compressed folder “packages.zip”.More information on PLBenchmarks, for which this workflow was developed, is located at:https://github.com/openforcefield/protein-ligand-benchmark

^ Here you can find information on the folder architecture expected as both input and output for the pmxworkflow scripts. However, I have attached a pdf that shows the minimum input folders setup and filenames (highlighted) you need setup to run the workflow steps from start to finish when you build a new system. The workflow can be run with the script “run.py” whose location should be at the spot highlighted in green in the pdf.

Sample data to get started with:

The attached run.py script which will run the setup for the dataset called “2020-02-07_tyk2” created by David Hahn here: https://drive.google.com/drive/folders/1A8ncO30eaS1vE1czGl3lwaIkjXgMyc4G

This example can run with the first few steps of the workflow commented out in run.py. Importantly, this example data is already in the correct folder and filename format to run the workflow. Beyond ‘run.py’ you will need to include in the same folder, ‘targets.yml’ (attached in this thread) as seen in the attached pdf. You may have more questions when you go to run the full workflow, which I can answer here or in a later meeting.

 

Action items

Decisions