2022-09-06 Protein-ligand benchmarks meeting notes

Participants

Goals

DD : update to milestones and deadlines - seeking approval
- https://openforcefield.atlassian.net/wiki/spaces/IN/pages/1447231531/Protein-Ligand+Benchmarks+Automation+via+Folding+Home
RG : clarification on long-term custodianship of fah-alchemy
DD : fah-alchemy - current board status
- fah-alchemy : Phase 1 - MVP
- 3 weeks out from 10/1 deadline for ASAP, biopolymer benchmarking
- gufe#36 nearly ready for merge (David Dotson finishing touches, reviews received from David W.H. Swenson, Richard Gowers)
- gufe#45 is in progress; gives protein tokenization/serialization (Ben Ries, David W.H. Swenson, Richard Gowers)
- gufe#42 almost merge-able after #45 is in place (David W.H. Swenson)
- David Dotson development effort is focused on Executor (service API), Scheduler (compute), and ResultServer (storage) for FAH
  - AlchemicalNetwork storage into ResultServer::neo4j works; working on full-fidelity roundtripping
  - turning attention to compute (dask+distributed cluster) this week
DS, BR : Protein serialization update (gufe#45):
IP : Nonequilibrium Cycling Protocol (perses#1066) update:
- Error rendering macro 'github-link-macro' : Page loading failed
MH : ProtocolSettings taxonomy update (gufe#37) update:
MB : protein-ligand-benchmark - 0.3.0 update

Discussion topics

Item	Notes
DD : update to milestones and deadlines - seeking approval	https://openforcefield.atlassian.net/wiki/spaces/IN/pages/1447231531/Protein-Ligand+Benchmarks+Automation+via+Folding+Home JC – Approve JW – Approve RG – Approve
RG : clarification on long-term custodianship of `fah-alchemy`	JC – I’d figured that my lab would be the major custodian, wiht OpenFF also playing a role RG – my understanding is that because it would be constrained by what can be run through the Folding@Home cores, so it’s of limited use for us to use for developing new methods JC – Can implement custom behavior in integrator - some room for custom logic in C++, but very limited and we can’t run untrusted python code. RG – I’m fairly correct in my understanding that if someone comes in with a new method, it would be of limited use to benchmark that immediately with F@H JC – Right, you could benchmark alternate approaches for ligand preparation and network definition, but there are some changes that wouldn’t be in scope. RG – the thing about mappings is useful to point out; valuable to know what areas it could be useful for us in JC – to be clear, would only consider OpenFE’s role as a tertiary level of support; first/second would be Chodera Lab and OpenFF
DD : `fah-alchemy` - current board status	DD – PLB #52 - I haven’t seen much movement on this in the past week. Do we have any updates? IP – I’d asked MB for further processing - One partial blocker is whether we want ligands to be in separate files or joined. Also JC – think it may be better to do consolidated ligands in an SDF file JW – my big concern on that is that there really is no spec for what a multi-molecule SDF means; or at least, different tools interpret it very differently separate files keeps them explicit as different ligands https://docs.eyesopen.com/toolkits/cpp/oechemtk/oemol.html#dude-where-s-my-sd-data JC – since w’re only planning to use one copy of a ligand per set, not sure that this is a huge risk right now; in the context of this repo, it may make sense to make the statement that these are multi-molecule SDFs, not merely multi-conformer JW – should make it clear on insertion that we don’t support multi-conformer RG – what is the issue with having multiple conformers? JC – the point of us doing things in this context is that we want to provide a single conformer with our best guess as to what should be used; the understanding then is that our SDFs are multi-molecule, but not multi-conformer JW – I’m fine with this approach; if we do want to support multi-conformer later, then we’ll have to re-engineer our approach Decision - we can have multiple structures in a single SDF file, our infrastructure will assume this ALWAYS corresponds to multiple distinct molecules and NEVER more than one conformer of a molecule. JW + JC + RG – Approve 3 weeks out from 10/1 deadline for ASAP, biopolymer benchmarking `gufe`#36 nearly ready for merge (David Dotson finishing touches, reviews received from David W.H. Swenson, Richard Gowers) `gufe`#45 is in progress; gives protein tokenization/serialization (Ben Ries, David W.H. Swenson, Richard Gowers) `gufe`#42 merge-able after #45 is in place (David W.H. Swenson) David Dotson development effort is focused on Executor (service API), Scheduler (compute), and ResultServer (storage) for FAH `AlchemicalNetwork` storage into `ResultServer::neo4j` works; working on full-fidelity roundtripping turning attention to compute (dask+distributed cluster) this week
DS, BR : Protein serialization update (`gufe`#45):	BR – GUFE #45 - Need from_pdb and to_pdb via OpenMM, and to_dict and from_dict by wednesday. The features are in theory implemented and just need testing. So I anticipate that this will be done tomorrow evening in europe. JC – the protein PDB files in the repo make a great unit test, what are you looking at? BR – Yes, those are good, also want to read in PDB and write out and compare them. JC – You should be able to retrieve PDBs from the repo BR – Already did it. JW – are you able to test for correctness of bond orders, formal charges? BR – Bond orders are translations in formal charges - So if one of them is wrong then the I should get a chemistry error. JW – you said bond orders are translations of formal charges; in the at case you would expect RDKit to be loud about this. Are you applying bond orders and implying formal charges from this? JW – Now that we have our toolkit release cut, would be happy to work directly with you on a further iteration of this, but I agree that it’s important for you to reach your immediate deadline and can work with you on the next iteration in the future. BR – yes, would take you up on this after our initial implementation is in place DS – Agenda says #42 is “mergeable”, but I don’t think it is - I need to make additions to unit testing and will inform once that’s done.
IP : Nonequilibrium Cycling Protocol (`perses`#1066) update:	IP – I’ve hit a blocker - I can reproduce protein mutation cases - but when I try to reproduce examples for protein-ligand I’m hitting problems. These pathways use different API points so I need to debug the code paths for the latter (getting `NoneType` errors, need to dive in and figure out where). I think it's a perses issue and will discuss this at perses devs meeting today. already solved other issues I’ve seen with serialization, breaking changes in new OpenFF release now hitting issues that appear to be perses issues; can hit that in our working session Do think I won’t be able to complete by end of this week; will extend into next week JW – can you clarify; it’s working for protein mutations, but not protein-ligand transformations? IP – there are two different API points, and yes one of them works, but the other currently needs work DD – Anything we can do to help? IP – I’d like to check in with you after the Perses devs call (3 PM US Eastern) in case we need to coordinate further. DD – Will do, we’ll talk offline to coordinate a time.
MH : `ProtocolSettings` taxonomy update (`gufe`#37) update:	MH – I’ve spent some more dev cycles on this. Added unit support, but there are a few issues with using our base model on generics that have units. Working with MT on a unified approach to this. Also adding some convenience functions (like ensuring we don’t have negative temps in K). But basically implementing dimensionality checking (so lengths can be different units, but they all most be measurements of length). DD – It’s useful to consider this an area that we’ll continuously iterate. JW – are you getting all the Matt time you want? Really want to leverage him here MH – yes I am, but if you want to give him your blessing that may help get some more time JW – excellent, I’ll tell him to spend all the time he wants on this to free him up LN – I also have resources we can contribute from MolSSI on pydantic usage; will reach out after call to see how best to contribute
MB : `protein-ligand-benchmark` - 0.3.0 update	DD – MB unavailable for this meeting, I’ll try to get an update offline and update on GH. IP – MB is moving forward with this - I’ve gotten a few questions. Some issues with Schrodinger’s inconsistent handling of peptides. Started using some tools from Bonvin lab and has had a good experience. Some complexity around xtal waters. JC – From our end, we decided that waters and ions could be in PDB, but more complex small molecules (“spectator ligands”) would need to be explicity defined outside of a PDB file. IP – In an example I’ve been working on, the spectator ligand is in the PDB. There is probably an issue here with the source of chemical information for those ligands. JC – I unfortunately won’t have time to work with MB this week DD – I’ll reach out to MB and offer assistance.
	JC – OpenFF PIs are going to brainstorm what we need to collect for the NIH renewal application. One big question is how we measure impact/usage of our tools. So if folks have ideas here I’d love to hear. Maybe a callback when a user loads/downloads a file. Current ideas include monitoring how many FE calcs we’ve run. DD – Will be a little tough to measure this for fah-alchemy since job submission won’t be publicly available. But maybe metrics around community engagement with PLBenchmarks. JC – Yeah, but it will be good to measure throughput/jobs processed on F@H. DD – That makes sense. For fah-alchemy we won’t have unique users, but we could measure job throughput.

Participants

Goals

Discussion topics

Action items

Decisions