2022-09-20 Protein-ligand benchmarks meeting notes

Participants

  • @David W.H. Swenson

  • Ben Ries

  • @Mike Henry

  • @Iván Pulido

  • Levi Naden

  • Jenke Scheen

  • @Diego Nolasco (Deactivated)

  • @Richard Gowers

  • @John Chodera

Discussion topics

Presenter

Notes

Presenter

Notes

@David W.H. Swenson Ben Ries

  • Protein serialization update (gufe#45):

    • DS – Finishing up tests, just pushed a commit - BR will pick up from my work. All the functionality is there, just adding tests/making things easier to maintain now.

    • BR – This is soon to be merged.

  • (Last meeting) – Decided that the OpenMM PR isn’t a “must” for the next major OpenMM release. Only GUFE functionality is a “must” for initial OpenFE release.

  • JW – Now that OFF Toolkit 0.11 is out with protein-reading functionality, we have effort available to look at further iterations of PDB reading functionality. Spoke with BRies and RGowers - We all want PDB loader that yields bond orders+formal charges. Tentatively planned to work together on a fork of OMM’s PDBReader class.

  • DS – We haven’t yet split PDBFile into a new repo. If JW has effort available right now and needs somewhere to put ideas, I could start a repo for it. But if not then this would be more natural in a few weeks.

    • RG – We should discuss with PEastman before we go ahead with this, so unless there’s a pressing need I’d like to avoid antagonizing him by not communicatiing about this choice

    • JC – I’m also looking forward to this functionality for espaloma.

    • JW – No urgent need to open a new repo right now. OpenFF and OpenMM have a standing meeting once a month - The next will be 4 weeks from now.

@Iván Pulido

  • Nonequilibrium Cycling Protocol (perses#1066) update:

    • Component objects attributes for proteins, small molecules and solvent.

    • IP – Still working on getting core relative FE object in Perses to be built from objects instead of files. This is turning into a big refactor but I’m nearing the finish line. Question for RG – In GUFE, we have protein/solvent/small molecule components - For protein component, since this is changing from using OpenMM to using RDKit - this is slightly disruptive since we use OMM objects for protein representation.

    • RG – I think this should be possible via OpenFF - OFFMol.from_rdkit(rdmol_protein).to_topology().to_openmm()

    • JC – Can we use an OFFMol in Perses for small molecule components instead? Can export OFFMol (which is available from GUFE) to OpenEye.

    • IP – We need OFFmol for small molecules and OpenMM objects for protein/receptor.

    • DS + BR – We may be internally storing chemistry as RDKit for all components - So we have pathways to get to OpenMM topology and openMM positions for these components.

  • IP – Are the chemical states already usable? Last time I looked, there were PRs that needed to be merged.

    • RG – I think they’re finished (PRs are merged)

    • DS – Still waiting on serialization of other things higher up in the hierarchy - The network and transformation objects aren’t serializable.

    • RG – There’s also a PR about mapping open, but that’s higher up. So system and below should work fine right now.

@Mike Henry

  • ProtocolSettings taxonomy update (gufe#37) update:

    • MH – Met with MThompson and some OpenFE people last week – Looks like there are some intricacies in serialization re: supporting different python versions. So I’m proceeding with some explicit unit requirements instead of just dimension (eg temperature must be in K in the initial implementation). This should keep type checkers happy with our MVP while keeping the door open to more elegant functionality later. Will open for review soon and then it should be good to merge.

  • MH – Also, I got access to AMBER’s GitLab and I’m working on M1 support for AmberTools. Things are a bit more complex than it seems. So I’ll try to fix it in the source, but if that doesn’t work, I’ll just manually build the conda packages and request conda-forge to upload manually.

Melissa Boby?

  • protein-ligand-benchmark - 0.3.0 update

    • IAlibay (earlier on Slack) – I won't be able to join in this week - my update on the PLB side of things is "still working on it", sorry it's been a pretty busy week back but I'll get things done soon.

    • IP – MBoby asked if she was needed, but is quite busy with other things. I recently gave feedback on the PR noting that it’sgood that there are some new docked ligands, but there are 4 cases where the number of ligands decreased. MB will check into this.

    • JC – Have you been able to tests these systems? It’ll be important to ensure that these simulate before we make the release. It’ll be good to have these checks automated.

    • IP – That’s still a to-do item for me.

    • JW – Would this be blocking to the overall project? As in, would we need to rush out a releaseif this falls behind?

    • JC – We can run from the branch. The big question is whether the re-preparation decreases the accuracy a lot.

    • IP – In that case, would we need to run these calcs to completion?

    • JC – Something like that, it would be fine to “spot check” without running the whole set. Do you think we are too limited to run through these in a week?

    • IP – We’d previously done 300 jobs in a week - Runtimes were largely system-dependent. Some take a few hours, some take a whole day.

    • JW – This is becoming a bit chicken-and-egg - What if we needed F@H to run enough of the dataset to have confidence in the result? How could we distinguish between regressions due to protocol vs. data?

    • JC – I think running a subset of the data would solve this without needing F@H. We have lots of compute at MSKCC

    • IP – Was seeing slower-than-anticipated performance at MSKCC, I’m talking with sysadmins. Which systems would be representative?

    • JC – I’d think that we should run 1 iteration of each system to test for technical issues, and then 10 edged per system to completion to assess accuracy. Does this sound appropriate?

    • MH – I think these tests make sense, especially the technical “1 iteration” ones.

    • JW – I think it’s OK to have tolerance for botched releases - Coming from the very delayed OFF Toolkit release, aiming for perfection has infinite cost, and I’d prefer to have a broken first release that we can fix than big delay.

    • JS (chat) – IIRC Chris Bayly had a nice plot of the benchmarking series' system sizes vs computing cost (with orion)

    • MH – Could we run an allocation to run this on orion?

    • JC – Maybe not, since they prepared systems using their own software, and they generally frown on assisting competitiors

    • JS – MCL1 was their fastest system

    • JC – Could we wrap SireMol in a protocol?

    • JS – Like, SOMD? I think that should be possible. JMichiel would probably be happy to be involved.

  • DS – IA reported that OpenFE is already running through these datasets.

    • RG – We also ran through 3 targets from the Hahn set to double check against gromacs runs.

    • JC – (clarification) – DH originally ran these calculations himself on Janssen resources. One level of quality control is to ensure that our new results aren’t radically different.

    • IA chat – we are running the new systems, we are comparing against Hahn's results. we aren't re-running Hahn's initial conditions.

    • JW – Is this comparing results form …

    • RG – This is saying (new structures+new workflow) are yielding similar results to (hahn structures+hahn workflow)

    • JW – That’s great news - I’d love to see more details when you’re comfortable sharing.

 

 

Action items

Decisions