2022-05-10 Protein-ligand benchmarks meeting notes

Participants

  • @Diego Nolasco (Deactivated)

  • @David Dotson

  • Antonia Mey

  • @Richard Gowers

  • David Swenson

  • @Irfan Alibay

  • @Iván Pulido

  • @Lorenzo D'Amore

  • @Mike Henry

  • @John Chodera

  • @Jeffrey Wagner

Goals

  • DN : deadline for 0.3.0 release - 2022.06.03 - seeking approval

  • DD : protein-ligand-benchmark - 0.3.0 milestone review, issue assistance, rebalance

  • DD : Settings Taxonomy - working group spin out, develop proposal for this group

  • DD : Results Storage - working group spin out, develop proposal for this group

  • IA : analysis package inventory - establish current state, where effort may be effectively-directed

    • openff-arsenic :

    • alchemlyb :

      • pymbar (grouped in here)

    • perses :

  • RG : protocol execution architecture - current state

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Deadline for the 0.3.0 release - 2022-06-03

@Diego Nolasco (Deactivated)

  • DN – Last week, AM said that there’s a manuscript coming that will require a release of the recent changes in PLBenchmarks. Any objection to a 06/03 deadline for this release?

    • RG, JW, JC approve

PLBenchmarks issue review

@David Dotson

  • LD – I’m working on making a summary of the repo contents, but I noticed that there are two more targets coming in in a different PR.

    • DD – Go ahead and just make a summary of the current state.

  • JC – Melissa (bobym) opened a PR to fix the bad structures. This used maestro. But we don’t mention how we handle biological units (do we use monomer, dimer, others?). Should we publish best practices on this?

    • AM – It’s probably best to state what we used in the manuscript.

    • JC – Agreed, we could make a table of this. I’ve asked bobym to open an issue on the manuscript repo

    • JC – I’m having bobym add these files in an 04_maestro folder, and then we could have an 05_openeye using spruce. But would it be better to just overwrite the existing GROMACS files?

    • RG – I’d prefer to overwrite the existing GROMACS files.

    • JC – Agree

    • AM – One question is whether we want to provide any FF information with these structures

      • JC – Right now we’re thinking that we don’t want that.

      • JW – Agree, having a singlular input file would be good. RG mentioned that we could require parseability by some number of tools.

    • AM – Agreed, I’d like to test out bobym’s prepared structures by running them through BioSimSpace. I could have one of my students try running the structures in the PR.

    • DD – JC, where would you expect to see the biological unit annotations in the repo?

      • JC – Probably the yaml file in the 00 folder for each target, and the details would be in the raw PDB data, and the manuscript should have something like a table.

    • IA - Would it make sense to turn that into a CI step?

    • IP – Do we need to do anything special since this is Git LFS?

      • DD – If you’re making commits, you’ll need to have git lfs installed.

      • JC – I told bobym not to use git lfs since the pdbs and sdfs are less than 1MB each. Maybe we could make the pdbs, sdfs, and metadata NOT be in lfs, and the ecosystem-specific files like GROs and ITPs be in LFS.

      • JC – So two options:

        • Delete package-specific files altogether

        • Put package-specific files in a specific subfolder that gets put into LFS.

      • What are the problems of scale? How big is too big?

  •  

Settings Taxonomy

@David Dotson

Results Storage

@David Dotson

  • https://docs.google.com/document/d/1NOv01BuTM_Sw_VIc5pipRNOT2g4Tre2dGtZ7-ysy3UI/edit?usp=sharing

  • DD – Could we spin out a working group on this?

    • DS – I could lead that, but OpenFE needs this resolved ASAP.

    • RG – This may be something that’s going to take a long time to resolve, and so this may benefit from decoupling “The F@H results store” and “The OFE results model”

    • JC – Are we disagreeing that much? There are only a few differences about format and hierarchy.

    • DS – I think there are fundamental differences between what users need from OFE and F@H here. When you’re looking at a database that’s going to have a lot of edges over time versus a bunch of one-shots using OFE you may need different architectures

    • JC – I think these are the same thing

    • JC – What if we look at this as a single network for execution and not worry about storage.

      • don’t focus on archival, etc. right now

    • JW – Would it be possible to basically move forward quickly with a bad design and then announce that there will be an API break in the future?

    • DD – DS, could you run this working group?

    • DS – I think we’re talking two different timescales, and OpenFE will need to have an implementation built before this working group would even have time to meet.

    • DD – It would be fine if these followed different timescales.

    • RG – I think there’s a philosphical difference about how we imagine these models getting unified, and it’s not clear how these designs would come together. WE’re imagining a just-in-time data merge, where the outputs stay in their original format and aren’t converted until that’s needed.

      • similar to providing an interface like mdtraj to existing formats

    • JC – yes I think that may be the big philosophical difference

      • RG – probably going to use alchemlyb as our interface

        • has support for some existing engines, not hard to add more

        • pushing step of producing common data models to as late as possible

        • keeping engine-specific files to meet user needs; allows them to use their own tools to understand what’s going on

        • our analysis can also consume those files

    • JC – So you’d plan on keeping things in the package-specific formats until they need to be converted?

      • RG – Yes, this would let people work directly with formats that they’re familiar with, and still leave the door open for unification later.

    • JC – definitely prefer instead to steer folks to data models

      • RG – agree that’s better design, but I don’t think we’re in a position to force our users into using our data models

    • RG – compromise solution may be that we produce the unified format, but also want to have the option to store the engine-specific outputs as well

    • DD – Something like an “opt-in” aproach - So the sims run and make package-specific output files. By DEFAULT we won’t store those raw output files, but we’d optionally let users set a flag to keep them. We could also make it so that if there’s a failure the raw outputs get stored.

      • RG – That could work, I’d just want the legacy formats to be “first class citizens” in the results sotre

      • JC – We could even start with that design, where the legacy formats are the default at first, and then as we start adopting a common format we could make those be default.

      • DS – I agree with that strategy. We’d use legacy files as the resources at first, and then transition to a better solution.

      • JC – Right, we’d operate with these outputs being compressed folders.

      • DS – Yes, and we’d just need to figure out how to store the metadata that describes these tarballs.

      • DS – I’ll lead this working group

        • RG – Approved

        • DS – I’ll make a call on Slack for this.

        • DD – I’ll be in this working group

Analysis package inventory

@Irfan Alibay

  • analysis package inventory - establish current state, where effort may be effectively-directed

    • openff-arsenic :

    • alchemlyb :

    • perses :

  • JW – on openff-arsenic; OpenFF doesn’t have any lead on this, so would be happy to move this to OpenFE org / total control

    • JC – this package was intended to encapsulate best practices; think that is aligned with OpenFE’s prerogative

      • that said, analysis tools don’t need to live in here; was intended to be the entry-point for users

      • Diffnet MLE is included in here currently; not sure if it’s the best place for it?

    • RG – arsenic is something we’d be kinda interested to adopt. It seems like it’s almost two packages - One for number crunching and another for plotting/visualization best practices. So we may reorganize the package to be a bit lighter weight or

    • JW – we have experience making two conda-forge packages from one repo; differ in their install instructions

      • e.g. plain bagel and everything bagel packaged version

      • MH – there are recent improvements in packaging that avoids pulling down a whole X11 stack for plotting

      •  

    • IA – I have a meeting with AM tomorrow about this repo. Like, are all plotting deps needed or could we just refactor some of them (eg plotly) out?

      • AM – With plotly you can now click on data points which leads to a much richer visualization.

      • DD – Agree it would be a big loss to remove plotly.

      • JW – If the interface between the “number crunching” and “visualization” parts is complicated then it may be good to keep this repo as a monolith, just so that the interface moves in lockstep and you’re not needing to track version intercompatibility.

    •  

  •  

  • JC – Plotting in arsenic could use some help

    • AM – I was hoping to contribute on this

    •  

  • Protocol execution architecture

@Richard Gowers

 

Action items

Decisions