2022-02-15 Protein-ligand benchmarks meeting notes

Participants

 

  • @Diego Nolasco (Deactivated)

  • @Jeffrey Wagner

  • @John Chodera

  • Irfan Alibay

  • @Richard Gowers

  • David Swenson

  • @Jeffry Setiadi

  • @David Dotson

  • @Lorenzo D'Amore

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Project organization

@Jeffrey Wagner

  • Vote and secure approval for:

    • Approvers

    • The process for making binding decisions

    • Discussion + decision venues (you should feel pressure to NOT have discussions in private channels/DMs, and rather move them to these discussion venues)

  • First three meetings will be on formalizing requirements. Planning to solidify first draft of requirements by the end of meeting 3.

  • F@H interface is a “should” for Rosemary - Must have free energy benchmarks, and this will be by far the most straightforward pathway.

Notes:

  • Unanimity of primary driver and two approvers accepted

  • Veto authority of primary driver and any approver accepted

  • Decision forum accepted

  •  

Technical overview

@David Dotson

  • Overall architecture - High level details

    • DD –

    • JC – Advantages of colocation would be that Chodera lab could cover hosting expenses for OpenFF (both technical and personnel).

    • JC – I’d like to share as much as possible with OpenFE - Molecule transformations, network mapping, etc.

    • JC – Can make data available on AWS for redownload, for people to download at their own discretion

      • DD – Could save data on AWS S3. Different tiers of speeds/schedules, could be accessed by a public URI.

      • DD – Interested in getting output spec information from OpenFE.

    • JC – What’s missing here is info about how inputs are specified.

    • JW – how do we avoid versioning hell on the stuff inside the green box?

      • JC – First thing to standardize what goes in and out of the green box. Could freeze an env

      • JW – how do you do this where there aren’t any automation details for free energy calculations in the green box?

        • JC – what you get is a person doing all the work of generating inputs, which manifests as a slow and expensive feedback loop

          • we want to be able to make iterations fast and cheap; this is scalability

          • being able to get information about failures is key for FF development / drug discovery / advancing the free energy calculation infrastructure

        • JC – would like us to come up with inputs that are a starting point, then we can iterate on them to build out downstream components

  • Current plans/functionality

    • JC – ideally we are in a position where in 3-6 months we have a functional green box that supports one engine

      • has a RESTful API where protein-ligand-benchmarks can be dumped in

      • uses OpenFE components and common data models

    • JW – would like to see free energy benchmarking results, want things to be reproducible and isolatable

      • network planner, system generator, etc.; need to be able to keep everything constant except one component, etc., ability to run locally with only suspected problem inputs

    • JW – do we get to specify which version of Gromacs / OpenMM we want to run on client machine?

      • JC – to some extent, yes; within a few recent releases can specify core versions

        • could be some usability for OpenFE folks; if you wanted to try a large selection of different network planners, could make use of Folding@Home to test these

      • JC – public benchmarking data will be the only way for OpenFE to identify failures; won’t be able to operate on proprietary data

    • JW – how do we want visibility into errors; how do we delete or invalidate previous errors?

      • are we planning on having the ability to delete?

      • JC – We’ve definitely needed to trash datasets/results before.

      • JC – It’s become helpful to be able to pause or delete individual runs/edges.

      • JC – an adaptive supervisory process is possible later on that can add or delete edges in a network graph and propagate those changes into the work server

      • JW – Generally agree. This could become complex so I’d be flexible on this

  • Collect use cases/user stories

    • DD – Let’s populate the issue tracker of the following repo: issue tracker

      • These stories will be the basis of discussion next week.

    • JS – How black-box/agnostic would this be?

      • DD – We can try to loop as much in as possible, and these user stories will help us identify what business logic goes into the green box and what is externalized to users of the green box

    • JW – I think JS may have multiple individual requests - Things like “make the inputs agnostic enough to take host and guest systems” and “allow people to use pAPRika as a workflow component”

    • JS – Could this run forcebalance optimizaitons internally?

      • JW – I don’t think so, but this could be a backend for optimizations if we engineer it right.

      • DD – Understanding your use case, as well as the involvement of forcebalance in it, will allow us to ensure that we are including the information you need in outputs to use it productively.

  • JC – I’d talked to RGowers about shared object models/APIs, and keeping touch about that.

    • RGowers – This is being collected on our github

  • Identify functionality gaps

    • Use cases need to eventually be written, in fah-alchemy GH repo issue tracker

Action items

@David Dotson will make announcement in #free-energy-benchmarking on user story collection via fah-alchemy

Decisions