2022-03-08 Protein-ligand benchmarks meeting notes

Participants

  • @Richard Gowers

  • @Diego Nolasco (Deactivated)

  • @Jeffrey Wagner

  • Irfan Alibay

  • David Swenson

  • @Chapin Cavender

  • @John Chodera

  • @David Dotson

Goals

  • DN: Propose new RG as new approver - seek approval

  • JW: Propose milestone updates - seek approval (proposed changes in red)

  • DD: Review fah-alchemy user stories #7 through #10 - discuss and gather live feedback

  • DD: Propose candidate data model - discuss and gather feedback

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Propose new RG as new approver - seek approval

@Diego Nolasco (Deactivated)

  • Approved by JC, JW, DD

Propose milestone updates - seek approval (proposed changes in red)

@Jeffrey Wagner

  • (All changes approved)

Review fah-alchemy user stories #7 through #10 - discuss and gather live feedback

@David Dotson

  • DD – Story #7

    • JC – Basic idea of fragalasys is that we all benefit from workflows where other folks' resources fund the creation of data. Diamond light source’s fragalysis program mines enamine real to overlay bond vectors of docked poses that can be elaborated. One thing that diamond would like to offer would be automated free energy calcs that would try elaborating these bond vectors to propose good binders. This could support a lot of really good open science. So this workflow would take a list of structures related to xray ligands and submit them for calculations.

    • DD – So, this would take a network of ligands, and then make an RBFE graph, and we’d compute predicted affinity changes.

  • JW – looks like for each of these purchaseable compounds they already have coordinates?

    • JC – actually they don’t do docking. But other user stories also require pose/coordinate prediction.

    • JW – do other use cases require docking?

    • DD – Do we expect docking to happen “in the green box”?

    • JC – In a lot of the FE workflows the poses will need to be predicted somewhere. Industry folks may have their favorite pose prediction method but we should offer something built in if they don’t have that.

    • IA – an issue you face with the XChem screens is that crystallographic maps can be a bit blurry

      • the ligand is a bit artificially placed there, may not bind as tightly as it appears

    • JC – Also there’s frequently degeneracy in the binding modes. So we’ll want some sort of equilibration in this workflow to ensure the pose/protonation/etc are reasonable.

      • might want preparatory step of use cases that allows extended equilibration of prepared system

  • JW – experience with constrained docking, atom mapping tricky to work out

    • if user can’t pre-specify, then inviting issues that are hard to solve

    • RG – external currently in our tooling, but something we are anticipating to have to have a solution for

    • JC – we have several strategies to do pose prediction. Agree that we need to have a user override for pose prediction.

    • JW – I know perses can do ligand growing-ish methods

    • JC – Also can use Omega to do constrained conf gen. Also have a method to use FF valence terms. And a ML kind of model to do pose pred.

    • JW – might complicate picture to try and put this in green box upfront?

    • JC – do have components that exist we can utilize already though; will encounter issues but allows us to confront them

    • JW – would we ever support two reference ligands?

    • JC – might be departing a lot from what’s needed; aiming for minimal API that has at least the information that’s needed for a docking strategy to consume

    • JW – I’d be interested to see how the API would look for this user story.

    • JC – If we don’t have a component to generate the poses, we’d still need a component to determine whether a pose is reasonable

    • DD – Agree, could have tests to find things like steric clashes or other obvious geometry issues

    • JW – Agree, I like the idea of initial equilibration/sanity check

    • DD – Is it OK if the system doesn’t do docking upfront?

    • JC – We should determine how many of our use cases are going to require coord gen. Like, we should probably be able to support a H → F mutation of a ligand.

    • DD – I see, so we do probably want a minimum case here, even if it’s just to hold space for a more elaborate implementation

  • User story 8

    • JC – Folks in pharma often need to benchmark tools internally on large datasets. This takes lot of time and replicated effort. Also, we are interested in curating datasets from large databases for our own purposes. So we could automate running FE calcs from bindingDB. Eg, we could select for bromodomains or kinases. Then we could use this as a data source for a FF/workflow performance dashboard.

    • DD – This would be good as an “application approach”, where we prove that our stuff actually works out of the box.

    • JC – This would be similar to CELPP and other continuous evaluations to show that tools actually work.

    • IA – How prepared are the entries?

      • do we then need workflow that adds protons in a sensible way, or is that something we can expect from users?

      • JC – Will require protonatioin, protein structure cleanup, ligand prep, network prep,

      • DD – This would be a way to ensure that we don’t overfit to our current benchmark set in system implementation

    • RG – I like the idea of being “ahead of the curve”, since this will help us have a canary for the proteins/systems that we’re not actively looking at

    • DD – Agree, I anticipate that pulling in initial iterations will have large failure rates, but they’ll be instructive

    • IA – might be useful to store then what the intermediate structures are as they move through the system? That way folks have a fighting chance to understand them

    • JC – Does it make sense to cache intermediate results for some amount of time?

      • IA – Depends on the storage requirements. Any sort of intermediate storage would be great.

      • DD – Could have a retention policy, like archiving or deleting thigns once they havent been accessed for a while, or having age-based policy

      • JC – Timescales on the scale of weeks would be fine by me.

      • JC – What/who would manage this lifecycle? People? Programs?

        • retention policy would in practice be implemented at the level of S3 where possible with existing lifecycle functionality

      • DD – Data store would ultimately be S3, but to the user it would all look like a REST API.

      • JC – Big question is how the data is structured, which parts of the process say what the retention policy is for the data they produce, etc

        • what component handles talking to e.g. S3, do individual components write their own intermediates? These are questions that need to be addressed

    • JW – If our bankend tools are continuously being updated, are we concerned that our work won’t be reproducible?

      • RG – if you have the versions of every component you used, you have everything you need?

        • JC – if you’re using external resources or if components are stochastic, then the answer is no

          • should get statistically similar results, not necessarily exactly the same things throughout pipeline

      • JW – but even loop modelling, can’t put a constraint on externally-developed tools, so we’re at the mercy of their changes. What should our policy be towards this.

      • JC – best thing you can do is adopt careful consideration of dependencies

      • JW – take docking as an example; if we have a docking component that releases a new version that changes results fundamentally, how do we react to it? We’d need to rerun a ton of FE calcs, have a human review to determine whether outputs change, etc

        • JC – This will be an aspect of establishing best practices - We can publish results of using different components, and use that to make recommendations about standard practice. Also, what’s the alternative? Externalizing selection of best practices?

        • DD – Agree that we don’t want to externalize this. We do want to detect when a new version of software significantly changes results. This can help inform us of failures.

        • JW – This sounds like a full time job for 5 people.

        • IA – There’s no solution that doesn’t require human interventions. It’s effectively impossible to get the exact same results on different days.

        • DD – If we build the system to be introspectable, then the differences in performance will be actual useful data.

        • JC – This project doesn’t commit us to do A/B testing of everything, but this will ENABLE people to do A/B testing. This would be a big use case for OpenFE to continue offering updated recommendations.

    • JW – would like to take a policy of keeping versions of components fixed and updated at discrete intervals, certainly not continuously (basically, component updates should be OPT IN)

Propose candidate data model - discuss and gather feedback

@David Dotson

Action items

Next meeting will continue with user stories 9 and 10, candidate data model proposal

Decisions