/
2022-04-19 Protein-ligand benchmarks meeting notes

2022-04-19 Protein-ligand benchmarks meeting notes

Participants

  • @Diego Nolasco (Deactivated)

  • @David Dotson

  • @John Chodera

  • @Richard Gowers

  • David Swenson

  • @Irfan Alibay

  • @Iván Pulido

  • @Mike Henry

  • @Jeffrey Wagner

Goals

  • DD : protein-ligand-benchmark issue divide-and-conquer for milestone 0.3.0

  • DD : any value in Transformation distinct from Protocol? Better to remove an abstraction layer?

  • RG : ResultsNetwork - should individual transformations encode their own TransformationResult with some required attributes/fields?

  • JC: Defining API points for free energy submission/execution

  • IA (if time allows - low priority) - towards a defined means of storing historical free energy data

    • Post discussions with D. Hahn, being able to store the existing OpenFF benchmark data somewhere in an easily accessible manner would be very useful for all involved.

Discussion topics

Item

Notes

Item

Notes

protein-ligand-benchmark issue divide-and-conquer for milestone 0.3.0

  • DD – IA and others have been identifying items to clean up the protein-ligand-benchmark repo

  • DD – #30 assigned to JC, will delegate to Melissa

  • DD – Three others are delegated to Lorenzo

    • RG – I’m coassigning myself on #24

  • DD – #31 delegated to JC

  • DD – Units migration delegated to IA.

  • DD – LiveComms citation and docs delegated to MH

    • JC – Migrate to github pages?

    • MH – I’m in favor of RTD - already exists for this repo, simpler permissions, etc

  • DD – Jnk1 charges?

    • RG – I think this had funny

    • JC – I think we should toss jnk1 since it doesn’t emet our quality standards - “Not trustworthy” xray data

    • MH – I believe we are planning to delete this in #31.

    • RG – May still be worth reaching out to DHahn, if he’s been using this. It may be that we need to update this with what he’s been using (for the record) and then deprecate it.

    • JC – I think it fails multiple inclusion criteria - It doesn’t meet the xray standards OR the dynamic range standards. So I’d be in favor of directly deprecating. If there are better jnk1 structures we could use those.

  • RTD on PR builds

    • MH – I can handle this, if you give me permissions

    • JW – I’ve reached out to DHahn to get us permissions

  • (some others, recorded on GH issue tracker)

  • JW will disable LGTM

  • DD – Adding citation.cff

    • IP – I’ll do this

  • IP – Have you discussed separating API from the data itself? This repo has BOTH data AND the API to access it. It’s kinda a pain that you have to download the whole dataset to just get the reading code.

    • DD – We’d discussed that a bit earlier. Like, scikit-learn does kinda lazy-downloading this with example/test data.

    • DD –

  • JW – Who are reviewers and mergers?

    • DD – Anyone can review. Only one review is required

    • RG – DH opened #36 last week to recommend multiple reviewers.

    • MH – It’d be cool to have policies and pull request templates - Like the “add a new target” template could have a checklist with the steps required to get it merged.

    • RG – Should that be a target for the 0.3 milestone?

    • DD – Nah, let’s plan it for the 0.4 milestone

  •  

any value in Transformation distinct from Protocol? Better to remove an abstraction layer?

  • DD – I’m working on GUFE #13 - Trying to get it ready for review. I’d love to have a working session with someone from OpenFE to move this forward.

    • RG – Yeah, I’d be happy to work with you this week.

    • RG and DD will work to push this PR forward later this week

  • DD – Possible related to the “submission/execution API” point below - I’ve been thinking about how to do this. Up until now, I’ve been thinking about transformations as having their own protocol. But that may not be necessary to do a… Is there a useful distinction between a transformation and a protocol?

    • RG – The transformation object is the mapping object in this case, right?

    • JC – If you’re computing the FE between two species… There are a few consideraitons. You can do replicaa exchange, lamda switching, etc. But then there’s ALSO the possibility of different atom mappings. So we should answer what we’re looking at here - Just free energies of two species? What if we switch the protein? What if we switch the solvent? Different atom mapping will also lead to different efficencies. Then there’s also the question of different force fields.

    • DD – Along those lines, my question is whether a transformation should HAVE a protocol, or if a transformation should BE a protocol?

    • JC – The bigger question is “do you want to separate things that affect the EFFICIENCY of the calculation from things that affect the ACCURACY/RESULT of the calculation?”

    • JC – Is the transformation tied to an environment (if the free energy difference between two mols in JUT a protein, or a protein AND solvent”

    • DD – A transformation connects two chemical systems. A chemical system connects many components - Could be a protein component, solvent component, and ligand component.

    • JC – If you only want to relate transformations that happen in the same environment,then you… Do you have one transformation per FE, or two?

    • DD – I think we’ll have two. We discussed this yesterday in OpenFE meeting.

  •  

ResultsNetwork - should individual transformations encode their own TransformationResult with some required attributes/fields?

  • RG – From a free energy protocol, we have a results object coming off. We were wondering how unified the API for the results object should be. Like, what’s the maximum of things that we’d want from a results API. Like, we’re pretty sure there will be a delta G, but what else?

    • will there be a container that would allow you to calculate which DDGs you want to calculate?

    • JC – We have some standard statistics we make from the pyMBAR package - Energies, uncertainty, other estimate details. Needs to be in units.

    • RG – Length is a good addition

    • JC – can also be other properties you want, such as statistical fluctuation (how difficult the calculation was, how much additional simulation time needed to reduce variance by 1 unit)

  • DD – We were considering the idea that each protocol… We have the RelativeLigandTransform object.

  • JC – Is this an abstract base class? If we think this is a real, workable API, we should make an abstract base class. Can we import it?

  • DD – Yes, it will probably make its way into GUFE. Right now this has a bunch of settings objects that can be used to define general transformations. What is the minimal/maximal set of things we need to know?

  • JC – Minimum is (Free energy difference and units, uncertainty, statistical difficulty (measure of how much addtl simulation time is needed to reduce uncertainty)

  • IA – What’s a transform here? Can people do enthalpy instead of free energy?

  • JC – Could break this into deltaH and TdeltaS component.

    • standardizing on kT as unit is a good idea

    • also the results object should record how much work (e.g. simulation time) was required to arrive at result to calculate statistical fluctuation

  • IP (chat) – we use this in openmmtools real time analysis output https://github.com/choderalab/openmmtools/blob/870d81ab5a751f666bbbc6a2b3d6a264c36f0e5f/openmmtools/multistate/multistatesampler.py#L1573-L1577 just in case

  • JC – I’m particularly interested in hands-on testing and feedback

  • IA – still need to make sure we have a useful results API

    • DD – If they all have dGs then that should satisfy minimal requirements

    • IA – But if you do a RBFE vs ABFE, those have different meanings.

    • DD – Some ideas we discussed yesterday were convenience methods/functions that consume a network and know which data they can use and how to interpret them to give useful answers.

    • JC – Tricky thing here is that they must have the same atom mapping and same alchemicalt transformation. Otherwise they can’t be combines/compared.

    • DD – I see. So that info needs to be accessible through the aPI

    • JC – Yes, this is why I think we should separate “accuracy factors” (force field , atom mapping, alchemical transformation) from “efficiency factors” (which just affect runtime)

  •  

Defining API points for free energy submission/execution

  • JC – If OpenFE already has APIs laid out here, we should see if we can build to that.

  • (This is referring to the API presented by the FEMethod base class)

  • JC (sarcastically) – Do we like making people’s lives difficult? (We should call this FreeEnergyMethod)

  •  

(if time allows - low priority) - towards a defined means of storing historical free energy data

  • IA – Post discussions with D. Hahn, being able to store the existing OpenFF benchmark data somewhere in an easily accessible manner would be very useful for all involved.

  • IA – Like, recording software versions for future interpretability.

    • IA – Also, do we know where the existing calculated deltaG results live?

    • DD – Maybe just on Janssen’s clusters right now.

    • JC – We could use an exchange format for this data.

    • DD – We could get the data from Janssen and store on Amazon S3. But I’m not sure how big this would be. What’s the minimum we’d need? Just dGs? Or more?

      • JC + RG – The object we discussed above would be great. So deltaG, uncertainty, force field, setup info, version of protein-ligand benchmark.

      • IA – I doubt they’re storing raw trajectory data. So the above items would be great.

    • DD will follow up with DHahn to see how much of the above we can get from Janssen, and store it on an S3 bucket.

      • RG – That would be useful

    • IA (chat) – David Hahn did point me to this ~ hour ago, but I don't think everything is there: https://github.com/dfhahn/protein-ligand-benchmark-analysis

 

  • JW – When will we be able to start determining whether we include/exclude user stories?

    • DD – Once we make the next GUFE release, we’ll have tangible, defined objects that we can reason with to determine whether we can support various user stories.

Action items

@David Dotson will reach out to @David Hahn, get information on deltaG, uncertainty, force field, setup info, version of protein-ligand benchmark in historical benchmarks; also need dataset size, suitability for S3 archival
@John Chodera will review OpenFE current state for current execution API via https://try.openfree.energy
@David Dotson will work with @Richard Gowers to push gufe#13 to review/merge

Decisions

Related pages