2022-06-21 Protein-ligand benchmarks meeting notes

Participants

  • @Jeffrey Wagner

  • @David Hahn

  • @John Chodera

  • @Irfan Alibay

  • @David W.H. Swenson

  • @Richard Gowers

  • @Iván Pulido

  • @Mike Henry

  • @Lorenzo D'Amore

  • @Diego Nolasco (Deactivated)

Goals

  •  

Discussion topics

Item

Notes

Item

Notes

General updates

  • JW – Biopolymer toolkit update - RGowers had contributed an improvement using rdkit instead of networkx, going from ~40s to ~2s for a test system .

    • JC: why not use oechem or rdkit directly to read in the pdb?

      • JW: both had issues, oechem failed on non-protein, rdkit only identified bond order as one.

    • JC – Be a bit cautious about using OpenMM as a perfect validator - There are some additional templates/edge cases where it won’t be fully spec-compliant.

    • RG – We’ve looked at separating out the PDBFile object from the rest of OpenMM.

      • JC – And the mmCIF reader?

      • RG – Yes

      • JW – We’d be interested in helping with this.

    • JW – Once the OE PR is merged, I was planning on beginning the 1-month RC period. I may be able to get that in this week, though I’m busy with annua meeting prep.

    •  

Dataset cleaning questions

  • DHahn - Protein-ligand benchmark data questions. Two things:

    • Best practices paper - MShirts had asked about status, can I go ahead with this?

      • Answered below, DH should go ahead with previous manuscript as-is

    • I was seeing the updates on PLBenchmarks repo. Was wondering about general plan for dataset.

      • JC – For the datasets, OpenFE have agreed to adopt the dataset as a benchmark that will continue to be curated for community-wide benchmarking. One of the first tasks was to clean up the repo to be of broader use to the comunity. One downside was that the initial files were only suitable for use in GROMACS. So we’re looking at generalizing those.

      • One issue was that some ligands failed to dock in constrained docking. So we’re working through those manually.

      • MB noticed that there were some things that were ambiguous about protein prep - eg capping and use of biological unit. So I think the current draft of the paper should go out, with some coordination of a 1.1 and eventually 2.0 version of the paper. But you should go ahead with the 1.0 version of the paper.

        • DH – That makes sense. I did wonder about the redocking - Changing the poses changes the science of the question, and we can’t really compare to previous calcultions once we’ve done that. The initial poses had a lot of expert input in their generation, and it’s dangerous to lose track of that.

        • JC – We’re running benchmarks on the new poses right now, so we’ll see how the new results look. Some of the initial poses weren’t compatible with the protocols that we wanted to use, and some targets like JNK1 where the underlying xtal data wasn’t suitable for inclusion.

        • DH – Understood, I’m glad to hear that the new sets are being run.

    • DH – Would there be a way to also generalize the previous input files/poses?

      • JC – We’re working on the benchmark of the newly docked poses

      • DH – Maybe we could reformat the original poses as well though? Like have maestro clean up the files and write out the existing poses as more general PDB?

      • JC – We’d talked about this a bit, but I don’t think that can be done.

      •  

    •  

Feedback on biopolymer branch

https://docs.google.com/document/d/1ceQWSCQRYRBWORFJ3Jf5g_494t9ojCbYLGOjzi915cc/edit?usp=sharing

 

Action items

Decisions