2022-04-12 Protein-ligand benchmarks meeting notes

Participants

  • @Richard Gowers

  • @Irfan Alibay

  • David Swenson

  • @Jeffrey Wagner

  • @John Chodera

  • @Diego Nolasco (Deactivated)

  • @David Dotson

  • @Mike Henry

Goals

  • DD : current development status, blockers, next priorities

  • DN : backlog population, system of record for coordinating effort on work units (issues, PRs)

  • RG : needs for gufe::ProteinComponent protein representation, identify gaps in using OpenFF Toolkit Molecule

    • Current status of reading in PDB files

    • Storage of “protein” contents within Molecule

  • DD : protein-ligand-benchmark advancement, LiveCOMS revision

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Current development status, blockers, next priorities

@David Dotson

Backlog population, system of record for coordinating effort on work units (issues, PRs)

@Diego Nolasco (Deactivated)

  • DN – I like working with work breakdown structures (WBSes). This will help keep things in scope and prevent late items from creeping on to the backlog. Each item would be described as “must” “could” “should” or “want/won’t”

  • DD and DN will meet to make a WBS, send JW an optional invite

Needs for protein representation

@Richard Gowers

Needs for gufe::ProteinComponent protein representation, identify gaps in using OpenFF Toolkit Molecule

  • Current status of reading in PDB files

  • Storage of “protein” contents within Molecule

  • JW

    • 0.10.X loading of proteins

      • Will need to be a MOL/SDF

    • 0.11 changes:

      • Deprecation of TopologyMolecule

      • Add atom hierarchy metadata and iterators

  • RG: most interested in protein representation, not necessarily force field application at this time; do want to know what changes are coming

    • were looking at hacking OpenMM’s PDBFile to populate an OpenFF Molecule

  • JC: can you describe what you’re trying to accomplish first?

  • RG: would like to ultimately have the OpenFF Molecule as the molecular representation; working on the path to getting there from a PDB

  • JC: what is the state of reading in protein details from a PDB, in particular getting chemical information out of it (nontrivial)?

  • JW: at this time, working to preserve residue and chain info from PDB; need to do three things

    • need to hold information in OpenFF Molecule such that it can be round-tripped to and from PDB

  • JW: zoom out though; current release can’t read protein PDBs at all

    • pre-release version would be better, but can change

  • RG: does OpenFF Molecules data structure change significantly?

    • JW: no longer supporting virtualsites on the Molecule itself

      • currently don’t have residue information, but will have that

  • JC: having users for this functionality can be useful positive pressure

  • JW: don’t want to cause friction for OpenFE; is this a problem?

  • RG: one way we can use current stable release is to patch in PDB perception code in our ProteinComponent, then use current stable version of toolkit; basically hanging this information on the side until we have the new release?

  • JW: problem is you’ll have a mol where you don’t know bond orders, formal charges

  • JC: wouldn’t it make sense to invest the effort to fix issues in the toolkit itself?

    • RG: sure if it’s pretty close to 90% there, we can go that route

    • MH: agree, if it’s pretty close then good approach

    • JW: agree, and think it would be great to have users banging on it

    • RG: in terms of exporters, just need to be able to export to an OpenMM Topology; status?

      • need to be able to export with an Amber FF for a start

      • JW: yes can do that already, so aiming to be able to do that

    • IA: how likely would OpenFF 0.11.0 be in the next month, 2 months?

      • trying to meet partner timetable

    • JW: aiming for alpha release in next month

      • having this project as a consumer helps justify higher prioritization, faster release

      • aiming by May 12; does this work?

      • MH: can also commit patches where we see issues directly, too, if that’s okay

  • JW: DN, prioritizing effort to biopolymer PR, since this is blocking for this project

  • JC: is there basic documentation for getting started with the branch?

    • JW: docstrings currently being re-arranged

      • there is a notebook in Interchange

protein-ligand-benchmark advancement, LiveCOMS revision

@David Dotson

  • DD – RG, IA, as I understand it, OpenFE is looking to coordinate activities on the protein-ligand-benchmark repo.

  • JC – This repo needs cleanup. I’ve opened issues about the specific aspects that need work. Contributors will be credited in future publications.

  • JW – In theory this repo is owned by the Science team, but we are low on personnel resources there, so doesn’t explicitly have an owner

  • JC: the LiveCOMS repo (https://github.com/openforcefield/FE-Benchmarks-Best-Practices)

  • RG: OpenFE can take over maintenance (part of remit from board)

    • want to coordinate with existing players where possible

    • DD: intention was to have Lorenzo heavily involved; will work with JW to get LD what he needs to engage, including discussion with GT to get at least 80% of his time

      • JW: also happy to take on direct management

      • DD: yes would be open to that

  • IA https://docs.google.com/presentation/d/1qvE8qgWIr33BY-jA0X4DEkflLHSvdYPH/edit#slide=id.p1

    • (not mentioned in slides) – Needs reference to paper

    • JC – Could explicitly specify requirements for inclusion (like, metals? and other stuff)

    • IA: taking less of a gromacs-first approach might be better; e.g. just switching to PDB-oriented approach

    • IA: consistency issues

    • RG: in terms of representation of the protein, would it make more sense to serialize an OpenFF Molecule? PDB has a lot of ambiguity

    • IA: yes, that would make sense

    • JW: so idea is that you would load an OpenFF Molecule from a serialized representation

      • however not currently supported across versions of Toolkit

    • RG: wouldn’t be hard to crank out a stable format

    • JC: would be a worthwhile use of engineering effort

    • JW: with the upcoming release we’re gaining residue fields and losing virtualsites

      • frankly way we’re doing aromaticity doesn’t make sense, so that may change in a future release

      • one-time upconverters would be possible to read old serialized versions

    • MH: do want to version the output format; that way upconverting is possible

    • IA: I realize we want to move away from PDBs, but ideally going forward we don’t have to change our representation in this repo

      • perhaps starting point should be PDBs then?

    • RG: if you start from PDB, changes to the perception of the PDB would impact results

    • JC: in the short term need “well-prepared” PDBs with explicit protonation, missing residues built in, at least conformant with PDB spec, etc.; too early for standardization of serialized format for Toolkit Molecule given data model is changing so much currently

    • RG – Maybe mmcif?

      • JC – mmcif can optionally record bond order and formal charge.

      • JW – I’m a big fan of mmcif as a standard representation

    • JC – Short-term, I could use OE Spruce to clean up many structures. But some structures still aren’t going to be acceptable because of their resolution and other factors

      • JC – I will fix the things that I can quickly fix this week (re-refine from PDB source and rebuild structures)

    • JC – It’d be great if IA could open issues on the repo for these discrete points:

      • IA – Can do. RG tasked me to remediate this.

      • DD – Review policy?

        • JW – OpenFF won’t add a required reviewer.

        • JC – LiveCOMS style repos should be decentralized. So any one person’s approving review will be sufficient.

        • (General) – Agree

      • protein-ligand-benchmarks conda package?

        • DD – May be a good idea. Git LFS hosting could get expensive quickly.

Additional topics

 

  • JC: would like to think about using the objects in gufe, from perses end could create Protocols in that can be tossed

  • DD – So perhaps Perses could use gufe objects in constructing protocols.

    • JC – It would be necessary to be somewhat flexible in how protocols are defined, so like different atom mappings or other modular replacements for aspects of strategies can be slotted in.

Action items

@David Dotson will schedule time with @Diego Nolasco (Deactivated) and @Jeffrey Wagner (optional) to create work-breakdown structure (WBS) for this project; present to working group for feedback and approval
@John Chodera will address protein-ligand-benchmark#20 via PR

Decisions