2022-04-12 Protein-ligand benchmarks meeting notes

Participants

Goals

DD : current development status, blockers, next priorities
DN : backlog population, system of record for coordinating effort on work units (issues, PRs)
RG : needs for gufe::ProteinComponent protein representation, identify gaps in using OpenFF Toolkit Molecule
- Current status of reading in PDB files
- Storage of “protein” contents within Molecule
DD : protein-ligand-benchmark advancement, LiveCOMS revision

Discussion topics

Item	Presenter	Notes
Current development status, blockers, next priorities	David Dotson
Backlog population, system of record for coordinating effort on work units (issues, PRs)	Diego Nolasco (Deactivated)	DN – I like working with work breakdown structures (WBSes). This will help keep things in scope and prevent late items from creeping on to the backlog. Each item would be described as “must” “could” “should” or “want/won’t” DD and DN will meet to make a WBS, send JW an optional invite
Needs for protein representation	Richard Gowers	Needs for `gufe::ProteinComponent` protein representation, identify gaps in using OpenFF Toolkit `Molecule` Current status of reading in PDB files Storage of “protein” contents within `Molecule` JW 0.10.X loading of proteins Will need to be a MOL/SDF 0.11 changes: Deprecation of TopologyMolecule Add atom hierarchy metadata and iterators RG: most interested in protein representation, not necessarily force field application at this time; do want to know what changes are coming were looking at hacking OpenMM’s `PDBFile` to populate an OpenFF `Molecule` JC: can you describe what you’re trying to accomplish first? RG: would like to ultimately have the OpenFF `Molecule` as the molecular representation; working on the path to getting there from a PDB JC: what is the state of reading in protein details from a PDB, in particular getting chemical information out of it (nontrivial)? JW: at this time, working to preserve residue and chain info from PDB; need to do three things need to hold information in OpenFF `Molecule` such that it can be round-tripped to and from PDB … JW: zoom out though; current release can’t read protein PDBs at all pre-release version would be better, but can change RG: does OpenFF `Molecule`s data structure change significantly? JW: no longer supporting virtualsites on the `Molecule` itself currently don’t have residue information, but will have that JC: having users for this functionality can be useful positive pressure JW: don’t want to cause friction for OpenFE; is this a problem? RG: one way we can use current stable release is to patch in PDB perception code in our `ProteinComponent`, then use current stable version of toolkit; basically hanging this information on the side until we have the new release? JW: problem is you’ll have a mol where you don’t know bond orders, formal charges JC: wouldn’t it make sense to invest the effort to fix issues in the toolkit itself? RG: sure if it’s pretty close to 90% there, we can go that route MH: agree, if it’s pretty close then good approach JW: agree, and think it would be great to have users banging on it RG: in terms of exporters, just need to be able to export to an OpenMM `Topology`; status? need to be able to export with an Amber FF for a start JW: yes can do that already, so aiming to be able to do that IA: how likely would OpenFF 0.11.0 be in the next month, 2 months? trying to meet partner timetable JW: aiming for alpha release in next month having this project as a consumer helps justify higher prioritization, faster release aiming by May 12; does this work? MH: can also commit patches where we see issues directly, too, if that’s okay JW: DN, prioritizing effort to biopolymer PR, since this is blocking for this project JC: is there basic documentation for getting started with the branch? JW: docstrings currently being re-arranged there is a notebook in Interchange
`protein-ligand-benchmark` advancement, LiveCOMS revision	David Dotson	DD – RG, IA, as I understand it, OpenFE is looking to coordinate activities on the protein-ligand-benchmark repo. JC – This repo needs cleanup. I’ve opened issues about the specific aspects that need work. Contributors will be credited in future publications. JW – In theory this repo is owned by the Science team, but we are low on personnel resources there, so doesn’t explicitly have an owner JC: the LiveCOMS repo (https://github.com/openforcefield/FE-Benchmarks-Best-Practices) RG: OpenFE can take over maintenance (part of remit from board) want to coordinate with existing players where possible DD: intention was to have Lorenzo heavily involved; will work with JW to get LD what he needs to engage, including discussion with GT to get at least 80% of his time JW: also happy to take on direct management DD: yes would be open to that IA https://docs.google.com/presentation/d/1qvE8qgWIr33BY-jA0X4DEkflLHSvdYPH/edit#slide=id.p1 (not mentioned in slides) – Needs reference to paper JC – Could explicitly specify requirements for inclusion (like, metals? and other stuff) IA: taking less of a gromacs-first approach might be better; e.g. just switching to PDB-oriented approach IA: consistency issues RG: in terms of representation of the protein, would it make more sense to serialize an OpenFF `Molecule`? PDB has a lot of ambiguity IA: yes, that would make sense JW: so idea is that you would load an OpenFF `Molecule` from a serialized representation however not currently supported across versions of Toolkit RG: wouldn’t be hard to crank out a stable format JC: would be a worthwhile use of engineering effort JW: with the upcoming release we’re gaining `residue` fields and losing virtualsites frankly way we’re doing aromaticity doesn’t make sense, so that may change in a future release one-time upconverters would be possible to read old serialized versions MH: do want to version the output format; that way upconverting is possible IA: I realize we want to move away from PDBs, but ideally going forward we don’t have to change our representation in this repo perhaps starting point should be PDBs then? RG: if you start from PDB, changes to the perception of the PDB would impact results JC: in the short term need “well-prepared” PDBs with explicit protonation, missing residues built in, at least conformant with PDB spec, etc.; too early for standardization of serialized format for Toolkit `Molecule` given data model is changing so much currently RG – Maybe mmcif? JC – mmcif can optionally record bond order and formal charge. JW – I’m a big fan of mmcif as a standard representation JC – Short-term, I could use OE Spruce to clean up many structures. But some structures still aren’t going to be acceptable because of their resolution and other factors JC – I will fix the things that I can quickly fix this week (re-refine from PDB source and rebuild structures) JC – It’d be great if IA could open issues on the repo for these discrete points: IA – Can do. RG tasked me to remediate this. DD – Review policy? JW – OpenFF won’t add a required reviewer. JC – LiveCOMS style repos should be decentralized. So any one person’s approving review will be sufficient. (General) – Agree protein-ligand-benchmarks conda package? DD – May be a good idea. Git LFS hosting could get expensive quickly.
Additional topics		JC: would like to think about using the objects in `gufe`, from `perses` end could create `Protocol`s in that can be tossed DD – So perhaps Perses could use gufe objects in constructing protocols. JC – It would be necessary to be somewhat flexible in how protocols are defined, so like different atom mappings or other modular replacements for aspects of strategies can be slotted in.

Action items

David Dotson will schedule time with Diego Nolasco (Deactivated) and Jeffrey Wagner (optional) to create work-breakdown structure (WBS) for this project; present to working group for feedback and approval
John Chodera will address protein-ligand-benchmark#20 via PR

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments