Current development status, blockers, next priorities
@David Dotson
Backlog population, system of record for coordinating effort on work units (issues, PRs)
@Diego Nolasco (Deactivated)
DN – I like working with work breakdown structures (WBSes). This will help keep things in scope and prevent late items from creeping on to the backlog. Each item would be described as “must” “could” “should” or “want/won’t”
DD and DN will meet to make a WBS, send JW an optional invite
Needs for protein representation
@Richard Gowers
Needs for gufe::ProteinComponent protein representation, identify gaps in using OpenFF Toolkit Molecule
Current status of reading in PDB files
Storage of “protein” contents within Molecule
JW
0.10.X loading of proteins
Will need to be a MOL/SDF
0.11 changes:
Deprecation of TopologyMolecule
Add atom hierarchy metadata and iterators
RG: most interested in protein representation, not necessarily force field application at this time; do want to know what changes are coming
were looking at hacking OpenMM’s PDBFile to populate an OpenFF Molecule
JC: can you describe what you’re trying to accomplish first?
RG: would like to ultimately have the OpenFF Molecule as the molecular representation; working on the path to getting there from a PDB
JC: what is the state of reading in protein details from a PDB, in particular getting chemical information out of it (nontrivial)?
JW: at this time, working to preserve residue and chain info from PDB; need to do three things
need to hold information in OpenFF Molecule such that it can be round-tripped to and from PDB
…
JW: zoom out though; current release can’t read protein PDBs at all
pre-release version would be better, but can change
RG: does OpenFF Molecules data structure change significantly?
JW: no longer supporting virtualsites on the Molecule itself
currently don’t have residue information, but will have that
JC: having users for this functionality can be useful positive pressure
JW: don’t want to cause friction for OpenFE; is this a problem?
RG: one way we can use current stable release is to patch in PDB perception code in our ProteinComponent, then use current stable version of toolkit; basically hanging this information on the side until we have the new release?
JW: problem is you’ll have a mol where you don’t know bond orders, formal charges
JC: wouldn’t it make sense to invest the effort to fix issues in the toolkit itself?
RG: sure if it’s pretty close to 90% there, we can go that route
MH: agree, if it’s pretty close then good approach
JW: agree, and think it would be great to have users banging on it
RG: in terms of exporters, just need to be able to export to an OpenMM Topology; status?
need to be able to export with an Amber FF for a start
JW: yes can do that already, so aiming to be able to do that
IA: how likely would OpenFF 0.11.0 be in the next month, 2 months?
trying to meet partner timetable
JW: aiming for alpha release in next month
having this project as a consumer helps justify higher prioritization, faster release
aiming by May 12; does this work?
MH: can also commit patches where we see issues directly, too, if that’s okay
JW: DN, prioritizing effort to biopolymer PR, since this is blocking for this project
JC: is there basic documentation for getting started with the branch?
RG: OpenFE can take over maintenance (part of remit from board)
want to coordinate with existing players where possible
DD: intention was to have Lorenzo heavily involved; will work with JW to get LD what he needs to engage, including discussion with GT to get at least 80% of his time
(not mentioned in slides) – Needs reference to paper
JC – Could explicitly specify requirements for inclusion (like, metals? and other stuff)
IA: taking less of a gromacs-first approach might be better; e.g. just switching to PDB-oriented approach
IA: consistency issues
RG: in terms of representation of the protein, would it make more sense to serialize an OpenFF Molecule? PDB has a lot of ambiguity
IA: yes, that would make sense
JW: so idea is that you would load an OpenFF Molecule from a serialized representation
however not currently supported across versions of Toolkit
RG: wouldn’t be hard to crank out a stable format
JC: would be a worthwhile use of engineering effort
JW: with the upcoming release we’re gaining residue fields and losing virtualsites
frankly way we’re doing aromaticity doesn’t make sense, so that may change in a future release
one-time upconverters would be possible to read old serialized versions
MH: do want to version the output format; that way upconverting is possible
IA: I realize we want to move away from PDBs, but ideally going forward we don’t have to change our representation in this repo
perhaps starting point should be PDBs then?
RG: if you start from PDB, changes to the perception of the PDB would impact results
JC: in the short term need “well-prepared” PDBs with explicit protonation, missing residues built in, at least conformant with PDB spec, etc.; too early for standardization of serialized format for Toolkit Molecule given data model is changing so much currently
RG – Maybe mmcif?
JC – mmcif can optionally record bond order and formal charge.
JW – I’m a big fan of mmcif as a standard representation
JC – Short-term, I could use OE Spruce to clean up many structures. But some structures still aren’t going to be acceptable because of their resolution and other factors
JC – I will fix the things that I can quickly fix this week (re-refine from PDB source and rebuild structures)
JC – It’d be great if IA could open issues on the repo for these discrete points:
IA – Can do. RG tasked me to remediate this.
DD – Review policy?
JW – OpenFF won’t add a required reviewer.
JC – LiveCOMS style repos should be decentralized. So any one person’s approving review will be sufficient.
(General) – Agree
protein-ligand-benchmarks conda package?
DD – May be a good idea. Git LFS hosting could get expensive quickly.
Additional topics
JC: would like to think about using the objects in gufe, from perses end could create Protocols in that can be tossed
DD – So perhaps Perses could use gufe objects in constructing protocols.
JC – It would be necessary to be somewhat flexible in how protocols are defined, so like different atom mappings or other modular replacements for aspects of strategies can be slotted in.
Action items
@David Dotson will schedule time with @Diego Nolasco (Deactivated) and @Jeffrey Wagner (optional) to create work-breakdown structure (WBS) for this project; present to working group for feedback and approval
@John Chodera will address protein-ligand-benchmark#20 via PR