2021-02-17 Biopolymer infrastructure Meeting notes

Date

Feb 17, 2021

Participants

  • @Jeffrey Wagner

  • @Simon Boothroyd

  • @Matt Thompson

  • @David Dotson

Discussion topics

Item

Notes

Item

Notes

Recap previous meeting

John API suggestions

  • Change one residue into another (OK if this produces a new topology and atom mapping)

  • Phosphorylate

  • Covalent ligands

  • Take a residue, and get a capped molecule with atom mapping

  • Take a residue, and get a UNcapped molecule with atom mapping → Make sure that these fragments can make it to an OEMol/RDMol with some basic capabilities.

  • Can get further use case info/feedback from Dominic Rufa

 

  • SB – What do we want as the long-term vision for these objects vs. what would Chapin need in the short term? Would we expect major refactor to be done in time for his start?

  • DD – What’s expected molecule/system size?

    • JW – up to 1000 residues, 100ks of atoms including water

    • DD – In MDAnalysis, we used topology objects that had slot-based classes. But we had scaling issues when we got near a million atoms. Had a hierarchy of atom/residue/segment mappings, which are now efficient because they’re based on numpy.

  • (General) – Numpy/number-based indexing/bookkeeping is more efficient, but it assumes that the system doesn’t change (have atoms added/removed). So a big decision is the extent to which our biopolymers will be mutable. This will determine whether we make a “worldbuilding” guarantee to the users.

    • SB – We almost certainly want to allow mutability

    • JW – Agree

    • SB – My preference is to have an immutable inner data model, and a mutable layer around it that offers safe mutation/copy+replacement of the inner model. (so the rebuilding may or may not happen, but the user won’t know). Anything else makes bookkeeping really hard.

    • MT – This idea doesn’t remove the need to handle lots of complexity. I’m not comfortable with the idea that anything which we don’t explicitly allow isn’t available to users.

      • JW – Agree. Maybe we could offer a “safe” API that makes some guarantees, but also make it possible for people to directly access underlying data and take risks

      • SB – We’ll probably want to start with a small API and take user feedback for new mutators. People will be free to make mutations by making a subclass of our stuff. This could be viewed more as a documentation challenge than an API challenge.

      • JW – We’ll need to have formal cache-invalidation logic in our mutators, so if a new mutator calls three existing mutators, it will able to declare which caches/atom groups it invalidates by combining those three. And if we make that information readily availabe to developers then they can safely implement their own mutators.

      • MT + SB – This may not be a day-0 issue

  • SB – We should make sure we have a plan to improve performance of OE and RDKit biopolymers on SMARTS matching.

    • JW – This is on my roadmap. Looking into caching to_rdkit/openeye outputs, reversing SMARTS search, parallelizing SMARTS search, deduplicating parameters.

    • DD – Timing differences between OE and RDKit?

      • JW – I need to send them an example. Right now my only code uses OFFTK, and ti will take some work to disentangle an example from OpenFF toolkit code.

    • SB: see if RDKit can support a single SMARTS-matching call that can take multiple SMARTS

      • best gains for now likely through our usage of RDKit/OE through Python

Next steps

  • Where should tests/infrastructure proposals go?

    • → SB will give JW, MT, DD write access

    • Eventually this will live in OFF Toolkit

  • Who should do it? (should we all work on the same thing, or different things?)

    • JW + SB – It’ll be good to start by populating the repo with tests, all marked xfail, and then to start working on implementations that make increasing numbers of tests pass.

  • Direct users:

    • Chapin coming in May

    • Dominic Rufa (and other perses people) from Chodera Lab; Chris Bayly from OpenEye

  • How to gather feedback?

    • SB – I’m in favor of making a strawman that they can attack, since it’s easier to propose changes to something that exists.

    • MT – Agree, though it’s easy for that to get pushed far into the future and then we never get feedback.

    • JW – I will schedule a meeting with Dominic in a month, and we’ll present whatever we have.

Action items

@Jeffrey Wagner will contact Dominic Rufa to coordinate a meeting around Mar 17 to show aspirational biopolymer API, and gather his needs/ask him to distill down Perses’s effective biopolymer API.

Decisions

  1. @Jeffrey Wagner will own biopolymer infrastructure, and @David Dotson may begin assisting in a large capacity in the future.