| |
---|
Recap previous meeting | |
John API suggestions | Change one residue into another (OK if this produces a new topology and atom mapping) Phosphorylate Covalent ligands Take a residue, and get a capped molecule with atom mapping Take a residue, and get a UNcapped molecule with atom mapping → Make sure that these fragments can make it to an OEMol/RDMol with some basic capabilities. Can get further use case info/feedback from Dominic Rufa
|
| SB – What do we want as the long-term vision for these objects vs. what would Chapin need in the short term? Would we expect major refactor to be done in time for his start? DD – What’s expected molecule/system size? JW – up to 1000 residues, 100ks of atoms including water DD – In MDAnalysis, we used topology objects that had slot-based classes. But we had scaling issues when we got near a million atoms. Had a hierarchy of atom/residue/segment mappings, which are now efficient because they’re based on numpy.
(General) – Numpy/number-based indexing/bookkeeping is more efficient, but it assumes that the system doesn’t change (have atoms added/removed). So a big decision is the extent to which our biopolymers will be mutable. This will determine whether we make a “worldbuilding” guarantee to the users. SB – We almost certainly want to allow mutability JW – Agree SB – My preference is to have an immutable inner data model, and a mutable layer around it that offers safe mutation/copy+replacement of the inner model. (so the rebuilding may or may not happen, but the user won’t know). Anything else makes bookkeeping really hard. MT – This idea doesn’t remove the need to handle lots of complexity. I’m not comfortable with the idea that anything which we don’t explicitly allow isn’t available to users. JW – Agree. Maybe we could offer a “safe” API that makes some guarantees, but also make it possible for people to directly access underlying data and take risks SB – We’ll probably want to start with a small API and take user feedback for new mutators. People will be free to make mutations by making a subclass of our stuff. This could be viewed more as a documentation challenge than an API challenge. JW – We’ll need to have formal cache-invalidation logic in our mutators, so if a new mutator calls three existing mutators, it will able to declare which caches/atom groups it invalidates by combining those three. And if we make that information readily availabe to developers then they can safely implement their own mutators. MT + SB – This may not be a day-0 issue
SB – We should make sure we have a plan to improve performance of OE and RDKit biopolymers on SMARTS matching. JW – This is on my roadmap. Looking into caching to_rdkit/openeye outputs, reversing SMARTS search, parallelizing SMARTS search, deduplicating parameters. DD – Timing differences between OE and RDKit? SB: see if RDKit can support a single SMARTS-matching call that can take multiple SMARTS
|
Next steps | Where should tests/infrastructure proposals go?
→ SB will give JW, MT, DD write access Eventually this will live in OFF Toolkit
Who should do it? (should we all work on the same thing, or different things?) Direct users: How to gather feedback? SB – I’m in favor of making a strawman that they can attack, since it’s easier to propose changes to something that exists. MT – Agree, though it’s easy for that to get pushed far into the future and then we never get feedback. JW – I will schedule a meeting with Dominic in a month, and we’ll present whatever we have.
|