2021-04-19 Developers Coffee Meeting notes

Date

Apr 19, 2021

Participants

@Jeffrey Wagner
@David Dotson
@Lily Wang
@Pavan Behara
@Iván Pulido
@Simon Boothroyd
@Matt Thompson

Discussion topics

Item	Notes

Item

Notes

Updates

MT
- Moved from using OpenFF topology+MDTraj topology stapled on, to using MDTraj topology for most conversions. Mostly had what I needed, but had to add angles and a few other api points.
- Worked on difficult ParmEd edge-case tests, AMBER format/conversion look daunting.
- Tried to do some gromacs-on-conda-forge packaging, but didn’t get far
- Met with Eastman/Chodera/Swails/Shirts on how System object will replace ParmEd
- Been using openff-units, will start using openff-utilities
- Codecov blew up. Considering whether it’s healthy to have so many external services.
  - JW – I’d be happy to either remove secrets from our main repo, or remove a lot of the free integrations
  - MT – I’d be happy to move away from having so many free services, and do more of them on our own.
  - SB – I’ve been doing this myself, and 90% of things that LGTM will catch could be caught by local linting. Some of the other 10% is pretty useful. I’m a fan of codecov.
SB
- We’ve been looking for better optimization around FF optimization infra. Part of this will be handled in bespokefit. But other parts will be around maintaining/referencing datasets, where you can combine datasets/results and slice/customize how they mix. I’ve been working on putting this into qcsubmit so that bespokefit can use it.
  - PB – BP mentioned that we can’t make custom datasets on QCA. Will this enable custom dataset creation.
  - SB – I’ve done this more on the QM side than the phys prop side, using a special REST API. It would be a lot more work to get this supported in QCA.
  - PB – For recordkeeping, is there a way to upload a results JSON for particular entries?
  - SB – I kinda think we should have a github repo for each study that we do, where we can put up custom datasets and in a recorded way.
  - DD – Is it currently the case the QCA collections doesn’t meet the need for fitting?
  - SB – Yes. There’s the mutability issue where more results can be completed over time, and that datasets can be extended. Also, there’s post-download filtering (eg for conformers that are good with sterics/electrostatics). So, the best thing to do seems to be to keep track of the individual molecule record ids.
  - DD – This makes sense to me.
  - SB – So, this is why a change in record ids would be bad. Also, we want to support ways to access records from different QCA servers.
  - SB – I’m working with JH on an aspirational API for this flexible datasets/record pulldown functionality
  - DD – I think the new QCPortal client should be able to support these, hopefulyl caching will make the performacne a bit better.
- Worked on fragmenter refactor, thanks LW for review. I think the results are going to be comparable to the old version, but not exactly the same. There will still be a difference between OE and AT because of the sue of restraints in minimization.
- Encountered an issue where YANK’s mdtraj interface wouldn’t support atom slicing of a HFE calculation was leading to a crash. To avoid this, use mdtraj <=1.9.4.
IP
- Working on aspirational API for topology. Working on functionality like formal charge/bond order perception from protein PDB, residue name/number assignment from SDF.
- Meeting today with MoSDeF folks to see gather topology requirements
- Meeting tomorrow with Perses team to gather requirement for topology refactor.
- Did some small PRs – developer envs and other docs.
- Studied MDTraj internals to understand what people will expect from biopolymer topology.
- MT – A word of caution – It’ll be good to satisfy all potential users, but this may be too wide of a net. I wouldn’t prioritize materials science stuff as much as biopolymer stuff.
- IP – One of the things we’ve been focusing on is giving the users a way to extend the API to whatever they need, so there are places where we expose plugin interfaces for users' own needs. For example, topology hierarchy should be customizable.
- SB – Would this be based on inheritance or composition?
- IP – Mostly composition, but still working on the details
- SB – One heads up, pydantic doesn’t like inheritance. So this will require careful planning
  - DD – One change we’re making in QCPortal is to make the users interact with a friendly python object, which internally stores its state in a pydantic object.
- JW – We’re trying to design the topology to not make a lot of assumptions about the presence/meaning of resname/type/num/id/index, and let users define which hierarchy they expect for differnt parts of their topology
  - DD – A lot of this seems to be drawing on MDTraj conventions, will it work well with MDAnalysis?
  - IP – I actually prefer MDAnalysis, and I like that the atomgroups and residuegroups are more generic. Hesitant about making it a depenency
  - DD – I wouldn’t advocate making it a dependency, but it would be worth researching their object model and seeing what features of it might be integrated.
  - LW – We did a small survey of selection languages in VMD, pymol, MDA, MDT (focusing mostly on resid and insertion code behaviour)
- LW – For the bond order perception problem, Cedric was a GSoC student who used MDA+RDKit to do bond order+formal charge perception from PDB – Got 99%+ accuracy on chembl.
DD
- Added compute tag-based routing to qca-dataset-submission (in a PR currently, hoping to merge this week). Thanks, PB, for the review
- Now managing Lilac and PRP QCF workers.
- Next, I’ll work on standards v3 implementation – TG had made a set of policies for how we want future QC datasets to look, so we’ll try to implement automated enforcement of these policies. Hopefully this will make it clear what is in a dataset and when a dataset name can be considered stable.
- Partner benchmarks: Tested schrodinger pathway at Genetech. Worked with Bill Swope to evaluate OPLS3 performance. ffbuilder didn’t work on his setup. Seems like a dependency issue, he’s reaching out to others.
- Worked with lorenzo on torsiondrive execution. Found some ambiguity about how different backends percieve “rotatable”.
  - SB – Are you in sync with JH’s needs/plans with this?
  - DD – JH is on the benchmarking calls where we work on this, but he isn’t a user yet.
  - JH – It’d be good to try and ensure this will interface nicely with bespokefit.
  - DD – Should the torsiondrive executor live outside openff-benchmark? Maybe in QCSubmit?
  - SB – I think it should live in its own package, or somewhere closer to the QC infrastructure.
  - DD – We have an optimization executor that lives in openff-benchmark as well, should that live in the same spot?
  - SB – That would be good. It’ll be great if we can easily access QC without spinning up a whole fractal server. Some questions about how to locally store/cache results in an appropriate way.
  - DD – We had spoken about this on Friday, and I see the overlap. This will interface with some deep manager code changes.
  - DD – Name recommendations?
  - JW – We need to keep the different compute pathways identical during the pharma partner benchmarking “season 1”. But we should have cut the final release for this, and if we DO need a subsequent release, we can continue openff-benchmark development from our previous release/tag.
  - DD – Agree. We can separate out local execution without negatively affecting benchmarking.
  - SB – Can we recap why we’re not using snowflake for local execution?
  - DD – For optimizations, snowflake is just fine. For torsiondrives, a lot of the code lives in QCFractal. JH had asked why QCEngine couldn’t be used instead, but the QCEngine architecture is quite strict, and it’s not suitable for torsiondrives.
  - SB – What are the blockers to useing snowflake?
  - DD – We tried this. We ran into trouble when a failure occurs in the server or manager, it’s really hard to disentangle where the error occurred and how to fix it. Basically, there was a lot of complexity because of all the process boundaries.
  - SB – Will the fractal infrastructure be able to make use of the new code going into this tool?
  - DD – We could create new components in QCFractal to contain this, though we’d be kinda at the whims of MolSSI’s release cycle. The process that I prefer is that we prototype functionality here, and then, at the end, decide whether to push the final code into the Fractal CLI.
  - SB – That sounds like a good path. We’ll want to ensure that the keep the pathway open to merge this upstream into QCF later.
  - DD – The argument for not pushing this upstream is that torsiondrives don’t require all the dependencies of QCF. So there isn’t much code duplication, since most of it lives in the torsiondrive package. One could make the argument that the code could live in torsiondrive, and it does have its own CLI. The place where we’d do this would be TorsionDrive’s td_api method. This is what our local torsiondrive tool accesses.
  - JW – The best path forward is probably to plan in the medium term for this functionality to go into its own repo+package.
  - DD – Agree. This causes some fragmentation but should give us the flexibility we need.
  - SB – Agree, this is what I’d initially thought.
- This week, I’ll be working on PLBenchmarks automation. Trying to figure out how to loop in F@H for compute in protein-lignd benchmarks.
JW
- Worked with IP on topology aspirational API creation. It’s interesting to see what sorts of molecule mutability touchpoints can unlock new functionality.
  - MT – Would this rely on an inner model/outer model convention?
    - JW – Probably? We plan for most mutatbility operations to return a new molecule
    - SB – That would be consistent with an IMmutable internal representation.
  - JW – Coordinate generation for modified proteins will be really hard. How should we do this? If we don’t, how should we recommend that users do it?
    - DD – In MDAnalysis, we discussed this a lot. Worldbuilding is hard. In our object model, deletion is particuarly hard, but addition is also pretty hard. We had planned a few pathways for this, but all of them were fundamentally hard.
- Infrastructure space is now public, will be working with KCJ on reorganizaiton
- Lerned that OpenMM nonbondedforces are as computationally effieicnt as having two customnonbondedforces
- Worked with SB to cut openff-1.3.1-alpha.1 release
- Codecov exploded
- MT – Any progress on the simtk vs. pint units? Were you going to bring this up with the PIs?
  - JW – Not yet. I’ll bring this up with PIs.
PB
- Working on theory benchmarking.
LW
- Had a working session last week, identified science questions – How do you fragment a polymer into monomers for parameter creation? How do you make conformers for QM submissions?
- Reviewed fragmenter and constructure, opened a PR for constructure for some ideas that I had.
- I’m looking to try out QCArchive submissions. Will run compute at ANU.
  - DD – Yeah, please do put together a dataset and let me know when it’s ready for review. The queue is a bit busy right now but we can pick when to submit it. (LW will compute on ANU). You can use the same dataset preparation scheme as we use on qca-dataset-submission.
  - PB – Could I join in?
  - (General) – Yes
  - JW – Is any special preparation needed for qca manager creation/spinup?
  - PB – The same ones from benchmarking should work well
  - DD – Yes
  - DD – Generally, put server somewhere stable, like your workstation, if the cluster nodes can access it (the compute nodes need to initiate TCP connections from themselves to the server)
  - Deployment Procedure
  - Optimization Benchmarking Protocol - Season 1 | 4. Optimization execution

Meetings

2021-04-19 Developers Coffee Meeting notes

Date

Participants

Discussion topics

Action items

Decisions