2021-02-08 Core Devs Coffee

Date

Feb 8, 2021

Participants

@Jeffrey Wagner
@Simon Boothroyd
@David Dotson
@Pavan Behara
@Matt Thompson
@Mike Henry

Discussion topics

Item	Notes

Item

Notes

Roundtable updates

SB
- Worked on pydantic-based molecule, Topology, biopolymer. Met with JW+MT on this.
- Shared scripts for gradients+evaluations with PB.
- Lots of science work.
- JW and SB will talk about how to delegate/assign subtasks to team members on research areas.
MT
- Mostly meetings. Topology meeting – very promising.
- Some progress on VU coordination – Starting to set deadlines through 2023
- Building up energy-based tests. Basically, if we’re going to continue having a central representation for systems, then we should be able to get engines to try and agree on energies. Hopefully this framework should extend into a common structure as we add new engines.
- Tried using MyST to make docs for System object. Fairly close to jupyter book infrastructure.
  - https://openff-system.readthedocs.io/en/latest/
MH
- Got initial django webapp hosted – working on setting up HTTPS
  - https://www.samplchallenges.org/
  - http://app.samplchallenges.org/
- Working on dependency resolution on perses – Requires openmmtools toolchain conda-forge
  - JW – If you have bandwidth, try banging around wiht conda install -c conda-forge openff-toolkit
  - SB – OpenMMForcefields+Perses are still undergoing migration. OMMFFs is almost ready to get merged, just waiting on JC review. Then QCEngine infrastructure can be updated.
- JW – Let me know if I can help with validation.
DD
- Kicked off partner benchmark. Previously we had just done a “burn in”, where they all ran on the same molecule set, and ensured they could get the software running on their internal compute. Now they’re rolling “production”, where they are running on their own molecule sets.
- Aiming to have computation done in 60 days.
- Next, I’m going to take each partner’s open data sets and run them through our own QCA queue. Need to decide on how to record molecule names/indices – Whether to use QCA automation’s default naming scheme (CMILES) or benchmarking IDs.
  - JW – If standards V3 uses CMILES for names, let’s stick with that. Otherwise let’s chat about potential downstream effects.
  - (General) – The document above lets each dataset describe its own naming scheme. So let’s use benchmarking IDs
- I’ll be getting back to my role managing the public OpenFF QCArchive queue. Currently aiming to figure out what’s wrong with PEPCONF. Will aim to get the trajectory written out from failed jobs so we can see what’s going on.
- PB, would you have time to work on this today?
  - PB – Yes, let’s schedule offline
    - DD will follow up with proposed meeting times.
  - PB – Tried running one of the errored jobs locally (on UCI cluster). It took a few steps and stopped. Didn’t get much useful error output. Currently trying to reproduce on my personal machine.
  - DD – If this fails quickly, this will be a great error report for the psi4 devs.
- Also looking to implement some changes to dataset lifecycle automation. Especially priority-shifting. This is especially important for torsiondrives, since newly-spawned optimizations don’t inherit the priority of their patent torsiondrive.
- Error cycling can also be improved – Would like to “priority-decrement” failed jobs, to keep from using lots of compute on jobs that are more likely to fail.
- Will also begin scoping out infrastructure changes we need for automated implementation of standards V3. Hoping to reduce human effort/bureaucracy required for dataset submission. I’m optimistic about this, because standards V3 were based on things we were already doing, so we have something quite concrete to implement.
JW
- Worked on benchmarking Devops and RMS changes (see #benchmarking-partners channel for more info)
- Worked on “project planning” – Let me know if you know someone who would be a good 50%ish contractor for infrastructure load-balancing.
- Next up, I’ll be ready for benchmarking support, but will try to begin fighting down Toolkit PR/Issue backlog.
PB
- Most time was spent on WBO work and FF release meeting.
- Worked on submitting genentech set. Had an issue with fragmenter – All molecules were returning 0 outputs from QCSubmit factory – Either stereoisomer enumeration or fragmentation was the cause of this.
  - JW – Maybe due to namespace migration/package name changes?
  - DD – QCSubmit uses a lot of multiprocessing pools – That might be swallowing errors.
  - PB – Looks like they definitely die in fragmenter (not stereo enumeration)
  - PB – Can look at this when DD and I meet about debugging PEPCONF.
- Worked on debugging PEPCONF.
- Will be continuing work on WBO studies. Working with SB on gradient analysis.

Sprint planning at 9 AM Pacific

2021-02-08 Core Devs Coffee

Date

Participants

Discussion topics

Action items

Decisions