2022-03-07 Core Developers meeting notes

Participants

Irfan Alibay
@Simon Boothroyd
@Chapin Cavender
@David Dotson
@Diego Nolasco (Deactivated)
@Matt Thompson
@Jeffrey Wagner

Discussion topics

Item	Notes

Item

Notes

General updates

JW – Lots of people will be out at OE CUP this week - I know of Owen, Michael S, John, Pavan, David M

Individual updates

SB
- Mainly getting bespokefit ready for industry use. Documentation, QoL improvements to API, CLI work. I think it’s in a pretty good place moving forward. Ideally this will go out to industry folks in a couple weeks.
- JW – I saw QCEngine PR that you DMed this morning - Is this a hard blocker or a nice-to-have?
  - SB – It’s a quality-of-life improvement. Right now we can only run one optimization at a time for TorsionDrives. But the PR will allow us to one run opt per core. So without it the runtime goes up a lot.
  - JW – Do we have the same problem with QCFractal?
  - SB – QCF runs things fully parallel. But it’s super opaque and hard to debug/develop. So we’re just using QCEngine for bespokefit.
  - DD – We came to same conclusion with industry benchmark.
  - JW – Is there a QCEngine release planned with the next big QCF release?
  - DD – Usually it’s the other way around - A QCEngine release motivates a QCF release. Also frequently drives new releases for elemental.
  - DD – I’ll reach out to LBurns to take a look at this and push for merge+release.
- JW – I was going to ask about release coordination, but SB coordinated having JH reach out to BSwope. This is great, since JH knows the detailed status of the software.
  - SB – I figured we should offer to let JH be the initial point of ontact, and then ahve OFF take over when it becomes more arduous.
IA
- Just popping in to say hi.
- OpenFE just finished the first sprint. Put together an example notebook showcasing hwo we use OpenFE components, showing how we go from ligands to final numbers.
- Interested in understanding what you’re planning to do for PL benchmarking (what fraction of PL benchmark repo), whether this can be joined with our HPC resource request.
  - DD – I think we’ll probably do the all the systems from the PL benchmarks repo
- IA – I’m probably limited to ~10k total gpu hours. The particular Oxford HPC system I’m targeting doesn’t have an accounting system. But we really shouldn’t exceed our actual request. We’ll probably use the latest version of OpenFF so we can report results back if we coordinate a bit.
- JW – Currently just doing small molecule parameters with Sage. With Rosemary we’d do protein as well.
  - IA – To keep the combinatorics down, we’ll probably just do small molecule parameters.
- DD – If allocation will be a little tight, then I’d recommend focusing on the systems most useful to OpenFE.
- JW – Does OpenFE have plans for protein prep?
  - IA – RG knows more, we don’t have it as a high priority for year 1. Protein prep will be hard and we’ll likely recommend use of something like PDBFixer.
DN
- Was in an OpenFE planning meeting and mentioned that I’m making a strategy statement/document. Some members of their board reached out to me to express support. So I’m looking at making OpenFF and OpenFE strategic plans. Looking at making a 2023-2027 strategic plan now. This will help us understand our mission and help industry partners make decisions about continuing to support our efforts.
- Strategic planning
  - Clear long term vision
  - Short term expectations alignment
  - Make sure everyone has the information they need
- This will necessarily need updates/changes, and I’ll be talking with each of you to gather information in the future .
CC
- Continued work on revising LiveCoMS review, more work to do this week.
- Putting together a more detailed plan for Rosemary benchmark - protein systems, expected trajectory length, infrastructure needed for analysis. Laid out the high-level plan on fah-alchemy repo (Benchmark proteins (eventually other biomolecules) using F@H · Issue #9 · OpenFreeEnergy/alchemiscale ).
  - Trying to make this tiered, to separate out “wants” and “needs”. This will allow us to consider additional proteins and/or water models.
  - Making a more detailed spec for what will need to be added to Evaluator to enable a detailed analysis of the benchmark simulations.
MT
- Interchange/toolkit regression testing done except for virtual sites.
- Started deep dive into virtual sites. Expected support is looking like anything that can be written up in SMIRNOFF, with the caveats:
  - Some patterns are ambiguous and no guarantees are made about which of multiple possible matches makes its way into the system
  - The toolkit has some bugs, so old toolkit behavior might be different
- Toolkit now applies a switching function in its OpenMM export
  - Was previously not at all, which is a bug
  - Results for condensed phase systems are likely slightly different, un-tested by myself
  - Interchange applies the switching function as I understand it should be applied
- ForceBalance has broken CI and is not compatible with OpenFF Toolkit 0.10.3. Hoping to fix both with JW this week
- Wrote up minimal docs (just expanding the README) for units package: GitHub - openforcefield/openff-units: A common units module for the OpenFF software stack
  - SB – It’d be good to ensure that this doesn’t break or massively slow down FB - We had this issue a while ago. So make sure that FB quantity parsing survived this - Breadcrumb here:
- Other CI nursing/fixing/updating
DD
- Protein-Ligand Benchmarks
  - had our third working group meeting, I laid out my impressions from review of user stories working group reviewed previous week
  - decided to proceed with a developing a candidate data model, based on the work OpenFE has done, perhaps even subclassing it, and drawing on previous work John has done as well as the Covid Moonshot
  - finished research cycles on OpenFE showcase (need to review latest revision shown today by Irfan), as well as John's drafted vision
  - today using my notes and conclusions as inputs to create draft data model
  - at tomorrow's meeting we will review the 4 new user stories; will also present current thinking on data model for free energy network
  - DD – Would like to work with JW and DN on agenda for tomorrow
    - JW and DD will meet later today to draft agenda, submit to DN for final approval.
- QCArchive
  - Ben recovered space on public QCA; removed old SPICE set records, including wavefunctions; storage usage at around 52% now, vs. over 90% earlier last week
    - DD – PB asked (on behalf of BP) about other things that we could remove (eg bond orders and other data from optimization trajectories). THuefner posted on Slack that he is using molecule geometry and energies from many optimization trajectories.
    - DD – Looks like we missed the time window for the NIH equipment supplement. Window will open again in one year.
  - ESP dataset appears complete; now systematic errors only
    - SB – Thanks for getting this through so quickly. This looks great, only 24 failing opts out of 50k.
  - making forward progress on peptide datasets; fast enough Chapin?
    - CC – It seems to be pretty bursty. There will be thousands of errors reporting None all at once, then after error cycling many of them will finish successfully
    - DD – I had shut down PRP managers a while ago because we’d fail through the whole queue really quickly. But whatever used to be happening seems to be resolved now, since the jobs are now getting run successfully.
    - CC – I’ll do some sleuthing on what’s going on with the failures, to see whether it’s a geometric/torsiondrive issue.
  - burning quickly through remaining SPICE sets, which is great news; eager to get these off our plate completely. Doing about 15k opts per day, about halfway through the whole set now. I’ll ping JH on this.
JW
- OFFTK 0.10.3 release due to OFFTK mis-assigning charges to water. Broke ForceBalance in the process. MT and I will work on getting a FB release out this week.
- F@H planning continues, there’s been a lot of interest so tomorrow we will update the timeline to have more time for speccing requirements.
- Working on vsite regression tests for interchange rollout. There are some cases where multiple behaviors are acceptable, so it’s not very straightforward. Found some bugs in the process. I got the hard part done last week (making sure the correct number of vsites were made, ensuring that they do/don’t override each other via SMIRNOFF rules), this week I’ll focus on making sure the geoemtry, vdW, and electrostatics parameters are correct.