2021-03-10 Benchmarking for Industry Partners - Development Meeting notes

Date

Mar 10, 2021

Participants

  • @Jeffrey Wagner

  • @David Dotson

  • @David Hahn

  • @Joshua Horton

  • @Lorenzo D'Amore

Discussion topics

Item

Notes

Item

Notes

Rescheduling meeting?

  • DD – Could we shift this to an hour later?

    • (General) – Yes

Updates

  • JW –

    • Toolkit 0.8.4 stable is now online – Can get off omnia/label/benchmark, though QCSubmit will still be on omnia/label/rc

      • This isn’t urgent (performance will be exactly the same), but we’ll want to eventually have this migrated over.

    • Haven’t made SMARTS filter

  • DD –

    • Worked on PR #65 to detect connectivity rearrangement. Refactored a bit, still working on it.

    • Worked with Bill Swope to make a local patch for export step. His process takes ~40 hours since they’ve run so many calculations (~3600 mols). Right now he’s running an experiment where he exports 1/3rd of dataset.

    • Katharina Meier (Bayer) is unable to use anaconda for policy reasons. She thinks it’s resolvable.

      • JW – Would single file installer resolve this?

      • DD – Bayer completed the burn-in. So they have a technical path to run things. It seems likely that it’s a security policy issue.

        • [committed] DD will reach out to Bayer, get status update, mention single-file installer as alternative if not already tried, ask again if there are ways we can assist

    • DD – Started conversation with Gary about doing a season 2. We’ll discuss that at the next benchmarking partners meeting. Some questions as to whether we want to allocate infrastructure personnel-time to this.

  • LD --

    • I was able to install and use the benchmarking workflow (running through Docker from JW’s instructions)

    • Did small tests with a few molecules on a mac workstation. Followed protocol on Confluence, I thought it was excellent.

    • I’ve been told that the Janssen laptop is on its way – That will give me access to a linux machine/cluster with more resources.

    • Meeting tomorrow with Davids H and M, and will continue planning.

    • Mostly replicating results from David H right now, getting familiar with infrastructure.

  • Decision – We will move discussion about benchmarking infrastrucure into the #developers channel

  • JH –

    • Working on other projects the past week. Nothing to report benchmarking related.

    • In the near future, I’ll be starting to get torsiondrives running. I’d like to coordinate on getting a single interface for this running.

      • DD – Agree

      • JH – Is the plan for the initial implementation to go into the benchmark package?

      • DD – Yes. It’ll start there, but once it’s mature we’ll absorb it into a more developed package.

    • (General) – Fragmenter pinning to old OpenEye breaks our environments. We should push on solving that ASAP.

    • JH – I’m planning on doing a bespoke workflow test on the pharma partner public set. Please let me know once it’s submitted. Not a huge rush, but once the submission is made, let me know, since I’ll need to fragment the mols.

      • DD – So it’s only necessary to see the submitted molecules, and not the results?

      • JH – Yes, at least initially. I’ll need to resubmit the molecules in fragmented form for torsion drives to get the bespoke parameters.

  • DH –

    • Working with LD, helping get the workflow running for him.

    • I’d like to to have LD use the dedicated torsiondrive script.

      • DD – I’d made an early version of this with JH. It’s still a bit early and may not be stable. Current implementation launches a local QCFractal server, and deletes the database once it’s exported the results. Currently it involves a lot of processes talking to each other, so it’s somewhat unstable and hard to debug. I’m working on refactoring this to not require use of subprocesses and hopefully be much stabler.

      • DH – I wasn’t able to find it. Could you point me to where it lives?

      • DD – It’s not online yet – I’ll point you to it when it goes up (~next week)

      • DH – Could LD work on a pull request to get this integrated into the workflow?

      • DD – I can tag LD as an initial reviewer when I put the code up.

      • LD – What level of computational resources is required?

      • DD – About the same as the normal optimizations running in the current workflow

      • (General) – Current psi4 runs will be somewhat slow in Docker. Hopefully LD will have access to Janssen Linux resources by the time the PR opens. In the meantime, JW can share how to make a multi-core docker container that may speed things up.

    • DH – Is the openff-benchmark PR the same code as JH is looking to use?

      • DD – It’s the beginning of functionality that bespokefit could integrate later (once it’s stable)

    • DH – Worked on Schrodinger CLI paths in a PR. Recent updates make it possible to merge ffbuilder results into your storage directory. Previously, if you ran the FF builder twice, there was an error since there was already data present in the output diretory. So now it merges the data first, and creates a backup directory with the previous contents. This would only come up if you run it a second time with a different set of parameters.

      • DH – Once I have the PR ready to go, I’ll ping LD, JW, and Bill Swope to try it out.

      • DD – Could we announce this work at the March 24 meeting?

      • DH – Yes, but it’s important to note that this won’t include OPLS4, since I don’t have access yet.

      • DH – One handy thing is that OPLS4 will be able to re-use QM calculations from the OLPS3e calculations. So I’ll want to make sure we don’t delete those.

    • DH – Working on Lim paper review. One reviewer asked what’s the difference between custom and default OPLS3 parameters. I’ve redone the calculation with “default” OPLS3e, and shown that there is a slight improvement (expected) when you use the custom params

      • OpenFF may be performing somewhere between the OPLS3e default and custom case

      • The improvement between custom params and default isn’t so great, given the expense of running the QM to get the custom params

    • DH - over 2000 conformers complete in production run; hitting a bit of a bottleneck with other users on the cluster, so throughput not as high as before at the moment

Action items

@David Dotson will shift this meeting 1 hour later, starting next week (already done, since set to AZ time, which doesn’t change)
@David Dotson will set production envs for openff-benchmark to use omnia/label/main
@David Dotson will reach out to Katharina, get status update and offer assistance
@David Dotson will work with @Jeffrey Wagner to refactor fragmenter for openff namespace inclusion, unblock e.g. openff-qcsubmit
@David Dotson will notify @Joshua Horton , @David Hahn on torsiondrive executor PR when up; will tag @Lorenzo D'Amore as a reviewer
@David Hahn will notify @Lorenzo D'Amore, @Jeffrey Wagner, and Bill Swope (Github BillSwope)

Decisions