2021-08-11 Industry benchmarks meeting notes

Participants

@David Hahn
@Jeffrey Wagner
@Lorenzo D'Amore
@David Dotson
@Joshua Horton
@Simon Boothroyd

Goals

All partner data is in!
Manuscript needs, Sage and OPLS
Public dataset status and needs
Updates from team

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
All partner data is in!		Roche has submitted their results! We now have all partner results in.
OPLS results		LD: for Janssen, the terms with Schrodinger are pretty restrictive, and may be different for Roche terms could be that they can share results with others, but cannot post publicly, etc. DD: will ask directly for clarification before any public posting LD: Roche only shared default param results DH: Schrodinger may complain if he only benchmarks with default, since that’s not what they recommend using. DD: I’ll ask XL about both topics (whether he has permission to share, and whether we could also JW: we should recognize that we are not going to be able to competently navigate the legal landscape of 10 different agreements DD: verification via email of 1) can you share these results with us? and 2) can these results be made public? in-writing response should protect our org and theirs
Manuscript needs from Gary, Sage and OPLS		LD: we will have two datasets: (1) aggregated results from all partners (2) public dataset from QCArchive gaff, smirnoff, all parsley versions also think sage should be included for the manuscript GT would like to have OPLS results from as many pharma partners are possible. We don’t think we’ll get 10/10 doing OPLS4 with both default and custom, but a subset of the first dataset will have OPLS results LD – Also considering which FFs to compare. In SI we can do all vs. all, but in the main text we’ll do SMIRNOFF, GAFF, Parsley 1.3, and Sage (full release if possible) For reduced dataset, we’ll do OPLS4 with custom and default parameters. Then for the overlapping dataset, we’d do Sage, GAFF, and OPLS. DD: what are some action items we must address to support Gary’s work? LD: public dataset, assessing its current state, exporting data QM data export MM data export QM&MM (backup, likely not used for manuscript) QMthenMM Adding Sage to the MM compute specs DD: we’ll add Sage RC.2 to public dataset; Sage itself is imminent though, so should we include? SB: working on param renaming, but yes is imminent, and would be preferrable for me JW: even though Sage has renaming, the results should be the same in terms of optimizations; the physics should be the same DD: can put Sage as a compute spec on each of these sets at any time later for completeness, even if RC.2 is physically the same JW: don’t want anything blocked by full release of Sage; think we should proceed with RC.2 for the benchmark, not worry about Sage release itself here DD: will proceed with `openff-forcefields` 2.0.0-rc.2 deployment for prod QCA compute Also, need to coordinate pharma partners on Sage, OPLS (where possible) DD: can Lorenzo, David H., and Gary coordinate the partners for gathering these results, since we are now beyond the scope of Season 1? happy to assist, but will be in a supporting role LD: yes, can do JW: we have clarity on what was executed with strict versioning, so technical details should all be covered well; even if e.g. `psi4` package disappears, we have single-file installers for all releases of `openff-benchmark` DD: The lifetime of those on GH should be good – They’re attached to releases, which should be static and long-lived. DD: can ask Lori to avoid deletion of the package we used for benchmarking DD: will communicate above to Gary
Updates from team		JH Just the thing from last week – Useful for refactor, that qcsubmit outputs can go straight to qcengine now DD – I had found that having QCF as a dependency balloons the complexity of the environment, so I’m looking forward to being able to run without that. Very few partners chose to use the server approach (only Swope used it IIRC). SB Nothing to report LD Update on TFox analysis. I’m refining the violation plots to be better normalized/scaled. Using normalization, t133 comes to the top (7%, 1 violation out of 14 uses). Redid analysis to control for number of violations and results are looking clearer. Next steps will be to run more torsiondrives of “high violation” torsions to gather more info. Still brainstorming exactly how to do that. JW: made a new toolkit release; may see differences in partial charge assignment entering the month for doing benchmarking refactor work aiming to collect ideas on key directions in coming weeks DD Not much more to report than above. I’m very happy with the way that this is wrapping up. would be valuable to get feedback/run a survey on how season 1 went JW – I’d wait until everyone’s done submitting everything, otherwise we’ll hav eto do a lot of work defining what “season 1” was and asking them to separate their thoughts about that from the remaining work they may be doing DD – Ok, I’ll start drafting these questions but won’t send them out yet.

Action items

@David Dotson will communicate with Roche, results

@David Dotson will announce all results in!

@Lorenzo D'Amore will export QM data from public dataset from QCArchive (assisted by @David Dotson )

@David Dotson will shepherd public dataset MM computations, add Sage RC.2

@Lorenzo D'Amore and @David Hahn will coordinate partner results for Sage RC.2, OPLS

@Jeffrey Wagner will begin collecting ideas for benchmarking infrastructure refactor in coming weeks

@David Dotson will prepare questionnaire for partners as an input to our retrospective; will wait to distribute

Meetings

2021-08-11 Industry benchmarks meeting notes

Participants

Goals

Discussion topics

Action items

Decisions