2021-07-08 Industry Benchmarks Partners Call

Participants

@David Dotson
Thomas Fox
Arjun Narayanan
Bai Xue
Bill Swope
Christina Schindler
@David Hahn
Ian Craig
Katharine Meier
Kaushik Lakkaraju
@Lorenzo D'Amore
Xavier Lucas
Pieter in't Veld
@Jeffrey Wagner

Discussion topics

Item	Notes

Item	Notes
Publication discussion	GT – Goal is to build upon the previous Lim and Hahn paper. This is timely because we’re going to follow up in the future with more sophisticated comparisons, like protein-ligand, which will be good, but have a lot of complexity. So this is one of our last stops for simple, less complex analysis. The conformer energetics will constitute the basis of the manuscript. We’re interested in hearing more from people about different ideas for analysis. Everyone’s going to be a coauthor thanks to CS – Is an interesting aspect of this project the collaborative benchmarking itself? Should we highlight this as a means of method development in our field? GT – Great idea. We’ll have the space and scope to discuss this. Like, are we learning something more by the volume of the study? Initial study was 3300 mols, this one is around 50k, so it’s possible that we’re going to find clearer signal here. I’m thinking that there will be a few key figures like in the short version of the Lim and Hahn paper. For us, if we find some small internal structures that are especially problematic, we could probably release it publicly. TF – We should emphasizes that the format of the workflow with accessible directories and python APIs made it easy to hack on the data myself. I did my “local analysis”/”violation analysis”, and I’d love tot alk about that in the paper. GT – Great idea. This could be a really good direction CS – Is the plan to include OPLS results if we have them? Will everyone be able to contribute that? GT – We’re trying to do this internally, but not everyone will be able to do it in time. CS – I ran it as well, but I’ll have to rerun it since DH had made an update. In our datasets, we found a big difference between general vs. custom parameters. We’ve already applied for Schrodinger for release TF – Is there something novel that we can say based on the scale of this project, that the previous datasets weren’t large enough to resolve? AN – I had heard “50k molecules” earlier in this meeting. Is that number the unique mols, or the number of confs? (seems to be confs, internal sets have about 7k unique mols, combined public+private sets should be around 14k) CS – It’s worth discussing why 1.3 shows worse performance than 1.2 JW – My understanding is that the training sets were substantially different – 1.3 training set had more congested molecules, and that may have added noise that leads to degraded performance (General) – It’ll be good to discuss why 1.3 is showing degraded performance, but if possible we’d like to avoid getting dragged down into detailed comparison of 1.2 and 1.3. So if we can bring in 2.0.0 we can sidestep needing to discuss this at length. Pi'V – It won’t be too hard to include 2.0.0, right? The same infrastructure should be able to handle it. DD – Correct. Pi’V – Are there any objections to running 2.0.0? (General) – No objections AN – So, is it worth running with the release candidate, or should we wait until the full release is out? (General) – We want to wait for the full release. But we can speed up the release by benchmarking the release candidate and showing improvement (and no cases of serious degradation). OpenFF will release new openff-benchmark package that can run sage-rc1. CS, PiV, pharma partners will run the benchmark using the Sage release candidate, and submit results in the next few weeks. GT will start an overleaf document with a paper draft, and that will be the common place to work on the document.
Season 2	BS – What will be the focus of season 2? DD – Torsiondrives, and comparison with ML potentials like ANI GT – Is there an added value to doing torsiondrives? I’m genuinely curious. GT – I’d be interested in seeing dimer interaction energies, but we don’t have internal datasets on these. But I saw a DE Shaw paper that looked at dimer energies on a big dataset, though it’s of questionable value. BS – I like the idea of interaction energies. The benchmarks so far are testing INTRAmolecular behavior, but I think that INTERmolecular interactions would add value. AN – Agree JW – Are there unique datasets that you have internally that would be good for testing? GT – Dimer energies could come straight from QM. GT – We do have some high quality datasets like logP internally, but those will be tricky to benchmark against JW – I’ll check with Simon about if we already have the infrastructure to test against logPs. GT – Generally, I’m interested in season 2, and I think torsiondrives will be interesting. I’m going to keep thinking about this. TF – What about IR spectra? JW – I’m not really sure. I’ll ask at the next FF fitting meeting. DH – The FFs aren’t necessarily fit to reproduce IR spectra
OpenFF to-dos	DD will push to make the new openff-benchmark release. Add test for loading 2.0.0-rc.1 JW will review this PR today. DD will make a protocol supplement for running evaluations on 2.0.0-rc.1 DH will add a small update to the analysis, removing QM-to-QM analysis (it’s always all 0s). LD will push the change that puts the logic to compute the intersection of methods in the plotting code, rather than assuming that the intersection was already computed in the spreadsheets (this will let us run the analyses of different methods as separate commands in bash) DH will ask GT whether Janssen will run OPLS on the public set, or if work will be split with other partners like CS.