2021-05-26 Benchmarking for Industry Partners - Development Meeting notes

Participants

  • @Simon Boothroyd

  • @Lorenzo D'Amore

  • @Joshua Horton

  • @David Hahn

  • @David Dotson

  • @Jeffrey Wagner

Goals

  • Updates from team

  • Needs for partner workshop

  • Active PRs

  • Industry benchmark reroll

Discussion topics

Item

Notes

Item

Notes

Updates from team

  • LD: Janssen results are in the public drop zone!

    • OPLS outliers, trying to assess what is wrong for these cases

      • 2/1000 molecules

    • Met with DM, DH: decided to do a small benchmark of torsion profiles

    • DM asked whether electron donating groups in aniline was bending the nitrogen (since it will switch between planar and tetrahedral).

    • (shows a case where there’s an amine group connected to a phenyl ring substituted with C(N)=C). In FF optimization, I saw it become planar with all openffs. But aniline w/ an ortho electron donating group became planar with openffs 1.0-1.2, but it became tetrahedral (10 degrees out of plane) with 1.3 (this may not have been the case, need to re-check data). Using QM, the pure aniline is 15 degrees out of plane, but with a electron donating group it’s 25 degrees. With an electron withdrawing group, it becomes more planar.

    • SB – This is cool, let’s talk about this more in the Friday improper meeting.

    • LD – Do you mean testing the planarity in the benchmarking?

    • SB – want to take common moieties and get an internal coordinate view, like amine planarity, etc.; when we do the benchmarking we get glaringly obvious deficiencies

      • having internal coord RMSDs for specific chemistries is definitely of interest

      • definitely want to be able to automate and routinely

    • LD – Benchmarking with OPLS4 – Working with DH on new schrodinger branch

  • DH

    • Adapted schrodinger PR to be able to work with newer schrodinger versions. Also updated documentation, eg to better explain the ffbuilder directory usage. I think it’s ready to be merged.

    • LD – Should we tell people that there’s a ff-REbuilder tool available?

    • DH – I’ll document that.

    • DD – How do you specify which OPLS version to use?

      • DH – It silently switches depending on which Schrodinger site version you have installed. In many cases you can’t use anything other than what your current schrodinger suite version supports.

      • DD – Thanks, DH. I sent Kaushik the new instructions but haven’t heard from him.

      • DH – I haven’t heard anything either. I emailed him today to say that there’s a new version of the branch available.

    • DD – Should we write a protocol document for this, as a supplement to our previous protocols?

      • DH – Let’s have Kaushik try it before we announce it. I do like the idea of a single place with all the protocol info. I can also ask Gary Tresadern about whether we want this data included in the publication.

      • DD -- Sounds good. I can write the protocol document once we’re ready. Even if this data doesn’t get into the upcoming publication the partners would probably find it valuable.

    • JW: would caution adding Schrodinger pathway to production for Season 1

      • we’ve already allocated effort here; don’t want to expand scope

      • if DH and LD can be primary on responding to partners, that could work

      • SB: I think the value-add is high here vs. effort; helps us to see if we’re heading in the wrong direction on an important metric for partners

      • JW: with benchmarking we’re having this discrete Season 1; don’t want to expand scope of this

        • am fine with just making tooling available for folks to run at will, with no time pressure on features and bug fixes

      • SB: still think the value is high here; comparison to schrodinger is one of the top 2 things partners are interested in

        • if we could suggest that partners can run this and tell us how it went, not commit to publishing them, that would be valuable

      • DH – In terms of licensing, it’s “officially” the same if the benchmarking data leaves the organization

      • DD – This seems like a grey area. I agree with SB that there’s a big value add to making the tools available even if we don’t collect results.

      • DH – Agree. Whether or not this goes into the publication people will find value in it.

      • DD – We should be ready for the situation where partners have tested this on their sets and want to contribute their data

      • JW – I’ll reiterate that we don’t have infrastructure personnel-time to run devops and user support, so this would need to be handled by DH and LD if season 1 contirnues.

    • DH – I also found that the openff-benchmark analysis from the industry dataset matched the rankings from the Lim set. This is very nice to see.

      • SB – I also saw the same thing. And this is even better, because one criticism of the original Lim set was that the test and training sets were too similar. So using the industry set somewhat solves this problem.

    • DH will keep following up with Kaushik to get feedback on Schrodinger PR.

    • DH will begin writing protocol doc for schrodinger optimization tool.

  • SB

    • I’ve been playing around with torsiondrive benchmarks, especially with the JACS fragmented set. I’m trying to figure out what would be valuable metrics

      • metric for barrier heights; metric for profile shape

      • RMSD, and also RMSD that is influenced by torsion barrier height.

    • Trying some benchmarking in parallel to season 1. DD really helped by getting the package updated to use openff-toolkit (>0.9)

    • Parallelization is a concern; need to see if it’s possible to batch parallelize analyses in particular (e.g. TFD)

    • Rethinking how we use RMSD for different terms could be good. For large molecules cartesian RMSD is not a useful metric. So instead per-term RMSDs could be good (like for angles, impropers, etc)

    • RMSDs for specific moieties of strong interest

      • checkmol - powerful tool that I use

      • Have a C++/Python port of this written but unpublished. Original author agreed to change license to be openff-compatible.

    • DD – Do you see these metrics as being a part of benchmarking infrastrcture, or living somewhere else?

      • SB – I’d envision this living in openff-benchmark. My current goal is to have a single “analyze everything” tool that can consume the output of openff-benhcmark's optimizations.

      • DD – Agree. I’d be happy to see openff-benchmark become a library of metrics.

      • SB – Agree. I think this could eventually be fully schema-ified. Then this would be easily compatible with an eventual web visualization frontend.

      • DD – This sounds good. Most of the data flow goes between SDF and OpenFFMol, and we could replace SDF with OFFMol jsons.

      • SB – Could imaging having a JSON model that defines what a season is, which datasets were used, which metrics/analyses were run and how, etc.

      • JW – Agree. This plan sounds great.

  • JH

    • Largely the same as what Simon’s been working on.

  • JW

    • Nothing new to report

  • DD

    • Worked with JW to do openff-toolkit update PR, and change testing logic to be more forward-looking.

    • JW – New QCSubmit release?

      • JH – Planned in “weeks”

      • SB – New QCSubmit release will require fragmenter C-F release (ETA 1-2 weeks based on past experiences)

        • AFTER that, expecting another 1-2 week turnaround to get openff-qcsubmit on conda-forge

    • We’d previously heavily pinned dependencies to make the industry benchmark results standard across partner sites. . So now we’re removing as many pins as we can. So this should help with env flexibility, but it’s possible that this will change results.

    • I noticed that unpinning QCEngine breaks things – The 0.19 version crashed when we try to use it.

    • Working with LD on reproducing Lim dataset. We’re blocked on pulling down old QCA datasets, waiting on BP to finish migration.

    • Adding multiprocessing to server export (off-b PR #82)

    • Putting optimization executor into geopt – Once this is done, I’ll replace openff-benchmark's internal executor with calls to this.

    • Messaged Christina Schindler and got a new public submission from Merck with explicit hydrogens. Will make a new v1.1 industry public dataset that replaces Merck’s old submission with the new one.

    • Industry public dataset is still being computed on QCA – 90% complete.

    • Checking for missing TFDs, still crunching data to check for this.

      • DH – Any results on this?

      • DD – I haven’t reproduced this yet, just need more results.

      • DH – If the molecule doesn’t have any rotatable bonds then we shouldn’t get any TFDs

      • Set of questions:

        • if TFD missing for some methods but not all for a single molecule/conformer, that is unexpected

        • if TFD missing for some conformers of a single molecule, but not all

      • JW: identified in Fox set BRI-00973, see TFD for some conformers, but not all

        • matched_openff-1.3.0

        • matched_gaff-2.11

      • DH: we observe different reference used for comparison, which could be a case of different connectivity between the compared conformers, which would trigger non-matching SMILES between the conformers

      • DH: seems like RDKit produces canonical smiles?

        • JW: did you see any of these cases in your own results?

        • LD: let me check, will investigate these questions on the Janssen set

      • JW: the Fox set might actually be pretty challenging, so not surprising if we’re getting exotic issues

      • DH: so yes, LD we can probably look at some examples from our set and see if these are cases of stereochemistry or connectivity changes

  • DH – The problem with the case of this that I can look at in the Janssen set has double bond stereochemistry in a macrocycle. The root cause appears to be that some RDKit-generated conformers of this macrocycle are very distorted, such that the double bond either represents a different stereochemistry even before QM, or that the structure is so bad that the QM drives it to change stereo.

    • (General) – This is a to-do item for season 2

    • LD – The QM optimization preserves the stereochemistry, so this is probably a problem with the generated conformer.

    • JW – To solve this, we could re-percieve stereo from 3D immediately after conformer generation, and ensure that it matches what we expect. We’re probably not catching this problem now because we blindly trust the stereochemistry defined in the BONDS section of the SDF, regardless of the 3D coordinates.

  • DH: can we ask Thomas if he checks the generated conformers if they have the same stereochemistry, for say BRI-00973

Action items

@David Dotson will announce Janssen submission to all partners
@David Hahn will follow up with Kaushik on Schrodinger testing feedback
@David Hahn will draft protocol doc for schrodinger optimization execution
@David Dotson will create issue capturing desire for pluggable metric architecture in openff-benchmark, such as per-moiety RMSD; another issue for schemafied data models within openff-benchmark for refactor work
@David Dotson will track down QCEngine 0.19.0 breaking changes with openff-benchmark
@David Dotson will roll in Christina’s explicit-hydrogen set into industry benchmark set v1.1
@David Dotson will ask Thomas Fox to check generated conformers for BRI-00973, see if they have the same stereochemistry

Decisions