2021-05-12 Benchmarking for Industry Partners - Development Meeting notes

2021-05-12 Benchmarking for Industry Partners - Development Meeting notes

Participants

 

  • @Lorenzo D'Amore

  • @David Hahn

  • @Jeffrey Wagner

  • @David Dotson

  • @Joshua Horton

Goals

  • Updates from team, spin out action items

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Updates from Team

 

  • LD: Completed production run at Janssen

    • also have data for OPLS3, default and custom parameters

    • now running the analysis, but hitting issues

      • analysis is not taking into account the MM result

      • looking closely at command line

    • finished work on the analysis for Bill and Xavier(!)

      • output is now as expected in both cases

      • need to take notebook code and put it into a PR on openff-benchmark

      • DH: for Bill’s analysis, would be nice to have plot of distributions of the energy at different levels of RMSD deviation

    • Also ran torsion benchmark on small dataset

      • M052x functional

      • openff 1.3.0 overestimates the barrier in many cases

      • up to 3 kcal/mol overestimation

      • JW: considering where this fits in

        • definitely of interest to the fitting effort

        • already effort going on to benchmark QM methods (Pavan and Simon)

        • we probably need to choose a gold standard to QM method/basis to compare b3lyp against

      • DH: probably be good to have a small dataset to see the spread in results for different method/basis sets

        • JW: would almost say that if we have two equivalent methods that are similar in cost, would be better if we benchmark against the one we didn’t use for fitting

  • DH: will take care of Katharina’s error

    • otherwise have question from Gary: how far along is the public submission?

      • DD: about 20% complete; bumping to high priority for a week to give it a boost

      • DD: Have 4/10 submissions in sharable results; we now have two genentech submissions; expanded set is a superset of their standard submission

  • JW: working with Bill to understand dipole moment discrepancies

    • related to using Molecule.from_smiles instead of Molecule.from_mapped_smiles.

    • It turns out that, the order of the atoms in the molecule, even if the connection table is the same, can affect the partial charges. This is because RDKit will generate different conformers depending on the atom order. Some conformers will then experience a proton transfer in AM1 calculations, leading to really bad partial charges. It is impossible to inspect an atom ordering and know whether it is “good” or “bad” without running the whole

    • He also wanted geom opt to use same charges as dipole charge calculations

  • JH: no real change

  • DD:

    • DD: JW and I met with AbbVie. I sent them a “welcome packet” to get them started on the benchmarking package.

    • XL contacted me to ask about the progress of torsiondrive infrastructure. I did about an hour of work last night to get this into a functional state. I’m sending him the install instructions so that he can start experimenting with it.

      • LD – I can start testing this.

      • DD – I’d made a mistake where one method was using a generator instead of a tuple. That bug should be fixed now.

    • Thomas Fox sent us a SMILES string exhibiting the chemistry the problem molecule he encountered had

      • mentioned that some tfd values missing in results

        • we see missing tfd values for BRI-00015 for gaff-2.11, but not openff-1.3.0

        • not clear how this can happen at this time

        • JW: might be able to reproduce this in the public dataset

          • strange thing is other analyses worked fine on the same molecule e.g. RMSD, DE

        • [decision] try to reproduce tfd missing in public dataset

        • DD: will take on reproducing this on the public dataset; could execute MM at home for a start

        • JW: Will respond to Thomas Fox’s email in some useful way

Action items

@David Hahn will address the error Katharina encountered
@Lorenzo D'Amore will create PR against openff-benchmark for analyses from Xavier, Bill
@David Dotson will execute MM on public industry benchmark dataset locally; attempt to reproduce cases of missing tfd
@Jeffrey Wagner will correspond with Thomas Fox on his provided SMILES for problem case

Decisions