2021-08-25 Industry benchmarks meeting notes

Participants

  • @Lorenzo D'Amore

  • @David Hahn

  • @Joshua Horton

  • @David Dotson

Goals

  • Benchmarking Workshop on 9/1

    • Live session

    • Aggregated results to show

    • Public dataset: which analysis / off method

    • Pavan results with Fox's analysis

  • Sage + OPLS coordination

  • Public dataset status and needs

  • Updates from team

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Benchmarking Workshop on 9/1

Lorenzo

  • Working on live session component

    • chemistry from public dataset, example is a torsiondrive that is out of sync (violation)

    • violations in molecule indices don’t seem to match what I observe from the SDF in Avogadro

      • JH: looks like atom indices being reordered?

      • DH: perhaps try out pymol and see if it gives same behavior?

        • Did you at some point convert mol to SMILES and back? That could do reordering of atom indices

      • JH: instead of Molecule.from_smiles use Molecule.from_mapped_smiles

        • DH: looks like you can avoid the conversion to/from SMILES; best to do that if possible

  • Aggregated results to show

    • relative violation analysis with seaborn

  • Executing torsiondrive will take too long during session, so perhaps have a pre-computed one?

    • BRI-00593-00

      • JH: seen some things like this; one moiety turns into a weird triangle from the FF

  • Public results with Fox’s analysis

    • DH: thresholds for torsions must be carefully chosen. For some torsions, however, a violation beyond 30deg impossible given the multiplicity

  • LD: providing notebook + dataset

  • For slide deck:

    • season 1 recap

    • small molecule benchmarking, leading into and motivating live session

    • LD: would it be better to do intro, show some season 1 results in live session?

      • then give a talk on protein-ligand benchmarking at the end?

      • Question:

        • put protein-ligand benchmarking talk before or after live session?

    • proposed schedule:

      • Intro slides:

        • season 1 recap

        • Sage performance

          • protein ligand systems

          • small molecules

        • analyzing problem cases from small molecules

      • live session: remainder of time

    • DH: aiming to have slides done by Monday latest

    • DH: do we have any partners attending that didn’t participate in season 1 benchmark?

      • DD: Maybe AbbVie? otherwise wouldn’t be too many

        • perhaps include enough in the season 1 recap so folks aren’t completely lost

  • LD: thinking to show aggregated results from season 1 benchmark, have Sage results from 2 partners (Janssen, Roche) so far

    • should we include this?

    • DD: I’d say yes, can use it as a motivator for additional partners to submit their Sage results!

    • LD: which analyses?

      • have four to choose from

      • thinking going forward with just one: compare-forcefields due to speed

      • worried about errors we might hit for these large datasets

        • DD: I’ll stand by for on-the-fly fixes

  • LD: aggregated results from public set?

    • pulled the specs Gary wants for the publication

      • smirnoff, openff-1.3.0, gaff, and openff Sage

  • DH: perhaps just focus on the public dataset; that keeps it fairly simple for the presentation

    • LD: will include mention of Sage and OPLS for internal results to motivate submission

Action items

@Lorenzo D'Amore will remove to/from SMILES conversion from torsion analysis
@Lorenzo D'Amore and @David Hahn will work to complete slide deck for Monday (8/30)
@Lorenzo D'Amore will work with @David Dotson on interactive workshop components

Decisions