Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

https://github.com/ntBre/benchmarking/blob/0e27189c9334a6aa4176b86cc986d7450d7a9328/misc/triage.py

  • BW – Yeah, I go through QCA IDs and search by SMILES. This would seem to be best as a post-processing step, since otherwise it’d just use a lot of disk space.

  • MT – We could implement tagging by parameter, or searching by SMIRKS. Wasn’t thinking of storing this in database, rather a second location.

  • LM – Yeah, agree this would be best in post processing.

  • JW – Is OFFMol I/O being slow an issue here? I’ve put very little work into optimizaing this.

  • BW – My normal thing is to … Takes 45 mins on 9800 mols.

  • LW – 45 minutes is a bit long, 5 minutes wouldn’t be bad

  • (slow part is molecule.from_smiles

  • JW – 5 min for 10k mol is quite slow for toolkits like RDKit. It might be easier to not go through OpenFF

  • BW – I’d tried using RDKit before but there was lots of back-and-forth

  • LW – slow/hard part was kekulizaton

  • BW – I use addhs, sanitize

  • LW (irrelevant side note): – ff.label_molecules(Topology.from_molecules[]) could be hacked to create a super RDKit mol that could be labelled all at once

  • Torsion-constrained data

    • It’ll be in this repo (for torsion constrained minimization):

  • Item

    Presenter

    Notes

    Feature requests


    LW

    • MT – Both discussed in issues that LW opened:

      • Slice benchmark results by parameter

        • MT – Could be handy if you have existing code that does this/shows what you want.

        • LM – So far, I do SMIRKS search of QM dataset to build these filtered sets. BW also had something that slices by parameter.

    Github link macro
    link
    Github link macro
    linkhttps://github.com/lilyminium/openff-strike-team/tree/add-torsions

  • Brent’s code for parameter stuff:

    Github link macro
    linkhttps://github.com/ntBre/benchmarking/blob/0e27189c9334a6aa4176b86cc986d7450d7a9328/misc/triage.py

  • LW – I’ll link some code snippets to do this. The first item above would be really handy.

    • MT –

  • LW – Also, a related feature request would be to invoke checkmol for slicing. Checkmol is funcitonality that groups mols by functional groups.

    • LM – I haven’t used that but it sounds good.

  • https://openforcefieldgroup.slack.com/archives/C03T3LLVC1J/p1713340639866169?thread_ts=1713285654.391259&cid=C03T3LLVC1J

  • Trello update

    MT

    https://trello.com/b/dzvFZnv4/infrastructure?filter=label:benchmarking

    Lessons learned re: 2.2 release benchmarking

    JW

    • JW - worth re-doing comparisons between versions of benchmarking code (using the same data)?

      • (This in comparison to re-running old data (like 2.0) whenever the benchmarking code changes)

        • LM - I expect re-running 2.0 data to be pretty much the same if using new code (YAMMBS 0.0.7 compared to Simon/Pavan) now that QM minima is used for DDE (this at least for the DDE numbers)

        • JW – do we have the CSV for the old benchmarking data so we can compare?

        • LW + LM – I believe the data is in the sage release notes

      • JW – And can we check for agreement on SFE benchmarks?

        • MT – 100-200 solute-solvent combos were used previously. a

          • Here’s the SFE data used in the 2.0 paper:

            Github link macro
            linkhttps://github.com/openforcefield/openff-sage/blob/main/data-set-curation/physical-property/benchmarks/data-sets/sage-fsolv-test-v1.csv

          • Phase 1 of validation is MO will use the new code and 10 combos as a validation set for the new code (the stuff I’m contributing to OpenFE skunkworks).

          • Phase 2 could be rerunning the SFE calcs that were done for a previous FF, and we could compare to that in the future if we wanted.

        • (General) – The convergence criteria may have always been extremely strict. Even in industry benchmarking, even in forcebalance(?)

      • LW – Most of the issues seemed to be around qcsubmit, QCA, and other tools, not much ended up being with YAMMBS.

        • LM – A big problem was with timeouts from QCA. Another big problem was with DASK doing charge checks. But I solved both of those (and can share solutions if needed!)

    ✅ Action items

    •  

    ⤴ Decisions

    ...