Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Item

Notes

Fitting mols

  • Convert all 1.2.0 release tarball mol2s to SMI, then those to PDF

    • HJ – We wouldn’t have used ALL of the torsions in the structures in our torsiondrives.

      • Could highlight driven torsions eg.

        Code Block
          "entry_label": "c1cc[c:1](cc1)[CH2:2][N:3]2[CH2:4]CCCC2=O",

        in fb-fit/targets/td_OpenFF_Gen_2_Torsion_Set_1_Roche_2_020_C12H15NO/metadata.json

    • Will all vibration frequencies be present in optimizations?

      • Unlikely

  • Separate by data/QC job type?

    • Make two pdf and smi files – One with all unique molecules (period), and another with molecules separated by data type, and highlighting driven torsions.

      • Let’s take the low-hanging fruit first (just list all unique SMILES and 2D structures), and get more elaborate if people ask. Scrolling through ~20 pages of highlighted torsions isn’t feasible anyway.

  • Identifier in PDF, which is also attached to SMILES

    • Name? SMILES itself?

    • Try doing SMILES in really small font, and only have three or four rows, so that people can Ctrl-F for their molecule of interest in the PDF.

  • What’s the best way to make benchmarking 3D structures available?

    • The set is hard to wrangle (due to overlap with fitting data), so let’s not worry about 3D structures initially

Benchmarking mols

  • Just repost Hahn’s molecules

    • Are these the same as in benchmarkff?

      Github link macro
      linkhttps://github.com/MobleyLab/benchmarkff/blob/74f500241810cdfc99ead0392d8ea0be5e93cb00/molecules/molecule_PDFs/molecules.pdf

Provenance?

  • Record process for converting 1.2.0 release package to SMI/PDF

    • Where? How?

      • versions-- at least conda env export

      • List steps in the website PR? Paste script?

      • Upload script/smiles/pdf of this set to release assets?

        • No. Those should be immutable.

      • Attach to a new release of one of our repos? New repo?

      • How do we handle stuff like this? It’s always done ad hoc

      • We NEED some simple, consistent dataset guidelines!!! Uncertainty here adds a ton of effort and builds institutional debt!!!

  • Version Hahn’s molecules in case we update dataset?

Future plans

  • Making this part of release checklist in the future?

  • Replacing mol2 with SDF in the future?future FB fits?

    • HJ will look into this

Final table for upload (with links)

Dataset

PDF

SMILES

Structures

Training

Add to 1.2.0 release assets and link

Add to 1.2.0 release assets and link

point to release tarball

1.2.0 release benchmarking

(need HJ to confirm)

Github link macro
linkhttps://github.com/openforcefield/release-1-benchmarking/blob/master/QM_molecule_selection/benchmark_set_pdfs/full_benchmark.pdf
(Need HJ to confirm)

or could run the same script on the release-1-benchmarking tarball

Could run the same script on the release-1-benchmarking tarball

release-1-benchmarking tarball https://github.com/openforcefield/release-1-benchmarking/releases/tag/v1.0.0

This round of benchmarking

Hahn’s pdf

(is this the same as PR pdf? Could link to that)

Hahn’s smi

(is this the same as PR smi? Could link to that)

The SDF from this PR? Are molecules collatable/is it possible to get reconstruct relationships/relative energies? molecules/set_v02_six_ffs/trim2_full_qcarchive.sdf

Github link macro
linkhttps://github.com/MobleyLab/benchmarkff/pull/14