2023-07-10 Meeting notes

Date

Jul 10, 2023

Participants

@Jeffrey Wagner
@Matt Thompson
@Lily Wang

Discussion topics

Notes

Notes

MT – Would love more structure on this - I work best when there’s work chunked out and I can see a lot of little steps accomplishing a goal.
LW – Would be good to start at the planning doc, for the tier 1 MVP. Goals are:
- Running QM benchmarks after pulling down from QCArchive
  - MT – This is done, but not in the sense that it’s polished. There’s some filtering/error handling that’s in the old code that isn’t in the new. In the last step there are 100 plots that are generated (split out for different functional groups).
  - LW – It’d be great to help get the new hires running the QM benchmarks running regularly.
  - MT – Big blocker there is pulling down dataset from QCA - There are lots of issues with QCA datasets.
  - JW – use JSON blob and/or SDF in meantime
  - MT – I’m very unhappy about state of QCA/QC datasets.
  - LW – Starting from serialized inputs could be good if QCA continues being unreliable. Lexie and Brent have learned that using locally-saved files from QCA is more straightforward process than accessing QCF live.
  - MT – Doesn’t this conflict with the ethos that things need to be runnable by other people?
  - LW – Since we’re making this modular means that this will actually be good for scientists.
  - MT – I wonder if there’s an early starting point where there needs to be a structure for the data - Like QCA → (input state), on-disk → (input state), etc.
  - LW – Was hoping that this could be the same model as QCSubmit datasets - Basically a format defined by downloading from QCArchive.
  - MT – QCSubmit models are probably a good place to start for dataset storage.
  - (..JW walks through code changes, whether QCF datasets could replace QCS datasets…)
  - MT – Don’t get rid of QCSubmit datasets - QCF datasets don’t store OFFMols, they only store QCEl Molecules which aren’t suitable for our purposes.
- LW – Would you be up for this workflow plan, where we try to get a full benchmarking run going for Lexie and Brent, circumventing QCA pulldowns.
- MT – Possible steps:
  - Define which dataset we want to use - Either by dataset name or otherwise
    - Starting points:
      - "OpenFF Full Optimization Benchmark 1"
      - "OpenFF Industry Benchmark Season 1 v1.1"
      - "OpenFF-benchmark-ligand-fragments-v1.0"
  - Re-jigger code so that the QCSubmit classes are the core objects (hopefully easy, but may be tricky).
    - MT – LW, how do you extend/append datasets?
    - LW – I hack the dataset by manually modifying the list of dataset.entries[client.address]but I’m almost certain that’s not kosher
    - JW – Neither am I
  - Make a clean way to extend/combine/filter datasets.
  - Test workflow using data from disk.
- LW – Would it be helpful or a hindrance to get LM and BW involved in the feedback cycle at this point?
  - MT – Probably not both, though it may be helpful to get one of them in this meeting in the future. Once I have the points above done they could be a good audience. Late this week may be too early. But mid-next week could be a good time.
  - LW – Could I invite them to this meeting next week as an intro to the project?
  - MT – That sounds good.
  - LW – LM is working on conformer selection. BW is working on torsion multiplicies and has expressed an interest in software engineering, so he may make his own plugins.

LW – JH has been working separately on his own benchmarking and has some tools that may be useful here. He wanted to contribute the to this framework if possible. It’s some analyses from the bespokefit paper that may be useful for other people.
- MT – Can we see the code?
- LW – I’ll ask him for the link. He’ll be in the benchmarking channel.
- MT – I’m broadly optimistic about what he’s done, looking forward to seeing it.

2023-07-10 Meeting notes

Date

Participants

Discussion topics

Action items

Decisions