Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Participants

Goals

  • Updates from project team members

  • Identify and address development issues encountered

  • Aim for merge of existing PRs; demonstrate and coordinate connective tissue between workflow components

  • Identify and address project risks

Discussion topics

Item

Notes

Updates from team

  • JW: Conformers prototyped; can give demo

    • working with dataset from Thomas Fox as a test case; does not go in smoothly

    • of 600 molecules, 8 stereochemistry issues; RDKit is a bit sloppy with stereochemistry issues

  • DD:

    • JW: should state the units in the SDF file for energies

  • JH: deployment procedure document; gathering information from PRs as they develop

    • trying to use each component to give feedback

  • DH: Pushed analysis components, would like consolidation on read/write functions, seasons

    • For each method, separate SDF, with exactly same molecule; have to be able to relate molecules of the same ID across results

    • Use pandas DataFrames primarily;

    • For CLI, separate analysis from plots/report generation

      • analysis artifact produced must be shareable

    • Could still output SDFs with relative metrics to QM included?

      • JW: may not want to do this, since it depends on another SDF somewhere in the results

      • DD: let’s give it a play and see if there is a way to make this less confusing in the output

    • Still need unit conversion; dependent on units being present in input SDFs to this component

    • DD: I’ll review today, push to merge

Dataset object?

  • Do we want to consolidate on a Dataset object (bundle of openFF mols pulled from SDFs, exports SDFs)

    • could do slicing based on ID components, would make analysis easier at the end; may add value at other components

  • For now, we’ll proceed with mergin each PR, then search for places we can consolidate read/write, handling of OFF mols

How to handle errored cases?

  • JW: 2 kinds:

    • partway through, can throw a message

    • failure with no output

  • Could add to e.g. the validate command:

    • Code Block
      
      @click.option('--error-directory',
                    default='1-errors')
      
      @click.option('--error-out',
                    default='1-errors.out')

  • DH: what do we do with “undefined stereochemistry?”

    • JW: these are opinionated parts of the toolkit, which may change/improve, but those improvements tied to release cycle of toolkit, so a bit slower; will just have to use this experience to spin out issues for improvement

    • [decision] since warnings are loud, we’ll squelch warnings; errors still get raised

    • can make clear that in the validation step, some percentage are expected to be excluded (>1%)

      • the minimization step may also have some percentage of failures (>5%)

      • could also say that overall, up to 10% of your dataset may not make it end-to-end

  • [decision]: we’ll make a slack channel for support; allows us to operate with low-friction, loop in folks as needed for help understanding weird cases

    • need to make clear that the channel is public, and that error messages should be posted with care

Structures from the PDB

  • Public submission of 6000 conformers

    • DH: will share on GDrive as a tarball for consumption by Jeff, DD

Basis set choice

  • DD: what are the goalposts for choice for basis?

  • JH: fast and accurate (big grin)

  • DH: Lim paper uses the default (DZVP) basis

    • [decision] DZVP will be our basis for this season

    • JH: Using DZVP will produce good results for OpenFF since it’s fit to this; GAFF wasn’t, so technically not as “fair”; we do want to start from this point for evaluating OpenFF though, since that’s the goal

    • DD: could make it fairly easy to inject compute specs for the curious

Action items

  •  

Decisions