Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Discussion topics

Item

Notes

Intro to benchmark

  • CH – Is there a particular region of chemical space that’s of particular interest?

    • DD – Diversity is basically our big priority. But generally we want people to run about 1000 molecules.

    • DM – We’re also happy to get contributions of molecules that pharma partners/you CAN release, that we can do the benchmarking on in public.

Starting to participate

  • NS – What level of calculations are being used for these molecules?

  • DD – Main challenge for most people is setting up the compute infrastructure for the QM.

    • CH – AbbVie is multi-site, we have some common compute facilities. But other sites may have local compute. We have access to NCSA at U Illinois, and we’ve since been split off to a separate queue.

    • NS – At my center, we have workstations with several GPUs, 40-50 CPUs. But we also have access to NCSA.

    • CH – It’ll be easier to run locally. Working with NCSA will be a bit more complex since we’ll need admins to get involved on software deployment/validation.

    • DD – Are local boxes connected to a queueing system, or would it be SSHing into workstations?

    • CH – SSHing, we don’t have a queueing system for local work stations any more.

    • NS – That’s also the situation at my site.

    • NS – We’ll also need to figure out which dataset we’ll use for calculations.

    • CH – Thinking about the composition of the set, is there chemistry that OpenFF particularly wants? Like more/less nitros?

    • DM – We have a lot of nitros in our public sets.

    • CH – What about ortho-substitutes rings like biphenyl ethers?

    • (General) – We want molecules that are “relevant”, even if they’re really hard for FFs.

  • CH – To get a view of timing, how long would something like toluene take?

  • NS – Are there molecules to avoid?

    • Molecules with Iodine (issue with QM),

    • silicon and boron (not supported by FF)

    • Things where you don’t want RDKit to generate conformers (tricky macrocycles) – Can also just pass in 10 conformers of such macrocycles, so that RDKit doesn’t generate any

  • DD – Issues with conda-installing software on NCSA cluster?

    • CH – May not be difficult, I’ve just never tried it

    • DD – We support two installation routes in user-space

      • Installing from anaconda.org package repo

      • installing from whole-environment single-file-installer tarball

    • CH - I should be able to do either one technically, but it’s a security policy thing.

      • DD – 40-50 cores on a local machine is probably insufficient for a large set, so it may be good to start getting a single-file installer reviewed by IT.

      • CH – We could also do it on AWS; that may be easier policy-wise

      • (General) – We’ll get more information on how AWS compute is put together.

    • DD – There are a few different ways to set up the distributed compute. The best one for AWS distribution may be by setting up a small QCFractal server, and anticipating having short-lived managers.

  • CH – Sounds good. I’ll work on assessing the feasibility of these options

  • NS – What’s the timeline for this first round of benchmarking?

    • (OpenFF) – We’re not sure what the final deadline will be, but we can let you know as more partners return results.

    • CH + NS – Great. We’ll work on getting this started, and will expect to hear from you about hard deadlines.

  • CH + NS were added to benchmarks-support slack channel

  • DD added CH + NS to email list.

Action items

  •  David Dotson will send consolidated instructions to Nour and Charles on first steps, in particular running the protocol on the burn-in set

Decisions