Page Comparison

...

Item

Notes

Intro to benchmark

CH – Is there a particular region of chemical space that’s of particular interest?
- DD – Diversity is basically our big priority. But generally we want people to run about 1000 molecules.
- DM – We’re also happy to get contributions of molecules that pharma partners/you CAN release, that we can do the benchmarking on in public.
  - DD – More info on public compound sets here: Call for Public Compounds

Starting to participate

NS – What level of calculations are being used for these molecules?
- (DD walked through “Scope” section of Geometry Optimization Benchmarking for Industry Partners )
DD – Main challenge for most people is setting up the compute infrastructure for the QM.
- CH – AbbVie is multi-site, we have some common compute facilities. But other sites may have local compute. We have access to NCSA at U Illinois, and we’ve since been split off to a separate queue.
- NS – At my center, we have workstations with several GPUs, 40-50 CPUs. But we also have access to NCSA.
- CH – It’ll be easier to run locally. Working with NCSA will be a bit more complex since we’ll need admins to get involved on software deployment/validation.
- DD – Are local boxes connected to a queueing system, or would it be SSHing into workstations?
- CH – SSHing, we don’t have a queueing system for local work stations any more.
- NS – That’s also the situation at my site.
- NS – We’ll also need to figure out which dataset we’ll use for calculations.
- CH – Thinking about the composition of the set, is there chemistry that OpenFF particularly wants? Like more/less nitros?
- DM – We have a lot of nitros in our public sets.
- CH – What about ortho-substitutes rings like biphenyl ethers?
- (General) – We want molecules that are “relevant”, even if they’re really hard for FFs.
CH – To get a view of timing, how long would something like toluene take?
- DD – It varies a lot by compute resources.
- DM – We can extrapolate from the results of the burn-in set
- DD – I’ll send the burn-in set to NS and CH.
  - https://openforcefield.atlassian.net/wiki/spaces/PS/pages/873922575/Deployment+Procedure#Installation
NS – Are there molecules to avoid?
- Molecules with Iodine (issue with QM),
- silicon and boron (not supported by FF)
- Things where you don’t want RDKit to generate conformers (tricky macrocycles) – Can also just pass in 10 conformers of such macrocycles, so that RDKit doesn’t generate any
DD – Issues with conda-installing software on NCSA cluster?
- CH – May not be difficult, I’ve just never tried it
- DD – We support two installation routes in user-space
  - Installing from anaconda.org package repo
  - installing from whole-environment single-file-installer tarball
- CH - I should be able to do either one technically, but it’s a security policy thing.
  - DD – 40-50 cores on a local machine is probably insufficient for a large set, so it may be good to start getting a single-file installer reviewed by IT.
  - CH – We could also do it on AWS; that may be easier policy-wise
  - (General) – We’ll get more information on how AWS compute is put together.
- DD – There are a few different ways to set up the distributed compute. The best one for AWS distribution may be by setting up a small QCFractal server, and anticipating having short-lived managers.
CH – Sounds good. I’ll work on assessing the feasibility of these options
NS – What’s the timeline for this first round of benchmarking?
- (OpenFF) – We’re not sure what the final deadline will be, but we can let you know as more partners return results.
- CH + NS – Great. We’ll work on getting this started, and will expect to hear from you about hard deadlines.
CH + NS were added to benchmarks-support slack channel
DD added CH + NS to email list.

Action items

David Dotson will send consolidated instructions to Nour and Charles on first steps, in particular running the protocol on the burn-in set

...

Versions Compared

Old Version 3

New Version Current

Key

Action items