/
Sage RC2 Benchmarking Protocol - Season 1

Sage RC2 Benchmarking Protocol - Season 1

This document details the execution procedure for adding Sage RC2 (openff-2.0.0-rc.2) optimizations to datasets produced with the Optimization Benchmarking Protocol - Season 1 .

All steps assume the deployed conda environment is active, e.g. with:

conda activate openff-benchmark-optimization

4. Optimization execution

Using your chosen approach for executing optimizations in Optimization Benchmarking Protocol - Season 1 | 4. Optimization execution , you will run MM optimizations using the 1:4 season specification.

Multi-node approaches

A. Multi-node, persistent server, long-lived manager

To submit and execute the MM stage using the QM stage data as input:

openff-benchmark optimize submit-molecules --fractal-uri "myserver.corp.local:7777" --dataset-name "Benchmark Optimization S1 - MM-SageRC2" --season "1:4" 4-compute-qm/b3lyp-d3bj/dzvp/

You can export the data from this dataset with:

openff-benchmark optimize export --fractal-uri "myserver.corp.local:7777" --dataset-name "Benchmark Optimization S1 - MM-SageRC2" -o 4-compute-mm

B. Multi-node, persistent server, short-lived managers

Same approach as (A) above.

C. Multi-node, no server/manager setup

If your cluster uses e.g. SLURM, create a submission script like the following, called submit_molecules.sh:

#! /usr/bin/bash #SBATCH --partition=workstations #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 ## EXAMPLE USAGE #sbatch --cpus-per-task 4 -J <JOBNAME> -o <stdoutfile.o> -e <stderrfile.e> submit_molecules.sh --memory 24 --season "1:1" -o 4-compute-qm 3-coverage_report/XXX-00000-00.sdf source $HOME/.bashrc conda activate openff-benchmark-optimization openff-benchmark optimize execute --nthreads ${SLURM_CPUS_PER_TASK} $@

For the MM stage, using 1 thread and 6GiB for each:

for conformer in 4-compute-qm/b3lyp-d3bj/dzvp/*.sdf do sbatch --cpus-per-task 1 -J $(basename ${conformer} ".sdf") -o $(basename ${conformer} ".sdf").o -e $(basename ${conformer} ".sdf").e submit_molecules.sh --memory 6 --season "1:4" -o 4-compute-mm ${conformer} done

You could batch multiple files into each submission with e.g. xargs:

echo 4-compute-qm/b3lyp-d3bj/dzvp/*.sdf | xargs -n 10 sbatch --cpus-per-task 1 submit_molecules.sh --memory 6 --season "1:4" -o 4-compute-mm

D. Single-node, persistent server

Same approach as (A) above.

E. Single-node, no server/manager setup

To execute the MM stage using the QM stage data as input:

openff-benchmark optimize execute --season "1:4" --nthreads 8 --memory 24 -o 4-compute-mm 4-compute-qm/b3lyp-d3bj/dzvp/

Because thread-parallelism may not scale as well as simply performing many optimizations at once with your machine, consider running multiple execute commands at once, giving each a batch of files to work on. You can do this in one shot with xargs. For example, for the MM step, we might choose 1 thread per command, and 8 commands:

echo 4-compute-qm/b3lyp-d3bj/dzvp/*.sdf | xargs -n 10 -P 8 openff-benchmark optimize execute --season "1:4" --nthreads 1 --memory 3 -o 4-compute-mm

Related content