Sage RC2 Benchmarking Protocol - Season 1
This document details the execution procedure for adding Sage RC2 (openff-2.0.0-rc.2
) optimizations to datasets produced with the Optimization Benchmarking Protocol - Season 1 .
All steps assume the deployed conda environment is active, e.g. with:
conda activate openff-benchmark-optimization
4. Optimization execution
Using your chosen approach for executing optimizations in Optimization Benchmarking Protocol - Season 1 | 4. Optimization execution , you will run MM optimizations using the 1:4
season specification.
Multi-node approaches
A. Multi-node, persistent server, long-lived manager
To submit and execute the MM stage using the QM stage data as input:
openff-benchmark optimize submit-molecules --fractal-uri "myserver.corp.local:7777" --dataset-name "Benchmark Optimization S1 - MM-SageRC2" --season "1:4" 4-compute-qm/b3lyp-d3bj/dzvp/
You can export the data from this dataset with:
openff-benchmark optimize export --fractal-uri "myserver.corp.local:7777" --dataset-name "Benchmark Optimization S1 - MM-SageRC2" -o 4-compute-mm
B. Multi-node, persistent server, short-lived managers
Same approach as (A) above.
C. Multi-node, no server/manager setup
If your cluster uses e.g. SLURM, create a submission script like the following, called submit_molecules.sh
:
#! /usr/bin/bash
#SBATCH --partition=workstations
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
## EXAMPLE USAGE
#sbatch --cpus-per-task 4 -J <JOBNAME> -o <stdoutfile.o> -e <stderrfile.e> submit_molecules.sh --memory 24 --season "1:1" -o 4-compute-qm 3-coverage_report/XXX-00000-00.sdf
source $HOME/.bashrc
conda activate openff-benchmark-optimization
openff-benchmark optimize execute --nthreads ${SLURM_CPUS_PER_TASK} $@
For the MM stage, using 1 thread and 6GiB for each:
for conformer in 4-compute-qm/b3lyp-d3bj/dzvp/*.sdf
do
sbatch --cpus-per-task 1 -J $(basename ${conformer} ".sdf") -o $(basename ${conformer} ".sdf").o -e $(basename ${conformer} ".sdf").e submit_molecules.sh --memory 6 --season "1:4" -o 4-compute-mm ${conformer}
done
You could batch multiple files into each submission with e.g. xargs
:
echo 4-compute-qm/b3lyp-d3bj/dzvp/*.sdf | xargs -n 10 sbatch --cpus-per-task 1 submit_molecules.sh --memory 6 --season "1:4" -o 4-compute-mm
D. Single-node, persistent server
Same approach as (A) above.
E. Single-node, no server/manager setup
To execute the MM stage using the QM stage data as input:
openff-benchmark optimize execute --season "1:4" --nthreads 8 --memory 24 -o 4-compute-mm 4-compute-qm/b3lyp-d3bj/dzvp/
Because thread-parallelism may not scale as well as simply performing many optimizations at once with your machine, consider running multiple execute
commands at once, giving each a batch of files to work on. You can do this in one shot with xargs
. For example, for the MM step, we might choose 1 thread per command, and 8 commands:
echo 4-compute-qm/b3lyp-d3bj/dzvp/*.sdf | xargs -n 10 -P 8 openff-benchmark optimize execute --season "1:4" --nthreads 1 --memory 3 -o 4-compute-mm