Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This document is designed to help you set up an OpenFF QM+MM benchmark using your own set of molecules.

If you experience any issues during any stage of the procedure, please raise all issues here. You can also receive support on the Open Force Field Slack channel benchmarks-partnerssupport NOTE: all discussions are public, so please do not disclose any molecule information you wish to remain private.

Table of Contents
outlinetrue

Installation using conda

To keep things simple conda environments have been created which are fully self-consistent and contain at a minimum all of the required dependencies to run the calculations and generate the final analysis.

...

Create a new conda environment on any machines that will hold your local

...

QCFractal server or run compute managers using the command:

Code Block
conda env create openforcefield/openff-benchmark-optimization

...

Activate the environment using:

Code Block
conda activate openff-benchmark-optimization

It is recommended to use this conda package as any Any update to the workflow will be distributed via an update to this environment.

...

Whenever we need to update any of the software with bug fixes or improvements we will update the conda environment above. To install these updates it is recommended that you remove the current working environment with conda env remove -n openff-benchmark-optimization and install again using the instructions above.

Compute Approaches

Due to the needs of the parties involved, multiple compute procedures have been created allowing users to generate the final analysis on the molecules following any of our well defined compute pathways

Installation using a single-file installer

If your compute environment cannot reach Anaconda Cloud, you can install the conda environment via a single-file installer.

  1. Navigate to the Releases page for openff-benchmark.

  2. For the most recent release in the list, click on the Assets dropdown, and download the ubuntu-latest_py3.7.sh.zip file. This is ~1GB, so the download may take a moment.

  3. Upload the file to your compute environment/cluster.

  4. Unzip with unzip ubuntu-latest_py3.7.sh.zip.

  5. Run the installer with: bash openff-benchmark-*/openff-benchmark-*.sh -b -p openff-benchmark-miniconda, substituting the name of the unzipped directory and the install script.

  6. Add conda executable to your environment with source openff-benchmark-miniconda/etc/profile.d/conda.sh

  7. Activate the base environment with conda activate. This will be the name of your environment instead of openff-benchmark-optimization, which you will see used elsewhere.

To make it easy to load this environment at will, we recommend you create an openff-benchmark-optimization.sh file with the following contents:

Code Block
source openff-benchmark-miniconda/etc/profile.d/conda.sh
conda activate

Software Updates

If you used the single-file installer approach, you will need to remove or rename the openff-benchmark-miniconda directory created from installation and repeat the install instructions above.

Compute Approaches

Multiple compute options have been created to allow users to generate the QM and MM optimizations required for the benchmark. The main difference between these is the use of a long-standing local QCArchive -running self-hosted QCFractal server instance to store and collect all data during the benchmark.

...

If you have have chosen a compute option that requires a self-hosted QCFractal server instance, follow the instructions below.

Setting up a QCFractal Server

In order to be able to use a local

...

QCFractal server instance, you will need to be able to host the instance on a machine

...

with a port (e.g. 7777) that is network-accessible from either the head node or compute nodes of your cluster. In the case where you do not have access to an HPC cluster and are using a single machine to host the server and

...

perform compute, all network communication will occur in-host via localhost.

...

There are several benefits to using a self-hosted server:

  1. There is a long-running database of results which can be queried at any time during execution for progress.

  2. QCArchive

    QCFractal will automatically handle the parallelization and distribution of all tasks between connected managers.

  3. Error reporting is more verbose as it is captured into the database on failed tasks.

  4. Drawbacks

    1. There is a long-running service with an open network connection on the machine/head node.

Server/Manager setup and error cycling

Once you have a conda environment installed you need to start up the server.

...

Server setup

Initialize your server instance with:

Code Block
qcfractal-server init

See qcfractal-server init --help for additional options. This will by default place the server data in ~/.qca/qcfractal.

Start up the server instance with:

Code Block
qcfractal-server start

See qcfractal-server start --help for additional options.

Manager setup

We will submit molecules and computation requests to the server, but compute of those computation requests must be performed by one or more QCFractal manager(s) that can communicate with that server.

Generally, you can start a manager with:

Code Block
qcfractal-manager -v --verify False --config-file <config.yaml>

The content of your manager’s config.yaml, as well as whether you will launch it as a long-running process or as part of a batch job, depends on your cluster setup and the compute option you have chosen to use on it.

Troubleshooting

  • Where is the local QCArchive database stored

    • you can find the database in a path like ~/.qca/qcfractal/ on the machine where you activated the server.

  • Why do I see stereochemistry issues in the initial step in the workflow?

    • All file reading and writing is done via the Open Force Field Toolkit which is , with many cheminformatics operations powered by RDKit. When we read in a molecule we check according to RDKits definitions with RDKit if the molecule has any missing stereochemistry information or for other common formatting issues. We recommend using 3D SDF files where possible so that all stereochemistry is fully defined.

  • Missing molecules?

    • During the workflow, multiple things can go wrong manly involving undefined stereochemistry when In each stage of the workflow, a small fraction of molecules will encounter issues with the various programs in the pipeline. When this happens a warning is shown about , and the molecule is moved to an error directory to keep track ofsubdirectory of the output folder.

    • During the analysis, you may notice some molecules are missing or the number of conformers of the molecule is not as expected. During optimization, we sometimes see proton transfer and the resulting optimized geometry is actually a tautomer of the input, in . In this case, the optimization is excluded from the analysis. This should not affect the results.

...