This document is designed to help you set up an OpenFF QM benchmark using your own set of molecules. If you experience any issues during any stage of the procedure please raise all issues here. You can also receive support on the Open Force Field Slack channel benchmarks-partners
NOTE: all discussions are public so please do not disclose any molecule information you wish to remain private.
Installation
To keep things simple conda environments have been created which are fully self-consistent and contain at a minimum all of the required dependencies to run the calculations and generate the final analysis.
create a new conda environment on any machines that will hold your local QCArchive server or run compute managers using the command:
conda env create openforcefield/openff-benchmark-optimization
activate the environment using
conda activate openff-benchmark-optimization
Any update to the workflow will be distributed via an update to this environment.
Software Updates
Whenever we need to update any of the software with bug fixes or improvements we will update the conda environment above. To install these updates it is recommended that you remove the current working environment with conda env remove -n openff-benchmark-optimization
and install again using the instructions above.
Compute Approaches
Due to the needs of the parties involved, multiple compute procedures have been created allowing users to generate the final analysis on the molecules following any of our well defined compute pathways. The main difference between these is the use of a long-standing local QCArchive instance to store and collect all data during the benchmark.
Choosing Your Approach
In order to be able to use a local static QCArchive instance, you will need to be able to host the instance on a machine/ head node with an open port to talk to compute managers over your internal network. In the case where you do not have access to an HPC cluster and using a single machine to host the server and manager, they will communicate via
localhost
.Benefits
There is a long-running database of results which can be queried at any time during execution for progress.
QCArchive will automatically handle the parallelization and distribution of all tasks between connected managers.
Error reporting is more verbose as it is captured into the database on failed tasks.
Drawbacks
There is a long-running service with an open network connection on the machine/head node.
Server/Manager setup and error cycling
Once you have a conda environment installed you need to start up the server.
to start the server and the error cycling machinery which will periodically check all running tasks in the archive and restart them when an unexpected error is found using the following command.
Troubleshooting
Where is the local QCArchive database stored
you can find the database in a path like
.qca/qcfractal/
on the machine where you activated the server.
Why do I see stereochemistry issues in the initial step in the workflow?
All file reading and writing is done via the Open Force Field Toolkit which is powered by RDKit. When we read in a molecule we check according to RDKits definitions if the molecule has any missing stereochemistry. We recommend using 3D SDF files where possible so that all stereochemistry is fully defined.
Missing molecules?
During the workflow, multiple things can go wrong manly involving undefined stereochemistry when this happens a warning is shown about the molecule is moved to an error directory to keep track of.
During the analysis, you may notice some molecules are missing or the number of conformers of the molecule is not as expected. During optimization, we sometimes see proton transfer and the resulting optimized geometry is actually a tautomer of the input, in this case, the optimization is excluded from the analysis. This should not affect the results.