Discovery
Workflow components
Each workflow component from the diagram above is numbered below.
Options for software components indicated for each.
Separability of required workflow components will allow for parallelism in development activity. The DEV label on each workflow component indicate qualitative development required for each.
DEV will require the most development, and so should be prioritized.
DEV may require some development.
DEV has a well-known and heavily-used software pathway.
Identifier assignment DEV
new, include in benchmarking library
Conformer generation (~10 conformers per molecule) DEV
openff-toolkit{rdkit}
Parameterization of molecules DEV
openff-toolkit{rdkit}
FF coverage report DEV
Reach out to Trevor, Jessica, Pavan for existing implementations
QCSubmit can give a list of all parameters used; doesn’t do counts currently, but could be made to
we’ll want counts, as this is richer information and allows us to prioritize coverage gaps
Energy minimization with Psi4 (QM), OpenMM (MM) DEV
multiple options
QCSubmit->QCFractal(->QCEngine->GeomeTRIC->QCEngine->Psi4/OpenMM)
this path allows easier extension to torsiondrives; not directly possible without significant development work with other paths
orchestration mostly solved in this path compared to the others
QCEngine->GeomeTRIC->QCEngine->Psi4/OpenMM
GeomeTRIC->QCEngine->Psi4/OpenMM
each option requires different considerations for deployment on queueing systems
simpler in terms of components may require additional development for deployment
Analysis and report generation DEV
can use components from
benchmarkff
; need to extract and fold into benchmarking libraryno matter the approach chosen for optimizations in (5), we will need extraction tooling for flat file output, reports
Available software components for implementation
encoder of OpenFF's preferences for dataset submissions to QCArchive
no compute on its own; requires use of QCFractal if part of workflow
important to ensure CMILES metadata in place to allow seamless MM calculations
client+worker+server for executing and storing procedures, such as optimizations
perhaps not strictly necessary, but may still be easiest path
complex solution may present failure modes that we have a hard time pinning down
QCEngine {vital}
features wrapper procedure to GeomeTRIC taking as input
QCElemental.OptimizationInput
no need for QCFractal
not certain of value-add vs. GeomeTRIC directly, unless simplifies input
GeomeTRIC {vital}
optimization protocol
can use QCEngine internally to optimize using gradients from a variety of programs (engines)
evaluation analyses high value
not currently installable as a package; only scripts/notebooks
dependent on OpenEye Toolkit
will likely pull functionality out and create infrastructure home in
openff-benchmark
openff-toolkit {vital}
required for parameterization of molecules for OpenFF forcefields
required for GAFF, but also usable as abstraction layer for OpenFF forcefields, others
used in QCEngine for OpenMM execution
nouveau functionality for working with QCArchive data; utility functions in service to Trevor Gokey's research and work
possible to pull some prototype functionality we don't have in an infrastructure package
Restricted components
OpenEye Toolkit
cannot use for this purpose; must not be necessary for any part of the workflow
Packaging Options
openff-benchmark
Library components and entry points can be placed in openff.benchmark.geometry_optimizations
.
openff-cli
Could introduce an entrypoint in this package for distribution. (optional, and for later)
Proposal
Interface
Command-line interface executable from any shell preferable.
Identifier assignment
Conformer generation
Parameterization of molecules
Forcefield coverage report
Remaining questions
Should reports be aggregated? How possible is it to back-calculate a molecular structure based on the parameters used to to parameterize it?
Energy minimization with Psi4 (QM), OpenMM (MM)
Proposing a three-pronged approach.
High-throughput (primary)
QCSubmit->QCFractal(->QCEngine->GeomeTRIC->QCEngine->Psi4/OpenMM)
output extraction executable at any time for pulling available data
need error cycling process
High-throughput debug approach (secondary)
Trevor's local optimization executor
add this to QCSubmit; generally usable for OpenFF QCArchive users in debugging
components shared with (3)
GeomeTRIC->QCEngine->Psi4/OpenMM
output still usable for reporting
Fully-local execution (alternative)
Like Horton's local TorsionDrive script, minus QCFractal execution if possible
components shared with (2)
GeomeTRIC->QCEngine->Psi4/OpenMM
output still usable for reporting
In principle, (2) and (3) could be served via the same entrypoint.
(1) would make use of QCFractal with a persistent server to handle most of the compute orchestration.