Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Command-line interface executable from any shell preferable.

Identifier assignment

Implemented in openff.benchmark.utils. Can be as simple as a function that takes as input group/company code (3-letter), all molecules (with predefined conformers, if present). Will then produce a mapping of identifiers to molecule objects as: COM-XXXXX-YY

  • three-letter company code (COM)

  • molecule-index (XXXXX)

  • numerical conformer-index (YY); 01, 02, 03,…

Note that with this approach, each molecule submitted in the dataset will have exactly one conformer. We would not be stacking multiple conformers into each Molecule.

Conformer generation

For molecules with fewer than 10 conformers predefined, additional conformers will be generated to give a total of 10. This can already be done with openforcefield.topology.Molecule.generate_conformers.

We will need the mapping from Identifier assignment after this, so it likely makes sense to switch the order of these workflow components.

Remaining questions

  1. Do we care about easily distinguishing which conformers were provided vs. generated after the fact?

Parameterization of molecules

Parameterization of molecules will be performed with e.g.:

Code Block
from openforcefield.typing.engines.smirnoff import ForceField

# Load the OpenFF "Parsley" force field
forcefield = ForceField('openff-1.0.0.offxml')

# Parametrize the topology and return parameters used
off_topology = molecule.to_topology()
molecule_labels = forcefield.label_molecules(off_topology)

The labels can then be fed directly to the Forcefield coverage report generator. An entry-point wrapping this and the coverage report can be placed in openff.benchmark.parameterization.

Forcefield coverage report

A function taking multiple sets of molecule labels from Parameterization of molecules to generate coverage reports should go into openff.benchmark.parameterization. This will the produce a report giving the counts for each parameter in the forcefield, aggregated over the molecules provided.

Although possible to provide a report for each molecule, to mitigate privacy concerns on the molecules used, it is recommended to generate a single report for the whole dataset.

Remaining questions

  1. Should we enforce reports be aggregated? How Can we show how possible it is it to back-calculate a molecular structure based on the parameters used to to parameterize it?

Energy minimization with Psi4 (QM), OpenMM (MM)

...