...
We should make this process easier by curating a data set of systems, described below, and what we think the energy of each should be (including both the potential energy and per-term contributions). This would make it easier for users to validate their implementations and should mostly take control of the details out of their hands. This also provides internal value; it can uncover errors in our own implementations and serve as a natural data set to run regression tests against. The toolkit already runs some regression tests against a non-exhaustive set of molecules in an effort to safeguard against critical bugs, but leaves many use cases uncovered (and could be incorrect itself). Something similar is in the Interchange test suite; tracking reference energies from a canonical data set will reduce duplication of effort and increase the quality and reliability of these tests.
Curation
...
This data set should probably be distributed as a version-controlled set of files and scripts to process those files, supplemented with a table describing everything in detail. I think the result could look something like this, with potentially many (10s to 100s) of rows:
Molecule(s) (file(s) + other descriptors such that the organic chemistry is unambiguous) | Force field | Box vectors (implied by file) | Total energy | Ebond | … | EvdW | EElectrostatics |
---|---|---|---|---|---|---|---|
Single ligand: |
| (Implied as none by file |
) | some number kJ/mol | … | … | … | … |
Box of organic molecules in liquid phase: |
| (Read from PDB file |
) | … | … | … | … | … |
Protein in vacuo: | … | (Read from PDB file |
) | … | … | … | … | … | ||
Protein in water with ions and a docked ligand: | … | (Read from PDB file) |
Each record should specify, one way or another:
Force field (i.e. specific file/DOI)
Periodicity
Periodic box vectors, if any
Atomic coordinates
Specified in a file, not generated on the fly
Bond graph/connectivity
Each section of the SMIRNOFF spec should be covered. This includes
Multi-term torsions
Improper torsions
WBO interpolated torsions and bonds
AM1BCC charges
Library charges
Custom charge increments
Constrained and non-constrained force fields
Anything specific to biopolymers?
GBSA?
Virtual sites?
Some vaguely broad amount of chemistry should be covered
Single ligand(s) in vacuumSome molecules
Molecules with charged (zwitterionic?) groups
Molecules that are perceived differently by different aromaticity models
Molecules with non-planar impropers (i.e. pyrimidal nitrogens)
Molecule dimers in vacuum
Box of organic liquids
Box of water?
Protein(s) in vacuum
Protein(s) in watersaltwater
Ligand(s) bound to protein(s)?
Other biomolecules
...